The core development language with Impala is SQL. You can also use Java or other languages to interact with
Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For
specialized kinds of analysis, you can supplement the SQL built-in functions by writing
The Impala SQL dialect is highly compatible with the SQL syntax used in the Apache Hive component (HiveQL). As such, it is familiar to users who are already familiar with running SQL queries on the Hadoop infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in functions. Impala also includes additional built-in functions for common industry features, to simplify porting SQL from non-Hadoop systems.
For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect might seem familiar:
The
From the data warehousing world, you will recognize the notion of
In Impala 1.2 and higher,
For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect might require some learning and practice for you to become proficient in the Hadoop environment:
Impala SQL is focused on queries and includes relatively little DML. There is no
All data creation is done by
You often construct Impala table definitions and data files in some other environment, and then attach Impala so that it can run real-time queries. The same data files and table metadata are shared with other components of the Hadoop ecosystem. In particular, Impala can access tables created by Hive or data inserted by Hive, and Hive can access tables and data produced by Impala. Many other Hadoop components can write files in formats such as Parquet and Avro, that can then be queried by Impala.
Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL
includes some idioms that you might find in the import utilities for traditional database systems. For
example, you can create a table that reads comma-separated or tab-separated text files, specifying the
separator in the
Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does
not require length constraints on string data types. For example, you can define a database column as
Related information:
You can connect and submit requests to the Impala daemons through:
With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence tools that use the JDBC and ODBC interfaces.
Each