mirror of
https://github.com/apache/impala.git
synced 2026-01-05 12:01:11 -05:00
A number of 'CDH' by itself were turned into substitution variables resolving to 'Apache Hadoop'. Also fixed some stray instances of CDH version numbers. In some cases, 'CDH' or 'CDH 5' by itself was superfluous and was just removed. Change-Id: I979ea73ccaa5873d4108545f18f598072fb5e05f Reviewed-on: http://gerrit.cloudera.org:8080/6352 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
331 lines
12 KiB
XML
331 lines
12 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="langref_hiveql_delta">
|
|
|
|
<title>SQL Differences Between Impala and Hive</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Hive"/>
|
|
<data name="Category" value="Porting"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">Hive</indexterm>
|
|
<indexterm audience="hidden">HiveQL</indexterm>
|
|
Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as
|
|
built-in functions. See <xref href="impala_porting.xml#porting"/> for a general discussion of adapting SQL
|
|
code from a variety of database systems to Impala.
|
|
</p>
|
|
|
|
<p>
|
|
Because Impala and Hive share the same metastore database and their tables are often used interchangeably,
|
|
the following section covers differences between Impala and Hive in detail.
|
|
</p>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
|
|
<concept id="langref_hiveql_unsupported">
|
|
|
|
<title>HiveQL Features not Available in Impala</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The current release of Impala does not support the following SQL features that you might be familiar with
|
|
from HiveQL:
|
|
</p>
|
|
|
|
<!-- To do:
|
|
Yeesh, too many separate lists of unsupported Hive syntax.
|
|
Here, the FAQ, and in some of the intro topics.
|
|
Some discussion in IMP-1061 about how best to reorg.
|
|
Lots of opportunities for conrefs.
|
|
-->
|
|
|
|
<ul>
|
|
<!-- Now supported in <keyword keyref="impala23_full"/> and higher. Find places on this page (like already done under lateral views) to note the new data type support.
|
|
<li>
|
|
Non-scalar data types such as maps, arrays, structs.
|
|
</li>
|
|
-->
|
|
|
|
<li rev="1.2">
|
|
Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom file formats, or custom SerDes.
|
|
</li>
|
|
|
|
<li rev="">
|
|
The <codeph>DATE</codeph> data type.
|
|
</li>
|
|
|
|
<li>
|
|
XML and JSON functions.
|
|
</li>
|
|
|
|
<li>
|
|
Certain aggregate functions from HiveQL: <codeph>covar_pop</codeph>, <codeph>covar_samp</codeph>,
|
|
<codeph>corr</codeph>, <codeph>percentile</codeph>, <codeph>percentile_approx</codeph>,
|
|
<codeph>histogram_numeric</codeph>, <codeph>collect_set</codeph>; Impala supports the set of aggregate
|
|
functions listed in <xref href="impala_aggregate_functions.xml#aggregate_functions"/> and analytic
|
|
functions listed in <xref href="impala_analytic_functions.xml#analytic_functions"/>.
|
|
</li>
|
|
|
|
<li>
|
|
Sampling.
|
|
</li>
|
|
|
|
<li>
|
|
Lateral views. In <keyword keyref="impala23_full"/> and higher, Impala supports queries on complex types
|
|
(<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>), using join notation
|
|
rather than the <codeph>EXPLODE()</codeph> keyword.
|
|
See <xref href="impala_complex_types.xml#complex_types"/> for details about Impala support for complex types.
|
|
</li>
|
|
|
|
<li>
|
|
Multiple <codeph>DISTINCT</codeph> clauses per query, although Impala includes some workarounds for this
|
|
limitation.
|
|
<note conref="../shared/impala_common.xml#common/multiple_count_distinct"/>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
User-defined functions (UDFs) are supported starting in Impala 1.2. See <xref href="impala_udf.xml#udfs"/>
|
|
for full details on Impala UDFs.
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently
|
|
support user-defined table generating functions (UDTFs).
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
Only Impala-supported column types are supported in Java-based UDFs.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p conref="../shared/impala_common.xml#common/current_user_caveat"/>
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>
|
|
Impala does not currently support these HiveQL statements:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<codeph>ANALYZE TABLE</codeph> (the Impala equivalent is <codeph>COMPUTE STATS</codeph>)
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>DESCRIBE COLUMN</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>DESCRIBE DATABASE</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>EXPORT TABLE</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>IMPORT TABLE</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SHOW TABLE EXTENDED</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SHOW INDEXES</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SHOW COLUMNS</codeph>
|
|
</li>
|
|
|
|
<li rev="DOCS-656">
|
|
<codeph>INSERT OVERWRITE DIRECTORY</codeph>; use <codeph>INSERT OVERWRITE <varname>table_name</varname></codeph>
|
|
or <codeph>CREATE TABLE AS SELECT</codeph> to materialize query results into the HDFS directory associated
|
|
with an Impala table.
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="langref_hiveql_semantics">
|
|
|
|
<title>Semantic Differences Between Impala and HiveQL Features</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
This section covers instances where Impala and Hive have similar functionality, sometimes including the
|
|
same syntax, but there are differences in the runtime semantics of those features.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Security:</b>
|
|
</p>
|
|
|
|
<p>
|
|
Impala utilizes the <xref href="http://sentry.incubator.apache.org/" scope="external" format="html">Apache
|
|
Sentry </xref> authorization framework, which provides fine-grained role-based access control
|
|
to protect data against unauthorized access or tampering.
|
|
</p>
|
|
|
|
<p>
|
|
The Hive component now includes Sentry-enabled <codeph>GRANT</codeph>,
|
|
<codeph>REVOKE</codeph>, and <codeph>CREATE/DROP ROLE</codeph> statements. Earlier Hive releases had a
|
|
privilege system with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements that were primarily
|
|
intended to prevent accidental deletion of data, rather than a security mechanism to protect against
|
|
malicious users.
|
|
</p>
|
|
|
|
<p>
|
|
Impala can make use of privileges set up through Hive <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements.
|
|
Impala has its own <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala 2.0 and higher.
|
|
See <xref href="impala_authorization.xml#authorization"/> for the details of authorization in Impala, including
|
|
how to switch from the original policy file-based privilege model to the Sentry service using privileges
|
|
stored in the metastore database.
|
|
</p>
|
|
|
|
<p>
|
|
<b>SQL statements and clauses:</b>
|
|
</p>
|
|
|
|
<p>
|
|
The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL
|
|
statement and clause names:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Impala uses different syntax and names for query hints, <codeph>[SHUFFLE]</codeph> and
|
|
<codeph>[NOSHUFFLE]</codeph> rather than <codeph>MapJoin</codeph> or <codeph>StreamJoin</codeph>. See
|
|
<xref href="impala_joins.xml#joins"/> for the Impala details.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not expose MapReduce specific features of <codeph>SORT BY</codeph>, <codeph>DISTRIBUTE
|
|
BY</codeph>, or <codeph>CLUSTER BY</codeph>.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not require queries to include a <codeph>FROM</codeph> clause.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
<b>Data types:</b>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected
|
|
casting behavior.
|
|
<ul>
|
|
<li>
|
|
Impala does not implicitly cast between string and numeric or Boolean types. Always use
|
|
<codeph>CAST()</codeph> for these conversions.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does perform implicit casts among the numeric types, when going from a smaller or less precise
|
|
type to a larger or more precise one. For example, Impala will implicitly convert a
|
|
<codeph>SMALLINT</codeph> to a <codeph>BIGINT</codeph> or <codeph>FLOAT</codeph>, but to convert from
|
|
<codeph>DOUBLE</codeph> to <codeph>FLOAT</codeph> or <codeph>INT</codeph> to <codeph>TINYINT</codeph>
|
|
requires a call to <codeph>CAST()</codeph> in the query.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal
|
|
formats for the <codeph>TIMESTAMP</codeph> data type and the <codeph>from_unixtime()</codeph> format
|
|
string; see <xref href="impala_timestamp.xml#timestamp"/> for details.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
See <xref href="impala_datatypes.xml#datatypes"/> for full details on implicit and explicit casting for
|
|
all types, and <xref href="impala_conversion_functions.xml#conversion_functions"/> for details about
|
|
the <codeph>CAST()</codeph> function.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from
|
|
unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can
|
|
produce different results for some calls to similarly named date/time functions between Impala and Hive.
|
|
See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details about the Impala
|
|
functions. See <xref href="impala_timestamp.xml#timestamp"/> for a discussion of how Impala handles
|
|
time zones, and configuration options you can use to make Impala match the Hive behavior more closely
|
|
when dealing with Parquet-encoded <codeph>TIMESTAMP</codeph> data or when converting between
|
|
the local time zone and UTC.
|
|
</li>
|
|
|
|
<li>
|
|
The Impala <codeph>TIMESTAMP</codeph> type can represent dates ranging from 1400-01-01 to 9999-12-31.
|
|
This is different from the Hive date range, which is 0000-01-01 to 9999-12-31.
|
|
</li>
|
|
|
|
<li>
|
|
<p conref="../shared/impala_common.xml#common/int_overflow_behavior"/>
|
|
</li>
|
|
|
|
</ul>
|
|
|
|
<p>
|
|
<b>Miscellaneous features:</b>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Impala does not provide virtual columns.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not expose locking.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not expose some configuration properties.
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
</concept>
|