mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
331 lines
12 KiB
XML
331 lines
12 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="langref_hiveql_delta">
|
|
|
|
<title>SQL Differences Between Impala and Hive</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Hive"/>
|
|
<data name="Category" value="Porting"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">Hive</indexterm>
|
|
<indexterm audience="hidden">HiveQL</indexterm>
|
|
Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as
|
|
built-in functions. See <xref href="impala_porting.xml#porting"/> for a general discussion of adapting SQL
|
|
code from a variety of database systems to Impala.
|
|
</p>
|
|
|
|
<p>
|
|
Because Impala and Hive share the same metastore database and their tables are often used interchangeably,
|
|
the following section covers differences between Impala and Hive in detail.
|
|
</p>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
|
|
<concept id="langref_hiveql_unsupported">
|
|
|
|
<title>HiveQL Features not Available in Impala</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The current release of Impala does not support the following SQL features that you might be familiar with
|
|
from HiveQL:
|
|
</p>
|
|
|
|
<!-- To do:
|
|
Yeesh, too many separate lists of unsupported Hive syntax.
|
|
Here, the FAQ, and in some of the intro topics.
|
|
Some discussion in IMP-1061 about how best to reorg.
|
|
Lots of opportunities for conrefs.
|
|
-->
|
|
|
|
<ul>
|
|
<!-- Now supported in <keyword keyref="impala23_full"/> and higher. Find places on this page (like already done under lateral views) to note the new data type support.
|
|
<li>
|
|
Non-scalar data types such as maps, arrays, structs.
|
|
</li>
|
|
-->
|
|
|
|
<li rev="1.2">
|
|
Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom file formats, or custom SerDes.
|
|
</li>
|
|
|
|
<li rev="CDH-41376">
|
|
The <codeph>DATE</codeph> data type.
|
|
</li>
|
|
|
|
<li>
|
|
XML and JSON functions.
|
|
</li>
|
|
|
|
<li>
|
|
Certain aggregate functions from HiveQL: <codeph>covar_pop</codeph>, <codeph>covar_samp</codeph>,
|
|
<codeph>corr</codeph>, <codeph>percentile</codeph>, <codeph>percentile_approx</codeph>,
|
|
<codeph>histogram_numeric</codeph>, <codeph>collect_set</codeph>; Impala supports the set of aggregate
|
|
functions listed in <xref href="impala_aggregate_functions.xml#aggregate_functions"/> and analytic
|
|
functions listed in <xref href="impala_analytic_functions.xml#analytic_functions"/>.
|
|
</li>
|
|
|
|
<li>
|
|
Sampling.
|
|
</li>
|
|
|
|
<li>
|
|
Lateral views. In <keyword keyref="impala23_full"/> and higher, Impala supports queries on complex types
|
|
(<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>), using join notation
|
|
rather than the <codeph>EXPLODE()</codeph> keyword.
|
|
See <xref href="impala_complex_types.xml#complex_types"/> for details about Impala support for complex types.
|
|
</li>
|
|
|
|
<li>
|
|
Multiple <codeph>DISTINCT</codeph> clauses per query, although Impala includes some workarounds for this
|
|
limitation.
|
|
<note conref="../shared/impala_common.xml#common/multiple_count_distinct"/>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
User-defined functions (UDFs) are supported starting in Impala 1.2. See <xref href="impala_udf.xml#udfs"/>
|
|
for full details on Impala UDFs.
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently
|
|
support user-defined table generating functions (UDTFs).
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
Only Impala-supported column types are supported in Java-based UDFs.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p conref="../shared/impala_common.xml#common/current_user_caveat"/>
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>
|
|
Impala does not currently support these HiveQL statements:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<codeph>ANALYZE TABLE</codeph> (the Impala equivalent is <codeph>COMPUTE STATS</codeph>)
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>DESCRIBE COLUMN</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>DESCRIBE DATABASE</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>EXPORT TABLE</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>IMPORT TABLE</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SHOW TABLE EXTENDED</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SHOW INDEXES</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SHOW COLUMNS</codeph>
|
|
</li>
|
|
|
|
<li rev="DOCS-656">
|
|
<codeph>INSERT OVERWRITE DIRECTORY</codeph>; use <codeph>INSERT OVERWRITE <varname>table_name</varname></codeph>
|
|
or <codeph>CREATE TABLE AS SELECT</codeph> to materialize query results into the HDFS directory associated
|
|
with an Impala table.
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="langref_hiveql_semantics">
|
|
|
|
<title>Semantic Differences Between Impala and HiveQL Features</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
This section covers instances where Impala and Hive have similar functionality, sometimes including the
|
|
same syntax, but there are differences in the runtime semantics of those features.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Security:</b>
|
|
</p>
|
|
|
|
<p>
|
|
Impala utilizes the <xref href="http://sentry.incubator.apache.org/" scope="external" format="html">Apache
|
|
Sentry </xref> authorization framework, which provides fine-grained role-based access control
|
|
to protect data against unauthorized access or tampering.
|
|
</p>
|
|
|
|
<p>
|
|
The Hive component included in <ph rev="upstream">CDH 5.1</ph> and higher now includes Sentry-enabled <codeph>GRANT</codeph>,
|
|
<codeph>REVOKE</codeph>, and <codeph>CREATE/DROP ROLE</codeph> statements. Earlier Hive releases had a
|
|
privilege system with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements that were primarily
|
|
intended to prevent accidental deletion of data, rather than a security mechanism to protect against
|
|
malicious users.
|
|
</p>
|
|
|
|
<p>
|
|
Impala can make use of privileges set up through Hive <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements.
|
|
Impala has its own <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala 2.0 and higher.
|
|
See <xref href="impala_authorization.xml#authorization"/> for the details of authorization in Impala, including
|
|
how to switch from the original policy file-based privilege model to the Sentry service using privileges
|
|
stored in the metastore database.
|
|
</p>
|
|
|
|
<p>
|
|
<b>SQL statements and clauses:</b>
|
|
</p>
|
|
|
|
<p>
|
|
The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL
|
|
statement and clause names:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Impala uses different syntax and names for query hints, <codeph>[SHUFFLE]</codeph> and
|
|
<codeph>[NOSHUFFLE]</codeph> rather than <codeph>MapJoin</codeph> or <codeph>StreamJoin</codeph>. See
|
|
<xref href="impala_joins.xml#joins"/> for the Impala details.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not expose MapReduce specific features of <codeph>SORT BY</codeph>, <codeph>DISTRIBUTE
|
|
BY</codeph>, or <codeph>CLUSTER BY</codeph>.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not require queries to include a <codeph>FROM</codeph> clause.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
<b>Data types:</b>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected
|
|
casting behavior.
|
|
<ul>
|
|
<li>
|
|
Impala does not implicitly cast between string and numeric or Boolean types. Always use
|
|
<codeph>CAST()</codeph> for these conversions.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does perform implicit casts among the numeric types, when going from a smaller or less precise
|
|
type to a larger or more precise one. For example, Impala will implicitly convert a
|
|
<codeph>SMALLINT</codeph> to a <codeph>BIGINT</codeph> or <codeph>FLOAT</codeph>, but to convert from
|
|
<codeph>DOUBLE</codeph> to <codeph>FLOAT</codeph> or <codeph>INT</codeph> to <codeph>TINYINT</codeph>
|
|
requires a call to <codeph>CAST()</codeph> in the query.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal
|
|
formats for the <codeph>TIMESTAMP</codeph> data type and the <codeph>from_unixtime()</codeph> format
|
|
string; see <xref href="impala_timestamp.xml#timestamp"/> for details.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
See <xref href="impala_datatypes.xml#datatypes"/> for full details on implicit and explicit casting for
|
|
all types, and <xref href="impala_conversion_functions.xml#conversion_functions"/> for details about
|
|
the <codeph>CAST()</codeph> function.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from
|
|
unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can
|
|
produce different results for some calls to similarly named date/time functions between Impala and Hive.
|
|
See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details about the Impala
|
|
functions. See <xref href="impala_timestamp.xml#timestamp"/> for a discussion of how Impala handles
|
|
time zones, and configuration options you can use to make Impala match the Hive behavior more closely
|
|
when dealing with Parquet-encoded <codeph>TIMESTAMP</codeph> data or when converting between
|
|
the local time zone and UTC.
|
|
</li>
|
|
|
|
<li>
|
|
The Impala <codeph>TIMESTAMP</codeph> type can represent dates ranging from 1400-01-01 to 9999-12-31.
|
|
This is different from the Hive date range, which is 0000-01-01 to 9999-12-31.
|
|
</li>
|
|
|
|
<li>
|
|
<p conref="../shared/impala_common.xml#common/int_overflow_behavior"/>
|
|
</li>
|
|
|
|
</ul>
|
|
|
|
<p>
|
|
<b>Miscellaneous features:</b>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Impala does not provide virtual columns.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not expose locking.
|
|
</li>
|
|
|
|
<li>
|
|
Impala does not expose some configuration properties.
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
</concept>
|