mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Add short notes about this release. Tests: - Built the html and pdf locally. Verified the new content. Change-Id: I4b9cc838de018c954f419ebb71d65c0d5725a4a9 Reviewed-on: http://gerrit.cloudera.org:8080/17676 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
4432 lines
197 KiB
XML
4432 lines
197 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept rev="ver" id="new_features">
|
||
|
||
<title><ph audience="standalone">New Features in Apache Impala</ph><ph audience="integrated">What's New in Apache Impala</ph></title>
|
||
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Release Notes"/>
|
||
<data name="Category" value="New Features"/>
|
||
<data name="Category" value="What's New"/>
|
||
<data name="Category" value="Getting Started"/>
|
||
<data name="Category" value="Upgrading"/>
|
||
<data name="Category" value="Administrators"/>
|
||
<data name="Category" value="Developers"/>
|
||
<data name="Category" value="Data Analysts"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This release of Impala contains the following changes and enhancements from previous releases.
|
||
</p>
|
||
|
||
<p outputclass="toc inpage"/>
|
||
|
||
</conbody>
|
||
<concept rev="4.0.0" id="new_features_400">
|
||
<title>New Features in <keyword keyref="impala40"/></title>
|
||
<conbody>
|
||
<p>
|
||
For the full list of issues closed in this release, including the
|
||
issues marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="release_notes_40">release notes</xref> or
|
||
<xref keyref="changelog_40">changelog</xref> for
|
||
<keyword keyref="impala40"/>.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
<concept rev="3.4.0" id="new_features_34">
|
||
<title>New Features in <keyword keyref="impala34"/></title>
|
||
<conbody>
|
||
<p> The following sections describe the noteworthy improvements made in
|
||
<keyword keyref="impala34"/>. </p>
|
||
<p> For the full list of issues closed in this release, see the <xref
|
||
keyref="changelog_34">changelog for <keyword keyref="impala34"
|
||
/></xref>. </p>
|
||
<section id="section_cw4_nmw_pjb">
|
||
<title>Support for Hive Insert-Only Transactional Tables</title>
|
||
<p>Impala added the support to truncate insert-only transactional
|
||
tables. </p>
|
||
<p>By default, Impala creates an insert-only transactional table when
|
||
you issue the <codeph>CREATE TABLE</codeph> statement.</p>
|
||
<p>Use the Hive compaction to compact small files to improve the
|
||
performance and scalability of metadata in transactional tables.</p>
|
||
<p>See <xref href="impala_transactions.xml#transactions"/> for more
|
||
information.</p>
|
||
</section>
|
||
<section id="impala-8656">
|
||
<title>Server-side Spooling of Query Results</title>
|
||
<p>You can use the <codeph>SPOOL_QUERY_RESULTS</codeph> query option to
|
||
control how query results are returned to the client.</p>
|
||
<p>By default, when a client fetches a set of query results, the next
|
||
set of results are fetched in batches until all the result rows are
|
||
produced. If a client issues a query without fetching all the results,
|
||
the query fragments continue to hold on to the resources until the
|
||
query is canceled and unregistered, potentially tying up resources and
|
||
causing other queries to wait in admission control.</p>
|
||
<p>When the query result spooling feature is enabled, the result sets of
|
||
queries are eagerly fetched and buffered until they are read by the
|
||
client, and resources are freed up for other queries.</p>
|
||
<p>See <xref href="impala_query_results_spooling.xml#data_sink"/> for
|
||
the new feature and the query options.</p>
|
||
</section>
|
||
<section id="impala-8584">
|
||
<title>Cookie-based Authentication</title>
|
||
<p>Starting in this version, Impala supports cookies for authentication
|
||
when clients connect via HiveServer2 over HTTP. </p>
|
||
<p>You can use the <codeph>--max_cookie_lifetime_s startup</codeph> flag
|
||
to:</p>
|
||
<ul>
|
||
<li>Disable the use of cookies</li>
|
||
<li>Control how long generated cookies are valid for</li>
|
||
</ul>
|
||
<p>See <xref href="impala_client.xml#intro_client"/> for more
|
||
information.</p>
|
||
</section>
|
||
<section id="section_hw4_nmw_pjb">
|
||
<title>Object Ownership Support</title>
|
||
<p>Object ownership for tables, views, and databases is enabled by
|
||
default in Impala. When you create a database, a table, or a view, as
|
||
the owner of that object, you implicitly have the privileges on the
|
||
object. The privileges that owners have are specified in Ranger on the
|
||
special user, <codeph>{OWNER}</codeph>. </p>
|
||
<p>The <codeph>{OWNER}</codeph> user must be defined in Ranger for the
|
||
object ownership privileges work in Impala.</p>
|
||
<p>See <xref href="impala_authorization.xml#authorization"/> for
|
||
details.</p>
|
||
</section>
|
||
<section id="impala-8752">
|
||
<title>New Built-in Functions for Fuzzy Matching of Strings</title>
|
||
<p>Use the new Jaro or Jaro-Winkler functions to perform fuzzy matches
|
||
on relatively short strings, e.g. to scrub user inputs of names
|
||
against the records in the database.</p>
|
||
<ul>
|
||
<li><codeph>JARO_DISTANCE</codeph>, <codeph>JARO_DST</codeph></li>
|
||
<li><codeph>JARO_SIMILARITY</codeph>, <codeph>JARO_SIM</codeph></li>
|
||
<li><codeph>JARO_WINKLER_DISTANCE</codeph>,
|
||
<codeph>JW_DST</codeph></li>
|
||
<li><codeph>JARO_WINKLER_SIMILARITY</codeph>,
|
||
<codeph>JW_SIM</codeph></li>
|
||
</ul>
|
||
<p>See <xref href="impala_string_functions.xml#string_functions"/> for
|
||
details.</p>
|
||
</section>
|
||
<section id="impala-8376">
|
||
<title>Capacity Quota for Scratch Disks</title>
|
||
<p>When configuring scratch space for intermediate files used in large
|
||
sorts, joins, aggregations, or analytic function operations, use the
|
||
<codeph>‑‑scratch_dirs</codeph> startup flag to optionally specify a
|
||
capacity quota per scratch directory, e.g.,
|
||
<codeph>‑‑scratch_dirs=/dir1:5MB,/dir2</codeph>.</p>
|
||
<p>See <xref href="impala_file_formats.xml#file_formats"/> for
|
||
details.</p>
|
||
</section>
|
||
<section id="impala-8913">
|
||
<title>Query Option for Disabling HBase Row Estimation</title>
|
||
<p>During query plan generation, Impala samples underlying HBase tables
|
||
to estimate row count and row size, but the sampling process can
|
||
negatively impact the planning time. To alleviate the issue, when the
|
||
HBase table stats do not change much in a short time, disable the
|
||
sampling with the <codeph>DISABLE_HBASE_NUM_ROWS_ESTIMATE</codeph>
|
||
query option so that the Impala planner falls back to using Hive
|
||
Metastore (HMS) table stats instead. </p>
|
||
<p>See <xref
|
||
href="impala_disable_hbase_num_rows_estimate.xml#disable_hbase_num_rows_estimate"
|
||
/>.</p>
|
||
</section>
|
||
<section id="impala-8942">
|
||
<title>Query Option for Controlling Size of Parquet Splits on Non-block
|
||
Stores</title>
|
||
<p>To optimize query performance, Impala planner uses the value of the
|
||
<codeph>fs.s3a.block.size</codeph> startup flag when calculating the
|
||
split size on non-block based stores, e.g. S3, ADLS, etc. Starting in
|
||
this release, Impala planner uses the
|
||
<codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option to get
|
||
the Parquet file format specific split size. </p>
|
||
<p>For Parquet files, the <codeph>fs.s3a.block.size</codeph> startup
|
||
flag is no longer used.</p>
|
||
<p>The default value of the
|
||
<codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option is 256
|
||
MB.</p>
|
||
<p>See <xref href="impala_s3.xml#s3"/> for tuning Impala query
|
||
performance for S3.</p>
|
||
</section>
|
||
<section id="impala-5149">
|
||
<title>Query Profile Exported to JSON</title>
|
||
<p>On the Query Details page of Impala Daemon Web UI, you have a new
|
||
option, in addition to the existing Thrift and Text formats, to export
|
||
the query profile output in the JSON format.</p>
|
||
<p>See <xref href="impala_webui.xml#webui"/> for generating JSON query
|
||
profile outputs in Web UI.</p>
|
||
</section>
|
||
<section id="section_rnb_ny4_yjb">
|
||
<title>DATE Data Type Supported in Avro Tables</title>
|
||
<p>You can now use the <codeph>DATE</codeph> data type to query date
|
||
values from Avro tables.</p>
|
||
<p>See <xref href="impala_avro.xml#avro"/> for details.</p>
|
||
</section>
|
||
<section>
|
||
<title>Primary Key and Foreign Key Constraints</title>
|
||
<p>This release adds support for primary and foreign key constraints,
|
||
but in this release the constraints are advisory and intended for
|
||
estimating cardinality during query planning in a future release.
|
||
There is no attempt to enforce constraints. See <xref
|
||
href="impala_create_table.xml"/> for details. </p>
|
||
</section>
|
||
<section>
|
||
<title>Enhanced External Kudu Table</title>
|
||
<p>By default HMS implicitly translates internal Kudu tables to external
|
||
Kudu tables with the 'external.table.purge' property set to true. These
|
||
tables behave similar to internal tables. You can explicitly create such
|
||
external Kudu tables. See <xref href="impala_create_table.xml"/>
|
||
for details.</p>
|
||
</section>
|
||
<section>
|
||
<title>Ranger Column Masking</title>
|
||
<p>This release supports Ranger column masking, which hides sensitive columnar
|
||
data in Impala query output. For example, you can define a policy that reveals
|
||
only the first or last four characters of column data. Column masking is enabled
|
||
by default. See <xref href="impala_authorization.xml#sec_ranger_col_masking"/>
|
||
for details.</p>
|
||
</section>
|
||
<section>
|
||
<title>BROADCAST_BYTES_LIMIT query option</title>
|
||
<p>You can set the default limit for the size of the broadcast input. Such a limit
|
||
can prevent possible performance problems.</p>
|
||
<!--Add link to details after file is published.-->
|
||
</section>
|
||
<section>
|
||
<title>Experimental Support for Apache Hudi</title>
|
||
<p>In this release, you can use Read Optimized Queries on Hudi tables. See
|
||
<xref href="impala_hudi.xml"/> for details. </p>
|
||
</section>
|
||
<section>
|
||
<title>ORC Reads Enabled by Default</title>
|
||
<p>Impala stability and performance have been improved. Consequently, ORC reads are now
|
||
enabled in Impala by default. To disable, set <codeph>--enable_orc_scanner</codeph> to
|
||
<codeph>false</codeph> when starting the cluster. See <xref href="impala_orc.xml"/> for
|
||
details.</p>
|
||
</section>
|
||
<section>
|
||
<title>Support for ZSTD and DEFLATE</title>
|
||
<p>This release supports ZSTD and DEFLATE compression codecs for text files. See
|
||
<xref href="impala_txtfile.xml#gzip"/> for details.</p>
|
||
</section>
|
||
</conbody>
|
||
</concept>
|
||
<concept rev="3.3.0" id="new_features_33">
|
||
<title>New Features in <keyword keyref="impala33"/></title>
|
||
<conbody>
|
||
<p> The following sections describe the noteworthy improvements made in
|
||
<keyword keyref="impala33"/>. </p>
|
||
<p> For the full list of issues closed in this release, see the <xref
|
||
keyref="changelog_33">changelog for <keyword keyref="impala33"
|
||
/></xref>. </p>
|
||
<section id="section_ezf_tnq_s3b">
|
||
<title>Increased Compatibility with Apache Projects</title>
|
||
<p>Impala is integrate with the following components:<ul>
|
||
<li dir="ltr">
|
||
<p dir="ltr">Apache Ranger: Use Apache Ranger to manage
|
||
authorization in Impala. See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_authorization.html"
|
||
format="html" scope="external"><u>Impala
|
||
Authorization</u></xref> for details.</p>
|
||
</li>
|
||
<li dir="ltr">
|
||
<p dir="ltr">Apache Atlas: Use Apache Atlas to manage data
|
||
governance in Impala.</p>
|
||
</li>
|
||
<li dir="ltr">
|
||
<p dir="ltr">Hive 3</p>
|
||
</li>
|
||
</ul></p>
|
||
</section>
|
||
<section id="section_ys5_k4n_t3b">
|
||
<title>Parquet Page Index </title>
|
||
<p>To improve performance when using Parquet files, Impala can now write
|
||
page indexes in Parquet files and use those indexes to skip pages for
|
||
the faster scan.</p>
|
||
<p>See <xref href="impala_parquet.xml#parquet_performance"/> for
|
||
details.</p>
|
||
</section>
|
||
<section id="section_zs5_k4n_t3b">
|
||
<title>The Remote File Handle Cache Supports S3</title>
|
||
<p>Impala can now cache remote HDFS file handles when the tables that
|
||
store their data in Amazon S3 cloud storage.</p>
|
||
<p>See <xref href="impala_scalability.xml#scalability_file_handle_cache"
|
||
/> for the information on remote file handle cache.</p>
|
||
</section>
|
||
<section id="section_jls_hxj_s3b">
|
||
<title>Support for Kudu Integrated with Hive Metastore</title>
|
||
<p>In Impala 3.3 and Kudu 1.10, Kudu is integrated with Hive Metastore
|
||
(HMS), and from Impala, you can create, update, delete, and query the
|
||
tables in the Kudu services integrated with HMS.</p>
|
||
<p>See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_kudu.html"
|
||
format="html" scope="external">Using Kudu with Impala</xref> for
|
||
information on using Kudu tables in Impala.</p>
|
||
</section>
|
||
<section id="section_dp4_mxj_s3b">
|
||
<title>Zstd Compression for Parquet files</title>
|
||
<p>Zstandard (Zstd) is a real-time compression algorithm offering a
|
||
tradeoff between speed and ratio of compression. Compression levels
|
||
from 1 up to 22 are supported. The lower the level, the faster the
|
||
speed at the cost of compression ratio.</p>
|
||
</section>
|
||
<section id="section_parquet_lz4_notes">
|
||
<title>Lz4 Compression for Parquet files</title>
|
||
<p>Lz4 is a lossless compression algorithm providing extremely fast
|
||
and scalable compression and decompression.</p>
|
||
</section>
|
||
<section id="section_drv_nxj_s3b">
|
||
<title>Data Cache for Remote Reads</title>
|
||
<p>To improve performance on multi-cluster HDFS environments as well as
|
||
on object store environments, Impala now caches data for non-local
|
||
reads (e.g. S3, ABFS, ADLS) on local storage.</p>
|
||
<p>The data cache is enabled with the <codeph>--data_cache
|
||
startup</codeph> flag.</p>
|
||
<p>See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_data_cache.html"
|
||
format="html" scope="external">Impala Remote Data Cache</xref> for
|
||
the information and steps to enable remote data cache.</p>
|
||
</section>
|
||
<section id="section_xp4_b1f_t3b">
|
||
<title>Metadata Performance Improvements </title>
|
||
<p>The following features to improve metadata performance are enabled by
|
||
default in this release:</p>
|
||
<ul>
|
||
<li>
|
||
<p>Incremental stats are now compressed in memory in
|
||
<codeph>catalogd</codeph>, reducing memory footprint in
|
||
<codeph>catalogd</codeph>.</p>
|
||
</li>
|
||
<li>
|
||
<p><codeph>impalad</codeph>coordinators fetch incremental stats from
|
||
<codeph>catalogd</codeph> on-demand, reducing the memory
|
||
footprint and the network requirements for broadcasting
|
||
metadata.</p>
|
||
</li>
|
||
<li>
|
||
<p>Time-based and memory-based automatic invalidation of metadata to
|
||
keep the size of metadata bounded and to reduce the chances of
|
||
<codeph>catalogd</codeph>cache running out of memory.</p>
|
||
</li>
|
||
<li>
|
||
<p>Automatic invalidation of metadata</p>
|
||
<p>With automatic metadata management enabled, you no longer have to
|
||
issue <codeph>INVALIDATE</codeph> / <codeph>REFRESH</codeph> in a
|
||
number of conditions.</p>
|
||
<p>In Impala 3.3, the following additional event in Hive Metastore
|
||
can trigger automatic INVALIDATE / REFRESH of Metadata:</p>
|
||
<ul>
|
||
<li>
|
||
<p>INSERT into tables and partitions from Impala or from Spark
|
||
on the same or multiple cluster configuration</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<p>See <xref href="impala_metadata.xml#impala_metadata"/> for the
|
||
information on the above features.</p>
|
||
</section>
|
||
<section id="section_ztf_c4q_s3b">
|
||
<title>Scalable Pool Configuration in Admission Controller</title>
|
||
<p>To offer more dynamic and flexible resource management, Impala
|
||
supports the new configuration parameters that scale with the number
|
||
of hosts in the resource pool. You can use the parameters to control
|
||
the number of running queries, queued queries, and maximum amount of
|
||
memory allocated for Impala resource pools. See <xref
|
||
href="impala_admission.xml#admission_control"/> for the information
|
||
about the new parameters and using them for admission control.</p>
|
||
</section>
|
||
<section id="section_b55_gxj_s3b">
|
||
<title>Query Profile</title>
|
||
<p>The following information was added to the Query Profile output for
|
||
better monitoring and troubleshooting of query performance.</p>
|
||
<ul>
|
||
<li>
|
||
<p>Network I/O throughput</p>
|
||
</li>
|
||
<li>
|
||
<p>System disk I/O throughput</p>
|
||
</li>
|
||
</ul>
|
||
<p>See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_explain_plan.html"
|
||
format="html" scope="external">Impala Query Profile</xref> for
|
||
generating and reading query profile.</p>
|
||
</section>
|
||
<section id="section_lbh_kzj_s3b">
|
||
<title>DATE Data Type and Functions</title>
|
||
<p>You can use the new the DATE type to describe a particular
|
||
year/month/day, in the form YYYY-MM-DD.</p>
|
||
<p>This initial DATE type support the TEXT, Parquet, and HBASE file
|
||
formats.</p>
|
||
<p>The support of DATE data type includes the following features:</p>
|
||
<ul>
|
||
<li><codeph>DATE</codeph> type column as a partitioning key
|
||
column</li>
|
||
<li><codeph>DATE</codeph> literal</li>
|
||
<li>Implicit casting between <codeph>DATE</codeph> and other types:
|
||
<codeph>STRING</codeph> and <codeph>TIMESTAMP</codeph></li>
|
||
<li>Most of the built-in functions for <codeph>TIMESTAMP</codeph> now
|
||
allow the <codeph>DATE</codeph> type arguments, as well.</li>
|
||
</ul>
|
||
<p>See <xref href="impala_date.xml#date"/> and <xref
|
||
href="impala_datetime_functions.xml#datetime_functions"/> for using
|
||
the DATE type.</p>
|
||
</section>
|
||
<section id="section_wpm_zzj_s3b">
|
||
<title>Support Hive Insert-Only Transactional Tables</title>
|
||
<p>Impala added the support to create, drop, query, and insert into the
|
||
insert-only type of transactional tables. </p>
|
||
</section>
|
||
<section>
|
||
<p>See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_transactions.html"
|
||
format="html" scope="external">Impala Transactions</xref> for
|
||
details.</p>
|
||
</section>
|
||
<section id="section_ab2_41k_s3b">
|
||
<title>HiveServer2 HTTP Connection for Clients</title>
|
||
<p>Now client applications can connect to Impala over HTTP via
|
||
HiveServer2 with the option to use the Kerberos SPNEGO and LDAP for
|
||
authentication. See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_client.html"
|
||
format="html" scope="external">Impala Clients</xref> for
|
||
details.</p>
|
||
</section>
|
||
<section id="section_xxt_44q_s3b">
|
||
<title>Default File Format Changed to Parquet</title>
|
||
<p>When you create a table, the default format for that table data is
|
||
now Parquet.</p>
|
||
<p>For backward compatibility, you can use the
|
||
<codeph>DEFAULT_FILE_FORMAT</codeph> query option to set the default
|
||
file format to the previous default, text, or other formats.</p>
|
||
</section>
|
||
<section id="section_m1h_mnf_t3b">
|
||
<title>Built-in Function to Process JSON Objects</title>
|
||
<p>The <codeph>GET_JSON_OBJECT()</codeph> function extracts JSON object
|
||
from a string based on the path specified and returns the extracted
|
||
JSON object.</p>
|
||
<p>See <xref href="impala_misc_functions.xml#misc_functions">Impala
|
||
Miscellaneous Functions</xref>. for details.</p>
|
||
</section>
|
||
<section id="section_acs_wck_s3b">
|
||
<title>Ubuntu 18.04</title>
|
||
<p>This version of Impala is certified to run on Ubuntu 18.04.</p>
|
||
</section>
|
||
</conbody>
|
||
</concept>
|
||
<concept rev="3.2.0" id="new_features_32">
|
||
<title>New Features in <keyword keyref="impala32"/></title>
|
||
<conbody>
|
||
<p> The following sections describe the noteworthy improvements made in
|
||
<keyword keyref="impala32"/>. </p>
|
||
<p> For the full list of issues closed in this release, see the <xref
|
||
keyref="changelog_32">changelog for <keyword keyref="impala32"
|
||
/></xref>. </p>
|
||
</conbody>
|
||
<concept id="rn_32_multi_cluster">
|
||
<title>Multi-cluster Support</title>
|
||
<conbody>
|
||
<ul>
|
||
<li dir="ltr">Remote File Handle Cache<p>Impala can now cache remote
|
||
HDFS file handles when the
|
||
<codeph>cache_remote_file_handles</codeph> impalad flag is set
|
||
to <codeph>true</codeph>. This feature does not apply to non-HDFS
|
||
tables, such as Kudu or HBase tables, and does not apply to the
|
||
tables that store their data on cloud services, such as S3 or
|
||
ADLS. See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_scalability.html"
|
||
format="html" scope="external">Scalabilty Considerations</xref>
|
||
for file handle caching in Impala.</p></li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="rn_32_ac">
|
||
<title>Enhancements in Resource Management and Admission Control</title>
|
||
<conbody>
|
||
<ul>
|
||
<li>Admission Debug page is available in <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_webui.html"
|
||
format="html" scope="external">Impala Daemon (impalad) web
|
||
UI</xref> at <codeph>\admission</codeph> and provides the
|
||
following information about Impala resource pools:<ul>
|
||
<li>Pool configuration</li>
|
||
<li>Relevant pool stats</li>
|
||
<li>Queued queries in order of being queued (local to the
|
||
coordinator)</li>
|
||
<li>Running queries (local to this coordinator)</li>
|
||
<li>Histogram of the distribution of peak memory usage by admitted
|
||
queries</li>
|
||
</ul></li>
|
||
</ul>
|
||
<ul>
|
||
<li>A new query option, <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_num_rows_produced_limit.html"
|
||
format="html" scope="external">NUM_ROWS_PRODUCED_LIMIT</xref>, was
|
||
added to limit the number of rows returned from queries.<p>Impala
|
||
will cancel a query if the query produces more rows than the limit
|
||
specified by this query option. The limit applies only when the
|
||
results are returned to a client, e.g. for a
|
||
<codeph>SELECT</codeph> query, but not an
|
||
<codeph>INSERT</codeph> query. This query option is a guardrail
|
||
against users accidentally submitting queries that return a large
|
||
number of rows.</p></li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="rn_32_metadata">
|
||
<title>Metadata Performance Improvements</title>
|
||
<conbody>
|
||
<ul>
|
||
<li><xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_metadata.html"
|
||
format="html" scope="external">Automatic Metadata Sync using Hive
|
||
Metastore Notification Events</xref><p>When enabled, the
|
||
<codeph>catalogd</codeph> polls Hive Metastore (HMS)
|
||
notifications events at a configurable interval and syncs with
|
||
HMS. You can use the new web UI pages of the
|
||
<codeph>catalogd</codeph> to check the state of the automatic
|
||
invalidate event processor. </p><p><b>Note</b>: This is a preview
|
||
feature in <keyword keyref="impala32">Impala
|
||
3.2</keyword>.</p></li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="rn_32_usability">
|
||
<title>Compatibility and Usability Enhancements</title>
|
||
<conbody>
|
||
<ul>
|
||
<li>Impala can now read the <codeph>TIMESTAMP_MILLIS</codeph> and
|
||
<codeph>TIMESTAMP_MICROS</codeph> Parquet types. See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_parquet.html"
|
||
format="html" scope="external">Using Parquet File Format for
|
||
Impala Tables</xref> for the Parquet support in Impala.</li>
|
||
<li>Impala can now read the complex types in ORC such as ARRAY,
|
||
STRUCT, and MAP. See <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_orc.html"
|
||
format="html" scope="external">Using ORC File Format for Impala
|
||
Tables</xref> for the ORC support in Impala.</li>
|
||
<li>The <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_string_functions.html"
|
||
format="html" scope="external">LEVENSHTEIN</xref> string function
|
||
is supported.<p>The function returns the Levenshtein distance
|
||
between two input strings, the minimum number of single-character
|
||
edits required to transform one string to other.</p></li>
|
||
<li>The <codeph>IF NOT EXISTS</codeph> clause is supported in the
|
||
<xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_alter_table.html"
|
||
format="html" scope="external"><codeph>ALTER TABLE</codeph></xref>
|
||
statement.</li>
|
||
<li>The new <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html"
|
||
format="html" scope="external"
|
||
><codeph>DEFAULT_FILE_FORMAT</codeph></xref> query option allows
|
||
you to set the default table file format. This removes the need for
|
||
the <codeph>STORED AS <format></codeph> clause. Set this option
|
||
if you prefer a value that is not <codeph>TEXT</codeph>. The
|
||
supported formats are: <ul>
|
||
<li><codeph>TEXT</codeph></li>
|
||
<li><codeph>RC_FILE</codeph></li>
|
||
<li><codeph>SEQUENCE_FILE</codeph></li>
|
||
<li><codeph>AVRO</codeph></li>
|
||
<li><codeph>PARQUET</codeph></li>
|
||
<li><codeph>KUDU</codeph></li>
|
||
<li><codeph>ORC</codeph></li>
|
||
</ul></li>
|
||
<li>The extended or verbose <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_explain.html"
|
||
format="html" scope="external"><codeph>EXPLAIN</codeph></xref>
|
||
output includes the following new information for queries:<ul>
|
||
<li>The text of the analyzed query that may have been rewritten to
|
||
include various optimizations and implicit casts. </li>
|
||
<li>The implicit casts and literals shown with the actual
|
||
types.</li>
|
||
</ul></li>
|
||
<li>CPU resource utilization (user, system, iowait) metrics were added
|
||
to the <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_explain_plan.html"
|
||
format="html" scope="external">Impala profile</xref> output.</li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="rn_32_security">
|
||
<title><b id="docs-internal-guid-e1c558d3-7fff-4d4e-0ec1-e40f60c9b64a"
|
||
><b>Security Enhancement</b></b></title>
|
||
<conbody>
|
||
<ul>
|
||
<li>The <xref
|
||
href="https://impala.apache.org/docs/build/html/topics/impala_refresh_authorization.html"
|
||
format="html" scope="external">REFRESH AUTHORIZATION</xref>
|
||
statement was implemented for refreshing authorization data.</li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
</concept>
|
||
<!-- All 3.1.x new features go under here -->
|
||
<concept rev="3.1.0" id="new_features_31">
|
||
<title>New Features in <keyword keyref="impala31"/></title>
|
||
<conbody>
|
||
<p> For the full list of issues closed in this release, including the
|
||
issues marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="changelog_31">changelog for <keyword keyref="impala31"
|
||
/></xref>. </p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 3.0.x new features go under here -->
|
||
<concept rev="3.0.0" id="new_features_300">
|
||
<title>New Features in <keyword keyref="impala30"/></title>
|
||
<conbody>
|
||
<p>
|
||
For the full list of issues closed in this release, including the
|
||
issues marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="changelog_300">changelog for <keyword keyref="impala30"
|
||
/></xref>.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.12.x new features go under here -->
|
||
|
||
<concept rev="2.12.0" id="new_features_2120">
|
||
|
||
<title>New Features in <keyword keyref="impala212_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For the full list of issues closed in this release, including the issues
|
||
marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="changelog_212">changelog for <keyword keyref="impala212"/></xref>.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.11.x new features go under here -->
|
||
|
||
<concept rev="2.11.0" id="new_features_2110">
|
||
|
||
<title>New Features in <keyword keyref="impala211_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For the full list of issues closed in this release, including the issues
|
||
marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="changelog_211">changelog for <keyword keyref="impala211"/></xref>.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.10.x new features go under here -->
|
||
|
||
<concept rev="2.10.0" id="new_features_2100">
|
||
|
||
<title>New Features in <keyword keyref="impala210_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For the full list of issues closed in this release, including the issues
|
||
marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="changelog_210">changelog for <keyword keyref="impala210"/></xref>.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.9.x new features go under here -->
|
||
|
||
<concept rev="2.9.0" id="new_features_290">
|
||
|
||
<title>New Features in <keyword keyref="impala29_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For the full list of issues closed in this release, including the issues
|
||
marked as <q>new features</q> or <q>improvements</q>, see the
|
||
<xref keyref="changelog_29">changelog for <keyword keyref="impala29"/></xref>.
|
||
</p>
|
||
|
||
<p>
|
||
The following are some of the most significant new features in this release:
|
||
</p>
|
||
|
||
<ul id="feature_list">
|
||
<li>
|
||
<p rev="IMPALA-4729">
|
||
A new function, <codeph>replace()</codeph>, which is faster than
|
||
<codeph>regexp_replace()</codeph> for simple string substitutions.
|
||
See <xref keyref="string_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503">
|
||
Startup flags for the <cmdname>impalad</cmdname> daemon, <codeph>is_executor</codeph>
|
||
and <codeph>is_coordinator</codeph>, let you divide the work on a large, busy cluster
|
||
between a small number of hosts acting as query coordinators, and a larger number of
|
||
hosts acting as query executors. By default, each host can act in both roles,
|
||
potentially introducing bottlenecks during heavily concurrent workloads.
|
||
See <xref keyref="scalability_coordinator"/> for details.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.8.x new features go under here -->
|
||
|
||
<concept rev="2.8.0" id="new_features_280">
|
||
|
||
<title>New Features in <keyword keyref="impala28_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<ul id="feature_list">
|
||
<li>
|
||
<p>
|
||
Performance and scalability improvements:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-4572">
|
||
The <codeph>COMPUTE STATS</codeph> statement can
|
||
take advantage of multithreading.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4135">
|
||
Improved scalability for highly concurrent loads by reducing the possibility of TCP/IP timeouts.
|
||
A configuration setting, <codeph>accepted_cnxn_queue_depth</codeph>, can be adjusted upwards to
|
||
avoid this type of timeout on large clusters.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Several performance improvements were made to the mechanism for generating native code:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-3638">
|
||
Some queries involving analytic functions can take better advantage of native code generation.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4008">
|
||
Modules produced during intermediate code generation are organized
|
||
to be easier to cache and reuse during the lifetime of a long-running or complicated query.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4397 IMPALA-1430">
|
||
The <codeph>COMPUTE STATS</codeph> statement is more efficient
|
||
(less time for the codegen phase) for tables with a large number
|
||
of columns, especially for tables containing <codeph>TIMESTAMP</codeph>
|
||
columns.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3838 IMPALA-4495">
|
||
The logic for determining whether or not to use a runtime filter is more reliable, and the
|
||
evaluation process itself is faster because of native code generation.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3902">
|
||
The <codeph>MT_DOP</codeph> query option enables
|
||
multithreading for a number of Impala operations.
|
||
<codeph>COMPUTE STATS</codeph> statements for Parquet tables
|
||
use a default of <codeph>MT_DOP=4</codeph> to improve the
|
||
intra-node parallelism and CPU efficiency of this data-intensive
|
||
operation.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4397">
|
||
The <codeph>COMPUTE STATS</codeph> statement is more efficient
|
||
(less time for the codegen phase) for tables with a large number
|
||
of columns.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2521">
|
||
A new hint, <codeph>CLUSTERED</codeph>,
|
||
allows Impala <codeph>INSERT</codeph> operations on a Parquet table
|
||
that use dynamic partitioning to process a high number of
|
||
partitions in a single statement. The data is ordered based on the
|
||
partition key columns, and each partition is only written
|
||
by a single host, reducing the amount of memory needed to buffer
|
||
Parquet data while the data blocks are being constructed.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3552">
|
||
The new configuration setting <codeph>inc_stats_size_limit_bytes</codeph>
|
||
lets you reduce the load on the catalog server when running the
|
||
<codeph>COMPUTE INCREMENTAL STATS</codeph> statement for very large tables.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1788">
|
||
Impala folds many constant expressions within query statements,
|
||
rather than evaluating them for each row. This optimization
|
||
is especially useful when using functions to manipulate and
|
||
format <codeph>TIMESTAMP</codeph> values, such as the result
|
||
of an expression such as <codeph>to_date(now() - interval 1 day)</codeph>.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4529">
|
||
Parsing of complicated expressions is faster. This speedup is
|
||
especially useful for queries containing large <codeph>CASE</codeph>
|
||
expressions.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4302">
|
||
Evaluation is faster for <codeph>IN</codeph> operators with many constant
|
||
arguments. The same performance improvement applies to other functions
|
||
with many constant arguments.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1286">
|
||
Impala optimizes identical comparison operators within multiple <codeph>OR</codeph>
|
||
blocks.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4193 IMPALA-3342">
|
||
The reporting for wall-clock times and total CPU time in profile output is more accurate.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3671">
|
||
A new query option, <codeph>SCRATCH_LIMIT</codeph>, lets you restrict the amount of
|
||
space used when a query exceeds the memory limit and activates the <q>spill to disk</q> mechanism.
|
||
This option helps to avoid runaway queries or make queries <q>fail fast</q> if they require more
|
||
memory than anticipated. You can prevent runaway queries from using excessive amounts of spill space,
|
||
without restarting the cluster to turn the spilling feature off entirely.
|
||
See <xref href="impala_scratch_limit.xml#scratch_limit"/> for details.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Integration with Apache Kudu:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="">
|
||
The experimental Impala support for the Kudu storage layer has been folded
|
||
into the main Impala development branch. Impala can now directly access Kudu tables,
|
||
opening up new capabilities such as enhanced DML operations and continuous ingestion.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
The <codeph>DELETE</codeph> statement is a flexible way to remove data from a Kudu table. Previously,
|
||
removing data from an Impala table involved removing or rewriting the underlying data files, dropping entire partitions,
|
||
or rewriting the entire table. This Impala statement only works for Kudu tables.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
The <codeph>UPDATE</codeph> statement is a flexible way to modify data within a Kudu table. Previously,
|
||
updating data in an Impala table involved replacing the underlying data files, dropping entire partitions,
|
||
or rewriting the entire table. This Impala statement only works for Kudu tables.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3725">
|
||
The <codeph>UPSERT</codeph> statement is a flexible way to ingest, modify, or both data within a Kudu table. Previously,
|
||
ingesting data that might contain duplicates involved an inefficient multi-stage operation, and there was no
|
||
built-in protection against duplicate data. The <codeph>UPSERT</codeph> statement, in combination with
|
||
the primary key designation for Kudu tables, lets you add or replace rows in a single operation, and
|
||
automatically avoids creating any duplicate data.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3719 IMPALA-3726">
|
||
The <codeph>CREATE TABLE</codeph> statement gains some new clauses that are specific to Kudu tables:
|
||
<codeph>PARTITION BY</codeph>, <codeph>PARTITIONS</codeph>, <codeph>STORED AS KUDU</codeph>, and column
|
||
attributes <codeph>PRIMARY KEY</codeph>, <codeph>NULL</codeph> and <codeph>NOT NULL</codeph>,
|
||
<codeph>ENCODING</codeph>, <codeph>COMPRESSION</codeph>, <codeph>DEFAULT</codeph>, and <codeph>BLOCK_SIZE</codeph>.
|
||
These clauses replace the explicit <codeph>TBLPROPERTIES</codeph> settings that were required in the
|
||
early experimental phases of integration between Impala and Kudu.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2890">
|
||
The <codeph>ALTER TABLE</codeph> statement can change certain attributes of Kudu tables.
|
||
You can add, drop, or rename columns.
|
||
You can add or drop range partitions.
|
||
You can change the <codeph>TBLPROPERTIES</codeph> value to rename or point to a different underlying Kudu table,
|
||
independently from the Impala table name in the metastore database.
|
||
You cannot change the data type of an existing column in a Kudu table.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4403">
|
||
The <codeph>SHOW PARTITIONS</codeph> statement displays information about the distribution of data
|
||
between partitions in Kudu tables. A new variation, <codeph>SHOW RANGE PARTITIONS</codeph>,
|
||
displays information about the Kudu-specific partitions that apply across ranges of key values.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4379">
|
||
Not all Impala data types are supported in Kudu tables. In particular, currently the Impala
|
||
<codeph>TIMESTAMP</codeph> type is not allowed in a Kudu table. Impala does not recognize the
|
||
<codeph>UNIXTIME_MICROS</codeph> Kudu type when it is present in a Kudu table. (These two
|
||
representations of date/time data use different units and are not directly compatible.)
|
||
You cannot create columns of type <codeph>TIMESTAMP</codeph>, <codeph>DECIMAL</codeph>,
|
||
<codeph>VARCHAR</codeph>, or <codeph>CHAR</codeph> within a Kudu table. Within a query, you can
|
||
cast values in a result set to these types. Certain types, such as <codeph>BOOLEAN</codeph>,
|
||
cannot be used as primary key columns.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
Currently, Kudu tables are not interchangeable between Impala and Hive the way other kinds of Impala tables are.
|
||
Although the metadata for Kudu tables is stored in the metastore database, currently Hive cannot access Kudu tables.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
The <codeph>INSERT</codeph> statement works for Kudu tables. The organization
|
||
of the Kudu data makes it more efficient than with HDFS-backed tables to insert
|
||
data in small batches, such as with the <codeph>INSERT ... VALUES</codeph> syntax.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4283">
|
||
Some audit data is recorded for data governance purposes.
|
||
All <codeph>UPDATE</codeph>, <codeph>DELETE</codeph>, and <codeph>UPSERT</codeph> statements are characterized
|
||
as <codeph>INSERT</codeph> operations in the audit log. Currently, lineage metadata is not generated for
|
||
<codeph>UPDATE</codeph> and <codeph>DELETE</codeph> operations on Kudu tables.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4000">
|
||
Currently, Kudu tables have limited support for Sentry:
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Access to Kudu tables must be granted to roles as usual.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Currently, access to a Kudu table through Sentry is <q>all or nothing</q>.
|
||
You cannot enforce finer-grained permissions such as at the column level,
|
||
or permissions on certain operations such as <codeph>INSERT</codeph>.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Only users with <codeph>ALL</codeph> privileges on <codeph>SERVER</codeph> can create external Kudu tables.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
Because non-SQL APIs can access Kudu data without going through Sentry
|
||
authorization, currently the Sentry support is considered preliminary.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4571">
|
||
Equality and <codeph>IN</codeph> predicates in Impala queries are pushed to
|
||
Kudu and evaluated efficiently by the Kudu storage layer.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
<b>Security:</b>
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Impala can take advantage of the S3 encrypted credential
|
||
store, to avoid exposing the secret key when accessing
|
||
data stored on S3.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1654">
|
||
[<xref keyref="IMPALA-1654">IMPALA-1654</xref>]
|
||
Several kinds of DDL operations
|
||
can now work on a range of partitions. The partitions can be specified
|
||
using operators such as <codeph><</codeph>, <codeph>>=</codeph>, and
|
||
<codeph>!=</codeph> rather than just an equality predicate applying to a single
|
||
partition.
|
||
This new feature extends the syntax of several clauses
|
||
of the <codeph>ALTER TABLE</codeph> statement
|
||
(<codeph>DROP PARTITION</codeph>, <codeph>SET [UN]CACHED</codeph>,
|
||
<codeph>SET FILEFORMAT | SERDEPROPERTIES | TBLPROPERTIES</codeph>),
|
||
the <codeph>SHOW FILES</codeph> statement, and the
|
||
<codeph>COMPUTE INCREMENTAL STATS</codeph> statement.
|
||
It does not apply to statements that are defined to only apply to a single
|
||
partition, such as <codeph>LOAD DATA</codeph>, <codeph>ALTER TABLE ... ADD PARTITION</codeph>,
|
||
<codeph>SET LOCATION</codeph>, and <codeph>INSERT</codeph> with a static
|
||
partitioning clause.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3973">
|
||
The <codeph>instr()</codeph> function has optional second and third arguments, representing
|
||
the character to position to begin searching for the substring, and the Nth occurrence
|
||
of the substring to find.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3441 IMPALA-4387">
|
||
Improved error handling for malformed Avro data. In particular, incorrect
|
||
precision or scale for <codeph>DECIMAL</codeph> types is now handled.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Impala debug web UI:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-1169">
|
||
In addition to <q>inflight</q> and <q>finished</q> queries, the web UI
|
||
now also includes a section for <q>queued</q> queries.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4048">
|
||
The <uicontrol>/sessions</uicontrol> tab now clarifies how many of the displayed
|
||
sections are active, and lets you sort by <uicontrol>Expired</uicontrol> status
|
||
to distinguish active sessions from expired ones.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-4020">
|
||
Improved stability when DDL operations such as <codeph>CREATE DATABASE</codeph>
|
||
or <codeph>DROP DATABASE</codeph> are run in Hive at the same time as an Impala
|
||
<codeph>INVALIDATE METADATA</codeph> statement.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1616">
|
||
The <q>out of memory</q> error report was made more user-friendly, with additional
|
||
diagnostic information to help identify the spot where the memory limit was exceeded.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3983 IMPALA-3974">
|
||
Improved disk space usage for Java-based UDFs. Temporary copies of the associated JAR
|
||
files are removed when no longer needed, so that they do not accumulate across restarts
|
||
of the <cmdname>catalogd</cmdname> daemon and potentially cause an out-of-space condition.
|
||
These temporary files are also created in the directory specified by the <codeph>local_library_dir</codeph>
|
||
configuration setting, so that the storage for these temporary files can be independent
|
||
from any capacity limits on the <filepath>/tmp</filepath> filesystem.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.7.x new features go under here -->
|
||
|
||
<concept rev="2.7.0" id="new_features_270">
|
||
|
||
<title>New Features in <keyword keyref="impala27_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<ul id="feature_list">
|
||
<li>
|
||
<p>
|
||
Performance improvements:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-3206">
|
||
[<xref keyref="IMPALA-3206">IMPALA-3206</xref>]
|
||
Speedup for queries against <codeph>DECIMAL</codeph> columns in Avro tables.
|
||
The code that parses <codeph>DECIMAL</codeph> values from Avro now uses
|
||
native code generation.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3674">
|
||
[<xref keyref="IMPALA-3674">IMPALA-3674</xref>]
|
||
Improved efficiency in LLVM code generation can reduce codegen time, especially
|
||
for short queries.
|
||
</p>
|
||
</li>
|
||
<!-- Not actually a new feature, it's more a tip about when to expect remote reads and how to minimize them. To go somewhere in the performance / best practices / Parquet info.
|
||
<li>
|
||
<p rev="IMPALA-3885">
|
||
[<xref keyref="IMPALA-3885">IMPALA-3885</xref>]
|
||
Parquet files with multiple blocks can now be processed
|
||
without remote reads.
|
||
</p>
|
||
</li>
|
||
-->
|
||
<li>
|
||
<p rev="IMPALA-2979">
|
||
[<xref keyref="IMPALA-2979">IMPALA-2979</xref>]
|
||
Improvements to scheduling on worker nodes,
|
||
enabled by the <codeph>REPLICA_PREFERENCE</codeph> query option.
|
||
See <xref
|
||
href="impala_replica_preference.xml#replica_preference"/> for details.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li audience="hidden">
|
||
<p rev="IMPALA-3210"><!-- Patch didn't make it into in <keyword keyref="impala27_full"/> -->
|
||
[<xref keyref="IMPALA-3210">IMPALA-3210</xref>]
|
||
The analytic functions <codeph>FIRST_VALUE()</codeph> and <codeph>LAST_VALUE()</codeph>
|
||
accept a new clause, <codeph>IGNORE NULLS</codeph>.
|
||
See <xref href="impala_analytic_functions.xml#first_value"/>
|
||
and <xref href="impala_analytic_functions.xml#last_value"/>
|
||
for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1683">
|
||
[<xref keyref="IMPALA-1683">IMPALA-1683</xref>]
|
||
The <codeph>REFRESH</codeph> statement can be applied to a single partition,
|
||
rather than the entire table. See <xref href="impala_refresh.xml#refresh"/>
|
||
and <xref href="impala_partitioning.xml#partition_refresh"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Improvements to the Impala web user interface:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-2767">
|
||
[<xref keyref="IMPALA-2767">IMPALA-2767</xref>]
|
||
You can now force a session to expire by clicking a link in the web UI,
|
||
on the <uicontrol>/sessions</uicontrol> tab.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3715">
|
||
[<xref keyref="IMPALA-3715">IMPALA-3715</xref>]
|
||
The <uicontrol>/memz</uicontrol> tab includes more information about
|
||
Impala memory usage.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3716">
|
||
[<xref keyref="IMPALA-3716">IMPALA-3716</xref>]
|
||
The <uicontrol>Details</uicontrol> page for a query now includes
|
||
a <uicontrol>Memory</uicontrol> tab.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3499">
|
||
[<xref keyref="IMPALA-3499">IMPALA-3499</xref>]
|
||
Scalability improvements to the catalog server. Impala handles internal communication
|
||
more efficiently for tables with large numbers of columns and partitions, where the
|
||
size of the metadata exceeds 2 GiB.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3677">
|
||
[<xref keyref="IMPALA-3677">IMPALA-3677</xref>]
|
||
You can send a <codeph>SIGUSR1</codeph> signal to any Impala-related daemon to write a
|
||
Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
|
||
without triggering a crash. See <xref href="impala_breakpad.xml#breakpad"/> for
|
||
details about the Breakpad minidump feature.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3687">
|
||
[<xref keyref="IMPALA-3687">IMPALA-3687</xref>]
|
||
The schema reconciliation rules for Avro tables have changed slightly
|
||
for <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> columns. Now, if
|
||
the definition of such a column is changed in the Avro schema file,
|
||
the column retains its <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph>
|
||
type as specified in the SQL definition, but the column name and comment
|
||
from the Avro schema file take precedence.
|
||
See <xref href="impala_avro.xml#avro_create_table"/> for details about
|
||
column definitions in Avro tables.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3575">
|
||
[<xref keyref="IMPALA-3575">IMPALA-3575</xref>]
|
||
Some network
|
||
operations now have additional timeout and retry settings. The extra
|
||
configuration helps avoid failed queries for transient network
|
||
problems, to avoid hangs when a sender or receiver fails in the
|
||
middle of a network transmission, and to make cancellation requests
|
||
more reliable despite network issues. </p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
</concept>
|
||
<!-- All 2.6.x new features go under here -->
|
||
|
||
<concept rev="2.6.0" id="new_features_260">
|
||
|
||
<title>New Features in <keyword keyref="impala26_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Improvements to Impala support for the Amazon S3 filesystem:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-1878">
|
||
Impala can now write to S3 tables through the <codeph>INSERT</codeph>
|
||
or <codeph>LOAD DATA</codeph> statements.
|
||
See <xref href="impala_s3.xml#s3"/> for general information about
|
||
using Impala with S3.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3452">
|
||
A new query option, <codeph>S3_SKIP_INSERT_STAGING</codeph>, lets you
|
||
trade off between fast <codeph>INSERT</codeph> performance and
|
||
slower <codeph>INSERT</codeph>s that are more consistent if a
|
||
problem occurs during the statement. The new behavior is enabled by default.
|
||
See <xref href="impala_s3_skip_insert_staging.xml#s3_skip_insert_staging"/> for details
|
||
about this option.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
Performance improvements for the runtime filtering feature:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-3333">
|
||
The default for the <codeph>RUNTIME_FILTER_MODE</codeph>
|
||
query option is changed to <codeph>GLOBAL</codeph> (the highest setting).
|
||
See <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode"/> for
|
||
details about this option.
|
||
</p>
|
||
</li>
|
||
<li rev="IMPALA-3007">
|
||
<p>
|
||
The <codeph>RUNTIME_BLOOM_FILTER_SIZE</codeph> setting is now only used
|
||
as a fallback if statistics are not available; otherwise, Impala
|
||
uses the statistics to estimate the appropriate size to use for each filter.
|
||
See <xref href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size"/> for
|
||
details about this option.
|
||
</p>
|
||
</li>
|
||
<li rev="IMPALA-3480">
|
||
<p>
|
||
New query options <codeph>RUNTIME_FILTER_MIN_SIZE</codeph> and
|
||
<codeph>RUNTIME_FILTER_MAX_SIZE</codeph> let you fine-tune
|
||
the sizes of the Bloom filter structures used for runtime filtering.
|
||
If the filter size derived from Impala internal estimates or from
|
||
the <codeph>RUNTIME_FILTER_BLOOM_SIZE</codeph> falls outside the size
|
||
range specified by these options, any too-small filter size is adjusted
|
||
to the minimum, and any too-large filter size is adjusted to the maximum.
|
||
See <xref href="impala_runtime_filter_min_size.xml#runtime_filter_min_size"/>
|
||
and <xref href="impala_runtime_filter_max_size.xml#runtime_filter_max_size"/>
|
||
for details about these options.
|
||
</p>
|
||
</li>
|
||
<li rev="IMPALA-2956">
|
||
<p>
|
||
Runtime filter propagation now applies to all the
|
||
operands of <codeph>UNION</codeph> and <codeph>UNION ALL</codeph>
|
||
operators.
|
||
</p>
|
||
</li>
|
||
<li rev="IMPALA-3077">
|
||
<p>
|
||
Runtime filters can now be produced during join queries even
|
||
when the join processing activates the spill-to-disk mechanism.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
See <xref href="impala_runtime_filtering.xml#runtime_filtering"/> for
|
||
general information about the runtime filtering feature.
|
||
</li>
|
||
<!-- Have to look closer at resource management / admission control to see if
|
||
there are any ripple effects from this default change. -->
|
||
<li>
|
||
<p rev="IMPALA-3199">
|
||
Admission control and dynamic resource pools are enabled by default.
|
||
See <xref href="impala_admission.xml#admission_control"/> for details
|
||
about admission control.
|
||
</p>
|
||
</li>
|
||
<!-- Below here are features that are pretty well taken care of already;
|
||
some of them didn't need much if any doc in the first place. -->
|
||
<li>
|
||
<p rev="IMPALA-3369">
|
||
Impala can now manually set column statistics,
|
||
using the <codeph>ALTER TABLE</codeph> statement with a
|
||
<codeph>SET COLUMN STATS</codeph> clause.
|
||
See <xref href="impala_perf_stats.xml#perf_column_stats_manual"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3490 IMPALA-3581 IMPALA-2686">
|
||
Impala can now write lightweight <q>minidump</q> files, rather
|
||
than large core files, to save diagnostic information when
|
||
any of the Impala-related daemons crash. This feature uses the
|
||
open source <codeph>breakpad</codeph> framework.
|
||
See <xref href="impala_breakpad.xml#breakpad"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
New query options improve interoperability with Parquet files:
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-2835">
|
||
The <codeph>PARQUET_FALLBACK_SCHEMA_RESOLUTION</codeph> query option
|
||
lets Impala locate columns within Parquet files based on
|
||
column name rather than ordinal position.
|
||
This enhancement improves interoperability with applications
|
||
that write Parquet files with a different order or subset of
|
||
columns than are used in the Impala table.
|
||
See <xref href="impala_parquet_fallback_schema_resolution.xml#parquet_fallback_schema_resolution"/>
|
||
for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2069">
|
||
The <codeph>PARQUET_ANNOTATE_STRINGS_UTF8</codeph> query option
|
||
makes Impala include the <codeph>UTF-8</codeph> annotation
|
||
metadata for <codeph>STRING</codeph>, <codeph>CHAR</codeph>,
|
||
and <codeph>VARCHAR</codeph> columns in Parquet files created
|
||
by <codeph>INSERT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph>
|
||
statements.
|
||
See <xref href="impala_parquet_annotate_strings_utf8.xml#parquet_annotate_strings_utf8"/>
|
||
for details.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
See <xref href="impala_parquet.xml#parquet"/> for general information about working
|
||
with Parquet files.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Improvements to security and reduction in overhead for secure clusters:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-1928">
|
||
Overall performance improvements for secure clusters.
|
||
(TPC-H queries on a secure cluster were benchmarked
|
||
at roughly 3x as fast as the previous release.)
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2660">
|
||
Impala now recognizes the <codeph>auth_to_local</codeph> setting,
|
||
specified through the HDFS configuration setting
|
||
<codeph>hadoop.security.auth_to_local</codeph>.
|
||
This feature is disabled by default; to enable it,
|
||
specify <codeph>--load_auth_to_local_rules=true</codeph>
|
||
in the <cmdname>impalad</cmdname> configuration settings.
|
||
See <xref href="impala_kerberos.xml#auth_to_local"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2599">
|
||
Timing improvements in the mechanism for the <cmdname>impalad</cmdname>
|
||
daemon to acquire Kerberos tickets. This feature spreads out the overhead
|
||
on the KDC during Impala startup, especially for large clusters.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3554">
|
||
For Kerberized clusters, the Catalog service now uses
|
||
the Kerberos principal instead of the operating sytem user that runs
|
||
the <cmdname>catalogd</cmdname> daemon.
|
||
This eliminates the requirement to configure a <codeph>hadoop.user.group.static.mapping.overrides</codeph>
|
||
setting to put the OS user into the Sentry administrative group, on clusters where the principal
|
||
and the OS user name for this user are different.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3286">
|
||
Overall performance improvements for join queries, by using a prefetching mechanism
|
||
while building the in-memory hash table to evaluate join predicates.
|
||
See <xref href="impala_prefetch_mode.xml#prefetch_mode"/> for the query option
|
||
to control this optimization.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3397">
|
||
The <cmdname>impala-shell</cmdname> interpreter has a new command,
|
||
<codeph>SOURCE</codeph>, that lets you run a set of SQL statements
|
||
or other <cmdname>impala-shell</cmdname> commands stored in a file.
|
||
You can run additional <codeph>SOURCE</codeph> commands from inside
|
||
a file, to set up flexible sequences of statements for use cases
|
||
such as schema setup, ETL, or reporting.
|
||
See <xref href="impala_shell_commands.xml#shell_commands"/> for details
|
||
and <xref href="impala_shell_running_commands.xml#shell_running_commands"/>
|
||
for examples.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1772">
|
||
The <codeph>millisecond()</codeph> built-in function lets you extract
|
||
the fractional seconds part of a <codeph>TIMESTAMP</codeph> value.
|
||
See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3092">
|
||
If an Avro table is created without column definitions in the
|
||
<codeph>CREATE TABLE</codeph> statement, and columns are later
|
||
added through <codeph>ALTER TABLE</codeph>, the resulting
|
||
table is now queryable. Missing values from the newly added
|
||
columns now default to <codeph>NULL</codeph>.
|
||
See <xref href="impala_avro.xml#avro"/> for general details about
|
||
working with Avro files.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
The mechanism for interpreting <codeph>DECIMAL</codeph> literals is
|
||
improved, no longer going through an intermediate conversion step
|
||
to <codeph>DOUBLE</codeph>:
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-3163">
|
||
Casting a <codeph>DECIMAL</codeph> value to <codeph>TIMESTAMP</codeph>
|
||
<codeph>DOUBLE</codeph> produces a more precise
|
||
value for the <codeph>TIMESTAMP</codeph> than formerly.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3439">
|
||
Certain function calls involving <codeph>DECIMAL</codeph> literals
|
||
now succeed, when formerly they failed due to lack of a function
|
||
signature with a <codeph>DOUBLE</codeph> argument.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
Faster runtime performance for <codeph>DECIMAL</codeph> constant
|
||
values, through improved native code generation for all combinations
|
||
of precision and scale.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
See <xref href="impala_decimal.xml#decimal"/> for details about the <codeph>DECIMAL</codeph> type.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3155">
|
||
Improved type accuracy for <codeph>CASE</codeph> return values.
|
||
If all <codeph>WHEN</codeph> clauses of the <codeph>CASE</codeph>
|
||
expression are of <codeph>CHAR</codeph> type, the final result
|
||
is also <codeph>CHAR</codeph> instead of being converted to
|
||
<codeph>STRING</codeph>.
|
||
See <xref href="impala_conditional_functions.xml#conditional_functions"/>
|
||
for details about the <codeph>CASE</codeph> function.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3232">
|
||
Uncorrelated queries using the <codeph>NOT EXISTS</codeph> operator
|
||
are now supported. Formerly, the <codeph>NOT EXISTS</codeph>
|
||
operator was only available for correlated subqueries.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2736">
|
||
Improved performance for reading Parquet files.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3375">
|
||
Improved performance for <term>top-N</term> queries, that is,
|
||
those including both <codeph>ORDER BY</codeph> and
|
||
<codeph>LIMIT</codeph> clauses.
|
||
</p>
|
||
</li>
|
||
<!-- JIRA still in open state as of 5.8 / 2.6, commenting out.
|
||
<li>
|
||
<p rev="IMPALA-3471">
|
||
A top-N query can now also activate the spill-to-disk mechanism if
|
||
a host runs low on memory while evaluating it. For example, using
|
||
large <codeph>LIMIT</codeph> and/or <codeph>OFFSET</codeph> clauses
|
||
adds some memory overhead that could cause spilling.
|
||
</p>
|
||
</li>
|
||
-->
|
||
<li>
|
||
<p rev="IMPALA-1740">
|
||
Impala optionally skips an arbitrary number of header lines from text input
|
||
files on HDFS based on the <codeph>skip.header.line.count</codeph> value
|
||
in the <codeph>TBLPROPERTIES</codeph> field of the table metadata.
|
||
See <xref href="impala_txtfile.xml#text_data_files"/> for details.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2336">
|
||
Trailing comments are now allowed in queries processed by
|
||
the <cmdname>impala-shell</cmdname> options <codeph>-q</codeph>
|
||
and <codeph>-f</codeph>.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2844">
|
||
Impala can run <codeph>COUNT</codeph> queries for RCFile tables
|
||
that include complex type columns.
|
||
See <xref href="impala_complex_types.xml#complex_types"/> for
|
||
general information about working with complex types,
|
||
and <xref href="impala_array.xml#array"/>,
|
||
<xref href="impala_map.xml#map"/>, and <xref href="impala_struct.xml#struct"/>
|
||
for syntax details of each type.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.5.x new features go under here -->
|
||
|
||
<concept rev="2.5.0" id="new_features_250">
|
||
|
||
<title>New Features in <keyword keyref="impala25_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li><!-- Spec: https://docs.google.com/document/d/1ambtYJ1t05iITCVIrN6N1A-e7PZBSetBPgjy8SLzJrA/edit#heading=h.vcftzwlpn845 -->
|
||
<p rev="IMPALA-2552 IMPALA-3054">
|
||
Dynamic partition pruning. When a query refers to a partition key column in a <codeph>WHERE</codeph>
|
||
clause, and the exact set of column values are not known until the query is executed,
|
||
Impala evaluates the predicate and skips the I/O for entire partitions that are not needed.
|
||
For example, if a table was partitioned by year, Impala would apply this technique to a query
|
||
such as <codeph>SELECT c1 FROM partitioned_table WHERE year = (SELECT MAX(year) FROM other_table)</codeph>.
|
||
<ph audience="standalone">See <xref href="impala_partitioning.xml#dynamic_partition_pruning"/> for details.</ph>
|
||
</p>
|
||
<p>
|
||
The dynamic partition pruning optimization technique lets Impala avoid reading
|
||
data files from partitions that are not part of the result set, even when
|
||
that determination cannot be made in advance. This technique is especially valuable
|
||
when performing join queries involving partitioned tables. For example, if a join
|
||
query includes an <codeph>ON</codeph> clause and a <codeph>WHERE</codeph> clause
|
||
that refer to the same columns, the query can find the set of column values that
|
||
match the <codeph>WHERE</codeph> clause, and only scan the associated partitions
|
||
when evaluating the <codeph>ON</codeph> clause.
|
||
</p>
|
||
<p>
|
||
Dynamic partition pruning is controlled by the same settings as the runtime filtering feature.
|
||
By default, this feature is enabled at a medium level, because the maximum setting can use
|
||
slightly more memory for queries than in previous releases.
|
||
To fully enable this feature, set the query option <codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
|
||
</p>
|
||
</li>
|
||
<li><!-- Spec: https://docs.google.com/document/d/1ambtYJ1t05iITCVIrN6N1A-e7PZBSetBPgjy8SLzJrA/edit#heading=h.vcftzwlpn845 -->
|
||
<p rev="IMPALA-2419 IMPALA-3001 IMPALA-3008 IMPALA-3039 IMPALA-3046 IMPALA-3054">
|
||
Runtime filtering. This is a wide-ranging set of optimizations that are especially valuable for join queries.
|
||
Using the same technique as with dynamic partition pruning,
|
||
Impala uses the predicates from <codeph>WHERE</codeph> and <codeph>ON</codeph> clauses
|
||
to determine the subset of column values from one of the joined tables could possibly be part of the
|
||
result set. Impala sends a compact representation of the filter condition to the hosts in the cluster,
|
||
instead of the full set of values or the entire table.
|
||
<ph audience="PDF">See <xref href="impala_runtime_filtering.xml#runtime_filtering"/> for details.</ph>
|
||
</p>
|
||
<p>
|
||
By default, this feature is enabled at a medium level, because the maximum setting can use
|
||
slightly more memory for queries than in previous releases.
|
||
To fully enable this feature, set the query option <codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
|
||
<ph audience="PDF">See <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode"/> for details.</ph>
|
||
</p>
|
||
<p>
|
||
This feature involves some new query options:
|
||
<xref audience="standalone" href="impala_runtime_filter_mode.xml">RUNTIME_FILTER_MODE</xref><codeph audience="integrated">RUNTIME_FILTER_MODE</codeph>,
|
||
<xref audience="standalone" href="impala_max_num_runtime_filters.xml">MAX_NUM_RUNTIME_FILTERS</xref><codeph audience="integrated">MAX_NUM_RUNTIME_FILTERS</codeph>,
|
||
<xref audience="standalone" href="impala_runtime_bloom_filter_size.xml">RUNTIME_BLOOM_FILTER_SIZE</xref><codeph audience="integrated">RUNTIME_BLOOM_FILTER_SIZE</codeph>,
|
||
<xref audience="standalone" href="impala_runtime_filter_wait_time_ms.xml">RUNTIME_FILTER_WAIT_TIME_MS</xref><codeph audience="integrated">RUNTIME_FILTER_WAIT_TIME_MS</codeph>,
|
||
and <xref audience="standalone" href="impala_disable_row_runtime_filtering.xml">DISABLE_ROW_RUNTIME_FILTERING</xref><codeph audience="integrated">DISABLE_ROW_RUNTIME_FILTERING</codeph>.
|
||
<ph audience="PDF">See
|
||
<xref href="impala_runtime_filter_mode.xml#runtime_filter_mode">RUNTIME_FILTER_MODE</xref>,
|
||
<xref href="impala_max_num_runtime_filters.xml#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS</xref>,
|
||
<xref href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE</xref>,
|
||
<xref href="impala_runtime_filter_wait_time_ms.xml#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS</xref>, and
|
||
<xref href="impala_disable_row_runtime_filtering.xml#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING</xref>
|
||
for details.
|
||
</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2696">
|
||
More efficient use of the HDFS caching feature, to avoid
|
||
hotspots and bottlenecks that could occur if heavily used
|
||
cached data blocks were always processed by the same host.
|
||
By default, Impala now randomizes which host processes each cached
|
||
HDFS data block, when cached replicas are available on multiple hosts.
|
||
(Remember to use the <codeph>WITH REPLICATION</codeph> clause with the
|
||
<codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statement
|
||
when enabling HDFS caching for a table or partition, to cache the same
|
||
data blocks across multiple hosts.)
|
||
The new query option <codeph>SCHEDULE_RANDOM_REPLICA</codeph>
|
||
<!-- and <codeph>REPLICA_PREFERENCE</codeph> -->
|
||
lets you fine-tune the interaction with HDFS caching even more.
|
||
<ph audience="PDF">See <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2641">
|
||
The <codeph>TRUNCATE TABLE</codeph> statement now accepts an <codeph>IF EXISTS</codeph>
|
||
clause, making <codeph>TRUNCATE TABLE</codeph> easier to use in setup or ETL scripts where the table might or
|
||
might not exist.
|
||
<ph audience="PDF">See <xref href="impala_truncate_table.xml#truncate_table"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2681 IMPALA-2688 IMPALA-2749">
|
||
Improved performance and reliability for the <codeph>DECIMAL</codeph> data type:
|
||
<ul>
|
||
<li>
|
||
<p rev="IMPALA-2681">
|
||
Using <codeph>DECIMAL</codeph> values in a <codeph>GROUP BY</codeph> clause now
|
||
triggers the native code generation optimization, speeding up queries that
|
||
group by values such as prices.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2688">
|
||
Checking for overflow in <codeph>DECIMAL</codeph>
|
||
multiplication is now substantially faster, making <codeph>DECIMAL</codeph>
|
||
a more practical data type in some use cases where formerly <codeph>DECIMAL</codeph>
|
||
was much slower than <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2749">
|
||
Multiplying a mixture of <codeph>DECIMAL</codeph>
|
||
and <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph> values now returns the
|
||
<codeph>DOUBLE</codeph> rather than <codeph>DECIMAL</codeph>. This change avoids
|
||
some cases where an intermediate value would underflow or overflow and become
|
||
<codeph>NULL</codeph> unexpectedly.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
<ph audience="PDF">See <xref href="impala_decimal.xml"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2382">
|
||
For UDFs written in Java, or Hive UDFs reused for Impala,
|
||
Impala now allows parameters and return values to be primitive types.
|
||
Formerly, these things were required to be one of the <q>Writable</q>
|
||
object types.
|
||
<ph audience="PDF">See <xref href="impala_udf.xml#udfs_hive"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1588"><!-- This is from 2015, so perhaps it's really in an earlier release. -->
|
||
Performance improvements for HDFS I/O. Impala now caches HDFS file handles to avoid the
|
||
overhead of repeatedly opening the same file.
|
||
</p>
|
||
</li>
|
||
|
||
<!-- Kudu didn't make it into 2.5 / 5.7 release, so no DELETE or UPDATE statement. -->
|
||
<li>
|
||
<p><!-- Is there a JIRA for that one? Alex? -->
|
||
Performance improvements for queries involving nested complex types.
|
||
Certain basic query types, such as counting the elements of a complex column,
|
||
now use an optimized code path.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p rev="IMPALA-3044 IMPALA-2538 IMPALA-1168">
|
||
Improvements to the memory reservation mechanism for the Impala
|
||
admission control feature. You can specify more settings, such
|
||
as the timeout period and maximum aggregate memory used, for each
|
||
resource pool instead of globally for the Impala instance. The
|
||
default limit for concurrent queries (the <uicontrol>max requests</uicontrol>
|
||
setting) is now unlimited instead of 200.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p rev="IMPALA-1755">
|
||
Performance improvements related to code generation.
|
||
Even in queries where code generation is not performed
|
||
for some phases of execution (such as reading data from
|
||
Parquet tables), Impala can still use code generation in
|
||
other parts of the query, such as evaluating
|
||
functions in the <codeph>WHERE</codeph> clause.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1305">
|
||
Performance improvements for queries using aggregation functions
|
||
on high-cardinality columns.
|
||
Formerly, Impala could do unnecessary extra work to produce intermediate
|
||
results for operations such as <codeph>DISTINCT</codeph> or <codeph>GROUP BY</codeph>
|
||
on columns that were unique or had few duplicate values.
|
||
Now, Impala decides at run time whether it is more efficient to
|
||
do an initial aggregation phase and pass along a smaller set of intermediate data,
|
||
or to pass raw intermediate data back to next phase of query processing to be aggregated there.
|
||
This feature is known as <term>streaming pre-aggregation</term>.
|
||
In case of performance regression, this feature can be turned off
|
||
using the <codeph>DISABLE_STREAMING_PREAGGREGATIONS</codeph> query option.
|
||
<ph audience="PDF">See <xref href="impala_disable_streaming_preaggregations.xml#disable_streaming_preaggregations"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Spill-to-disk feature now always recommended. In earlier releases, the spill-to-disk feature
|
||
could be turned off using a pair of configuration settings,
|
||
<codeph>enable_partitioned_aggregation=false</codeph> and
|
||
<codeph>enable_partitioned_hash_join=false</codeph>.
|
||
The latest improvements in the spill-to-disk mechanism, and related features that
|
||
interact with it, make this feature robust enough that disabling it is now
|
||
no longer needed or supported. In particular, some new features in <keyword keyref="impala25_full"/>
|
||
and higher do not work when the spill-to-disk feature is disabled.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1067">
|
||
Improvements to scripting capability for the <cmdname>impala-shell</cmdname> command,
|
||
through user-specified substitution variables that can appear in statements processed
|
||
by <cmdname>impala-shell</cmdname>:
|
||
</p>
|
||
<ul>
|
||
<li rev="IMPALA-2179">
|
||
<p>
|
||
The <codeph>--var</codeph> command-line option lets you pass key-value pairs to
|
||
<cmdname>impala-shell</cmdname>. The shell can substitute the values
|
||
into queries before executing them, where the query text contains the notation
|
||
<codeph>${var:<varname>varname</varname>}</codeph>. For example, you might prepare a SQL file
|
||
containing a set of DDL statements and queries containing variables for
|
||
database and table names, and then pass the applicable names as part of the
|
||
<codeph>impala-shell -f <varname>filename</varname></codeph> command.
|
||
<ph audience="PDF">See <xref href="impala_shell_running_commands.xml#shell_running_commands"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li rev="IMPALA-2180">
|
||
<p>
|
||
The <codeph>SET</codeph> and <codeph>UNSET</codeph> commands within the
|
||
<cmdname>impala-shell</cmdname> interpreter now work with user-specified
|
||
substitution variables, as well as the built-in query options.
|
||
The two kinds of variables are divided in the <codeph>SET</codeph> output.
|
||
As with variables defined by the <codeph>--var</codeph> command-line option,
|
||
you refer to the user-specified substitution variables in queries by using
|
||
the notation <codeph>${var:<varname>varname</varname>}</codeph>
|
||
in the query text. Because the substitution variables are processed by
|
||
<cmdname>impala-shell</cmdname> instead of the <cmdname>impalad</cmdname>
|
||
backend, you cannot define your own substitution variables through the
|
||
<codeph>SET</codeph> statement in a JDBC or ODBC application.
|
||
<ph audience="PDF">See <xref href="impala_set.xml#set"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1599">
|
||
Performance improvements for query startup. Impala better parallelizes certain work
|
||
when coordinating plan distribution between <cmdname>impalad</cmdname> instances, which improves
|
||
startup time for queries involving tables with many partitions on large clusters,
|
||
or complicated queries with many plan fragments.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2560">
|
||
Performance and scalability improvements for tables with many partitions.
|
||
The memory requirements on the coordinator node are reduced, making it substantially
|
||
faster and less resource-intensive
|
||
to do joins involving several tables with thousands of partitions each.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-3095">
|
||
Whitelisting for access to internal APIs. For applications that need direct access
|
||
to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
|
||
specify a list of Kerberos users who are allowed to call those APIs. By default, the
|
||
<codeph>impala</codeph> and <codeph>hdfs</codeph> users are the only ones authorized
|
||
for this kind of access.
|
||
Any users not explicitly authorized through the <codeph>internal_principals_whitelist</codeph>
|
||
configuration setting are blocked from accessing the APIs. This setting applies to all the
|
||
Impala-related daemons, although currently it is primarily used for HDFS to control the
|
||
behavior of the catalog server.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="">
|
||
Improvements to Impala integration and usability for Hue. (The code changes
|
||
are actually on the Hue side.)
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p rev="">
|
||
The list of tables now refreshes dynamically.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1787">
|
||
Usability improvements for case-insensitive queries.
|
||
You can now use the operators <codeph>ILIKE</codeph> and <codeph>IREGEXP</codeph>
|
||
to perform case-insensitive wildcard matches or regular expression matches,
|
||
rather than explicitly converting column values with <codeph>UPPER</codeph>
|
||
or <codeph>LOWER</codeph>.
|
||
<ph audience="PDF">See <xref href="impala_operators.xml#ilike"/> and <xref href="impala_operators.xml#iregexp"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1480">
|
||
Performance and reliability improvements for DDL and insert operations on partitioned tables with a large
|
||
number of partitions. Impala only re-evaluates metadata for partitions that are affected by
|
||
a DDL operation, not all partitions in the table. While a DDL or insert statement is in progress,
|
||
other Impala statements that attempt to modify metadata for the same table wait until the first one
|
||
finishes.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2867">
|
||
Reliability improvements for the <codeph>LOAD DATA</codeph> statement.
|
||
Previously, this statement would fail if the source HDFS directory
|
||
contained any subdirectories at all. Now, the statement ignores
|
||
any hidden subdirectories, for example <filepath>_impala_insert_staging</filepath>.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2147">
|
||
A new operator, <codeph>IS [NOT] DISTINCT FROM</codeph>, lets you compare values
|
||
and always get a <codeph>true</codeph> or <codeph>false</codeph> result,
|
||
even if one or both of the values are <codeph>NULL</codeph>.
|
||
The <codeph>IS NOT DISTINCT FROM</codeph> operator, or its equivalent
|
||
<codeph><=></codeph> notation, improves the efficiency of join queries that
|
||
treat key values that are <codeph>NULL</codeph> in both tables as equal.
|
||
<ph audience="PDF">See <xref href="impala_operators.xml#is_distinct_from"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1934">
|
||
Security enhancements for the <cmdname>impala-shell</cmdname> command.
|
||
A new option, <codeph>--ldap_password_cmd</codeph>, lets you specify
|
||
a command to retrieve the LDAP password. The resulting password is
|
||
then used to authenticate the <cmdname>impala-shell</cmdname> command
|
||
with the LDAP server.
|
||
<ph audience="PDF">See <xref href="impala_shell_options.xml"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
The <codeph>CREATE TABLE AS SELECT</codeph> statement now accepts a
|
||
<codeph>PARTITIONED BY</codeph> clause, which lets you create a
|
||
partitioned table and insert data into it with a single statement.
|
||
<ph audience="PDF">See <xref href="impala_create_table.xml#create_table"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1748">
|
||
User-defined functions (UDFs and UDAFs) written in C++ now persist automatically
|
||
when the <cmdname>catalogd</cmdname> daemon is restarted. You no longer
|
||
have to run the <codeph>CREATE FUNCTION</codeph> statements again after a restart.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2843">
|
||
User-defined functions (UDFs) written in Java can now persist
|
||
when the <cmdname>catalogd</cmdname> daemon is restarted, and can be shared
|
||
transparently between Impala and Hive. You must do a one-time operation to recreate these
|
||
UDFs using new <codeph>CREATE FUNCTION</codeph> syntax, without a signature for arguments
|
||
or the return value. Afterwards, you no longer have to run the <codeph>CREATE FUNCTION</codeph>
|
||
statements again after a restart.
|
||
Although Impala does not have visibility into the UDFs that implement the
|
||
Hive built-in functions, user-created Hive UDFs are now automatically available
|
||
for calling through Impala.
|
||
<ph audience="PDF">See <xref href="impala_create_function.xml#create_function"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<!-- Listed as fixed in 2.6.0. Is this item inappropriate or did it actually come from a different JIRA? -->
|
||
<p rev="IMPALA-2728">
|
||
Reliability enhancements for memory management. Some aggregation and join queries
|
||
that formerly might have failed with an out-of-memory error due to memory contention,
|
||
now can succeed using the spill-to-disk mechanism.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<!-- Same blurb is under Incompatible Changes. Turn into a conref. -->
|
||
<p rev="IMPALA-2070">
|
||
The <codeph>SHOW DATABASES</codeph> statement now returns two columns rather than one.
|
||
The second column includes the associated comment string, if any, for each database.
|
||
Adjust any application code that examines the list of databases and assumes the
|
||
result set contains only a single column.
|
||
<ph audience="PDF">See <xref href="impala_show.xml#show_databases"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2499">
|
||
A new optimization speeds up aggregation operations that involve only the partition key
|
||
columns of partitioned tables. For example, a query such as <codeph>SELECT COUNT(DISTINCT k), MIN(k), MAX(k) FROM t1</codeph>
|
||
can avoid reading any data files if <codeph>T1</codeph> is a partitioned table and <codeph>K</codeph>
|
||
is one of the partition key columns. Because this technique can produce different results in cases
|
||
where HDFS files in a partition are manually deleted or are empty, you must enable the optimization
|
||
by setting the query option <codeph>OPTIMIZE_PARTITION_KEY_SCANS</codeph>.
|
||
<ph audience="PDF">See <xref href="impala_optimize_partition_key_scans.xml"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li audience="hidden"><!-- All the other undocumented query options are not really new features for this release, so hiding this whole bullet. -->
|
||
<p>
|
||
Other new query options:
|
||
</p>
|
||
<ul>
|
||
<li audience="hidden"><!-- Actually from a long way back, just never documented. Not sure if appropriate to keep internal-only or expose. -->
|
||
<codeph>DISABLE_OUTERMOST_TOPN</codeph>
|
||
</li>
|
||
<li audience="hidden"><!-- Actually from a long way back, just never documented. Not sure if appropriate to keep internal-only or expose. -->
|
||
<codeph>RM_INITIAL_MEM</codeph>
|
||
</li>
|
||
<li audience="hidden"><!-- Seems to be related to writing sequence files, a capability not externalized at this time. -->
|
||
<codeph>SEQ_COMPRESSION_MODE</codeph>
|
||
</li>
|
||
<li audience="hidden"><!-- Actually, was only used for working around one JIRA. Being deprecated now in Impala 2.3 via IMPALA-2963. -->
|
||
<codeph>DISABLE_CACHED_READS</codeph>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-2196">
|
||
The <codeph>DESCRIBE</codeph> statement can now display metadata about a database, using the
|
||
syntax <codeph>DESCRIBE DATABASE <varname>db_name</varname></codeph>.
|
||
<ph audience="PDF">See <xref href="impala_describe.xml#describe"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p rev="IMPALA-1477">
|
||
The <codeph>uuid()</codeph> built-in function generates an
|
||
alphanumeric value that you can use as a guaranteed unique identifier.
|
||
The uniqueness applies even across tables, for cases where an ascending
|
||
numeric sequence is not suitable.
|
||
<ph audience="PDF">See <xref href="impala_misc_functions.xml#misc_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.4.x new features go under here -->
|
||
|
||
<concept rev="2.4.0" id="new_features_240">
|
||
|
||
<title>New Features in <keyword keyref="impala24_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Impala can be used on the DSSD D5 Storage Appliance.
|
||
From a user perspective, the Impala features are the same as in <keyword keyref="impala23_full"/>.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<!-- All 2.3.x subsections go under here -->
|
||
|
||
<!-- Actually for 2.3 / 5.5, let's get away from doing a separate subhead for each maintenance release,
|
||
because in the normal course of events there will be nothing to add here until 5.6. If something new
|
||
needs to get noted, just add a new bullet with wording to indicate which 5.5.x release it applies to. -->
|
||
|
||
<concept rev="2.3.0" id="new_features_230">
|
||
|
||
<title>New Features in <keyword keyref="impala23_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The following are the major new features in Impala 2.3.x. This major release
|
||
contains improvements to SQL syntax (particularly new support for complex types), performance,
|
||
manageability, security.
|
||
</p>
|
||
|
||
<ul>
|
||
|
||
<li>
|
||
<p>
|
||
Complex data types: <codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, and <codeph>MAP</codeph>. These
|
||
types can encode multiple named fields, positional items, or key-value pairs within a single column.
|
||
You can combine these types to produce nested types with arbitrarily deep nesting,
|
||
such as an <codeph>ARRAY</codeph> of <codeph>STRUCT</codeph> values,
|
||
a <codeph>MAP</codeph> where each key-value pair is an <codeph>ARRAY</codeph> of other <codeph>MAP</codeph> values,
|
||
and so on. Currently, complex data types are only supported for the Parquet file format.
|
||
<ph audience="PDF">See <xref href="impala_complex_types.xml#complex_types"/> for usage details and <xref href="impala_array.xml#array"/>, <xref href="impala_struct.xml#struct"/>, and <xref href="impala_map.xml#map"/> for syntax.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="collevelauth">
|
||
<p>
|
||
Column-level authorization lets you define access to particular columns within a table,
|
||
rather than the entire table. This feature lets you reduce the reliance on creating views to
|
||
set up authorization schemes for subsets of information.
|
||
See <xref keyref="sg_hive_sql"/> for background details, and
|
||
<xref href="impala_grant.xml#grant"/> and <xref href="impala_revoke.xml#revoke"/> for Impala-specific syntax.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="IMPALA-1139">
|
||
<p>
|
||
The <codeph>TRUNCATE TABLE</codeph> statement removes all the data from a table without removing the table itself.
|
||
<ph audience="PDF">See <xref href="impala_truncate_table.xml#truncate_table"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-2015">
|
||
<p>
|
||
Nested loop join queries. Some join queries that formerly required equality comparisons can now use
|
||
operators such as <codeph><</codeph> or <codeph>>=</codeph>. This same join mechanism is used
|
||
internally to optimize queries that retrieve values from complex type columns.
|
||
<ph audience="PDF">See <xref href="impala_joins.xml#joins"/> for details about Impala join queries.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Reduced memory usage and improved performance and robustness for spill-to-disk feature.
|
||
<ph audience="PDF">See <xref href="impala_scalability.xml#spill_to_disk"/> for details about this feature.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="IMPALA-1881">
|
||
<p>
|
||
Performance improvements for querying Parquet data files containing multiple row groups
|
||
and multiple data blocks:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p> For files written by Hive, SparkSQL, and other Parquet MR writers
|
||
and spanning multiple HDFS blocks, Impala now scans the extra
|
||
data blocks locally when possible, rather than using remote
|
||
reads. </p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Impala queries benefit from the improved alignment of row groups with HDFS blocks for Parquet
|
||
files written by Hive, MapReduce, and other components. (Impala itself never writes
|
||
multiblock Parquet files, so the alignment change does not apply to Parquet files produced by Impala.)
|
||
These Parquet writers now add padding to Parquet files that they write to align row groups with HDFS blocks.
|
||
The <codeph>parquet.writer.max-padding</codeph> setting specifies the maximum number of bytes, by default
|
||
8 megabytes, that can be added to the file between row groups to fill the gap at the end of one block
|
||
so that the next row group starts at the beginning of the next block.
|
||
If the gap is larger than this size, the writer attempts to fit another entire row group in the remaining space.
|
||
Include this setting in the <filepath>hive-site</filepath> configuration file to influence Parquet files written by Hive,
|
||
or the <filepath>hdfs-site</filepath> configuration file to influence Parquet files written by all non-Impala components.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
<p audience="PDF">
|
||
See <xref href="impala_parquet.xml#parquet"/> for instructions about using Parquet data files
|
||
with Impala.
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-1660">
|
||
<p>
|
||
Many new built-in scalar functions, for convenience and enhanced portability of SQL that uses common industry extensions.
|
||
</p>
|
||
|
||
<p rev="IMPALA-1771">
|
||
Math functions<ph audience="PDF"> (see <xref href="impala_math_functions.xml#math_functions"/> for details)</ph>:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>ATAN2</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>COSH</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>COT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DCEIL</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DEXP</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DFLOOR</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DLOG10</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DPOW</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DROUND</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DSQRT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DTRUNC</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>FACTORIAL</codeph>, and corresponding <codeph>!</codeph> operator
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>FPOW</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>RADIANS</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>RANDOM</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>SINH</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>TANH</codeph>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
String functions<ph audience="PDF"> (see <xref href="impala_string_functions.xml#string_functions"/> for details)</ph>:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>BTRIM</codeph>
|
||
</li>
|
||
<li>
|
||
<codeph>CHR</codeph>
|
||
</li>
|
||
<li>
|
||
<codeph>REGEXP_LIKE</codeph>
|
||
</li>
|
||
<li>
|
||
<codeph>SPLIT_PART</codeph>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
Date and time functions<ph audience="PDF"> (see <xref href="impala_datetime_functions.xml#datetime_functions"/> for details)</ph>:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>INT_MONTHS_BETWEEN</codeph>
|
||
</li>
|
||
<li>
|
||
<codeph>MONTHS_BETWEEN</codeph>
|
||
</li>
|
||
<li>
|
||
<codeph>TIMEOFDAY</codeph>
|
||
</li>
|
||
<li>
|
||
<codeph>TIMESTAMP_CMP</codeph>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
Bit manipulation functions<ph audience="PDF"> (see <xref href="impala_bit_functions.xml#bit_functions"/> for details)</ph>:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>BITAND</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>BITNOT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>BITOR</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>BITXOR</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>COUNTSET</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>GETBIT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>ROTATELEFT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>ROTATERIGHT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>SETBIT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>SHIFTLEFT</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>SHIFTRIGHT</codeph>
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
Type conversion functions<ph audience="PDF"> (see <xref href="impala_conversion_functions.xml#conversion_functions"/> for details)</ph>:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>TYPEOF</codeph>
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
The <codeph>effective_user()</codeph> function<ph audience="PDF"> (see <xref href="impala_misc_functions.xml#misc_functions"/> for details)</ph>.
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-2081">
|
||
<p>
|
||
New built-in analytic functions: <codeph>PERCENT_RANK</codeph>, <codeph>NTILE</codeph>,
|
||
<codeph>CUME_DIST</codeph>.
|
||
<ph audience="PDF">See <xref href="impala_analytic_functions.xml#analytic_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-595">
|
||
<p>
|
||
The <codeph>DROP DATABASE</codeph> statement now works for a non-empty database.
|
||
When you specify the optional <codeph>CASCADE</codeph> clause, any tables in the
|
||
database are dropped before the database itself is removed.
|
||
<ph audience="PDF">See <xref href="impala_drop_database.xml#drop_database"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>DROP TABLE</codeph> and <codeph>ALTER TABLE DROP PARTITION</codeph> statements have a new optional keyword, <codeph>PURGE</codeph>.
|
||
This keyword causes Impala to immediately remove the relevant HDFS data files rather than sending them to the HDFS trashcan.
|
||
This feature can help to avoid out-of-space errors on storage devices, and to avoid files being left behind in case of
|
||
a problem with the HDFS trashcan, such as the trashcan not being configured or being in a different HDFS encryption zone
|
||
than the data files.
|
||
<ph audience="PDF">See <xref href="impala_drop_table.xml#drop_table"/> and <xref href="impala_alter_table.xml#alter_table"/> for syntax.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-80">
|
||
<p>
|
||
The <cmdname>impala-shell</cmdname> command has a new feature for live progress reporting. This feature
|
||
is enabled through the <codeph>--live_progress</codeph> and <codeph>--live_summary</codeph>
|
||
command-line options, or during a session through the <codeph>LIVE_SUMMARY</codeph> and
|
||
<codeph>LIVE_PROGRESS</codeph> query options.
|
||
<ph audience="PDF">See <xref href="impala_live_progress.xml#live_progress"/> and <xref href="impala_live_summary.xml#live_summary"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <cmdname>impala-shell</cmdname> command also now displays a random <q>tip of the day</q> when it starts.
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-1413">
|
||
<p>
|
||
The <cmdname>impala-shell</cmdname> option <codeph>-f</codeph> now recognizes a special filename
|
||
<codeph>-</codeph> to accept input from stdin.
|
||
<ph audience="PDF">See <xref href="impala_shell_options.xml#shell_options"/> for details about the options for running <cmdname>impala-shell</cmdname> in non-interactive mode.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-1963">
|
||
<p>
|
||
Format strings for the <codeph>unix_timestamp()</codeph> function can now include numeric timezone offsets.
|
||
<ph audience="PDF">See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Impala can now run a specified command to obtain the password to decrypt a private-key PEM file,
|
||
rather than having the private-key file be unencrypted on disk.
|
||
<ph audience="PDF">See <xref href="impala_ssl.xml#ssl"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-859">
|
||
<p>
|
||
Impala components now can use SSL for more of their internal communication. SSL is used for
|
||
communication between all three Impala-related daemons when the configuration option
|
||
<codeph>ssl_server_certificate</codeph> is enabled. SSL is used for communication with client
|
||
applications when the configuration option <codeph>ssl_client_ca_certificate</codeph> is enabled.
|
||
<ph audience="PDF">See <xref href="impala_ssl.xml#ssl"/> for details.</ph>
|
||
</p>
|
||
<p>
|
||
Currently, you can only use one of server-to-server TLS/SSL encryption or Kerberos authentication.
|
||
This limitation is tracked by the issue
|
||
<xref keyref="IMPALA-2598">IMPALA-2598</xref>.
|
||
</p>
|
||
</li>
|
||
|
||
<li id="IMPALA-1829">
|
||
<p>
|
||
Improved flexibility for intermediate data types in user-defined aggregate functions (UDAFs).
|
||
<ph audience="PDF">See <xref href="impala_udf.xml#udafs"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
</ul>
|
||
|
||
<p>
|
||
In <keyword keyref="impala232"/>, the bug fix for <xref keyref="IMPALA-2598">IMPALA-2598</xref>
|
||
removes the restriction on using both Kerberos and SSL for internal communication between Impala components.
|
||
</p>
|
||
|
||
<!-- End of new feature list for 2.3 / 5.5. -->
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<!-- All 2.2.x subsections go under here -->
|
||
|
||
<concept rev="2.2.0" id="new_features_220">
|
||
|
||
<title>New Features in <keyword keyref="impala28_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The following are the major new features in <keyword keyref="impala22_full"/>. This release
|
||
contains improvements to performance, manageability, security, and SQL syntax.
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Several improvements to date and time features enable higher interoperability with Hive and other
|
||
database systems, provide more flexibility for handling time zones, and future-proof the handling of
|
||
<codeph>TIMESTAMP</codeph> values:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
The <codeph>WITH REPLICATION</codeph> clause for the <codeph>CREATE TABLE</codeph> and
|
||
<codeph>ALTER TABLE</codeph> statements lets you control the replication factor for
|
||
HDFS caching for a specific table or partition. By default, each cached block is
|
||
only present on a single host, which can lead to CPU contention if the same host
|
||
processes each cached block. Increasing the replication factor lets Impala choose
|
||
different hosts to process different cached blocks, to better distribute the CPU load.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Startup flags for the <cmdname>impalad</cmdname> daemon enable a higher level of compatibility with
|
||
<codeph>TIMESTAMP</codeph> values written by Hive, and more flexibility for working with date and
|
||
time data using the local time zone instead of UTC. To enable these features, set the
|
||
<cmdname>impalad</cmdname> startup flags
|
||
<codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph> and
|
||
<codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>.
|
||
</p>
|
||
|
||
<p>
|
||
The <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting controls how the
|
||
<codeph>unix_timestamp()</codeph>, <codeph>from_unixtime()</codeph>, and <codeph>now()</codeph>
|
||
functions handle time zones. By default (when this setting is turned off), Impala considers all
|
||
<codeph>TIMESTAMP</codeph> values to be in the UTC time zone when converting to or from Unix time
|
||
values. When this setting is enabled, Impala treats <codeph>TIMESTAMP</codeph> values passed to or
|
||
returned from these functions to be in the local time zone. When this setting is enabled, take
|
||
particular care that all hosts in the cluster have the same timezone settings, to avoid
|
||
inconsistent results depending on which host reads or writes <codeph>TIMESTAMP</codeph> data.
|
||
</p>
|
||
|
||
<p>
|
||
The <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting causes Impala to convert
|
||
<codeph>TIMESTAMP</codeph> values to the local time zone when it reads them from Parquet files
|
||
written by Hive. This setting only applies to data using the Parquet file format, where Impala can
|
||
use metadata in the files to reliably determine that the files were written by Hive. If in the
|
||
future Hive changes the way it writes <codeph>TIMESTAMP</codeph> data in Parquet, Impala will
|
||
automatically handle that new <codeph>TIMESTAMP</codeph> encoding.
|
||
</p>
|
||
|
||
<p>
|
||
See <xref href="impala_timestamp.xml#timestamp"/> for details about time zone handling and the
|
||
configuration options for Impala / Hive compatibility with Parquet format.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p conref="../shared/impala_common.xml#common/y2k38" />
|
||
|
||
<p>
|
||
See <xref href="impala_datetime_functions.xml#datetime_functions"/> for the current function
|
||
signatures.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>SHOW FILES</codeph> statement lets you view the names and sizes of the files that make up
|
||
an entire table or a specific partition. See <xref href="impala_show.xml#show_files"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Impala can now run queries against Parquet data containing columns with complex or nested types, as
|
||
long as the query only refers to columns with scalar types.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Performance improvements for queries that include <codeph>IN()</codeph> operators and involve
|
||
partitioned tables.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<!-- Same text for this item in impala_fixed_issues.xml. Could turn into a conref. -->
|
||
<p>
|
||
The new <codeph>-max_log_files</codeph> configuration option specifies how many log files to keep at
|
||
each severity level. The default value is 10, meaning that Impala preserves the latest 10 log files for
|
||
each severity level (<codeph>INFO</codeph>, <codeph>WARNING</codeph>, and <codeph>ERROR</codeph>) for
|
||
each Impala-related daemon (<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and
|
||
<cmdname>catalogd</cmdname>). Impala checks to see if any old logs need to be removed based on the
|
||
interval specified in the <codeph>logbufsecs</codeph> setting, every 5 seconds by default. See
|
||
<xref href="impala_logging.xml#logs_rotate"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Redaction of sensitive data from Impala log files. This feature protects details such as credit card
|
||
numbers or tax IDs from administrators who see the text of SQL statements in the course of monitoring
|
||
and troubleshooting a Hadoop cluster. See <xref href="impala_logging.xml#redaction"/> for background
|
||
information for Impala users, and <xref keyref="sg_redaction"/> for usage details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Lineage information is available for data created or queried by Impala. This feature lets you track who
|
||
has accessed data through Impala SQL statements, down to the level of specific columns, and how data
|
||
has been propagated between tables. See <xref href="impala_lineage.xml#lineage"/> for background
|
||
information for Impala users, <xref keyref="datamgmt_impala_lineage_log"/> for usage details and
|
||
how to interpret the lineage information.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Impala tables and partitions can now be located on the Amazon Simple Storage Service (S3) filesystem,
|
||
for convenience in cases where data is already located in S3 and you prefer to query it in-place.
|
||
Queries might have lower performance than when the data files reside on HDFS, because Impala uses some
|
||
HDFS-specific optimizations. Impala can query data in S3, but cannot write to S3. Therefore, statements
|
||
such as <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> are not available when the destination
|
||
table or partition is in S3. See <xref href="impala_s3.xml#s3"/> for details.
|
||
</p>
|
||
|
||
<note conref="../shared/impala_common.xml#common/s3_caveat" />
|
||
</li>
|
||
|
||
<li>
|
||
<!-- Only want the link out of the release notes to appear for HTML
|
||
(N.B. audience="PDF" means hide from PDF), and only in the HTML for the
|
||
integrated build where the topic is available for link resolution. -->
|
||
<p>
|
||
Improved support for HDFS encryption. The <codeph>LOAD DATA</codeph> statement now works when the
|
||
source directory and destination table are in different encryption zones. See
|
||
<xref keyref="cdh_sg_component_kms"/> for details about using HDFS encryption with
|
||
Impala.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Additional arithmetic function <codeph>mod()</codeph>. See
|
||
<xref href="impala_math_functions.xml#math_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Flexibility to interpret <codeph>TIMESTAMP</codeph> values using the UTC time zone (the traditional
|
||
Impala behavior) or using the local time zone (for compatibility with <codeph>TIMESTAMP</codeph> values
|
||
produced by Hive).
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Enhanced support for ETL using tools such as Flume. Impala ignores temporary files typically produced
|
||
by these tools (filenames with suffixes <codeph>.copying</codeph> and <codeph>.tmp</codeph>).
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The CPU requirement for Impala, which had become more restrictive in Impala 2.0.x and 2.1.x, has now
|
||
been relaxed.
|
||
</p>
|
||
|
||
<p conref="../shared/impala_common.xml#common/cpu_prereq" />
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Enhanced support for <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> types in the <codeph>COMPUTE
|
||
STATS</codeph> statement.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="">
|
||
<p>
|
||
The amount of memory required during setup for <q>spill to disk</q> operations is greatly reduced. This
|
||
enhancement reduces the chance of a memory-intensive join or aggregation query failing with an
|
||
out-of-memory error.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Several new conditional functions provide enhanced compatibility when porting code that uses industry
|
||
extensions. The new functions are: <codeph>isfalse()</codeph>, <codeph>isnotfalse()</codeph>,
|
||
<codeph>isnottrue()</codeph>, <codeph>istrue()</codeph>, <codeph>nonnullvalue()</codeph>, and
|
||
<codeph>nullvalue()</codeph>. See <xref href="impala_conditional_functions.xml#conditional_functions"/>
|
||
for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The Impala debug web UI now can display a visual representation of the query plan. On the
|
||
<uicontrol>/queries</uicontrol> tab, select <uicontrol>Details</uicontrol> for a particular query. The
|
||
<uicontrol>Details</uicontrol> page includes a <uicontrol>Plan</uicontrol> tab with a plan diagram that
|
||
you can zoom in or out (using scroll gestures through mouse wheel or trackpad).
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<!-- End of new feature list for 5.4. -->
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<!-- All 2.1.x subsections go under here -->
|
||
|
||
<concept rev="2.1.0" id="new_features_210">
|
||
|
||
<title>New Features in <keyword keyref="impala21_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This release contains the following enhancements to query performance and system scalability:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Impala can now collect statistics for individual partitions in a partitioned table, rather than
|
||
processing the entire table for each <codeph>COMPUTE STATS</codeph> statement. This feature is known as
|
||
incremental statistics, and is controlled by the <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax.
|
||
(You can still use the original <codeph>COMPUTE STATS</codeph> statement for nonpartitioned tables or
|
||
partitioned tables that are unchanging or whose contents are entirely replaced all at once.) See
|
||
<xref href="impala_compute_stats.xml#compute_stats"/> and
|
||
<xref href="impala_perf_stats.xml#perf_stats"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Optimization for small queries lets Impala process queries that process very few rows without the
|
||
unnecessary overhead of parallelizing and generating native code. Reducing this overhead lets Impala
|
||
clear small queries quickly, keeping YARN resources and admission control slots available for
|
||
data-intensive queries. The number of rows considered to be a <q>small</q> query is controlled by the
|
||
<codeph>EXEC_SINGLE_NODE_ROWS_THRESHOLD</codeph> query option. See
|
||
<xref href="impala_exec_single_node_rows_threshold.xml#exec_single_node_rows_threshold"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
An enhancement to the statestore component lets it transmit heartbeat information independently of
|
||
broadcasting metadata updates. This optimization improves reliability of health checking on large
|
||
clusters with many tables and partitions.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data
|
||
as it is read, rather than reading the entire gzipped file and decompressing it in memory.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<!-- All 2.0.x subsections go under here -->
|
||
|
||
<concept rev="2.0.0" id="new_features_200">
|
||
|
||
<title>New Features in <keyword keyref="impala20_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The following are the major new features in <keyword keyref="impala20_full"/>. This major release
|
||
contains improvements to performance, scalability, security, and SQL syntax.
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Queries with joins or aggregation functions involving high volumes of data can now use temporary work
|
||
areas on disk, reducing the chance of failure due to out-of-memory errors. When the required memory for
|
||
the intermediate result set exceeds the amount available on a particular node, the query automatically
|
||
uses a temporary work area on disk. This <q>spill to disk</q> mechanism is similar to the <codeph>ORDER
|
||
BY</codeph> improvement from Impala 1.4. For details, see
|
||
<xref href="impala_scalability.xml#spill_to_disk"/>.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Subquery enhancements:
|
||
<ul>
|
||
<li>
|
||
Subqueries are now allowed in the <codeph>WHERE</codeph> clause, for example with the
|
||
<codeph>IN</codeph> operator.
|
||
</li>
|
||
|
||
<li>
|
||
The <codeph>EXISTS</codeph> and <codeph>NOT EXISTS</codeph> operators are available. They are
|
||
always used in conjunction with subqueries.
|
||
</li>
|
||
|
||
<li>
|
||
The <codeph>IN</codeph> and <codeph>NOT IN</codeph> queries can now operate on the result set from
|
||
a subquery, not just a hardcoded list of values.
|
||
</li>
|
||
|
||
<li>
|
||
Uncorrelated subqueries let you compare against one or more values for equality,
|
||
<codeph>IN</codeph>, and <codeph>EXISTS</codeph> comparisons. For example, you might use
|
||
<codeph>WHERE</codeph> clauses such as <codeph>WHERE <varname>column</varname> = (SELECT
|
||
MAX(<varname>some_other_column</varname> FROM <varname>table</varname>)</codeph> or <codeph>WHERE
|
||
<varname>column</varname> IN (SELECT <varname>some_other_column</varname> FROM
|
||
<varname>table</varname> WHERE <varname>conditions</varname>)</codeph>.
|
||
</li>
|
||
|
||
<li>
|
||
Correlated subqueries let you cross-reference values from the outer query block and the subquery.
|
||
</li>
|
||
|
||
<li>
|
||
Scalar subqueries let you substitute the result of single-value aggregate functions such as
|
||
<codeph>MAX()</codeph>, <codeph>MIN()</codeph>, <codeph>COUNT()</codeph>, or
|
||
<codeph>AVG()</codeph>, where you would normally use a numeric value in a <codeph>WHERE</codeph>
|
||
clause.
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
|
||
<p>
|
||
For details about subqueries, see <xref href="impala_subqueries.xml#subqueries"/> For information about
|
||
new and improved operators, see <xref href="impala_operators.xml#exists"/> and
|
||
<xref href="impala_operators.xml#in"/>.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Analytic functions such as <codeph>RANK()</codeph>, <codeph>LAG()</codeph>, <codeph>LEAD()</codeph>,
|
||
and <codeph>FIRST_VALUE()</codeph> let you analyze sequences of rows with flexible ordering and
|
||
grouping. Existing aggregate functions such as <codeph>MAX()</codeph>, <codeph>SUM()</codeph>, and
|
||
<codeph>COUNT()</codeph> can also be used in an analytic context. See
|
||
<xref href="impala_analytic_functions.xml#analytic_functions"/> for details. See
|
||
<xref href="impala_aggregate_functions.xml#aggregate_functions"/> for enhancements to existing
|
||
aggregate functions.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New data types provide greater compatibility with source code from traditional database systems:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>VARCHAR</codeph> is like the <codeph>STRING</codeph> data type, but with a maximum length.
|
||
See <xref href="impala_varchar.xml#varchar"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>CHAR</codeph> is like the <codeph>STRING</codeph> data type, but with a precise length. Short
|
||
values are padded with spaces on the right. See <xref href="impala_char.xml#char"/> for details.
|
||
</li>
|
||
|
||
<li audience="hidden">
|
||
<!-- This feature will be undocumented in Impala 2.0, probably ready for prime time in 2.1. -->
|
||
<codeph>DATE</codeph>. See <xref href="impala_date.xml#date"/> for details.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Security enhancements:
|
||
<ul>
|
||
<li>
|
||
Formerly, Impala was restricted to using either Kerberos or LDAP / Active Directory authentication
|
||
within a cluster. Now, Impala can freely accept either kind of authentication request, allowing you
|
||
to set up some hosts with Kerberos authentication and others with LDAP or Active Directory. See
|
||
<xref href="impala_mixed_security.xml#mixed_security"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>GRANT</codeph> statement. See <xref href="impala_grant.xml#grant"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>REVOKE</codeph> statement. See <xref href="impala_revoke.xml#revoke"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>CREATE ROLE</codeph> statement. See <xref href="impala_create_role.xml#create_role"/> for
|
||
details.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DROP ROLE</codeph> statement. See <xref href="impala_drop_role.xml#drop_role"/> for
|
||
details.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>SHOW ROLES</codeph> and <codeph>SHOW ROLE GRANT</codeph> statements. See
|
||
<xref href="impala_show.xml#show"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
To complement the HDFS encryption feature, a new Impala configuration option,
|
||
<codeph>--disk_spill_encryption</codeph> secures sensitive data from being observed or tampered
|
||
with when temporarily stored on disk.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
|
||
<p>
|
||
The new security-related SQL statements work along with the Sentry authorization framework. See
|
||
<xref keyref="authorization"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Impala can now read compressed text files compressed by gzip, bzip, or Snappy. These files do not
|
||
require any special table settings to work in an Impala text table. Impala recognizes the compression
|
||
type automatically based on file extensions of <codeph>.gz</codeph>, <codeph>.bz2</codeph>, and
|
||
<codeph>.snappy</codeph> respectively. These types of compressed text files are intended for
|
||
convenience with existing ETL pipelines. Their non-splittable nature means they are not optimal for
|
||
high-performance parallel queries. See <xref href="impala_txtfile.xml#gzip"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Query hints can now use comment notation, <codeph>/* +<varname>hint_name</varname> */</codeph> or
|
||
<codeph>-- +<varname>hint_name</varname></codeph>, at the same places in the query where the hints
|
||
enclosed by <codeph>[ ]</codeph> are recognized. This enhancement makes it easier to reuse Impala
|
||
queries on other database systems. See <xref href="impala_hints.xml#hints"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new query option, <codeph>QUERY_TIMEOUT_S</codeph>, lets you specify a timeout period in seconds for
|
||
individual queries.
|
||
</p>
|
||
|
||
<p>
|
||
The working of the <codeph>--idle_query_timeout</codeph> configuration option is extended. If no
|
||
<codeph>QUERY_OPTION_S</codeph> query option is in effect, <codeph>--idle_query_timeout</codeph> works
|
||
the same as before, setting the timeout interval. When the <codeph>QUERY_OPTION_S</codeph> query option
|
||
is specified, its maximum value is capped by the value of the <codeph>--idle_query_timeout</codeph>
|
||
option.
|
||
</p>
|
||
|
||
<p>
|
||
That is, the system administrator sets the default and maximum timeout through the
|
||
<codeph>--idle_query_timeout</codeph> startup option, and then individual users or applications can set
|
||
a lower timeout value if desired through the <codeph>QUERY_TIMEOUT_S</codeph> query option. See
|
||
<xref href="impala_timeouts.xml#timeouts"/> and
|
||
<xref href="impala_query_timeout_s.xml#query_timeout_s"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New functions <codeph>VAR_SAMP()</codeph> and <codeph>VAR_POP()</codeph> are aliases for the existing
|
||
<codeph>VARIANCE_SAMP()</codeph> and <codeph>VARIANCE_POP()</codeph> functions.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new date and time function, <codeph>DATE_PART()</codeph>, provides similar functionality to
|
||
<codeph>EXTRACT()</codeph>. You can also call the <codeph>EXTRACT()</codeph> function using the SQL-99
|
||
syntax, <codeph>EXTRACT(<varname>unit</varname> FROM <varname>timestamp</varname>)</codeph>. These
|
||
enhancements simplify the porting process for date-related code from other systems. See
|
||
<xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New approximation features provide a fast way to get results when absolute precision is not required:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
The <codeph>APPX_COUNT_DISTINCT</codeph> query option lets Impala rewrite
|
||
<codeph>COUNT(DISTINCT)</codeph> calls to use <codeph>NDV()</codeph> instead, which speeds up the
|
||
operation and allows multiple <codeph>COUNT(DISTINCT)</codeph> operations in a single query. See
|
||
<xref href="impala_appx_count_distinct.xml#appx_count_distinct"/> for details.
|
||
</li>
|
||
</ul>
|
||
The <codeph>APPX_MEDIAN()</codeph> aggregate function produces an estimate for the median value of a
|
||
column by using sampling. See <xref href="impala_appx_median.xml#appx_median"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Impala now supports a <codeph>DECODE()</codeph> function. This function works as a shorthand for a
|
||
<codeph>CASE()</codeph> expression, and improves compatibility with SQL code containing vendor
|
||
extensions. See <xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>STDDEV()</codeph>, <codeph>STDDEV_POP()</codeph>, <codeph>STDDEV_SAMP()</codeph>,
|
||
<codeph>VARIANCE()</codeph>, <codeph>VARIANCE_POP()</codeph>, <codeph>VARIANCE_SAMP()</codeph>, and
|
||
<codeph>NDV()</codeph> aggregate functions now all return <codeph>DOUBLE</codeph> results rather than
|
||
<codeph>STRING</codeph>. Formerly, you were required to <codeph>CAST()</codeph> the result to a numeric
|
||
type before using it in arithmetic operations.
|
||
</p>
|
||
</li>
|
||
|
||
<li id="parquet_block_size">
|
||
<p>
|
||
The default settings for Parquet block size, and the associated <codeph>PARQUET_FILE_SIZE</codeph>
|
||
query option, are changed. Now, Impala writes Parquet files with a size of 256 MB and an HDFS block
|
||
size of 256 MB. Previously, Impala attempted to write Parquet files with a size of 1 GB and an HDFS
|
||
block size of 1 GB. In practice, Impala used a conservative estimate of the disk space needed for each
|
||
Parquet block, leading to files that were typically 512 MB anyway. Thus, this change will make the file
|
||
size more accurate if you specify a value for the <codeph>PARQUET_FILE_SIZE</codeph> query option. It
|
||
also reduces the amount of memory reserved during <codeph>INSERT</codeph> into Parquet tables,
|
||
potentially avoiding out-of-memory errors and improving scalability when inserting data into Parquet
|
||
tables.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Anti-joins are now supported, expressed using the <codeph>LEFT ANTI JOIN</codeph> and <codeph>RIGHT
|
||
ANTI JOIN</codeph> clauses.
|
||
<!-- Maybe RIGHT SEMI JOIN is new too? -->
|
||
<!-- Make following statement true in the context of RIGHT ANTI JOIN. -->
|
||
These clauses returns results from one table that have no match in the other table. You might use this
|
||
type of join in the same sorts of use cases as the <codeph>NOT EXISTS</codeph> and <codeph>NOT
|
||
IN</codeph> operators. See <xref href="impala_joins.xml#joins"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li audience="hidden">
|
||
<!-- This feature will be undocumented in Impala 2.0, probably ready for prime time in 2.1. -->
|
||
<p>
|
||
Improved file format support. Impala can now write to Avro, compressed text, SequenceFile, and RCFile
|
||
tables using the <codeph>INSERT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph> statements. See
|
||
<xref href="impala_file_formats.xml#file_formats"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>SET</codeph> command in <cmdname>impala-shell</cmdname> has been promoted to a real SQL
|
||
statement. You can now set query options such as <codeph>PARQUET_FILE_SIZE</codeph>,
|
||
<codeph>MEM_LIMIT</codeph>, and <codeph>SYNC_DDL</codeph> within JDBC, ODBC, or any other kind of
|
||
application that submits SQL without going through the <cmdname>impala-shell</cmdname> interpreter. See
|
||
<xref href="impala_set.xml#set"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <cmdname>impala-shell</cmdname> interpreter now reads settings from an optional configuration file,
|
||
named <filepath>$HOME/.impalarc</filepath> by default. See
|
||
<xref href="impala_shell_options.xml#shell_config_file"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li audience="hidden">
|
||
<!-- This feature will be undocumented in Impala 2.0, probably ready for prime time in 2.1. -->
|
||
<p>
|
||
The <codeph>COMPUTE STATS</codeph> statement can now gather statistics for newly added partitions
|
||
rather than the entire table. This feature is known as <term>incremental statistics</term>. See
|
||
<xref href="impala_compute_stats.xml#compute_stats"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The library used for regular expression parsing has changed from Boost to Google RE2. This
|
||
implementation change adds support for non-greedy matches using the <codeph>.*?</codeph> notation. This
|
||
and other changes in the way regular expressions are interpreted means you might need to re-test
|
||
queries that use functions such as <codeph>regexp_extract()</codeph> or
|
||
<codeph>regexp_replace()</codeph>, or operators such as <codeph>REGEXP</codeph> or
|
||
<codeph>RLIKE</codeph>. See <xref href="impala_incompatible_changes.xml#incompatible_changes"/> for
|
||
those details.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.4.0" id="new_features_140">
|
||
|
||
<title>New Features in <keyword keyref="impala14_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The following are the major new features in <keyword keyref="impala14_full"/>:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
The <codeph>DECIMAL</codeph> data type lets you store fixed-precision values, for working with currency
|
||
or other fractional values where it is important to represent values exactly and avoid rounding errors.
|
||
This feature includes enhancements to built-in functions, numeric literals, and arithmetic expressions.
|
||
<ph audience="PDF">See <xref href="impala_decimal.xml#decimal"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Where the underlying HDFS support exists, Impala can take advantage of the HDFS caching feature to <q>pin</q> entire tables or
|
||
individual partitions in memory, to speed up queries on frequently accessed data and reduce the CPU
|
||
overhead of memory-to-memory copying. When HDFS files are cached in memory, Impala can read the cached
|
||
data without any disk reads, and without making an additional copy of the data in memory. Other Hadoop
|
||
components that read the same data files also experience a performance benefit.
|
||
</p>
|
||
|
||
<p audience="PDF">
|
||
For background information about HDFS caching, see
|
||
<xref keyref="setup_hdfs_caching"/>. For performance information about using this feature with Impala, see
|
||
<xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>. For the <codeph>SET CACHED</codeph> and
|
||
<codeph>SET UNCACHED</codeph> clauses that let you control cached table data through DDL statements,
|
||
see <xref href="impala_create_table.xml#create_table"/> and
|
||
<xref href="impala_alter_table.xml#alter_table"/>.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Impala can now use Sentry-based authorization based either on the original policy file, or on rules
|
||
defined by <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements issued through Hive.
|
||
See <xref keyref="authorization"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
For interoperability with Parquet files created through other Hadoop components, such as Pig or
|
||
MapReduce jobs, you can create an Impala table that automatically sets up the column definitions based
|
||
on the layout of an existing Parquet data file. <ph audience="PDF">See
|
||
<xref href="impala_create_table.xml#create_table"/> for the syntax, and
|
||
<xref href="impala_parquet.xml#parquet_ddl"/> for usage information.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
<codeph>ORDER BY</codeph> queries no longer require a <codeph>LIMIT</codeph> clause. If the size of the
|
||
result set to be sorted exceeds the memory available to Impala, Impala uses a temporary work space on
|
||
disk to perform the sort operation. <ph audience="PDF">See <xref href="impala_order_by.xml#order_by"/>
|
||
for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
LDAP connections can be secured through either SSL or TLS. <ph audience="PDF">See
|
||
<xref href="impala_ldap.xml#ldap"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The following new built-in scalar and aggregate functions are available:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
A new built-in function, <codeph>EXTRACT()</codeph>, returns one date or time field from a
|
||
<codeph>TIMESTAMP</codeph> value. <ph audience="PDF">See
|
||
<xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new built-in function, <codeph>TRUNC()</codeph>, truncates date/time values to a particular
|
||
granularity, such as year, month, day, hour, and so on. <ph audience="PDF">See
|
||
<xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
<codeph>ADD_MONTHS()</codeph> built-in function, an alias for the existing
|
||
<codeph>MONTHS_ADD()</codeph> function. <ph audience="PDF">See
|
||
<xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new built-in function, <codeph>ROUND()</codeph>, rounds <codeph>DECIMAL</codeph> values to a
|
||
specified number of fractional digits. <ph audience="PDF">See
|
||
<xref href="impala_math_functions.xml#math_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Several built-in aggregate functions for computing properties for statistical distributions:
|
||
<codeph>STDDEV()</codeph>, <codeph>STDDEV_SAMP()</codeph>, <codeph>STDDEV_POP()</codeph>,
|
||
<codeph>VARIANCE()</codeph>, <codeph>VARIANCE_SAMP()</codeph>, and <codeph>VARIANCE_POP()</codeph>.
|
||
<ph audience="PDF">See <xref href="impala_stddev.xml#stddev"/> and
|
||
<xref href="impala_variance.xml#variance"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Several new built-in functions, such as <codeph>MAX_INT()</codeph>,
|
||
<codeph>MIN_SMALLINT()</codeph>, and so on, let you conveniently check whether data values are in
|
||
an expected range. You might be able to switch a column to a smaller type, saving memory during
|
||
processing. <ph audience="PDF">See <xref href="impala_math_functions.xml#math_functions"/> for
|
||
details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New built-in functions, <codeph>IS_INF()</codeph> and <codeph>IS_NAN()</codeph>, check for the
|
||
special values infinity and <q>not a number</q>. These values could be specified as
|
||
<codeph>inf</codeph> or <codeph>nan</codeph> in text data files, or be produced by certain
|
||
arithmetic expressions. <ph audience="PDF">See
|
||
<xref href="impala_math_functions.xml#math_functions"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>SHOW PARTITIONS</codeph> statement displays information about the structure of a
|
||
partitioned table. <ph audience="PDF">See <xref href="impala_show.xml#show"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li audience="hidden">
|
||
<!-- Not documenting for 1.4. Revisit in a future release. -->
|
||
<p>
|
||
Data sources. <ph audience="PDF">See <xref href="impala_data_sources.xml#data_sources"/> for
|
||
details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New configuration options for the <cmdname>impalad</cmdname> daemon let you specify initial memory
|
||
usage for all queries. The initial resource requests handled by Llama and YARN can be expanded later if
|
||
needed, avoiding unnecessary over-allocation and reducing the chance of out-of-memory conditions.
|
||
<ph audience="PDF">See <xref href="impala_resource_management.xml#resource_management"/> for
|
||
details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
The Impala <codeph>CREATE TABLE</codeph> statement now has a <codeph>STORED AS AVRO</codeph> clause,
|
||
allowing you to create Avro tables through Impala. <ph audience="PDF">See
|
||
<xref href="impala_avro.xml#avro"/> for details and examples.</ph>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New <cmdname>impalad</cmdname> configuration options let you fine-tune the calculations Impala makes to
|
||
estimate resource requirements for each query. These options can help avoid problems due to
|
||
overconsumption due to too-low estimates, or underutilization due to too-high estimates.
|
||
<ph audience="PDF">See <xref href="impala_resource_management.xml#resource_management"/> for
|
||
details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new <codeph>SUMMARY</codeph> command in the <cmdname>impala-shell</cmdname> interpreter provides a
|
||
high-level summary of the work performed at each stage of the explain plan. The summary is also
|
||
included in output from the <codeph>PROFILE</codeph> command. <ph audience="PDF">See
|
||
<xref href="impala_shell_commands.xml#shell_commands"/> and
|
||
<xref href="impala_explain_plan.xml#perf_summary"/> for details.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Performance improvements for the <codeph>COMPUTE STATS</codeph> statement:
|
||
</p>
|
||
<ul>
|
||
<!-- This particular change has been pushed out to a later release. -->
|
||
|
||
<li audience="hidden">
|
||
Certain simple aggregation operations (with no <codeph>GROUP BY</codeph> step) are multi-threaded if
|
||
spare cores are available.
|
||
</li>
|
||
|
||
<li>
|
||
The <codeph>NDV</codeph> function is speeded up through native code generation.
|
||
</li>
|
||
|
||
<li>
|
||
Because the <codeph>NULL</codeph> count is not currently used by the Impala query planner, in Impala
|
||
1.4.0 and higher, <codeph>COMPUTE STATS</codeph> does not count the <codeph>NULL</codeph> values for
|
||
each column. (The <codeph>#Nulls</codeph> field of the stats table is left as -1, signifying that the
|
||
value is unknown.)
|
||
</li>
|
||
</ul>
|
||
<p audience="PDF">
|
||
See <xref href="impala_compute_stats.xml#compute_stats"/> for general details about the <codeph>COMPUTE
|
||
STATS</codeph> statement, and <xref href="impala_perf_stats.xml#perf_stats"/> for how to use the
|
||
statistics to improve query performance.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Performance improvements for partition pruning. This feature reduces the time spent in query planning,
|
||
for partitioned tables with thousands of partitions. Previously, Impala typically queried tables with
|
||
up to approximately 3000 partitions. With the performance improvement in partition pruning, now Impala
|
||
can comfortably handle tables with tens of thousands of partitions. <ph audience="PDF">See
|
||
<xref href="impala_partitioning.xml#partition_pruning"/> for information about partition pruning.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The documentation provides additional guidance for planning tasks. <ph audience="PDF">See
|
||
<xref href="impala_planning.xml#planning"/>.</ph>
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <cmdname>impala-shell</cmdname> interpreter now supports UTF-8 characters for input and output. You
|
||
can control whether <cmdname>impala-shell</cmdname> ignores invalid Unicode code points through the
|
||
<codeph>--strict_unicode</codeph> option. (Although this option is removed in Impala 2.0.)
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.3.2" id="new_features_132">
|
||
|
||
<title>New Features in <keyword keyref="impala132"/></title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
No new features. This point release is exclusively a bug fix release for the IMPALA-1019 issue related to
|
||
HDFS caching.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.3.1" id="new_features_131">
|
||
|
||
<title>New Features in Impala 1.3.1</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This point release is primarily a vehicle to deliver bug fixes. Any new features are minor changes
|
||
resulting from fixes for performance, reliability, or usability issues.
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
A new <cmdname>impalad</cmdname> startup option, <codeph>--insert_inherit_permissions</codeph>, causes
|
||
Impala <codeph>INSERT</codeph> statements to create each new partition with the same HDFS permissions
|
||
as its parent directory. By default, <codeph>INSERT</codeph> statements create directories for new
|
||
partitions using default HDFS permissions. See <xref href="impala_insert.xml#insert"/> for examples of
|
||
<codeph>INSERT</codeph> statements for partitioned tables.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>SHOW FUNCTIONS</codeph> statement now displays the return type of each function, in
|
||
addition to the types of its arguments. See <xref href="impala_show.xml#show"/> for examples.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
You can now specify the clause <codeph>FIELDS TERMINATED BY '\0'</codeph> with a <codeph>CREATE
|
||
TABLE</codeph> statement to use text data files that use ASCII 0 (<codeph>nul</codeph>) characters as a
|
||
delimiter. See <xref href="impala_txtfile.xml#txtfile"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p conref="../shared/impala_common.xml#common/regexp_matching" />
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.3.0" id="new_features_130">
|
||
|
||
<title>New Features in <keyword keyref="impala13_full"/></title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
The admission control feature lets you control and prioritize the volume and resource consumption of
|
||
concurrent queries. This mechanism reduces spikes in resource usage, helping Impala to run alongside
|
||
other kinds of workloads on a busy cluster. It also provides more user-friendly conflict resolution
|
||
when multiple memory-intensive queries are submitted concurrently, avoiding resource contention that
|
||
formerly resulted in out-of-memory errors. See <xref href="impala_admission.xml#admission_control"/>
|
||
for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Enhanced <codeph>EXPLAIN</codeph> plans provide more detail in an easier-to-read format. Now there are
|
||
four levels of verbosity: the <codeph>EXPLAIN_LEVEL</codeph> option can be set from 0 (most concise) to
|
||
3 (most verbose). See <xref href="impala_explain.xml#explain"/> for syntax and
|
||
<xref href="impala_explain_plan.xml#explain_plan"/> for usage information.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>TIMESTAMP</codeph> data type accepts more kinds of input string formats through the
|
||
<codeph>UNIX_TIMESTAMP</codeph> function, and produces more varieties of string formats through the
|
||
<codeph>FROM_UNIXTIME</codeph> function. The documentation now also lists more functions for date
|
||
arithmetic, used for adding and subtracting <codeph>INTERVAL</codeph> expressions from
|
||
<codeph>TIMESTAMP</codeph> values. See <xref href="impala_datetime_functions.xml#datetime_functions"/>
|
||
for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New conditional functions, <codeph>NULLIF()</codeph>, <codeph>NULLIFZERO()</codeph>, and
|
||
<codeph>ZEROIFNULL()</codeph>, simplify porting SQL containing vendor extensions to Impala. See
|
||
<xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New utility function, <codeph>CURRENT_DATABASE()</codeph>. See
|
||
<xref href="impala_misc_functions.xml#misc_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Integration with the YARN resource management framework. This
|
||
feature makes use of the underlying YARN service, plus an additional service (Llama) that coordinates
|
||
requests to YARN for Impala resources, so that the Impala query only proceeds when all requested
|
||
resources are available. See <xref href="impala_resource_management.xml#resource_management"/> for full
|
||
details.
|
||
</p>
|
||
|
||
<p>
|
||
On the Impala side, this feature involves some new startup options for the <cmdname>impalad</cmdname>
|
||
daemon:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>-enable_rm</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-llama_host</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-llama_port</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-llama_callback_port</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-cgroup_hierarchy_path</codeph>
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
For details of these startup options, see <xref href="impala_config_options.xml#config_options"/>.
|
||
</p>
|
||
|
||
<p>
|
||
This feature also involves several new or changed query options that you can set through the
|
||
<cmdname>impala-shell</cmdname> interpreter and apply within a specific session:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>MEM_LIMIT</codeph>: the function of this existing option changes when Impala resource
|
||
management is enabled.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>REQUEST_POOL</codeph>: a new option. (Renamed to <codeph>RESOURCE_POOL</codeph> in Impala
|
||
1.3.0.)
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>V_CPU_CORES</codeph>: a new option.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>RESERVATION_REQUEST_TIMEOUT</codeph>: a new option.
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
For details of these query options, see <xref href="impala_resource_management.xml#rm_query_options"/>.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.2.4" id="new_features_124">
|
||
|
||
<title>New Features in Impala 1.2.4</title>
|
||
|
||
<conbody>
|
||
|
||
<note>
|
||
Impala 1.2.4 is primarily a bug fix release for Impala 1.2.3, plus some performance
|
||
enhancements for the catalog server to minimize startup and DDL wait times for Impala deployments with
|
||
large numbers of databases, tables, and partitions.
|
||
</note>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
On Impala startup, the metadata loading and synchronization mechanism has been improved and optimized,
|
||
to give more responsiveness when starting Impala on a system with a large number of databases, tables,
|
||
or partitions. The initial metadata loading happens in the background, allowing queries to be run
|
||
before the entire process is finished. When a query refers to a table whose metadata is not yet loaded,
|
||
the query waits until the metadata for that table is loaded, and the load operation for that table is
|
||
prioritized to happen first.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Formerly, if you created a new table in Hive, you had to issue the <codeph>INVALIDATE METADATA</codeph>
|
||
statement (with no table name) which was an expensive operation that reloaded metadata for all tables.
|
||
Impala did not recognize the name of the Hive-created table, so you could not do <codeph>INVALIDATE
|
||
METADATA <varname>new_table</varname></codeph> to get the metadata for just that one table. Now, when
|
||
you issue <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>, Impala checks to see if
|
||
that name represents a table created in Hive, and if so recognizes the new table and loads the metadata
|
||
for it. Additionally, if the new table is in a database that was newly created in Hive, Impala also
|
||
recognizes the new database.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
If you issue <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph> and the table has been
|
||
dropped through Hive, Impala will recognize that the table no longer exists.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New startup options let you control the parallelism of the metadata loading during startup for the
|
||
<cmdname>catalogd</cmdname> daemon:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
<codeph>--load_catalog_in_background</codeph> makes Impala load and cache metadata using background
|
||
threads after startup. It is <codeph>true</codeph> by default. Previously, a system with a large
|
||
number of databases, tables, or partitions could be unresponsive or even time out during startup.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
<codeph>--num_metadata_loading_threads</codeph> determines how much parallelism Impala devotes to
|
||
loading metadata in the background. The default is 16. You might increase this value for systems
|
||
with huge numbers of databases, tables, or partitions. You might lower this value for busy systems
|
||
that are CPU-constrained due to jobs from components other than Impala.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.2.3" id="new_features_123">
|
||
|
||
<title>New Features in Impala 1.2.3</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Impala 1.2.3 contains exactly the same feature set as Impala 1.2.2. Its only difference is one additional
|
||
fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or
|
||
MapReduce. If you are upgrading from Impala 1.2.1 or earlier, see
|
||
<xref href="impala_new_features.xml#new_features_122"/> for the latest added features.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.2.2" id="new_features_122">
|
||
|
||
<title>New Features in Impala 1.2.2</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Impala 1.2.2 includes new features for performance, security, and flexibility. The major enhancements over
|
||
1.2.1 are performance related, primarily for join queries.
|
||
</p>
|
||
|
||
<p>
|
||
New user-visible features include:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Join order optimizations. This highly valuable feature automatically distributes and parallelizes the
|
||
work for a join query to minimize disk I/O and network traffic. The automatic optimization reduces the
|
||
need to use query hints or to rewrite join queries with the tables in a specific order based on size or
|
||
cardinality. The new <codeph>COMPUTE STATS</codeph> statement gathers statistical information about
|
||
each table that is crucial for enabling the join optimizations. See
|
||
<xref href="impala_perf_joins.xml#perf_joins"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
<codeph>COMPUTE STATS</codeph> statement to collect both table statistics and column statistics with a
|
||
single statement. Intended to be more comprehensive, efficient, and reliable than the corresponding
|
||
Hive <codeph>ANALYZE TABLE</codeph> statement, which collects statistics in multiple phases through
|
||
MapReduce jobs. These statistics are important for query planning for join queries, queries on
|
||
partitioned tables, and other types of data-intensive operations. For optimal planning of join queries,
|
||
you need to collect statistics for each table involved in the join. See
|
||
<xref href="impala_compute_stats.xml#compute_stats"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Reordering of tables in a join query can be overridden by the <codeph>STRAIGHT_JOIN</codeph> operator,
|
||
allowing you to fine-tune the planning of the join query if necessary, by using the original technique
|
||
of ordering the joined tables in descending order of size. See
|
||
<xref href="impala_perf_joins.xml#straight_join"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>CROSS JOIN</codeph> clause in the
|
||
<codeph><xref href="impala_select.xml#select">SELECT</xref></codeph> statement to allow Cartesian
|
||
products in queries, that is, joins without an equality comparison between columns in both tables.
|
||
Because such queries must be carefully checked to avoid accidental overconsumption of memory, you must
|
||
use the <codeph>CROSS JOIN</codeph> operator to explicitly select this kind of join. See
|
||
<xref href="impala_tutorial.xml#tut_cross_join"/> for examples.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>ALTER TABLE</codeph> statement has new clauses that let you fine-tune table statistics. You
|
||
can use this technique as a less-expensive way to update specific statistics, in case the statistics
|
||
become stale, or to experiment with the effects of different data distributions on query planning.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
LDAP username/password authentication in JDBC/ODBC. See <xref href="impala_ldap.xml#ldap"/> for
|
||
details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
<xref href="impala_string_functions.xml#string_functions/group_concat">GROUP_CONCAT()</xref> aggregate
|
||
function to concatenate column values across all rows of a result set.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>INSERT</codeph> statement now accepts hints, <codeph>[SHUFFLE]</codeph> and
|
||
<codeph>[NOSHUFFLE]</codeph>, to influence the way work is redistributed during
|
||
<codeph>INSERT...SELECT</codeph> operations. The hints are primarily useful for inserting into
|
||
partitioned Parquet tables, where using the <codeph>[SHUFFLE]</codeph> hint can avoid problems due to
|
||
memory consumption and simultaneous open files in HDFS, by collecting all the new data for each
|
||
partition on a specific node.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Several built-in functions and operators are now overloaded for more numeric data types, to reduce the
|
||
requirement to use <codeph>CAST()</codeph> for type coercion in <codeph>INSERT</codeph> statements. For
|
||
example, the expression <codeph>2+2</codeph> in an <codeph>INSERT</codeph> statement formerly produced
|
||
a <codeph>BIGINT</codeph> result, requiring a <codeph>CAST()</codeph> to be stored in an
|
||
<codeph>INT</codeph> variable. Now, addition, subtraction, and multiplication only produce a result
|
||
that is one step <q>bigger</q> than their arguments, and numeric and conditional functions can return
|
||
<codeph>SMALLINT</codeph>, <codeph>FLOAT</codeph>, and other smaller types rather than always
|
||
<codeph>BIGINT</codeph> or <codeph>DOUBLE</codeph>.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
New <codeph>fnv_hash()</codeph> built-in function for constructing hashed values. See
|
||
<xref href="impala_math_functions.xml#math_functions"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The clause <codeph>STORED AS PARQUET</codeph> is accepted as an equivalent for <codeph>STORED AS
|
||
PARQUETFILE</codeph>. This more concise form is recommended for new code.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
Because Impala 1.2.2 builds on a number of features introduced in 1.2.1, if you are upgrading from an older
|
||
1.1.x release straight to 1.2.2, also review <xref href="impala_new_features.xml#new_features_121"/> to see
|
||
features such as the <codeph>SHOW TABLE STATS</codeph> and <codeph>SHOW COLUMN STATS</codeph> statements,
|
||
and user-defined functions (UDFs).
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.2" id="new_features_121">
|
||
|
||
<title>New Features in Impala 1.2.1</title>
|
||
|
||
<conbody>
|
||
|
||
<note>
|
||
The Impala 1.2.1 feature set is a superset of features in the Impala 1.2.0 beta, with the
|
||
exception of resource management, which relies on resource management infrastructure in the
|
||
underlying Hadoop distribution.
|
||
</note>
|
||
|
||
<p>
|
||
Impala 1.2.1 includes new features for security, performance, and flexibility.
|
||
</p>
|
||
|
||
<p>
|
||
New user-visible features include:
|
||
</p>
|
||
|
||
<ul>
|
||
<li rev="1.2.1">
|
||
<p>
|
||
<codeph>SHOW TABLE STATS <varname>table_name</varname></codeph> and <codeph>SHOW COLUMN STATS
|
||
<varname>table_name</varname></codeph> statements, to verify that statistics are available and to see
|
||
the values used during query planning.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
<codeph>CREATE TABLE AS SELECT</codeph> syntax, to create a new table and transfer data into it in a
|
||
single operation.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
<codeph>OFFSET</codeph> clause, for use with the <codeph>ORDER BY</codeph> and <codeph>LIMIT</codeph>
|
||
clauses to produce <q>paged</q> result sets such as items 1-10, then 11-20, and so on.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
<codeph>NULLS FIRST</codeph> and <codeph>NULLS LAST</codeph> clauses to ensure consistent placement of
|
||
<codeph>NULL</codeph> values in <codeph>ORDER BY</codeph> queries.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
New <xref href="impala_functions.xml#builtins">built-in functions</xref>: <codeph>least()</codeph>,
|
||
<codeph>greatest()</codeph>, <codeph>initcap()</codeph>.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
New aggregate function: <codeph>ndv()</codeph>, a fast alternative to <codeph>COUNT(DISTINCT
|
||
<varname>col</varname>)</codeph> returning an approximate result.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
The <codeph>LIMIT</codeph> clause can now accept a numeric expression as an argument, rather than only
|
||
a literal constant.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
The <codeph>SHOW CREATE TABLE</codeph> statement displays the end result of all the <codeph>CREATE
|
||
TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements for a particular table. You can use the
|
||
output to produce a simplified setup script for a schema.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
The <codeph>--idle_query_timeout</codeph> and <codeph>--idle_session_timeout</codeph> options for
|
||
<cmdname>impalad</cmdname> control the time intervals after which idle queries are cancelled, and idle
|
||
sessions expire. See <xref href="impala_timeouts.xml#timeouts"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
User-defined functions (UDFs). This feature lets you transform data in very flexible ways, which is
|
||
important when using Impala as part of an ETL or ELT pipeline. Prior to Impala 1.2, using UDFs required
|
||
switching into Hive. Impala 1.2 can run scalar UDFs and user-defined aggregate functions (UDAs). Impala
|
||
can run high-performance functions written in C++, or you can reuse existing Hive functions written in
|
||
Java.
|
||
</p>
|
||
|
||
<p>
|
||
You create UDFs through the <codeph>CREATE FUNCTION</codeph> statement and drop them through the
|
||
<codeph>DROP FUNCTION</codeph> statement. See <xref href="impala_udf.xml#udfs"/> for instructions about
|
||
coding, building, and deploying UDFs, and <xref href="impala_create_function.xml#create_function"/> and
|
||
<xref href="impala_drop_function.xml#drop_function"/> for related SQL syntax.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new service automatically propagates changes to table data and metadata made by one Impala node,
|
||
sending the new or updated metadata to all the other Impala nodes. The automatic synchronization
|
||
mechanism eliminates the need to use the <codeph>INVALIDATE METADATA</codeph> and
|
||
<codeph>REFRESH</codeph> statements after issuing Impala statements such as <codeph>CREATE
|
||
TABLE</codeph>, <codeph>ALTER TABLE</codeph>, <codeph>DROP TABLE</codeph>, <codeph>INSERT</codeph>, and
|
||
<codeph>LOAD DATA</codeph>.
|
||
</p>
|
||
|
||
<p>
|
||
For even more precise synchronization, you can enable the
|
||
<codeph><xref href="impala_sync_ddl.xml#sync_ddl">SYNC_DDL</xref></codeph> query option before issuing
|
||
a DDL, <codeph>INSERT</codeph>, or <codeph>LOAD DATA</codeph> statement. This option causes the
|
||
statement to wait, returning only after the catalog service has broadcast the applicable changes to all
|
||
Impala nodes in the cluster.
|
||
</p>
|
||
|
||
<note>
|
||
<p>
|
||
Because the catalog service only monitors operations performed through Impala, <codeph>INVALIDATE
|
||
METADATA</codeph> and <codeph>REFRESH</codeph> are still needed on the Impala side after creating new
|
||
tables or loading data through the Hive shell or by manipulating data files directly in HDFS. Because
|
||
the catalog service broadcasts the result of the <codeph>REFRESH</codeph> and <codeph>INVALIDATE
|
||
METADATA</codeph> statements to all Impala nodes, when you do need to use those statements, you can
|
||
do so a single time rather than on every Impala node.
|
||
</p>
|
||
</note>
|
||
|
||
<p>
|
||
This service is implemented by the <cmdname>catalogd</cmdname> daemon. See
|
||
<xref href="impala_components.xml#intro_catalogd"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements have new clauses
|
||
<codeph>TBLPROPERTIES</codeph> and <codeph>WITH SERDEPROPERTIES</codeph>. The
|
||
<codeph>TBLPROPERTIES</codeph> clause lets you associate arbitrary items of metadata with a particular
|
||
table as key-value pairs. The <codeph>WITH SERDEPROPERTIES</codeph> clause lets you specify the
|
||
serializer/deserializer (SerDes) classes that read and write data for a table; although Impala does not
|
||
make use of these properties, sometimes particular values are needed for Hive compatibility. See
|
||
<xref href="impala_create_table.xml#create_table"/> and
|
||
<xref href="impala_alter_table.xml#alter_table"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Delegation support lets you authorize certain OS users associated with applications (for example,
|
||
<codeph>hue</codeph>), to submit requests using the credentials of other users.
|
||
See <xref href="impala_delegation.xml#delegation"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Enhancements to <codeph>EXPLAIN</codeph> output. In particular, when you enable the new
|
||
<codeph>EXPLAIN_LEVEL</codeph> query option, the <codeph>EXPLAIN</codeph> and <codeph>PROFILE</codeph>
|
||
statements produce more verbose output showing estimated resource requirements and whether table and
|
||
column statistics are available for the applicable tables and columns. See
|
||
<xref href="impala_explain.xml#explain"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
<codeph>SHOW CREATE TABLE</codeph> summarizes the effects of the original <codeph>CREATE TABLE</codeph>
|
||
statement and any subsequent <codeph>ALTER TABLE</codeph> statements, giving you a <codeph>CREATE
|
||
TABLE</codeph> statement that will re-create the current structure and layout for a table.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="1.2.1">
|
||
<p>
|
||
The <codeph>LIMIT</codeph> clause for queries now accepts an arithmetic expression, in addition to
|
||
numeric literals.
|
||
</p>
|
||
</li>
|
||
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept rev="1.2" id="new_features_120">
|
||
|
||
<title>New Features in Impala 1.2.0 (Beta)</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The Impala 1.2.0 beta includes new features for security, performance, and flexibility.
|
||
</p>
|
||
|
||
<p>
|
||
New user-visible features include:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
User-defined functions (UDFs). This feature lets you transform data in very flexible ways, which is
|
||
important when using Impala as part of an ETL or ELT pipeline. Prior to Impala 1.2, using UDFs required
|
||
switching into Hive. Impala 1.2 can run scalar UDFs and user-defined aggregate functions (UDAs). Impala
|
||
can run high-performance functions written in C++, or you can reuse existing Hive functions written in
|
||
Java.
|
||
</p>
|
||
|
||
<p>
|
||
You create UDFs through the <codeph>CREATE FUNCTION</codeph> statement and drop them through the
|
||
<codeph>DROP FUNCTION</codeph> statement. See <xref href="impala_udf.xml#udfs"/> for instructions about
|
||
coding, building, and deploying UDFs, and <xref href="impala_create_function.xml#create_function"/> and
|
||
<xref href="impala_drop_function.xml#drop_function"/> for related SQL syntax.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
A new service automatically propagates changes to table data and metadata made by one Impala node,
|
||
sending the new or updated metadata to all the other Impala nodes. The automatic synchronization
|
||
mechanism eliminates the need to use the <codeph>INVALIDATE METADATA</codeph> and
|
||
<codeph>REFRESH</codeph> statements after issuing Impala statements such as <codeph>CREATE
|
||
TABLE</codeph>, <codeph>ALTER TABLE</codeph>, <codeph>DROP TABLE</codeph>, <codeph>INSERT</codeph>, and
|
||
<codeph>LOAD DATA</codeph>.
|
||
</p>
|
||
|
||
<note>
|
||
<p>
|
||
Because this service only monitors operations performed through Impala, <codeph>INVALIDATE
|
||
METADATA</codeph> and <codeph>REFRESH</codeph> are still needed on the Impala side after creating new
|
||
tables or loading data through the Hive shell or by manipulating data files directly in HDFS. Because
|
||
the catalog service broadcasts the result of the <codeph>REFRESH</codeph> and <codeph>INVALIDATE
|
||
METADATA</codeph> statements to all Impala nodes, when you do need to use those statements, you can
|
||
do so a single time rather than on every Impala node.
|
||
</p>
|
||
</note>
|
||
|
||
<p>
|
||
This service is implemented by the <cmdname>catalogd</cmdname> daemon. See
|
||
<xref href="impala_components.xml#intro_catalogd"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Integration with the YARN resource management framework. This
|
||
feature makes use of the underlying YARN service, plus an additional service (Llama) that coordinates
|
||
requests to YARN for Impala resources, so that the Impala query only proceeds when all requested
|
||
resources are available. See <xref href="impala_resource_management.xml#resource_management"/> for full
|
||
details.
|
||
</p>
|
||
|
||
<p>
|
||
On the Impala side, this feature involves some new startup options for the <cmdname>impalad</cmdname>
|
||
daemon:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>-enable_rm</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-llama_host</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-llama_port</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-llama_callback_port</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>-cgroup_hierarchy_path</codeph>
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
For details of these startup options, see <xref href="impala_config_options.xml#config_options"/>.
|
||
</p>
|
||
|
||
<p>
|
||
This feature also involves several new or changed query options that you can set through the
|
||
<cmdname>impala-shell</cmdname> interpreter and apply within a specific session:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<codeph>MEM_LIMIT</codeph>: the function of this existing option changes when Impala resource
|
||
management is enabled.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>YARN_POOL</codeph>: a new option. (Renamed to <codeph>RESOURCE_POOL</codeph> in Impala
|
||
1.3.0.)
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>V_CPU_CORES</codeph>: a new option.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>RESERVATION_REQUEST_TIMEOUT</codeph>: a new option.
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
For details of these query options, see <xref href="impala_resource_management.xml#rm_query_options"/>.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
<codeph>CREATE TABLE ... AS SELECT</codeph> syntax, to create a table and copy data into it in a single
|
||
operation. See <xref href="impala_create_table.xml#create_table"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
The <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements have a new
|
||
<codeph>TBLPROPERTIES</codeph> clause that lets you associate arbitrary items of metadata with a
|
||
particular table as key-value pairs. See <xref href="impala_create_table.xml#create_table"/> and
|
||
<xref href="impala_alter_table.xml#alter_table"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Delegation support lets you authorize certain OS users associated with applications (for example,
|
||
<codeph>hue</codeph>), to submit requests using the credentials of other users.
|
||
See <xref href="impala_delegation.xml#delegation"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Enhancements to <codeph>EXPLAIN</codeph> output. In particular, when you enable the new
|
||
<codeph>EXPLAIN_LEVEL</codeph> query option, the <codeph>EXPLAIN</codeph> and <codeph>PROFILE</codeph>
|
||
statements produce more verbose output showing estimated resource requirements and whether table and
|
||
column statistics are available for the applicable tables and columns. See
|
||
<xref href="impala_explain.xml#explain"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_111">
|
||
|
||
<title>New Features in Impala 1.1.1</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Impala 1.1.1 includes new features for security and stability.
|
||
</p>
|
||
|
||
<p>
|
||
New user-visible features include:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
Additional security feature: auditing. New startup options for <cmdname>impalad</cmdname> let you capture
|
||
information about Impala queries that succeed or are blocked due to insufficient privileges. For details,
|
||
see <xref href="impala_security.xml#security"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Parquet data files generated by Impala 1.1.1 are now compatible with the Parquet support in Hive. See
|
||
<xref href="impala_incompatible_changes.xml#incompatible_changes"/> for the procedure to update older
|
||
Impala-created Parquet files to be compatible with the Hive Parquet support.
|
||
</li>
|
||
|
||
<li>
|
||
Additional improvements to stability and resource utilization for Impala queries.
|
||
</li>
|
||
|
||
<li>
|
||
Additional enhancements for compatibility with existing file formats.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_11">
|
||
|
||
<title>New Features in Impala 1.1</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Impala 1.1 includes new features for security, performance, and usability.
|
||
</p>
|
||
|
||
<p>
|
||
New user-visible features include:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
Extensive new security features, built on top of the Sentry open source project. Impala now supports
|
||
fine-grained authorization based on roles. A policy file determines which privileges on which schema
|
||
objects (servers, databases, tables, and HDFS paths) are available to users based on their membership in
|
||
groups. By assigning privileges for views, you can control access to table data at the column level. For
|
||
details, see <xref href="impala_security.xml#security"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Impala can now create, alter, drop, and query views. Views provide a flexible way to set up simple
|
||
aliases for complex queries; hide query details from applications and users; and simplify maintenance as
|
||
you rename or reorganize databases, tables, and columns. See the overview section
|
||
<xref href="impala_views.xml#views"/> and the statements
|
||
<xref href="impala_create_view.xml#create_view"/>, <xref href="impala_alter_view.xml#alter_view"/>, and
|
||
<xref href="impala_drop_view.xml#drop_view"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Performance is improved through a number of automatic optimizations. Resource consumption is also reduced
|
||
for Impala queries. These improvements apply broadly across all kinds of workloads and file formats. The
|
||
major areas of performance enhancement include:
|
||
<ul>
|
||
<li>
|
||
Improved disk and thread scheduling, which applies to all queries.
|
||
</li>
|
||
|
||
<li>
|
||
Improved hash join and aggregation performance, which applies to queries with large build tables or a
|
||
large number of groups.
|
||
</li>
|
||
|
||
<li>
|
||
Dictionary encoding with Parquet, which applies to Parquet tables with short string columns.
|
||
</li>
|
||
|
||
<li>
|
||
Improved performance on systems with SSDs, which applies to all queries and file formats.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
Some new built-in functions are implemented:
|
||
<xref href="impala_string_functions.xml#string_functions/translate">translate()</xref> to substitute
|
||
characters within strings,
|
||
<!-- IMPALA-418 -->
|
||
<xref href="impala_misc_functions.xml#misc_functions/user">user()</xref> to check the login ID of the
|
||
connected user.
|
||
<!-- IMPALA-??? -->
|
||
</li>
|
||
|
||
<li>
|
||
The new <codeph>WITH</codeph> clause for <codeph>SELECT</codeph> statements lets you simplify complicated
|
||
queries in a way similar to creating a view. The effects of the <codeph>WITH</codeph> clause only last
|
||
for the duration of one query, unlike views, which are persistent schema objects that can be used by
|
||
multiple sessions or applications. See <xref href="impala_with.xml#with"/>.
|
||
</li>
|
||
|
||
<li>
|
||
An enhancement to <codeph>DESCRIBE</codeph> statement, <codeph>DESCRIBE FORMATTED
|
||
<varname>table_name</varname></codeph>, displays more detailed information about the table. This
|
||
information includes the file format, location, delimiter, ownership, external or internal, creation and
|
||
access times, and partitions. The information is returned as a result set that can be interpreted and
|
||
used by a management or monitoring application. See <xref href="impala_describe.xml#describe"/>.
|
||
</li>
|
||
|
||
<li>
|
||
You can now insert a subset of columns for a table, with other columns being left as all
|
||
<codeph>NULL</codeph> values. Or you can specify the columns in any order in the destination table,
|
||
rather than having to match the order of the corresponding columns in the source. <codeph>VALUES</codeph>
|
||
clause. This feature is known as <q>column permutation</q>. See <xref href="impala_insert.xml#insert"/>.
|
||
</li>
|
||
|
||
<li>
|
||
The new <codeph>LOAD DATA</codeph> statement lets you load data into a table directly from an HDFS data
|
||
file. This technique lets you minimize the number of steps in your ETL process, and provides more
|
||
flexibility. For example, you can bring data into an Impala table in one step. Formerly, you might have
|
||
created an external table where the data files are not entirely under your control, or copied the data
|
||
files to Impala data directories manually, or loaded the original data into one table and then used the
|
||
<codeph>INSERT</codeph> statement to copy it to a new table with a different file format, partitioning
|
||
scheme, and so on. See <xref href="impala_load_data.xml#load_data"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Improvements to Impala-HBase integration:
|
||
<ul>
|
||
<li>
|
||
New query options for HBase performance:
|
||
<codeph><xref href="impala_hbase_cache_blocks.xml#hbase_cache_blocks">HBASE_CACHE_BLOCKS</xref></codeph>
|
||
and <codeph><xref href="impala_hbase_caching.xml#hbase_caching">HBASE_CACHING</xref></codeph>.
|
||
</li>
|
||
|
||
<li>
|
||
Support for binary data types in HBase tables. See <xref href="impala_hbase.xml#hbase_types"/> for
|
||
details.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
You can issue <codeph>REFRESH</codeph> as a SQL statement through any of the programming interfaces that
|
||
Impala supports. <codeph>REFRESH</codeph> formerly had to be issued as a command through the
|
||
<cmdname>impala-shell</cmdname> interpreter, and was not available through a JDBC or ODBC API call. As
|
||
part of this change, the functionality of the <codeph>REFRESH</codeph> statement is divided between two
|
||
statements. In Impala 1.1, <codeph>REFRESH</codeph> requires a table name argument and immediately
|
||
reloads the metadata; the new <codeph>INVALIDATE METADATA</codeph> statement works the same as the Impala
|
||
1.0 <codeph>REFRESH</codeph> did: the table name argument is optional, and the metadata for one or all
|
||
tables is marked as stale, but not actually reloaded until the table is queried. When you create a new
|
||
table in the Hive shell or through a different Impala node, you must enter <codeph>INVALIDATE
|
||
METADATA</codeph> with no table parameter before you can see the new table in
|
||
<cmdname>impala-shell</cmdname>. See <xref href="impala_refresh.xml#refresh"/> and
|
||
<xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_101">
|
||
|
||
<title>New Features in Impala 1.0.1</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
New user-visible features include:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
The <codeph>VALUES</codeph> clause lets you <codeph>INSERT</codeph> one or more rows using literals,
|
||
function return values, or other expressions. For performance and scalability, you should still use
|
||
<codeph>INSERT ... SELECT</codeph> for bringing large quantities of data into an Impala table. The
|
||
<codeph>VALUES</codeph> clause is a convenient way to set up small tables, particularly for initial
|
||
testing of SQL features that do not require large amounts of data. See
|
||
<xref href="impala_insert.xml#values"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
The <codeph>-B</codeph> and <codeph>-o</codeph> options of the <codeph>impala-shell</codeph> command can
|
||
turn query results into delimited text files and store them in an output file. The plain text results are
|
||
useful for using with other Hadoop components or Unix tools. In benchmark tests, it is also faster to
|
||
produce plain rather than pretty-printed results, and write to a file rather than to the screen, giving a
|
||
more accurate picture of the actual query time.
|
||
</li>
|
||
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_101"/> for details.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_10">
|
||
|
||
<title>New Features in Impala 1.0</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This version has multiple performance improvements and adds the following functionality:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_10"/>.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph><xref href="impala_alter_table.xml#alter_table">ALTER TABLE</xref></codeph> statement.
|
||
</li>
|
||
|
||
<li>
|
||
<xref href="impala_hints.xml#hints">Hints</xref> to allow specifying a particular join strategy.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph><xref href="impala_refresh.xml#refresh">REFRESH</xref></codeph> for a single table.
|
||
</li>
|
||
|
||
<li>
|
||
Dynamic resource management, allowing high concurrency for Impala queries.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_07">
|
||
|
||
<title>New Features in Version 0.7 of the Impala Beta Release</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This version has multiple performance improvements and adds the following functionality:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_07"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Support for the Parquet file format. For more information on file formats, see
|
||
<xref href="impala_file_formats.xml#file_formats"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Added support for Avro.
|
||
</li>
|
||
|
||
<li>
|
||
Support for the memory limits. For more information, see the example on modifying memory limits in
|
||
<xref href="impala_config_options.xml#config_options"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Bigger and faster joins through the addition of partitioned joins to the already supported broadcast
|
||
joins.
|
||
</li>
|
||
|
||
<li>
|
||
Fully distributed aggregations.
|
||
</li>
|
||
|
||
<li>
|
||
Fully distributed top-n computation.
|
||
</li>
|
||
|
||
<li>
|
||
Support for creating and altering tables.
|
||
</li>
|
||
|
||
<li>
|
||
Support for GROUP BY with floats and doubles.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_06">
|
||
|
||
<title>New Features in Version 0.6 of the Impala Beta Release</title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_06"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Added support for Impala on SUSE and Debian/Ubuntu. Impala is now supported on:
|
||
<ul>
|
||
<li>
|
||
RHEL5.7/6.2 and Centos5.7/6.2
|
||
</li>
|
||
|
||
<li>
|
||
SUSE 11 with Service Pack 1 or higher
|
||
</li>
|
||
|
||
<li>
|
||
Ubuntu 10.04/12.04 and Debian 6.03
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
Support for the RCFile file format. For more information on file formats, see
|
||
<xref href="impala_file_formats.xml#file_formats">Understanding File Formats</xref>.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_05">
|
||
|
||
<title>New Features in Version 0.5 of the Impala Beta Release</title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_05"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Added support for a JDBC driver that allows you to access Impala from a Java client. To use this feature,
|
||
follow the instructions in <xref href="impala_jdbc.xml#impala_jdbc"/> to install the JDBC
|
||
driver JARs on the client machine and modify the <codeph>CLASSPATH</codeph> on the client to include the
|
||
JARs.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_04">
|
||
|
||
<title>New Features in Version 0.4 of the Impala Beta Release</title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_04"/>.
|
||
</li>
|
||
|
||
<li>
|
||
Added support for Impala on RHEL5.7/Centos5.7. Impala is now supported on RHEL5.7/6.2 and Centos5.7/6.2.
|
||
</li>
|
||
|
||
<li>
|
||
The Impala debug webserver now has the ability to serve static files from
|
||
<codeph>${IMPALA_HOME}/www</codeph>. This can be disabled by setting
|
||
<codeph>--enable_webserver_doc_root=false</codeph> on the command line. As a result, Impala now uses the
|
||
Twitter Bootstrap library to style its debug webpages, and the <codeph>/queries</codeph> page now tracks
|
||
the last 25 queries run by each Impala daemon.
|
||
</li>
|
||
|
||
<li>
|
||
Additional metrics available on the Impala Debug Webpage.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_03">
|
||
|
||
<title>New Features in Version 0.3 of the Impala Beta Release</title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_03"/>.
|
||
</li>
|
||
|
||
<li>
|
||
The <codeph>state-store-service binary</codeph> has been renamed <codeph>statestored</codeph>.
|
||
</li>
|
||
|
||
<li>
|
||
The location of the Impala configuration files has changed from the <codeph>/usr/lib/impala/conf</codeph>
|
||
directory to the <codeph>/etc/impala/conf</codeph> directory.
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="new_features_02">
|
||
|
||
<title>New Features in Version 0.2 of the Impala Beta Release</title>
|
||
|
||
<conbody>
|
||
|
||
<ul>
|
||
<li>
|
||
Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_02"/>.
|
||
</li>
|
||
|
||
<li>
|
||
<b>Added Default Query Options</b> Default query options override all default QueryOption values when
|
||
starting <codeph>impalad</codeph>. The format is:
|
||
<codeblock>-default_query_options='key=value;key=value'</codeblock>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
</concept>
|