IMPALA-9029: [DOCS] Impala 3.4 Release Notes

-Added broadcast_bytes_limit query option

Change-Id: I4385749de35f8379ecf6566fe515ed500b42d6cc
Reviewed-on: http://gerrit.cloudera.org:8080/14863
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
(cherry picked from commit 955868f88a)
This commit is contained in:
Alex Rodoni
2019-12-06 11:51:17 -08:00
committed by Joe McDonnell
parent 3a0515a1a6
commit f0e09a55ab
5 changed files with 223 additions and 224 deletions

View File

@@ -1519,8 +1519,7 @@ alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENE
or <codeph>-f</codeph> options are used.
</p>
<p id="live_progress_live_summary_asciinema">
To see how the <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> query
<p id="live_progress_live_summary_asciinema">To see how the <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> query
options work in real time, see
<xref href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" scope="external" format="html">this
animated demo</xref>.

View File

@@ -50,6 +50,26 @@ under the License.
</p>
<p outputclass="toc inpage"/>
</conbody>
<concept id="incompatible_changes_340x">
<title>Incompatible Changes Introduced in Impala 3.4.x</title>
<conbody>
<p> For the full list of issues closed in this release, including any that
introduce behavior changes or incompatibilities, see the <xref
keyref="changelog_34">changelog for <keyword keyref="impala34"
/></xref>. <ul>
<li>To optimize query performance, Impala planner uses the value of
the <codeph>fs.s3a.block.size</codeph> startup flag when calculating
the split size on non-block based stores, e.g. S3, ADLS, etc.
Starting in this release, Impala planner uses the
<codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option to
get the Parquet file format specific split size.<p>For Parquet
files, the <codeph>fs.s3a.block.size</codeph> startup flag is no
longer used.</p><p>The default value of the
<codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option is
256 MB.</p></li>
</ul></p>
</conbody>
</concept>
<concept id="incompatible_changes_330x">
<title>Incompatible Changes Introduced in Impala 3.3.x</title>
<conbody>

View File

@@ -259,162 +259,6 @@ under the License.
</concept>
<!--IMPALA-7585 and IMPALA-7298 are fixed. Should be removed from here?-->
<concept id="IMPALA-7585" audience="hidden">
<title>Impala user not added to /etc/passwd when LDAP is enabled</title>
<conbody>
<p>
When using Impala with LDAP enabled, a user may hit the following:
</p>
<pre>Not authorized: Client connection negotiation failed: client connection to 127.0.0.1:27000: SASL(-1): generic failure: All-whitespace username.</pre>
<p>
The following sequence can lead to the <codeph>impala</codeph> user not being created
in <codeph>/etc/passwd</codeph> on some machines on the cluster.
<ul>
<li>
Time 1: The <codeph>impala</codeph> user is not in LDAP. Impala was installed on
machine 1, and the user <codeph>impala</codeph> is created in
<codeph>/etc/passwd</codeph>.
</li>
<li>
Time 2: The <codeph>impala</codeph> user is added to LDAP.
</li>
<li>
Time 3: A new machine is added to the cluster. When adding Impala service to this
new machine, adding the <codeph>impala</codeph> user will fail as it already
exists in LDAP.
</li>
</ul>
</p>
<p>
The consequence is that the <codeph>impala</codeph> user doesn't exist in
<codeph>/etc/passwd</codeph> on the new machine, leading to the error above.
</p>
<p>
<b>Workaround</b>: Manually edit <codeph>/etc/passwd</codeph> to add the
<codeph>impala</codeph> user
</p>
<p>
<b>Apache Issue:</b> <xref keyref="IMPALA-7585">IMPALA-7585</xref>
</p>
<p>
<b>Affected Versions:</b> Impala 2.12, Impala 3.0
</p>
<p>
<b>Fixed Version:</b> Impala 3.1
</p>
</conbody>
</concept>
<concept id="IMPALA-7298" audience="hidden">
<title>Kerberos authentication fails with the reverse DNS lookup disabled</title>
<conbody>
<p>
Kerberos authentication does not function correctly if <codeph>rdns = false</codeph>
is configured in <codeph>krb5.conf</codeph>. If the flag <codeph>rdns =
false</codeph>, when Impala tries to match principals, it will fail because Kerberos
receives a SPN (Service Principal Name) with an IP address in it, but Impala expects a
principal with a FQDN in it.
</p>
<p>
You may hit the following error:
</p>
<pre>WARNINGS: TransmitData() to X.X.X.X:27000 failed: Not authorized: Client connection negotiation failed: client connection to X.X.X.X:27000: Server impala/X.X.X.X@VPC.CLOUDERA.COM not found in Kerberos database
</pre>
<p>
<b>Apache Issue:</b> <xref keyref="IMPALA-7298">IMPALA-7298</xref>
</p>
<p>
<b>Affected Versions:</b> Impala 2.12.0 and 3.0
</p>
<p>
<b>Workaround:</b> Set the following flags in <codeph>krb5.conf</codeph>:
<ul>
<li>
<codeph>dns_canonicalize_hostname = true</codeph>
</li>
<li>
<codeph>rdns = true</codeph>
</li>
</ul>
</p>
<p>
<b>Fixed Versions:</b> Impala 3.1
</p>
</conbody>
</concept>
<!--kudu2198 is fixed-->
<concept id="KUDU-2198" audience="hidden">
<title>System-wide auth-to-local mapping not applied correctly to Kudu service account</title>
<conbody>
<p>
Due to system the <codeph>auth_to_local</codeph> mapping, the principal may be mapped
to some local name.
</p>
<p>
When running with Kerberos enabled, you may hit the following error message where
<varname>&lt;random-string></varname> is some random string which doesn't match the
primary in the Kerberos principal.
</p>
<pre>WARNINGS: TransmitData() to X.X.X.X:27000 failed: Remote error: Not authorized: {username='&lt;random-string>', principal='impala/redacted'} is not allowed to access DataStreamService
</pre>
<p>
<b>Workaround</b>: Start Impala with the
<codeph>--use_system_auth_to_local=false</codeph> flag to ignore the system-wide
<codeph>auth_to_local</codeph> mappings configured in <codeph>/etc/krb5.conf</codeph>.
</p>
<p>
<b>Apache Issue:</b> <xref keyref="IMPALA-8154">IMPALA-8154</xref>
</p>
<p>
<b>Affected Versions:</b> Impala 2.12, Impala 3.0 / Kudu 1.6
</p>
<p>
<b>Fixed Versions:</b> Impala 3.2
</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_resources">
@@ -722,25 +566,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</concept>
<concept id="IMP-175" audience="hidden">
<title>Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</title>
<conbody>
<p>
Impala behavior differs from Hive with respect to out of range float/double values.
Out of range values are returned as maximum allowed value of type (Hive returns NULL).
</p>
<p>
<b>Workaround:</b> None
</p>
</conbody>
</concept>
<concept id="flume_writeformat_text">
<title>Configuration needed for Flume to be compatible with Impala</title>
@@ -837,6 +662,24 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</conbody>
</concept>
<concept id="IMPALA-8953">
<title>Tables and databases sharing same name can cause query
failures</title>
<conbody>
<p>A table and a database that share the same name can cause a query
failure if the table is not readable by Impala, for example, the table
was created in Hive in the Open CSV Serde format. The following
exception will return:</p>
<codeblock>CAUSED BY: TableLoadingException: Unrecognized table type for table</codeblock>
<p>
<b>Apache Issue:</b>
<xref keyref="IMPALA-8953">IMPALA-8953</xref>
</p>
<p>
<b>Workaround:</b> Do not create databases and tables with the same
names.</p>
</conbody>
</concept>
</concept>
@@ -852,22 +695,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</p>
</conbody>
<!--IMPALA8376 fixed the issue below.-->
<concept id="IMPALA-8829" audience="hidden">
<title>Unable to Correctly Parse the Terabyte Unit</title>
<conbody>
<p>Impala does not support parsing strings that contain "TB" when used
as a unit for terabytes. The flags related to memory limits may be
affected, such as the flags for scratch space and data cache.</p>
<p><b>Workaround:</b> Use other supported units to specify values, e.g.
GB or MB.</p>
<p><b>Affected Versions:</b> All versions</p>
<p>
<b>Apache Issue:</b>
<xref keyref="IMPALA-8829">IMPALA-8829</xref>
</p>
</conbody>
</concept>
<concept id="IMPALA-4551">
@@ -989,33 +816,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</concept>
<!--Fixed in 3.2-->
<concept id="IMPALA-941" rev="IMPALA-941" audience="hidden">
<title>Impala Parser issue when using fully qualified table names that start with a number</title>
<conbody>
<p>
A fully qualified table name starting with a number could cause a parsing error. In a
name such as <codeph>db.571_market</codeph>, the decimal point followed by digits is
interpreted as a floating-point number.
</p>
<p>
<b>Apache Issue:</b> <xref keyref="IMPALA-941">IMPALA-941</xref>
</p>
<p>
<b>Workaround:</b> Surround each part of the fully qualified name with backticks
(<codeph>``</codeph>).
</p>
</conbody>
</concept>
<concept id="IMPALA-532" rev="IMPALA-532">
<title>Impala should tolerate bad locale settings</title>

View File

@@ -45,6 +45,185 @@ under the License.
<p outputclass="toc inpage"/>
</conbody>
<concept rev="3.2.0" id="new_features_34">
<title>New Features in <keyword keyref="impala34"/></title>
<conbody>
<p> The following sections describe the noteworthy improvements made in
<keyword keyref="impala34"/>. </p>
<p> For the full list of issues closed in this release, see the <xref
keyref="changelog_34">changelog for <keyword keyref="impala34"
/></xref>. </p>
<section id="section_cw4_nmw_pjb">
<title>Support for Hive Insert-Only Transactional Tables</title>
<p>Impala added the support to truncate insert-only transactional
tables. </p>
<p>By default, Impala creates an insert-only transactional table when
you issue the <codeph>CREATE TABLE</codeph> statement.</p>
<p>Use the Hive compaction to compact small files to improve the
performance and scalability of metadata in transactional tables.</p>
<p>See <xref href="impala_transactions.xml#transactions"/> for more
information.</p>
</section>
<section id="impala-8656">
<title>Server-side Spooling of Query Results</title>
<p>You can use the <codeph>SPOOL_QUERY_RESULTS</codeph> query option to
control how query results are returned to the client.</p>
<p>By default, when a client fetches a set of query results, the next
set of results are fetched in batches until all the result rows are
produced. If a client issues a query without fetching all the results,
the query fragments continue to hold on to the resources until the
query is canceled and unregistered, potentially tying up resources and
causing other queries to wait in admission control.</p>
<p>When the query result spooling feature is enabled, the result sets of
queries are eagerly fetched and buffered until they are read by the
client, and resources are freed up for other queries.</p>
<p>See <xref href="impala_query_results_spooling.xml#data_sink"/> for
the new feature and the query options.</p>
</section>
<section id="impala-8584">
<title>Cookie-based Authentication</title>
<p>Starting in this version, Impala supports cookies for authentication
when clients connect via HiveServer2 over HTTP. </p>
<p>You can use the <codeph>--max_cookie_lifetime_s startup</codeph> flag
to:</p>
<ul>
<li>Disable the use of cookies</li>
<li>Control how long generated cookies are valid for</li>
</ul>
<p>See <xref href="impala_client.xml#intro_client"/> for more
information.</p>
</section>
<section id="section_hw4_nmw_pjb">
<title>Object Ownership Support</title>
<p>Object ownership for tables, views, and databases is enabled by
default in Impala. When you create a database, a table, or a view, as
the owner of that object, you implicitly have the privileges on the
object. The privileges that owners have are specified in Ranger on the
special user, <codeph>{OWNER}</codeph>. </p>
<p>The <codeph>{OWNER}</codeph> user must be defined in Ranger for the
object ownership privileges work in Impala.</p>
<p>See <xref href="impala_authorization.xml#authorization"/> for
details.</p>
</section>
<section id="impala-8752">
<title>New Built-in Functions for Fuzzy Matching of Strings</title>
<p>Use the new Jaro or Jaro-Winkler functions to perform fuzzy matches
on relatively short strings, e.g. to scrub user inputs of names
against the records in the database.</p>
<ul>
<li><codeph>JARO_DISTANCE</codeph>, <codeph>JARO_DST</codeph></li>
<li><codeph>JARO_SIMILARITY</codeph>, <codeph>JARO_SIM</codeph></li>
<li><codeph>JARO_WINKLER_DISTANCE</codeph>,
<codeph>JW_DST</codeph></li>
<li><codeph>JARO_WINKLER_SIMILARITY</codeph>,
<codeph>JW_SIM</codeph></li>
</ul>
<p>See <xref href="impala_string_functions.xml#string_functions"/> for
details.</p>
</section>
<section id="impala-8376">
<title>Capacity Quota for Scratch Disks</title>
<p>When configuring scratch space for intermediate files used in large
sorts, joins, aggregations, or analytic function operations, use the
<codeph>scratch_dirs</codeph> startup flag to optionally specify a
capacity quota per scratch directory, e.g.,
<codeph>scratch_dirs=/dir1:5MB,/dir2</codeph>.</p>
<p>See <xref href="impala_file_formats.xml#file_formats"/> for
details.</p>
</section>
<section id="impala-8913">
<title>Query Option for Disabling HBase Row Estimation</title>
<p>During query plan generation, Impala samples underlying HBase tables
to estimate row count and row size, but the sampling process can
negatively impact the planning time. To alleviate the issue, when the
HBase table stats do not change much in a short time, disable the
sampling with the <codeph>DISABLE_HBASE_NUM_ROWS_ESTIMATE</codeph>
query option so that the Impala planner falls back to using Hive
Metastore (HMS) table stats instead. </p>
<p>See <xref
href="impala_disable_hbase_num_rows_estimate.xml#disable_hbase_num_rows_estimate"
/>.</p>
</section>
<section id="impala-8942">
<title>Query Option for Controlling Size of Parquet Splits on Non-block
Stores</title>
<p>To optimize query performance, Impala planner uses the value of the
<codeph>fs.s3a.block.size</codeph> startup flag when calculating the
split size on non-block based stores, e.g. S3, ADLS, etc. Starting in
this release, Impala planner uses the
<codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option to get
the Parquet file format specific split size. </p>
<p>For Parquet files, the <codeph>fs.s3a.block.size</codeph> startup
flag is no longer used.</p>
<p>The default value of the
<codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option is 256
MB.</p>
<p>See <xref href="impala_s3.xml#s3"/> for tuning Impala query
performance for S3.</p>
</section>
<section id="impala-5149">
<title>Query Profile Exported to JSON</title>
<p>On the Query Details page of Impala Daemon Web UI, you have a new
option, in addition to the existing Thrift and Text formats, to export
the query profile output in the JSON format.</p>
<p>See <xref href="impala_webui.xml#webui"/> for generating JSON query
profile outputs in Web UI.</p>
</section>
<section id="section_rnb_ny4_yjb">
<title>DATE Data Type Supported in Avro Tables</title>
<p>You can now use the <codeph>DATE</codeph> data type to query date
values from Avro tables.</p>
<p>See <xref href="impala_avro.xml#avro"/> for details.</p>
</section>
<section>
<title>Primary Key and Foreign Key Constraints</title>
<p>This release adds support for primary and foreign key constraints,
but in this release the constraints are advisory and intended for
estimating cardinality during query planning in a future release.
There is no attempt to enforce constraints. See <xref
href="impala_create_table.xml"/> for details. </p>
</section>
<section>
<title>Enhanced External Kudu Table</title>
<p>By default HMS implicitly translates internal Kudu tables to external
Kudu tables with the 'external.table.purge' property set to true. These
tables behave similar to internal tables. You can explicitly create such
external Kudu tables. See <xref href="impala_create_table.xml"/>
for details.</p>
</section>
<section>
<title>Ranger Column Masking</title>
<p>This release supports Ranger column masking, which hides sensitive columnar
data in Impala query output. For example, you can define a policy that reveals
only the first or last four characters of column data. Column masking is enabled
by default. See <xref href="impala_authorization.xml#sec_ranger_col_masking"/>
for details.</p>
</section>
<section>
<title>BROADCAST_BYTES_LIMIT query option</title>
<p>You can set the default limit for the size of the broadcast input. Such a limit
can prevent possible performance problems.</p>
<!--Add link to details after file is published.-->
</section>
<section>
<title>Experimental Support for Apache Hudi</title>
<p>In this release, you can use Read Optimized Queries on Hudi tables. See
<xref href="impala_hudi.xml"/> for details. </p>
</section>
<section>
<title>ORC Reads Enabled by Default</title>
<p>Impala stability and performance have been improved. Consequently, ORC reads are now
enabled in Impala by default. To disable, set <codeph>--enable_orc_scanner</codeph> to
<codeph>false</codeph> when starting the cluster. See <xref href="impala_orc.xml"/> for
details.</p>
</section>
<section>
<title>Support for ZSTD and DEFLATE</title>
<p>This release supports ZSTD and DEFLATE compression codecs for text files. See
<xref href="impala_txtfile.xml#gzip"/> for details.</p>
</section>
</conbody>
</concept>
<concept rev="3.2.0" id="new_features_33">
<title>New Features in <keyword keyref="impala33"/></title>
<conbody>
@@ -231,9 +410,9 @@ under the License.
<title>Default File Format Changed to Parquet</title>
<p>When you create a table, the default format for that table data is
now Parquet.</p>
<p>For backward compatibility, you can use the DEFAULT_FILE_FORMAT query
option to set the default file format to the previous default, text,
or other formats.</p>
<p>For backward compatibility, you can use the
<codeph>DEFAULT_FILE_FORMAT</codeph> query option to set the default
file format to the previous default, text, or other formats.</p>
</section>
<section id="section_m1h_mnf_t3b">
<title>Built-in Function to Process JSON Objects</title>

View File

@@ -120,7 +120,8 @@ under the License.
details.
</p>
<p rev="2.0.0">You can also use text data compressed in the bzip2, deflate, gzip, Snappy, or
<p rev="2.0.0">
You can also use text data compressed in the bzip2, deflate, gzip, Snappy, or
zstd formats. Because these compressed formats are not <q>splittable</q> in the way that LZO
is, there is less opportunity for Impala to parallelize queries on them. Therefore, use
these types of compressed data only for convenience if that is the format in which you