IMPALA-9029: [DOCS] Impala 3.4 Release Notes

-Added broadcast_bytes_limit query option Change-Id: I4385749de35f8379ecf6566fe515ed500b42d6cc Reviewed-on: http://gerrit.cloudera.org:8080/14863 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> (cherry picked from commit 955868f88a)
2025-12-25 02:03:09 -05:00 · 2019-12-06 11:51:17 -08:00
parent 3a0515a1a6
commit f0e09a55ab
5 changed files with 223 additions and 224 deletions
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1519,8 +1519,7 @@ alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENE
        or <codeph>-f</codeph> options are used. 
      </p>

-      <p id="live_progress_live_summary_asciinema">
-        To see how the <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> query
+      <p id="live_progress_live_summary_asciinema">To see how the <codeph>LIVE_PROGRESS</codeph> and <codeph>LIVE_SUMMARY</codeph> query
        options work in real time, see
        <xref href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" scope="external" format="html">this
          animated demo</xref>.
--- a/docs/topics/impala_incompatible_changes.xml
+++ b/docs/topics/impala_incompatible_changes.xml
@@ -50,6 +50,26 @@ under the License.
    </p>
    <p outputclass="toc inpage"/>
  </conbody>
+  <concept id="incompatible_changes_340x">
+    <title>Incompatible Changes Introduced in Impala 3.4.x</title>
+    <conbody>
+      <p> For the full list of issues closed in this release, including any that
+        introduce behavior changes or incompatibilities, see the <xref
+          keyref="changelog_34">changelog for <keyword keyref="impala34"
+          /></xref>. <ul>
+          <li>To optimize query performance, Impala planner uses the value of
+            the <codeph>fs.s3a.block.size</codeph> startup flag when calculating
+            the split size on non-block based stores, e.g. S3, ADLS, etc.
+            Starting in this release, Impala planner uses the
+              <codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option to
+            get the Parquet file format specific split size.<p>For Parquet
+              files, the <codeph>fs.s3a.block.size</codeph> startup flag is no
+              longer used.</p><p>The default value of the
+                <codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option is
+              256 MB.</p></li>
+        </ul></p>
+    </conbody>
+  </concept>
  <concept id="incompatible_changes_330x">
    <title>Incompatible Changes Introduced in Impala 3.3.x</title>
    <conbody>
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -259,162 +259,6 @@ under the License.

    </concept>

-<!--IMPALA-7585 and IMPALA-7298 are fixed. Should be removed from here?-->
-
-    <concept id="IMPALA-7585" audience="hidden">
-
-      <title>Impala user not added to /etc/passwd when LDAP is enabled</title>
-
-      <conbody>
-
-        <p>
-          When using Impala with LDAP enabled, a user may hit the following:
-        </p>
-
-<pre>Not authorized: Client connection negotiation failed: client connection to 127.0.0.1:27000: SASL(-1): generic failure: All-whitespace username.</pre>
-
-        <p>
-          The following sequence can lead to the <codeph>impala</codeph> user not being created
-          in <codeph>/etc/passwd</codeph> on some machines on the cluster.
-          <ul>
-            <li>
-              Time 1: The <codeph>impala</codeph> user is not in LDAP. Impala was installed on
-              machine 1, and the user <codeph>impala</codeph> is created in
-              <codeph>/etc/passwd</codeph>.
-            </li>
-
-            <li>
-              Time 2: The <codeph>impala</codeph> user is added to LDAP.
-            </li>
-
-            <li>
-              Time 3: A new machine is added to the cluster. When adding Impala service to this
-              new machine, adding the <codeph>impala</codeph> user will fail as it already
-              exists in LDAP.
-            </li>
-          </ul>
-        </p>
-
-        <p>
-          The consequence is that the <codeph>impala</codeph> user doesn't exist in
-          <codeph>/etc/passwd</codeph> on the new machine, leading to the error above.
-        </p>
-
-        <p>
-          <b>Workaround</b>: Manually edit <codeph>/etc/passwd</codeph> to add the
-          <codeph>impala</codeph> user
-        </p>
-
-        <p>
-          <b>Apache Issue:</b> <xref keyref="IMPALA-7585">IMPALA-7585</xref>
-        </p>
-
-        <p>
-          <b>Affected Versions:</b> Impala 2.12, Impala 3.0
-        </p>
-
-        <p>
-          <b>Fixed Version:</b> Impala 3.1
-        </p>
-
-      </conbody>
-
-    </concept>
-
-    <concept id="IMPALA-7298" audience="hidden">
-
-      <title>Kerberos authentication fails with the reverse DNS lookup disabled</title>
-
-      <conbody>
-
-        <p>
-          Kerberos authentication does not function correctly if <codeph>rdns = false</codeph>
-          is configured in <codeph>krb5.conf</codeph>. If the flag <codeph>rdns =
-          false</codeph>, when Impala tries to match principals, it will fail because Kerberos
-          receives a SPN (Service Principal Name) with an IP address in it, but Impala expects a
-          principal with a FQDN in it.
-        </p>
-
-        <p>
-          You may hit the following error:
-        </p>
-
-<pre>WARNINGS: TransmitData() to X.X.X.X:27000 failed: Not authorized: Client connection negotiation failed: client connection to X.X.X.X:27000: Server impala/X.X.X.X@VPC.CLOUDERA.COM not found in Kerberos database
-</pre>
-
-        <p>
-          <b>Apache Issue:</b> <xref keyref="IMPALA-7298">IMPALA-7298</xref>
-        </p>
-
-        <p>
-          <b>Affected Versions:</b> Impala 2.12.0 and 3.0
-        </p>
-
-        <p>
-          <b>Workaround:</b> Set the following flags in <codeph>krb5.conf</codeph>:
-          <ul>
-            <li>
-              <codeph>dns_canonicalize_hostname = true</codeph>
-            </li>
-
-            <li>
-              <codeph>rdns = true</codeph>
-            </li>
-          </ul>
-        </p>
-
-        <p>
-          <b>Fixed Versions:</b> Impala 3.1
-        </p>
-
-      </conbody>
-
-    </concept>
-
-<!--kudu2198 is fixed-->
-
-    <concept id="KUDU-2198" audience="hidden">
-
-      <title>System-wide auth-to-local mapping not applied correctly to Kudu service account</title>
-
-      <conbody>
-
-        <p>
-          Due to system the <codeph>auth_to_local</codeph> mapping, the principal may be mapped
-          to some local name.
-        </p>
-
-        <p>
-          When running with Kerberos enabled, you may hit the following error message where
-          <varname>&lt;random-string></varname> is some random string which doesn't match the
-          primary in the Kerberos principal.
-        </p>
-
-<pre>WARNINGS: TransmitData() to X.X.X.X:27000 failed: Remote error: Not authorized: {username='&lt;random-string>', principal='impala/redacted'} is not allowed to access DataStreamService
-</pre>
-
-        <p>
-          <b>Workaround</b>: Start Impala with the
-          <codeph>--use_system_auth_to_local=false</codeph> flag to ignore the system-wide
-          <codeph>auth_to_local</codeph> mappings configured in <codeph>/etc/krb5.conf</codeph>.
-        </p>
-
-        <p>
-          <b>Apache Issue:</b> <xref keyref="IMPALA-8154">IMPALA-8154</xref>
-        </p>
-
-        <p>
-          <b>Affected Versions:</b> Impala 2.12, Impala 3.0 / Kudu 1.6
-        </p>
-
-        <p>
-          <b>Fixed Versions:</b> Impala 3.2
-        </p>
-
-      </conbody>
-
-    </concept>
-
  </concept>

  <concept id="known_issues_resources">
@@ -722,25 +566,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');

    </concept>

-    <concept id="IMP-175" audience="hidden">
-
-      <title>Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</title>
-
-      <conbody>
-
-        <p>
-          Impala behavior differs from Hive with respect to out of range float/double values.
-          Out of range values are returned as maximum allowed value of type (Hive returns NULL).
-        </p>
-
-        <p>
-          <b>Workaround:</b> None
-        </p>
-
-      </conbody>
-
-    </concept>
-
    <concept id="flume_writeformat_text">

      <title>Configuration needed for Flume to be compatible with Impala</title>
@@ -837,6 +662,24 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
      </conbody>

    </concept>
+    <concept id="IMPALA-8953">
+      <title>Tables and databases sharing same name can cause query
+        failures</title>
+      <conbody>
+        <p>A table and a database that share the same name can cause a query
+          failure if the table is not readable by Impala, for example, the table
+          was created in Hive in the Open CSV Serde format. The following
+          exception will return:</p>
+        <codeblock>CAUSED BY: TableLoadingException: Unrecognized table type for table</codeblock>
+        <p>
+          <b>Apache Issue:</b>
+          <xref keyref="IMPALA-8953">IMPALA-8953</xref>
+        </p>
+        <p>
+          <b>Workaround:</b> Do not create databases and tables with the same
+          names.</p>
+      </conbody>
+    </concept>

  </concept>

@@ -852,22 +695,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
      </p>

    </conbody>
-    <!--IMPALA8376 fixed the issue below.-->
-    <concept id="IMPALA-8829" audience="hidden">
-      <title>Unable to Correctly Parse the Terabyte Unit</title>
-      <conbody>
-        <p>Impala does not support parsing strings that contain "TB" when used
-          as a unit for terabytes. The flags related to memory limits may be
-          affected, such as the flags for scratch space and data cache.</p>
-        <p><b>Workaround:</b> Use other supported units to specify values, e.g.
-          GB or MB.</p>
-        <p><b>Affected Versions:</b> All versions</p>
-        <p>
-          <b>Apache Issue:</b>
-          <xref keyref="IMPALA-8829">IMPALA-8829</xref>
-        </p>
-      </conbody>
-    </concept>

    <concept id="IMPALA-4551">

@@ -989,33 +816,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');

    </concept>

-<!--Fixed in 3.2-->
-
-    <concept id="IMPALA-941" rev="IMPALA-941" audience="hidden">
-
-      <title>Impala Parser issue when using fully qualified table names that start with a number</title>
-
-      <conbody>
-
-        <p>
-          A fully qualified table name starting with a number could cause a parsing error. In a
-          name such as <codeph>db.571_market</codeph>, the decimal point followed by digits is
-          interpreted as a floating-point number.
-        </p>
-
-        <p>
-          <b>Apache Issue:</b> <xref keyref="IMPALA-941">IMPALA-941</xref>
-        </p>
-
-        <p>
-          <b>Workaround:</b> Surround each part of the fully qualified name with backticks
-          (<codeph>``</codeph>).
-        </p>
-
-      </conbody>
-
-    </concept>
-
    <concept id="IMPALA-532" rev="IMPALA-532">

      <title>Impala should tolerate bad locale settings</title>
--- a/docs/topics/impala_new_features.xml
+++ b/docs/topics/impala_new_features.xml
@@ -45,6 +45,185 @@ under the License.
    <p outputclass="toc inpage"/>

  </conbody>
+  <concept rev="3.2.0" id="new_features_34">
+    <title>New Features in <keyword keyref="impala34"/></title>
+    <conbody>
+      <p> The following sections describe the noteworthy improvements made in
+          <keyword keyref="impala34"/>. </p>
+      <p> For the full list of issues closed in this release, see the <xref
+          keyref="changelog_34">changelog for <keyword keyref="impala34"
+          /></xref>. </p>
+      <section id="section_cw4_nmw_pjb">
+        <title>Support for Hive Insert-Only Transactional Tables</title>
+        <p>Impala added the support to truncate insert-only transactional
+          tables. </p>
+        <p>By default, Impala creates an insert-only transactional table when
+          you issue the <codeph>CREATE TABLE</codeph> statement.</p>
+        <p>Use the Hive compaction to compact small files to improve the
+          performance and scalability of metadata in transactional tables.</p>
+        <p>See <xref href="impala_transactions.xml#transactions"/> for more
+          information.</p>
+      </section>
+      <section id="impala-8656">
+        <title>Server-side Spooling of Query Results</title>
+        <p>You can use the <codeph>SPOOL_QUERY_RESULTS</codeph> query option to
+          control how query results are returned to the client.</p>
+        <p>By default, when a client fetches a set of query results, the next
+          set of results are fetched in batches until all the result rows are
+          produced. If a client issues a query without fetching all the results,
+          the query fragments continue to hold on to the resources until the
+          query is canceled and unregistered, potentially tying up resources and
+          causing other queries to wait in admission control.</p>
+        <p>When the query result spooling feature is enabled, the result sets of
+          queries are eagerly fetched and buffered until they are read by the
+          client, and resources are freed up for other queries.</p>
+        <p>See <xref href="impala_query_results_spooling.xml#data_sink"/> for
+          the new feature and the query options.</p>
+      </section>
+      <section id="impala-8584">
+        <title>Cookie-based Authentication</title>
+        <p>Starting in this version, Impala supports cookies for authentication
+          when clients connect via HiveServer2 over HTTP. </p>
+        <p>You can use the <codeph>--max_cookie_lifetime_s startup</codeph> flag
+          to:</p>
+        <ul>
+          <li>Disable the use of cookies</li>
+          <li>Control how long generated cookies are valid for</li>
+        </ul>
+        <p>See <xref href="impala_client.xml#intro_client"/> for more
+          information.</p>
+      </section>
+      <section id="section_hw4_nmw_pjb">
+        <title>Object Ownership Support</title>
+        <p>Object ownership for tables, views, and databases is enabled by
+          default in Impala. When you create a database, a table, or a view, as
+          the owner of that object, you implicitly have the privileges on the
+          object. The privileges that owners have are specified in Ranger on the
+          special user, <codeph>{OWNER}</codeph>. </p>
+        <p>The <codeph>{OWNER}</codeph> user must be defined in Ranger for the
+          object ownership privileges work in Impala.</p>
+        <p>See <xref href="impala_authorization.xml#authorization"/> for
+          details.</p>
+      </section>
+      <section id="impala-8752">
+        <title>New Built-in Functions for Fuzzy Matching of Strings</title>
+        <p>Use the new Jaro or Jaro-Winkler functions to perform fuzzy matches
+          on relatively short strings, e.g. to scrub user inputs of names
+          against the records in the database.</p>
+        <ul>
+          <li><codeph>JARO_DISTANCE</codeph>, <codeph>JARO_DST</codeph></li>
+          <li><codeph>JARO_SIMILARITY</codeph>, <codeph>JARO_SIM</codeph></li>
+          <li><codeph>JARO_WINKLER_DISTANCE</codeph>,
+            <codeph>JW_DST</codeph></li>
+          <li><codeph>JARO_WINKLER_SIMILARITY</codeph>,
+            <codeph>JW_SIM</codeph></li>
+        </ul>
+        <p>See <xref href="impala_string_functions.xml#string_functions"/> for
+          details.</p>
+      </section>
+      <section id="impala-8376">
+        <title>Capacity Quota for Scratch Disks</title>
+        <p>When configuring scratch space for intermediate files used in large
+          sorts, joins, aggregations, or analytic function operations, use the
+            <codeph>‑‑scratch_dirs</codeph> startup flag to optionally specify a
+          capacity quota per scratch directory, e.g.,
+            <codeph>‑‑scratch_dirs=/dir1:5MB,/dir2</codeph>.</p>
+        <p>See <xref href="impala_file_formats.xml#file_formats"/> for
+          details.</p>
+      </section>
+      <section id="impala-8913">
+        <title>Query Option for Disabling HBase Row Estimation</title>
+        <p>During query plan generation, Impala samples underlying HBase tables
+          to estimate row count and row size, but the sampling process can
+          negatively impact the planning time. To alleviate the issue, when the
+          HBase table stats do not change much in a short time, disable the
+          sampling with the <codeph>DISABLE_HBASE_NUM_ROWS_ESTIMATE</codeph>
+          query option so that the Impala planner falls back to using Hive
+          Metastore (HMS) table stats instead. </p>
+        <p>See <xref
+            href="impala_disable_hbase_num_rows_estimate.xml#disable_hbase_num_rows_estimate"
+          />.</p>
+      </section>
+      <section id="impala-8942">
+        <title>Query Option for Controlling Size of Parquet Splits on Non-block
+          Stores</title>
+        <p>To optimize query performance, Impala planner uses the value of the
+            <codeph>fs.s3a.block.size</codeph> startup flag when calculating the
+          split size on non-block based stores, e.g. S3, ADLS, etc. Starting in
+          this release, Impala planner uses the
+            <codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option to get
+          the Parquet file format specific split size. </p>
+        <p>For Parquet files, the <codeph>fs.s3a.block.size</codeph> startup
+          flag is no longer used.</p>
+        <p>The default value of the
+            <codeph>PARQUET_OBJECT_STORE_SPLIT_SIZE</codeph> query option is 256
+          MB.</p>
+        <p>See <xref href="impala_s3.xml#s3"/> for tuning Impala query
+          performance for S3.</p>
+      </section>
+      <section id="impala-5149">
+        <title>Query Profile Exported to JSON</title>
+        <p>On the Query Details page of Impala Daemon Web UI, you have a new
+          option, in addition to the existing Thrift and Text formats, to export
+          the query profile output in the JSON format.</p>
+        <p>See <xref href="impala_webui.xml#webui"/> for generating JSON query
+          profile outputs in Web UI.</p>
+      </section>
+      <section id="section_rnb_ny4_yjb">
+        <title>DATE Data Type Supported in Avro Tables</title>
+        <p>You can now use the <codeph>DATE</codeph> data type to query date
+          values from Avro tables.</p>
+        <p>See <xref href="impala_avro.xml#avro"/> for details.</p>
+      </section>
+      <section>
+        <title>Primary Key and Foreign Key Constraints</title>
+        <p>This release adds support for primary and foreign key constraints,
+          but in this release the constraints are advisory and intended for
+          estimating cardinality during query planning in a future release.
+          There is no attempt to enforce constraints. See <xref 
+            href="impala_create_table.xml"/> for details. </p>
+      </section>
+      <section>
+        <title>Enhanced External Kudu Table</title>
+        <p>By default HMS implicitly translates internal Kudu tables to external
+          Kudu tables with the 'external.table.purge' property set to true. These
+          tables behave similar to internal tables. You can explicitly create such
+          external Kudu tables. See <xref href="impala_create_table.xml"/>
+          for details.</p>
+      </section>
+      <section>
+        <title>Ranger Column Masking</title>
+        <p>This release supports Ranger column masking, which hides sensitive columnar
+          data in Impala query output. For example, you can define a policy that reveals
+          only the first or last four characters of column data. Column masking is enabled
+          by default. See <xref href="impala_authorization.xml#sec_ranger_col_masking"/>
+          for details.</p>
+      </section>
+      <section>
+        <title>BROADCAST_BYTES_LIMIT query option</title>
+        <p>You can set the default limit for the size of the broadcast input. Such a limit
+          can prevent possible performance problems.</p>
+        <!--Add link to details after file is published.-->
+      </section>
+      <section>
+        <title>Experimental Support for Apache Hudi</title>
+        <p>In this release, you can use Read Optimized Queries on Hudi tables. See
+          <xref href="impala_hudi.xml"/> for details. </p>
+      </section>
+      <section>
+        <title>ORC Reads Enabled by Default</title>
+        <p>Impala stability and performance have been improved. Consequently, ORC reads are now
+          enabled in Impala by default. To disable, set <codeph>--enable_orc_scanner</codeph> to
+          <codeph>false</codeph> when starting the cluster. See <xref href="impala_orc.xml"/> for
+          details.</p>
+      </section>
+      <section>
+        <title>Support for ZSTD and DEFLATE</title>
+        <p>This release supports ZSTD and DEFLATE compression codecs for text files. See 
+          <xref href="impala_txtfile.xml#gzip"/> for details.</p>
+      </section>
+    </conbody>
+  </concept>
  <concept rev="3.2.0" id="new_features_33">
    <title>New Features in <keyword keyref="impala33"/></title>
    <conbody>
@@ -231,9 +410,9 @@ under the License.
        <title>Default File Format Changed to Parquet</title>
        <p>When you create a table, the default format for that table data is
          now Parquet.</p>
-        <p>For backward compatibility, you can use the DEFAULT_FILE_FORMAT query
-          option to set the default file format to the previous default, text,
-          or other formats.</p>
+        <p>For backward compatibility, you can use the
+            <codeph>DEFAULT_FILE_FORMAT</codeph> query option to set the default
+          file format to the previous default, text, or other formats.</p>
      </section>
      <section id="section_m1h_mnf_t3b">
        <title>Built-in Function to Process JSON Objects</title>
--- a/docs/topics/impala_txtfile.xml
+++ b/docs/topics/impala_txtfile.xml
@@ -120,7 +120,8 @@ under the License.
        details.
      </p>

-      <p rev="2.0.0">You can also use text data compressed in the bzip2, deflate, gzip, Snappy, or
+      <p rev="2.0.0">
+        You can also use text data compressed in the bzip2, deflate, gzip, Snappy, or
        zstd formats. Because these compressed formats are not <q>splittable</q> in the way that LZO
        is, there is less opportunity for Impala to parallelize queries on them. Therefore, use
        these types of compressed data only for convenience if that is the format in which you