mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Fixed some typos and made final changes. Clarified some questions that were raised as comments. Incorporated some minor comments. Documented the support for Kudu's multi-rows transaction. Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882 Reviewed-on: http://gerrit.cloudera.org:8080/19651 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
1635 lines
68 KiB
XML
1635 lines
68 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept id="impala_kudu" rev="kudu">
|
||
|
||
<title id="kudu">Using Impala to Query Kudu Tables</title>
|
||
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Kudu"/>
|
||
<data name="Category" value="Querying"/>
|
||
<data name="Category" value="Data Analysts"/>
|
||
<data name="Category" value="Developers"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
<indexterm audience="hidden">Kudu</indexterm>
|
||
You can use Impala to query tables stored by Apache Kudu. This capability
|
||
allows convenient access to a storage system that is tuned for different kinds of
|
||
workloads than the default with Impala.
|
||
</p>
|
||
|
||
<p>
|
||
By default, Impala tables are stored on HDFS using data files with various file formats.
|
||
HDFS files are ideal for bulk loads (append operations) and queries using full-table scans,
|
||
but do not support in-place updates or deletes. Kudu is an alternative storage engine used
|
||
by Impala which can do both in-place updates (for mixed read/write workloads) and fast scans
|
||
(for data-warehouse/analytic operations). Using Kudu tables with Impala can simplify the
|
||
ETL pipeline by avoiding extra steps to segregate and reorganize newly arrived data.
|
||
</p>
|
||
|
||
<p>
|
||
Certain Impala SQL statements and clauses, such as <codeph>DELETE</codeph>,
|
||
<codeph>UPDATE</codeph>, <codeph>UPSERT</codeph>, and <codeph>PRIMARY KEY</codeph> work
|
||
only with Kudu tables. Other statements and clauses, such as <codeph>LOAD DATA</codeph>,
|
||
<codeph>TRUNCATE TABLE</codeph>, and <codeph>INSERT OVERWRITE</codeph>, are not applicable
|
||
to Kudu tables.
|
||
</p>
|
||
|
||
<p outputclass="toc inpage"/>
|
||
|
||
</conbody>
|
||
|
||
<concept id="kudu_benefits">
|
||
|
||
<title>Benefits of Using Kudu Tables with Impala</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The combination of Kudu and Impala works best for tables where scan performance is
|
||
important, but data arrives continuously, in small batches, or needs to be updated
|
||
without being completely replaced. HDFS-backed tables can require substantial overhead
|
||
to replace or reorganize data files as new data arrives. Impala can perform efficient
|
||
lookups and scans within Kudu tables, and Impala can also perform update or
|
||
delete operations efficiently. You can also use the Kudu Java, C++, and Python APIs to
|
||
do ingestion or transformation operations outside of Impala, and Impala can query the
|
||
current data at any time.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_config">
|
||
|
||
<title>Configuring Impala for Use with Kudu</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The <codeph>-kudu_master_hosts</codeph> configuration property must be set correctly
|
||
for the <cmdname>impalad</cmdname> daemon, for <codeph>CREATE TABLE ... STORED AS
|
||
KUDU</codeph> statements to connect to the appropriate Kudu server. Typically, the
|
||
required value for this setting is <codeph><varname>kudu_host</varname>:7051</codeph>.
|
||
In a high-availability Kudu deployment, specify the names of multiple Kudu hosts separated by commas.
|
||
</p>
|
||
|
||
<p>
|
||
If the <codeph>-kudu_master_hosts</codeph> configuration property is not set, you can
|
||
still associate the appropriate value for each table by specifying a
|
||
<codeph>TBLPROPERTIES('kudu.master_addresses')</codeph> clause in the <codeph>CREATE TABLE</codeph> statement or
|
||
changing the <codeph>TBLPROPERTIES('kudu.master_addresses')</codeph> value with an <codeph>ALTER TABLE</codeph>
|
||
statement.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
<concept id="kudu_topology">
|
||
|
||
<title>Cluster Topology for Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
With HDFS-backed tables, you are typically concerned with the number of DataNodes in
|
||
the cluster, how many and how large HDFS data files are read during a query, and
|
||
therefore the amount of work performed by each DataNode and the network communication
|
||
to combine intermediate results and produce the final result set.
|
||
</p>
|
||
|
||
<p>
|
||
With Kudu tables, the topology considerations are different, because:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
The underlying storage is managed and organized by Kudu, not represented as HDFS
|
||
data files.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Kudu handles some of the underlying mechanics of partitioning the data. You can specify
|
||
the partitioning scheme with combinations of hash and range partitioning, so that you can
|
||
decide how much effort to expend to manage the partitions as new data arrives. For example,
|
||
you can construct partitions that apply to date ranges rather than a separate partition for each
|
||
day or each hour.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p> Data is physically divided based on units of storage called
|
||
<term>tablets</term>. Tablets are stored by <term>tablet
|
||
servers</term>. Each tablet server can store multiple tablets,
|
||
and each tablet is replicated across multiple tablet servers,
|
||
managed automatically by Kudu. Where practical, co-locate the
|
||
tablet servers on the same hosts as the Impala daemons, although
|
||
that is not required. </p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_replication_factor">
|
||
<title>Kudu Replication Factor</title>
|
||
<conbody>
|
||
<p>
|
||
By default, Kudu tables created through Impala use a tablet
|
||
replication factor of 3. To change the replication factor for a Kudu
|
||
table, specify the replication factor using <codeph>TBLPROPERTIES
|
||
('kudu.num_tablet_replicas' = '<i>n</i>')</codeph> in the <keyword
|
||
keyref="create_table"/> statement.
|
||
</p>
|
||
|
||
<p>
|
||
The number of replicas for a Kudu table must be odd.
|
||
</p>
|
||
|
||
<p> Altering the <codeph>kudu.num_tablet_replicas</codeph> property after
|
||
table creation currently has no effect. </p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="kudu_ddl">
|
||
|
||
<title>Impala DDL Enhancements for Kudu Tables (CREATE TABLE and ALTER TABLE)</title>
|
||
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="DDL"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
You can use the Impala <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>
|
||
statements to create and fine-tune the characteristics of Kudu tables. Because Kudu
|
||
tables have features and properties that do not apply to other kinds of Impala tables,
|
||
familiarize yourself with Kudu-related concepts and syntax first.
|
||
For the general syntax of the <codeph>CREATE TABLE</codeph>
|
||
statement for Kudu tables, see <xref keyref="create_table"/>.
|
||
</p>
|
||
|
||
<p outputclass="toc inpage"/>
|
||
|
||
</conbody>
|
||
<concept id="non_unique_primary_key">
|
||
<title>Non-unique Primary Keys for Kudu Tables</title>
|
||
<conbody>
|
||
<p>Kudu now allows a user to create a non-unique primary key for a table when creating a
|
||
table. The data engine handles this by appending a system generated auto-incrementing
|
||
column to the non-unique primary key columns. This is done to guarantee the uniqueness of
|
||
the primary key. This auto-incrementing column is named as 'auto_incrementing_id' with
|
||
bigint type and this column is only system generated and cannot be explicitly created by
|
||
the user. This auto_incrementing_id column is unique across a partition/tablet i.e. every
|
||
partition/tablet would have this column starting from one and incrementing monotonically.
|
||
The assignment to this column during insertion is automatic.</p>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="create">
|
||
<title>Create a Kudu Table with a non-unique PRIMARY KEY</title>
|
||
<conbody>
|
||
<p>The following example shows creating a table with a non-unique PRIMARY KEY.</p>
|
||
<codeblock>
|
||
CREATE TABLE kudu_tbl1
|
||
(
|
||
id INT NON UNIQUE PRIMARY KEY,
|
||
name STRING
|
||
)
|
||
PARTITION BY HASH (id) PARTITIONS 3 STORED as KUDU;</codeblock>
|
||
<p>The effective PRIMARY KEY in the above case will be {id, auto_increment_id}</p>
|
||
<note>"auto_incrementing_id" column cannot be added, removed or renamed with ALTER TABLE
|
||
statements.</note>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="verify">
|
||
<title>Verify the PRIMARY KEY is non-unique</title>
|
||
<conbody>
|
||
<p>You can now check the PRIMARY KEY created is non-unique by running the following DESCRIBE
|
||
command. A new property "key_unique" shows if the primary key is unique. System generated
|
||
column "auto_incrementing_id" is shown in the output for the table as a non-unique primary
|
||
key.</p>
|
||
<codeblock>
|
||
describe kudu_tbl1
|
||
+----------------------+--------+---------+-------------+------------+----------+---------------+---------------+---------------------+------------+
|
||
| name | type | comment | primary_key | key_unique | nullable | default_value | encoding | compression | block_size |
|
||
+----------------------+--------+---------+-------------+------------+----------+---------------+---------------+---------------------+------------+
|
||
| id | int | | true | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||
| auto_incrementing_id | bigint | | true | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||
| name | string | | false | | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||
+----------------------+--------+---------+-------------+------------+----------+---------------+---------------+---------------------+------------+
|
||
Fetched 3 row(s) in 4.72s
|
||
</codeblock>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="auto_incrementing_col">
|
||
<title>Query Auto Incrementing Column</title>
|
||
<conbody>
|
||
<p>When you query a table using the SELECT statement, it will not display the system
|
||
generated auto incrementing column unless the column is explicitly specified in the select
|
||
list.</p>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="no_primary_key">
|
||
<title>Create a Kudu table without a PRIMARY KEY attribute</title>
|
||
<conbody>
|
||
<p>You can create a Kudu table without specifying a PRIMARY KEY or a PARTITION KEY since
|
||
they are optional, however you cannot create a Kudu table without specifying both PRIMARY
|
||
KEY and PARTITION KEY. If you do not specify the primary key attribute, the partition key
|
||
columns can be promoted as a non-unique primary key. This is possible only if those
|
||
columns are the beginning columns of the table.</p>
|
||
<p>In the following example, 'a' and 'b' will be promoted as a non-unique primary key,
|
||
'auto_incrementing_id' column will be added by Kudu engine. 'a', 'b' and
|
||
'auto_incrementing_id' form the effective unique composite primary key.</p>
|
||
<example>
|
||
<codeblock>
|
||
CREATE TABLE auto_table
|
||
(
|
||
a BIGINT,
|
||
b STRING,
|
||
)
|
||
PARTITION BY HASH(a, b) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
<p>The effective primary key in this case would be {a, b, auto_incrementing_id}</p>
|
||
</example>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="limitations">
|
||
<title>Limitations</title>
|
||
<conbody>
|
||
<ul>
|
||
<li>UPSERT operation is not supported for Kudu tables with non-unique primary key. If you
|
||
run an UPSERT statement for a Kudu table with a non-unique primary key it will fail with
|
||
an error.</li>
|
||
<li>Since the auto generated key for each row will be assigned after the row’s data is
|
||
generated and after the row lands in the tablet, you cannot use this column in the
|
||
partition key.</li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="kudu_primary_key">
|
||
|
||
<title>Primary Key Columns for Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p> Kudu tables introduce the notion of primary keys to Impala for the first time. The
|
||
primary key is made up of one or more columns, whose values are combined and used as a
|
||
lookup key during queries. The primary key can be non-unique. The uniqueness of the
|
||
primary key is guaranteed by appending a system-generated auto-incrementing column to the
|
||
non-unique primary key columns. The tuple represented by these columns cannot contain any
|
||
NULL values, and can never be updated once inserted. For a Kudu table, all the partition
|
||
key columns must come from the set of primary key columns. </p>
|
||
|
||
<p>
|
||
The primary key has both physical and logical aspects:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
On the physical side, it is used to map the data values to particular tablets for fast retrieval.
|
||
Because the tuples formed by the primary key values are unique, the primary key columns are typically
|
||
highly selective.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p> You can insert non-unique data using an INSERT statement but the data saved in Kudu
|
||
table for each row which will be turned to unique by the system generated
|
||
auto-incrementing column. If the primary key is non-unique, the uniqueness will not
|
||
cause insertion failure. However, if the primary key is set as non-unique and if an
|
||
INSERT operation fails part way through, all rows except the rows with writing errors
|
||
will be added into the table. The duplicated rows will be added with different values
|
||
for auto-incrementing columns. </p>
|
||
</li>
|
||
</ul>
|
||
|
||
<note>
|
||
<p>
|
||
Impala only allows <codeph>PRIMARY KEY</codeph> clauses and <codeph>NOT NULL</codeph>
|
||
constraints on columns for Kudu tables. These constraints are enforced on the Kudu side.
|
||
</p>
|
||
</note>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_column_attributes" rev="IMPALA-3726">
|
||
|
||
<title>Kudu-Specific Column Attributes for CREATE TABLE</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For the general syntax of the <codeph>CREATE TABLE</codeph>
|
||
statement for Kudu tables, see <xref keyref="create_table"/>.
|
||
The following sections provide more detail for some of the
|
||
Kudu-specific keywords you can use in column definitions.
|
||
</p>
|
||
|
||
<p>
|
||
The column list in a <codeph>CREATE TABLE</codeph> statement can include the following
|
||
attributes, which only apply to Kudu tables:
|
||
</p>
|
||
|
||
<codeblock>
|
||
[NON UNIQUE] PRIMARY KEY
|
||
| [NOT] NULL
|
||
| ENCODING <varname>codec</varname>
|
||
| COMPRESSION <varname>algorithm</varname>
|
||
| DEFAULT <varname>constant_expression</varname>
|
||
| BLOCK_SIZE <varname>number</varname>
|
||
</codeblock>
|
||
|
||
<p outputclass="toc inpage">
|
||
See the following sections for details about each column attribute.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
<concept id="kudu_primary_key_attribute">
|
||
|
||
<title>PRIMARY KEY Attribute</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The primary key for a Kudu table is a column, or set of columns, that uniquely
|
||
identifies every row. The primary key value also is used as the natural sort order
|
||
for the values from the table. The primary key value for each row is based on the
|
||
combination of values for the columns.
|
||
</p>
|
||
|
||
<p>Because all of the primary key columns must have non-null values, specifying a column
|
||
in the PRIMARY KEY or NON-UNIQUE PRIMARY KEY clause implicitly adds the NOT NULL
|
||
attribute to that column.</p>
|
||
|
||
<p>
|
||
The primary key columns must be the first ones specified in the <codeph>CREATE
|
||
TABLE</codeph> statement. For a single-column primary key, you can include a
|
||
<codeph>PRIMARY KEY</codeph> attribute inline with the column definition. For a
|
||
multi-column primary key, you include a <codeph>PRIMARY KEY (<varname>c1</varname>,
|
||
<varname>c2</varname>, ...)</codeph> clause as a separate entry at the end of the
|
||
column list.
|
||
</p>
|
||
|
||
<p>
|
||
You can specify the <codeph>PRIMARY KEY</codeph> attribute either inline in a single
|
||
column definition, or as a separate clause at the end of the column list:
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE pk_inline
|
||
(
|
||
col1 BIGINT PRIMARY KEY,
|
||
col2 STRING,
|
||
col3 BOOLEAN
|
||
) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
|
||
|
||
CREATE TABLE pk_at_end
|
||
(
|
||
col1 BIGINT,
|
||
col2 STRING,
|
||
col3 BOOLEAN,
|
||
PRIMARY KEY (col1)
|
||
) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
|
||
|
||
CREATE TABLE pk_inline
|
||
(
|
||
col1 BIGINT [NON UNIQUE] PRIMARY KEY,
|
||
col2 STRING,
|
||
col3 BOOLEAN
|
||
) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
|
||
|
||
CREATE TABLE pk_at_end
|
||
(
|
||
col1 BIGINT,
|
||
col2 STRING,
|
||
col3 BOOLEAN,
|
||
[NON UNIQUE] PRIMARY KEY (col1)
|
||
) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
|
||
<p>
|
||
When the primary key is a single column, these two forms are equivalent. If the
|
||
primary key consists of more than one column, you must specify the primary key using
|
||
a separate entry in the column list:
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE pk_multiple_columns
|
||
(
|
||
col1 BIGINT,
|
||
col2 STRING,
|
||
col3 BOOLEAN,
|
||
<b>PRIMARY KEY (col1, col2)</b>
|
||
) PARTITION BY HASH(col2) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
|
||
<p>
|
||
The <codeph>SHOW CREATE TABLE</codeph> statement always represents the
|
||
<codeph>PRIMARY KEY</codeph> specification as a separate item in the column list:
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE inline_pk_rewritten (id BIGINT <b>PRIMARY KEY</b>, s STRING)
|
||
PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
|
||
|
||
SHOW CREATE TABLE inline_pk_rewritten;
|
||
+------------------------------------------------------------------------------+
|
||
| result |
|
||
+------------------------------------------------------------------------------+
|
||
| CREATE TABLE user.inline_pk_rewritten ( |
|
||
| id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
|
||
| s STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
|
||
| <b>PRIMARY KEY (id)</b> |
|
||
| ) |
|
||
| PARTITION BY HASH (id) PARTITIONS 2 |
|
||
| STORED AS KUDU |
|
||
| TBLPROPERTIES ('kudu.master_addresses'='host.example.com') |
|
||
+------------------------------------------------------------------------------+
|
||
</codeblock>
|
||
|
||
<p> The notion of primary key only applies to Kudu tables. Every Kudu table requires a
|
||
primary key. The primary key consists of one or more columns. You must specify any
|
||
primary key columns first in the column list or specify partition key with the beginning
|
||
columns of the table. </p>
|
||
|
||
<p>
|
||
The contents of the primary key columns cannot be changed by an
|
||
<codeph>UPDATE</codeph> or <codeph>UPSERT</codeph> statement. Including too many
|
||
columns in the primary key (more than 5 or 6) can also reduce the performance of
|
||
write operations. Therefore, pick the most selective and most frequently
|
||
tested non-null columns for the primary key specification.
|
||
If a column must always have a value, but that value
|
||
might change later, leave it out of the primary key and use a <codeph>NOT
|
||
NULL</codeph> clause for that column instead. If an existing row has an
|
||
incorrect or outdated key column value, delete the old row and insert an entirely
|
||
new row with the correct primary key.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_not_null_attribute">
|
||
|
||
<title>NULL | NOT NULL Attribute</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For Kudu tables, you can specify which columns can contain nulls or not. This
|
||
constraint offers an extra level of consistency enforcement for Kudu tables. If an
|
||
application requires a field to always be specified, include a <codeph>NOT
|
||
NULL</codeph> clause in the corresponding column definition, and Kudu prevents rows
|
||
from being inserted with a <codeph>NULL</codeph> in that column.
|
||
</p>
|
||
|
||
<p>
|
||
For example, a table containing geographic information might require the latitude
|
||
and longitude coordinates to always be specified. Other attributes might be allowed
|
||
to be <codeph>NULL</codeph>. For example, a location might not have a designated
|
||
place name, its altitude might be unimportant, and its population might be initially
|
||
unknown, to be filled in later.
|
||
</p>
|
||
|
||
<p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
|
||
|
||
<p>
|
||
For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
|
||
values, because it is not practical to enforce a <q>not null</q> constraint on HDFS
|
||
data files that could be prepared using external tools and ETL processes.
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE required_columns
|
||
(
|
||
id BIGINT PRIMARY KEY,
|
||
latitude DOUBLE NOT NULL,
|
||
longitude DOUBLE NOT NULL,
|
||
place_name STRING,
|
||
altitude DOUBLE,
|
||
population BIGINT
|
||
) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
|
||
<p>
|
||
During performance optimization, Kudu can use the knowledge that nulls are not
|
||
allowed to skip certain checks on each input row, speeding up queries and join
|
||
operations. Therefore, specify <codeph>NOT NULL</codeph> constraints when
|
||
appropriate.
|
||
</p>
|
||
|
||
<p>
|
||
The <codeph>NULL</codeph> clause is the default condition for all columns that are not
|
||
part of the primary key. You can omit it, or specify it to clarify that you have made a
|
||
conscious design decision to allow nulls in a column.
|
||
</p>
|
||
|
||
<p>
|
||
Because primary key columns cannot contain any <codeph>NULL</codeph> values, the
|
||
<codeph>NOT NULL</codeph> clause is not required for the primary key columns,
|
||
but you might still specify it to make your code self-describing.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_default_attribute">
|
||
|
||
<title>DEFAULT Attribute</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
You can specify a default value for columns in Kudu tables. The default value can be
|
||
any constant expression, for example, a combination of literal values, arithmetic
|
||
and string operations. It cannot contain references to columns or non-deterministic
|
||
function calls.
|
||
</p>
|
||
|
||
<p>
|
||
The following example shows different kinds of expressions for the
|
||
<codeph>DEFAULT</codeph> clause. The requirement to use a constant value means that
|
||
you can fill in a placeholder value such as <codeph>NULL</codeph>, empty string,
|
||
0, -1, <codeph>'N/A'</codeph> and so on, but you cannot reference functions or
|
||
column names. Therefore, you cannot use <codeph>DEFAULT</codeph> to do things such as
|
||
automatically making an uppercase copy of a string value, storing Boolean values based
|
||
on tests of other columns, or add or subtract one from another column representing a sequence number.
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE default_vals
|
||
(
|
||
id BIGINT PRIMARY KEY,
|
||
name STRING NOT NULL DEFAULT 'unknown',
|
||
address STRING DEFAULT upper('no fixed address'),
|
||
age INT DEFAULT -1,
|
||
earthling BOOLEAN DEFAULT TRUE,
|
||
planet_of_origin STRING DEFAULT 'Earth',
|
||
optional_col STRING DEFAULT NULL
|
||
) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
|
||
<note>
|
||
<p>
|
||
When designing an entirely new schema, prefer to use <codeph>NULL</codeph> as the
|
||
placeholder for any unknown or missing values, because that is the universal convention
|
||
among database systems. Null values can be stored efficiently, and easily checked with the
|
||
<codeph>IS NULL</codeph> or <codeph>IS NOT NULL</codeph> operators. The <codeph>DEFAULT</codeph>
|
||
attribute is appropriate when ingesting data that already has an established convention for
|
||
representing unknown or missing values, or where the vast majority of rows have some common
|
||
non-null value.
|
||
</p>
|
||
</note>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_encoding_attribute">
|
||
|
||
<title>ENCODING Attribute</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Each column in a Kudu table can optionally use an encoding, a low-overhead form of
|
||
compression that reduces the size on disk, then requires additional CPU cycles to
|
||
reconstruct the original values during queries. Typically, highly compressible data
|
||
benefits from the reduced I/O to read the data back from disk.
|
||
</p>
|
||
|
||
<p>
|
||
The encoding keywords that Impala recognizes are:
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
<codeph>AUTO_ENCODING</codeph>: use the default encoding based
|
||
on the column type, which are bitshuffle for the numeric type
|
||
columns and dictionary for the string type columns.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<codeph>PLAIN_ENCODING</codeph>: leave the value in its original binary format.
|
||
</p>
|
||
</li>
|
||
<!-- GROUP_VARINT is internal use only, not documenting that although it shows up
|
||
in parser error messages. -->
|
||
<li>
|
||
<p>
|
||
<codeph>RLE</codeph>: compress repeated values (when sorted in primary key
|
||
order) by including a count.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<codeph>DICT_ENCODING</codeph>: when the number of different string values is
|
||
low, replace the original string with a numeric ID.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<codeph>BIT_SHUFFLE</codeph>: rearrange the bits of the values to efficiently
|
||
compress sequences of values that are identical or vary only slightly based
|
||
on primary key order. The resulting encoded data is also compressed with LZ4.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<codeph>PREFIX_ENCODING</codeph>: compress common prefixes in string values; mainly for use internally within Kudu.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
|
||
<!--
|
||
UNKNOWN, AUTO_ENCODING, PLAIN_ENCODING, PREFIX_ENCODING, GROUP_VARINT, RLE, DICT_ENCODING, BIT_SHUFFLE
|
||
|
||
No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
|
||
-->
|
||
|
||
<p>
|
||
The following example shows the Impala keywords representing the encoding types.
|
||
(The Impala keywords match the symbolic names used within Kudu.)
|
||
For usage guidelines on the different kinds of encoding, see
|
||
<xref href="https://kudu.apache.org/docs/schema_design.html" scope="external" format="html">the Kudu documentation</xref>.
|
||
The <codeph>DESCRIBE</codeph> output shows how the encoding is reported after
|
||
the table is created, and that omitting the encoding (in this case, for the
|
||
<codeph>ID</codeph> column) is the same as specifying <codeph>DEFAULT_ENCODING</codeph>.
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE various_encodings
|
||
(
|
||
id BIGINT PRIMARY KEY,
|
||
c1 BIGINT ENCODING PLAIN_ENCODING,
|
||
c2 BIGINT ENCODING AUTO_ENCODING,
|
||
c3 TINYINT ENCODING BIT_SHUFFLE,
|
||
c4 DOUBLE ENCODING BIT_SHUFFLE,
|
||
c5 BOOLEAN ENCODING RLE,
|
||
c6 STRING ENCODING DICT_ENCODING,
|
||
c7 STRING ENCODING PREFIX_ENCODING
|
||
) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
|
||
|
||
-- Some columns are omitted from the output for readability.
|
||
describe various_encodings;
|
||
+------+---------+-------------+----------+-----------------+
|
||
| name | type | primary_key | nullable | encoding |
|
||
+------+---------+-------------+----------+-----------------+
|
||
| id | bigint | true | false | AUTO_ENCODING |
|
||
| c1 | bigint | false | true | PLAIN_ENCODING |
|
||
| c2 | bigint | false | true | AUTO_ENCODING |
|
||
| c3 | tinyint | false | true | BIT_SHUFFLE |
|
||
| c4 | double | false | true | BIT_SHUFFLE |
|
||
| c5 | boolean | false | true | RLE |
|
||
| c6 | string | false | true | DICT_ENCODING |
|
||
| c7 | string | false | true | PREFIX_ENCODING |
|
||
+------+---------+-------------+----------+-----------------+
|
||
</codeblock>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_compression_attribute">
|
||
|
||
<title>COMPRESSION Attribute</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
You can specify a compression algorithm to use for each column in a Kudu table. This
|
||
attribute imposes more CPU overhead when retrieving the values than the
|
||
<codeph>ENCODING</codeph> attribute does. Therefore, use it primarily for columns with
|
||
long strings that do not benefit much from the less-expensive <codeph>ENCODING</codeph>
|
||
attribute.
|
||
</p>
|
||
|
||
<p>
|
||
The choices for <codeph>COMPRESSION</codeph> are <codeph>LZ4</codeph>,
|
||
<codeph>SNAPPY</codeph>, and <codeph>ZLIB</codeph>.
|
||
</p>
|
||
|
||
<note>
|
||
<p>
|
||
Columns that use the <codeph>BITSHUFFLE</codeph> encoding are already compressed
|
||
using <codeph>LZ4</codeph>, and so typically do not need any additional
|
||
<codeph>COMPRESSION</codeph> attribute.
|
||
</p>
|
||
</note>
|
||
|
||
<p>
|
||
The following example shows design considerations for several
|
||
<codeph>STRING</codeph> columns with different distribution characteristics, leading
|
||
to choices for both the <codeph>ENCODING</codeph> and <codeph>COMPRESSION</codeph>
|
||
attributes. The <codeph>country</codeph> values come from a specific set of strings,
|
||
therefore this column is a good candidate for dictionary encoding. The
|
||
<codeph>post_id</codeph> column contains an ascending sequence of integers, where
|
||
several leading bits are likely to be all zeroes, therefore this column is a good
|
||
candidate for bitshuffle encoding. The <codeph>body</codeph>
|
||
column and the corresponding columns for translated versions tend to be long unique
|
||
strings that are not practical to use with any of the encoding schemes, therefore
|
||
they employ the <codeph>COMPRESSION</codeph> attribute instead. The ideal compression
|
||
codec in each case would require some experimentation to determine how much space
|
||
savings it provided and how much CPU overhead it added, based on real-world data.
|
||
</p>
|
||
|
||
<codeblock>
|
||
CREATE TABLE blog_posts
|
||
(
|
||
user_id STRING ENCODING DICT_ENCODING,
|
||
post_id BIGINT ENCODING BIT_SHUFFLE,
|
||
subject STRING ENCODING PLAIN_ENCODING,
|
||
body STRING COMPRESSION LZ4,
|
||
spanish_translation STRING COMPRESSION SNAPPY,
|
||
esperanto_translation STRING COMPRESSION ZLIB,
|
||
PRIMARY KEY (user_id, post_id)
|
||
) PARTITION BY HASH(user_id, post_id) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_block_size_attribute">
|
||
|
||
<title>BLOCK_SIZE Attribute</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Although Kudu does not use HDFS files internally, and thus is not affected by
|
||
the HDFS block size, it does have an underlying unit of I/O called the
|
||
<term>block size</term>. The <codeph>BLOCK_SIZE</codeph> attribute lets you set the
|
||
block size for any column.
|
||
</p>
|
||
|
||
<p>
|
||
The block size attribute is a relatively advanced feature. Refer to
|
||
<xref href="https://kudu.apache.org/docs/index.html" scope="external" format="html">the Kudu documentation</xref>
|
||
for usage details.
|
||
</p>
|
||
|
||
<!-- Commenting out this example for the time being.
|
||
<codeblock>
|
||
CREATE TABLE performance_for_benchmark_xyz
|
||
(
|
||
id BIGINT PRIMARY KEY,
|
||
col1 BIGINT BLOCK_SIZE 4096,
|
||
col2 STRING BLOCK_SIZE 16384,
|
||
col3 SMALLINT BLOCK_SIZE 2048
|
||
) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
|
||
</codeblock>
|
||
-->
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_partitioning">
|
||
|
||
<title>Partitioning for Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Kudu tables use special mechanisms to distribute data among the underlying
|
||
tablet servers. Although we refer to such tables as partitioned tables, they are
|
||
distinguished from traditional Impala partitioned tables by use of different clauses
|
||
on the <codeph>CREATE TABLE</codeph> statement. Kudu tables use
|
||
<codeph>PARTITION BY</codeph>, <codeph>HASH</codeph>, <codeph>RANGE</codeph>, and
|
||
range specification clauses rather than the <codeph>PARTITIONED BY</codeph> clause
|
||
for HDFS-backed tables, which specifies only a column name and creates a new partition for each
|
||
different value.
|
||
</p>
|
||
|
||
<p>
|
||
For background information and architectural details about the Kudu partitioning
|
||
mechanism, see
|
||
<xref href="https://kudu.apache.org/kudu.pdf" scope="external" format="html">the Kudu white paper, section 3.2</xref>.
|
||
</p>
|
||
|
||
<!-- Hiding but leaving in place for the moment, in case the white paper discussion isn't enough.
|
||
<p>
|
||
With Kudu tables, all of the columns involved in these clauses must be primary key
|
||
columns. These clauses let you specify different ways to divide the data for each
|
||
column, or even for different value ranges within a column. This flexibility lets you
|
||
avoid problems with uneven distribution of data, where the partitioning scheme for
|
||
HDFS tables might result in some partitions being much larger than others. By setting
|
||
up an effective partitioning scheme for a Kudu table, you can ensure that the work for
|
||
a query can be parallelized evenly across the hosts in a cluster.
|
||
</p>
|
||
-->
|
||
|
||
<note>
|
||
<p>
|
||
The Impala DDL syntax for Kudu tables is different than in early Kudu versions,
|
||
which used an experimental fork of the Impala code. For example, the
|
||
<codeph>DISTRIBUTE BY</codeph> clause is now <codeph>PARTITION BY</codeph>, the
|
||
<codeph>INTO <varname>n</varname> BUCKETS</codeph> clause is now
|
||
<codeph>PARTITIONS <varname>n</varname></codeph> and the range partitioning syntax
|
||
is reworked to replace the <codeph>SPLIT ROWS</codeph> clause with more expressive
|
||
syntax involving comparison operators.
|
||
</p>
|
||
</note>
|
||
|
||
<p outputclass="toc inpage"/>
|
||
|
||
</conbody>
|
||
|
||
<concept id="kudu_hash_partitioning">
|
||
<title>Hash Partitioning</title>
|
||
<conbody>
|
||
|
||
<p>
|
||
Hash partitioning is the simplest type of partitioning for Kudu tables.
|
||
For hash-partitioned Kudu tables, inserted rows are divided up between a fixed number
|
||
of <q>buckets</q> by applying a hash function to the values of the columns specified
|
||
in the <codeph>HASH</codeph> clause.
|
||
Hashing ensures that rows with similar values are evenly distributed, instead of
|
||
clumping together all in the same bucket. Spreading new rows across the buckets this
|
||
way lets insertion operations work in parallel across multiple tablet servers.
|
||
Separating the hashed values can impose additional overhead on queries, where
|
||
queries with range-based predicates might have to read multiple tablets to retrieve
|
||
all the relevant values.
|
||
</p>
|
||
|
||
<codeblock>
|
||
-- 1M rows with 50 hash partitions = approximately 20,000 rows per partition.
|
||
-- The values in each partition are not sequential, but rather based on a hash function.
|
||
-- Rows 1, 99999, and 123456 might be in the same partition.
|
||
CREATE TABLE million_rows (id string primary key, s string)
|
||
PARTITION BY HASH(id) PARTITIONS 50
|
||
STORED AS KUDU;
|
||
|
||
-- Because the ID values are unique, we expect the rows to be roughly
|
||
-- evenly distributed between the buckets in the destination table.
|
||
INSERT INTO million_rows SELECT * FROM billion_rows ORDER BY id LIMIT 1e6;
|
||
</codeblock>
|
||
|
||
<note>
|
||
<p>
|
||
The largest number of buckets that you can create with a <codeph>PARTITIONS</codeph>
|
||
clause varies depending on the number of tablet servers in the cluster, while the smallest is 2.
|
||
For simplicity, some of the simple <codeph>CREATE TABLE</codeph> statements throughout this section
|
||
use <codeph>PARTITIONS 2</codeph> to illustrate the minimum requirements for a Kudu table.
|
||
For large tables, prefer to use roughly 10 partitions per server in the cluster.
|
||
</p>
|
||
</note>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="kudu_range_partitioning">
|
||
<title>Range Partitioning</title>
|
||
<conbody>
|
||
|
||
<p>
|
||
Range partitioning lets you specify partitioning precisely, based on single values or ranges
|
||
of values within one or more columns. You add one or more <codeph>RANGE</codeph> clauses to the
|
||
<codeph>CREATE TABLE</codeph> statement, following the <codeph>PARTITION BY</codeph>
|
||
clause.
|
||
</p>
|
||
|
||
<p>
|
||
Range-partitioned Kudu tables use one or more range clauses, which include a
|
||
combination of constant expressions, <codeph>VALUE</codeph> or <codeph>VALUES</codeph>
|
||
keywords, and comparison operators. (This syntax replaces the <codeph>SPLIT
|
||
ROWS</codeph> clause used with early Kudu versions.)
|
||
For the full syntax, see <xref keyref="create_table"/>.
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
-- 50 buckets, all for IDs beginning with a lowercase letter.
|
||
-- Having only a single range enforces the allowed range of values
|
||
-- but does not add any extra parallelism.
|
||
create table million_rows_one_range (id string primary key, s string)
|
||
partition by hash(id) partitions 50,
|
||
range (partition 'a' <= values < '{')
|
||
stored as kudu;
|
||
|
||
-- 50 buckets for IDs beginning with a lowercase letter
|
||
-- plus 50 buckets for IDs beginning with an uppercase letter.
|
||
-- Total number of buckets = number in the PARTITIONS clause x number of ranges.
|
||
-- We are still enforcing constraints on the primary key values
|
||
-- allowed in the table, and the 2 ranges provide better parallelism
|
||
-- as rows are inserted or the table is scanned.
|
||
create table million_rows_two_ranges (id string primary key, s string)
|
||
partition by hash(id) partitions 50,
|
||
range (partition 'a' <= values < '{', partition 'A' <= values < '[')
|
||
stored as kudu;
|
||
|
||
-- Same as previous table, with an extra range covering the single key value '00000'.
|
||
create table million_rows_three_ranges (id string primary key, s string)
|
||
partition by hash(id) partitions 50,
|
||
range (partition 'a' <= values < '{', partition 'A' <= values < '[', partition value = '00000')
|
||
stored as kudu;
|
||
|
||
-- The range partitioning can be displayed with a SHOW command in impala-shell.
|
||
show range partitions million_rows_three_ranges;
|
||
+---------------------+
|
||
| RANGE (id) |
|
||
+---------------------+
|
||
| VALUE = "00000" |
|
||
| "A" <= VALUES < "[" |
|
||
| "a" <= VALUES < "{" |
|
||
+---------------------+
|
||
]]>
|
||
</codeblock>
|
||
|
||
<note>
|
||
<p>
|
||
When defining ranges, be careful to avoid <q>fencepost errors</q> where values at the
|
||
extreme ends might be included or omitted by accident. For example, in the tables defined
|
||
in the preceding code listings, the range <codeph><![CDATA["a" <= VALUES < "{"]]></codeph> ensures that
|
||
any values starting with <codeph>z</codeph>, such as <codeph>za</codeph> or <codeph>zzz</codeph>
|
||
or <codeph>zzz-ZZZ</codeph>, are all included, by using a less-than operator for the smallest
|
||
value after all the values starting with <codeph>z</codeph>.
|
||
</p>
|
||
</note>
|
||
|
||
<p>
|
||
For range-partitioned Kudu tables, an appropriate range must exist before a data value can be created in the table.
|
||
Any <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>UPSERT</codeph> statements fail if they try to
|
||
create column values that fall outside the specified ranges. The error checking for ranges is performed on the
|
||
Kudu side; Impala passes the specified range information to Kudu, and passes back any error or warning if the
|
||
ranges are not valid. (A nonsensical range specification causes an error for a DDL statement, but only a warning
|
||
for a DML statement.)
|
||
</p>
|
||
|
||
<p>
|
||
Ranges can be non-contiguous:
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
partition by range (year) (partition 1885 <= values <= 1889, partition 1893 <= values <= 1897)
|
||
|
||
partition by range (letter_grade) (partition value = 'A', partition value = 'B',
|
||
partition value = 'C', partition value = 'D', partition value = 'F')
|
||
]]>
|
||
</codeblock>
|
||
|
||
<p>
|
||
The <codeph>ALTER TABLE</codeph> statement with the <codeph>ADD PARTITION</codeph> or
|
||
<codeph>DROP PARTITION</codeph> clauses can be used to add or remove ranges from an
|
||
existing Kudu table.
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
ALTER TABLE foo ADD PARTITION 30 <= VALUES < 50;
|
||
ALTER TABLE foo DROP PARTITION 1 <= VALUES < 5;
|
||
]]>
|
||
</codeblock>
|
||
|
||
<p>
|
||
When a range is added, the new range must not overlap with any of the previous ranges;
|
||
that is, it can only fill in gaps within the previous ranges.
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
alter table test_scores add range partition value = 'E';
|
||
|
||
alter table year_ranges add range partition 1890 <= values < 1893;
|
||
]]>
|
||
</codeblock>
|
||
|
||
<p>
|
||
When a range is removed, all the associated rows in the table are deleted. (This
|
||
is true whether the table is internal or external.)
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
alter table test_scores drop range partition value = 'E';
|
||
|
||
alter table year_ranges drop range partition 1890 <= values < 1893;
|
||
]]>
|
||
</codeblock>
|
||
|
||
<p>
|
||
Kudu tables can also use a combination of hash and range partitioning.
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
partition by hash (school) partitions 10,
|
||
range (letter_grade) (partition value = 'A', partition value = 'B',
|
||
partition value = 'C', partition value = 'D', partition value = 'F')
|
||
]]>
|
||
</codeblock>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="kudu_partitioning_misc">
|
||
<title>Working with Partitioning in Kudu Tables</title>
|
||
<conbody>
|
||
|
||
<p>
|
||
To see the current partitioning scheme for a Kudu table, you can use the <codeph>SHOW
|
||
CREATE TABLE</codeph> statement or the <codeph>SHOW PARTITIONS</codeph> statement. The
|
||
<codeph>CREATE TABLE</codeph> syntax displayed by this statement includes all the
|
||
hash, range, or both clauses that reflect the original table structure plus any
|
||
subsequent <codeph>ALTER TABLE</codeph> statements that changed the table structure.
|
||
</p>
|
||
|
||
<p>
|
||
To see the underlying buckets and partitions for a Kudu table, use the
|
||
<codeph>SHOW TABLE STATS</codeph> or <codeph>SHOW PARTITIONS</codeph> statement.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_timestamps">
|
||
|
||
<title>Handling Date, Time, or Timestamp Data with Kudu</title>
|
||
|
||
<conbody>
|
||
|
||
<p conref="../shared/impala_common.xml#common/kudu_timestamp_details"/>
|
||
|
||
<codeblock rev="2.9.0 IMPALA-5137"><![CDATA[--- Make a table representing a date/time value as TIMESTAMP.
|
||
-- The strings representing the partition bounds are automatically
|
||
-- cast to TIMESTAMP values.
|
||
create table native_timestamp(id bigint, when_exactly timestamp, event string, primary key (id, when_exactly))
|
||
partition by hash (id) partitions 20,
|
||
range (when_exactly)
|
||
(
|
||
partition '2015-01-01' <= values < '2016-01-01',
|
||
partition '2016-01-01' <= values < '2017-01-01',
|
||
partition '2017-01-01' <= values < '2018-01-01'
|
||
)
|
||
stored as kudu;
|
||
|
||
insert into native_timestamp values (12345, now(), 'Working on doc examples');
|
||
|
||
select * from native_timestamp;
|
||
+-------+-------------------------------+-------------------------+
|
||
| id | when_exactly | event |
|
||
+-------+-------------------------------+-------------------------+
|
||
| 12345 | 2017-05-31 16:27:42.667542000 | Working on doc examples |
|
||
+-------+-------------------------------+-------------------------+
|
||
]]>
|
||
</codeblock>
|
||
|
||
<p>
|
||
Because Kudu tables have some performance overhead to convert <codeph>TIMESTAMP</codeph>
|
||
columns to the Impala 96-bit internal representation, for performance-critical
|
||
applications you might store date/time information as the number
|
||
of seconds, milliseconds, or microseconds since the Unix epoch date of January 1,
|
||
1970. Specify the column as <codeph>BIGINT</codeph> in the Impala <codeph>CREATE
|
||
TABLE</codeph> statement, corresponding to an 8-byte integer (an
|
||
<codeph>int64</codeph>) in the underlying Kudu table). Then use Impala date/time
|
||
conversion functions as necessary to produce a numeric, <codeph>TIMESTAMP</codeph>,
|
||
or <codeph>STRING</codeph> value depending on the context.
|
||
</p>
|
||
|
||
<p>
|
||
For example, the <codeph>unix_timestamp()</codeph> function returns an integer result
|
||
representing the number of seconds past the epoch. The <codeph>now()</codeph> function
|
||
produces a <codeph>TIMESTAMP</codeph> representing the current date and time, which can
|
||
be passed as an argument to <codeph>unix_timestamp()</codeph>. And string literals
|
||
representing dates and date/times can be cast to <codeph>TIMESTAMP</codeph>, and from there
|
||
converted to numeric values. The following examples show how you might store a date/time
|
||
column as <codeph>BIGINT</codeph> in a Kudu table, but still use string literals and
|
||
<codeph>TIMESTAMP</codeph> values for convenience.
|
||
</p>
|
||
|
||
<codeblock><![CDATA[
|
||
-- now() returns a TIMESTAMP and shows the format for string literals you can cast to TIMESTAMP.
|
||
select now();
|
||
+-------------------------------+
|
||
| now() |
|
||
+-------------------------------+
|
||
| 2017-01-25 23:50:10.132385000 |
|
||
+-------------------------------+
|
||
|
||
-- unix_timestamp() accepts either a TIMESTAMP or an equivalent string literal.
|
||
select unix_timestamp(now());
|
||
+------------------+
|
||
| unix_timestamp() |
|
||
+------------------+
|
||
| 1485386670 |
|
||
+------------------+
|
||
|
||
select unix_timestamp('2017-01-01');
|
||
+------------------------------+
|
||
| unix_timestamp('2017-01-01') |
|
||
+------------------------------+
|
||
| 1483228800 |
|
||
+------------------------------+
|
||
|
||
-- Make a table representing a date/time value as BIGINT.
|
||
-- Construct 1 range partition and 20 associated hash partitions for each year.
|
||
-- Use date/time conversion functions to express the ranges as human-readable dates.
|
||
create table time_series(id bigint, when_exactly bigint, event string, primary key (id, when_exactly))
|
||
partition by hash (id) partitions 20,
|
||
range (when_exactly)
|
||
(
|
||
partition unix_timestamp('2015-01-01') <= values < unix_timestamp('2016-01-01'),
|
||
partition unix_timestamp('2016-01-01') <= values < unix_timestamp('2017-01-01'),
|
||
partition unix_timestamp('2017-01-01') <= values < unix_timestamp('2018-01-01')
|
||
)
|
||
stored as kudu;
|
||
|
||
-- On insert, we can transform a human-readable date/time into a numeric value.
|
||
insert into time_series values (12345, unix_timestamp('2017-01-25 23:24:56'), 'Working on doc examples');
|
||
|
||
-- On retrieval, we can examine the numeric date/time value or turn it back into a string for readability.
|
||
select id, when_exactly, from_unixtime(when_exactly) as 'human-readable date/time', event
|
||
from time_series order by when_exactly limit 100;
|
||
+-------+--------------+--------------------------+-------------------------+
|
||
| id | when_exactly | human-readable date/time | event |
|
||
+-------+--------------+--------------------------+-------------------------+
|
||
| 12345 | 1485386696 | 2017-01-25 23:24:56 | Working on doc examples |
|
||
+-------+--------------+--------------------------+-------------------------+
|
||
]]>
|
||
</codeblock>
|
||
|
||
<note>
|
||
<p>
|
||
If you do high-precision arithmetic involving numeric date/time values,
|
||
when dividing millisecond values by 1000, or microsecond values by 1 million, always
|
||
cast the integer numerator to a <codeph>DECIMAL</codeph> with sufficient precision
|
||
and scale to avoid any rounding or loss of precision.
|
||
</p>
|
||
</note>
|
||
|
||
<codeblock><![CDATA[
|
||
-- 1 million and 1 microseconds = 1.000001 seconds.
|
||
select microseconds,
|
||
cast (microseconds as decimal(20,7)) / 1e6 as fractional_seconds
|
||
from table_with_microsecond_column;
|
||
+--------------+----------------------+
|
||
| microseconds | fractional_seconds |
|
||
+--------------+----------------------+
|
||
| 1000001 | 1.000001000000000000 |
|
||
+--------------+----------------------+
|
||
]]>
|
||
</codeblock>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_metadata">
|
||
|
||
<title>How Impala Handles Kudu Metadata</title>
|
||
|
||
<conbody>
|
||
<note>This section only applies the Kudu services that are not
|
||
integrated with the Hive Metastore (HMS).</note>
|
||
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
|
||
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
|
||
<p> Because Kudu manages the metadata for its own tables separately from
|
||
the metastore database, there is a table name stored in the metastore
|
||
database for Impala to use, and a table name on the Kudu side, and
|
||
these names can be modified independently through <codeph>ALTER
|
||
TABLE</codeph> statements. </p>
|
||
<p> To avoid potential name conflicts, the prefix
|
||
<codeph>impala::</codeph> and the Impala database name are encoded
|
||
into the underlying Kudu table name: </p>
|
||
<codeblock><![CDATA[
|
||
create database some_database;
|
||
use some_database;
|
||
|
||
create table table_name_demo (x int primary key, y int)
|
||
partition by hash (x) partitions 2 stored as kudu;
|
||
|
||
describe formatted table_name_demo;
|
||
...
|
||
kudu.table_name | impala::some_database.table_name_demo
|
||
]]>
|
||
</codeblock>
|
||
<p> See <xref keyref="kudu_tables"/> for examples of how to change the
|
||
name of the Impala table in the metastore database, the name of the
|
||
underlying Kudu table, or both. </p>
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
</concept>
|
||
<concept id="kudu_hms">
|
||
<title>Working with Kudu Integrated with Hive Metastore</title>
|
||
<conbody>
|
||
<p>Starting from Kudu 1.10 and Impala 3.3, Impala supports Kudu services
|
||
integrated with the Hive Metastore (HMS). See <xref
|
||
href="https://kudu.apache.org/docs/hive_metastore.html#hive_metastore"
|
||
format="html" scope="external">the HMS integration
|
||
documentation</xref> for more details on Kudu’s Hive Metastore
|
||
integration.</p>
|
||
<p>The following are some of the changes you need to consider when working
|
||
with Kudu services integrated with the HMS.<ul>
|
||
<li> When Kudu is integrated with the Hive Metastore, Impala must be
|
||
configured to use the same HMS as Kudu.</li>
|
||
<li> Since there may be no one-to-one mapping between Kudu tables and
|
||
external tables, only internal tables are automatically
|
||
synchronized. </li>
|
||
<li>When you create a table in Kudu, Kudu will create an HMS entry for
|
||
that table with the internal table type.</li>
|
||
<li>When the Kudu service is integrated with the HMS, internal table
|
||
entries will be created automatically in the HMS when tables are
|
||
created in Kudu without Impala. To access these tables through
|
||
Impala, run <codeph>INVALIDATE METADATA</codeph> statement so Impala
|
||
picks up the latest metadata.</li>
|
||
</ul></p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="kudu_etl">
|
||
|
||
<title>Loading Data into Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Kudu tables are well-suited to use cases where data arrives continuously, in small or
|
||
moderate volumes. To bring data into Kudu tables, use the Impala <codeph>INSERT</codeph>
|
||
and <codeph>UPSERT</codeph> statements. The <codeph>LOAD DATA</codeph> statement does
|
||
not apply to Kudu tables.
|
||
</p>
|
||
|
||
<p>
|
||
Because Kudu manages its own storage layer that is optimized for smaller block sizes than
|
||
HDFS, and performs its own housekeeping to keep data evenly distributed, it is not
|
||
subject to the <q>many small files</q> issue and does not need explicit reorganization
|
||
and compaction as the data grows over time. The partitions within a Kudu table can be
|
||
specified to cover a variety of possible data distributions, instead of hardcoding a new
|
||
partition for each new day, hour, and so on, which can lead to inefficient,
|
||
hard-to-scale, and hard-to-manage partition schemes with HDFS tables.
|
||
</p>
|
||
|
||
<p>
|
||
Your strategy for performing ETL or bulk updates on Kudu tables should take into account
|
||
the limitations on consistency for DML operations.
|
||
</p>
|
||
|
||
<p>
|
||
Make <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, and <codeph>UPSERT</codeph>
|
||
operations <term>idempotent</term>: that is, able to be applied multiple times and still
|
||
produce an identical result.
|
||
</p>
|
||
|
||
<p>
|
||
If a bulk operation is in danger of exceeding capacity limits due to timeouts or high
|
||
memory usage, split it into a series of smaller operations.
|
||
</p>
|
||
|
||
<p>
|
||
Avoid running concurrent ETL operations where the end results depend on precise
|
||
ordering. In particular, do not rely on an <codeph>INSERT ... SELECT</codeph> statement
|
||
that selects from the same table into which it is inserting, unless you include extra
|
||
conditions in the <codeph>WHERE</codeph> clause to avoid reading the newly inserted rows
|
||
within the same statement.
|
||
</p>
|
||
|
||
<p>
|
||
Because relationships between tables cannot be enforced by Impala and Kudu, and cannot
|
||
be committed or rolled back together, do not expect transactional semantics for
|
||
multi-table operations.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_dml">
|
||
|
||
<title>Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)</title>
|
||
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="DML"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Impala supports certain DML statements for Kudu tables only. The <codeph>UPDATE</codeph>
|
||
and <codeph>DELETE</codeph> statements let you modify data within Kudu tables without
|
||
rewriting substantial amounts of table data. The <codeph>UPSERT</codeph> statement acts
|
||
as a combination of <codeph>INSERT</codeph> and <codeph>UPDATE</codeph>, inserting rows
|
||
where the primary key does not already exist, and updating the non-primary key columns
|
||
where the primary key does already exist in the table.
|
||
</p>
|
||
|
||
<p>
|
||
The <codeph>INSERT</codeph> statement for Kudu tables honors the unique and <codeph>NOT
|
||
NULL</codeph> requirements for the primary key columns.
|
||
</p>
|
||
|
||
<p>
|
||
Because Impala and Kudu do not support transactions, the effects of any
|
||
<codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>DELETE</codeph> statement
|
||
are immediately visible. For example, you cannot do a sequence of
|
||
<codeph>UPDATE</codeph> statements and only make the changes visible after all the
|
||
statements are finished. Also, if a DML statement fails partway through, any rows that
|
||
were already inserted, deleted, or changed remain in the table; there is no rollback
|
||
mechanism to undo the changes.
|
||
</p>
|
||
|
||
<p>
|
||
In particular, an <codeph>INSERT ... SELECT</codeph> statement that refers to the table
|
||
being inserted into might insert more rows than expected, because the
|
||
<codeph>SELECT</codeph> part of the statement sees some of the new rows being inserted
|
||
and processes them again.
|
||
</p>
|
||
|
||
<note>
|
||
<p>
|
||
The <codeph>LOAD DATA</codeph> statement, which involves manipulation of HDFS data files,
|
||
does not apply to Kudu tables.
|
||
</p>
|
||
</note>
|
||
|
||
<p conref="../shared/impala_common.xml#common/kudu_hints"/>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
<concept id="multi_rows_transaction">
|
||
<title>Multi-row Transactions for Kudu Tables</title>
|
||
<conbody>
|
||
<p> When you use Impala to query Kudu tables, you can insert multiple rows into a Kudu table
|
||
in a single transaction. This broader transactional support between Kudu and Impala is
|
||
available to you at a query level and at a session level.</p></conbody>
|
||
</concept>
|
||
<concept id="using_multi_row_transaction">
|
||
<title>Using Multi-row Transaction Capability</title>
|
||
<conbody>
|
||
<p>You can control this multi-row transaction feature by using the following query option. You
|
||
may set this option at per-query or per-session level. When the option is enabled for a
|
||
session, Impala will open one Kudu transaction for each INSERT or CTAS statement.</p>
|
||
<codeblock>set ENABLE_KUDU_TRANSACTION=true</codeblock>
|
||
<p>The following example shows how to insert three rows into a table in a single
|
||
transaction.</p>
|
||
<p><b>Example:</b></p>
|
||
<p><ol>
|
||
<li>Create table kudu-test-tbl-1.
|
||
<codeblock>create table kudu-test-tbl-1 (a int primary key, b string) partition by hash(a) partitions 8 stored as kudu;</codeblock></li>
|
||
<li>Enable the multi-row transaction feature at the query
|
||
level.<codeblock>set ENABLE_KUDU_TRANSACTION=true;</codeblock></li>
|
||
<li>Insert three rows into the newly created table in a single transaction.
|
||
<codeblock>insert into kudu-test-tbl-1 values (0, 'a'), (1, 'b'), (2, 'c');</codeblock></li>
|
||
<li>Verify the number of rows of this table.
|
||
<codeblock>select count(*) from kudu-test-tbl-1;</codeblock></li>
|
||
</ol></p>
|
||
<p><b>Note:</b></p>
|
||
<p>If you insert multiple rows with duplicate keys into a table, the transaction is aborted.
|
||
To ignore the conflicts with duplicate keys during the transaction, start Impala daemons
|
||
with the flag <codeph>--kudu_ignore_conflicts_in_transaction=true</codeph>. This flag is set
|
||
to False by default. Note that this flag takes effect only if the flag
|
||
<codeph>--kudu_ignore_conflicts</codeph> is set as True. The flag
|
||
<codeph>--kudu_ignore_conflicts</codeph> is set to True by default.</p>
|
||
<p>When you enable the option <codeph>ENABLE_KUDU_TRANSACTION</codeph>, each Impala statement
|
||
is executed with a new opened transaction. If the statement is executed successfully, then
|
||
the Impala Coordinator commits the transaction. If there is an error returned by Kudu, then
|
||
Impala aborts the transaction.</p>
|
||
<p>This applies to the following statements:</p>
|
||
<p><ul>
|
||
<li>INSERT</li>
|
||
<li>CREATE TABLE AS SELECT</li>
|
||
</ul></p>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="advantages">
|
||
<title>Advantages of Using This Capability</title>
|
||
<conbody>
|
||
<p>You can now easily build and manage Kudu applications, especially when Impala is used to
|
||
interact with the data in the Kudu table. With multi-row transaction, you can atomically
|
||
ingest large number of rows into a Kudu table with INSERT-SELECT or CTAS statement.</p></conbody>
|
||
</concept>
|
||
<concept id="limitation">
|
||
<title>Limitation</title>
|
||
<conbody>
|
||
<p>INSERT and CTAS statements are supported for Kudu tables in the context of a multi-row
|
||
transaction, but UPDATE/UPSERT/DELETE statements are not supported in multi-row transaction
|
||
as of now.</p></conbody>
|
||
</concept>
|
||
|
||
<concept id="kudu_consistency">
|
||
|
||
<title>Consistency Considerations for Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>Kudu tables have consistency characteristics such as uniqueness, controlled by the primary
|
||
key columns, and non-nullable columns. The emphasis for consistency is on preventing
|
||
duplicate or incomplete data from being stored in a table. </p>
|
||
|
||
<p>Currently, Kudu does not enforce strong consistency for order of operations, or data that
|
||
is read while a write operation is in progress. If multi-rows transaction is enabled,
|
||
insertion of multiple rows in one insertion statement will be atomic, i.e. total success or
|
||
total failure. But if multi-row transaction is not enabled, changes are applied atomically
|
||
to each row, not applied as a single unit to all rows affected by a multi-row DML statement. </p>
|
||
|
||
<p>When multi-row transaction is not enabled and if some rows are rejected during a DML
|
||
operation because of a mismatch with duplicate primary key values, <codeph>NOT NULL</codeph>
|
||
constraints, and so on, the statement succeeds with a warning. Impala still inserts,
|
||
deletes, or updates the other rows that are not affected by the constraint violation. </p>
|
||
|
||
<p>Consequently, the number of rows affected by a DML operation on a Kudu table might be
|
||
different than you expect. </p>
|
||
|
||
<p>Because there is no strong consistency guarantee for information being inserted into with
|
||
separate INSERT statements, deleted from, or updated across multiple tables simultaneously,
|
||
consider denormalizing the data where practical. That is, if you run separate
|
||
<codeph>INSERT</codeph> statements to insert related rows into two different tables, one
|
||
<codeph>INSERT</codeph> might fail while the other succeeds, leaving the data in an
|
||
inconsistent state. Even if both inserts succeed, a join query might happen during the
|
||
interval between the completion of the first and second statements, and the query would
|
||
encounter incomplete inconsistent data. Denormalizing the data into a single wide table can
|
||
reduce the possibility of inconsistency due to multi-table operations. </p>
|
||
|
||
<p>Information about the number of rows affected by a DML operation is reported in
|
||
<cmdname>impala-shell</cmdname> output, and in the <codeph>PROFILE</codeph> output, but is
|
||
not currently reported to HiveServer2 clients such as JDBC or ODBC applications. </p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_security">
|
||
|
||
<title>Security Considerations for Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Security for Kudu tables involves:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Ranger authorization.
|
||
</p>
|
||
<p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
|
||
</li>
|
||
|
||
<li rev="2.9.0">
|
||
<p>
|
||
Kerberos authentication. See <xref keyref="kudu_security"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="2.9.0">
|
||
<p>
|
||
TLS encryption. See <xref keyref="kudu_security"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Lineage tracking.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Auditing.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Redaction of sensitive information from log files.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_performance">
|
||
|
||
<title>Impala Query Performance for Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For queries involving Kudu tables, Impala can delegate much of the work of filtering the
|
||
result set to Kudu, avoiding some of the I/O involved in full table scans of tables
|
||
containing HDFS data files. This type of optimization is especially effective for
|
||
partitioned Kudu tables, where the Impala query <codeph>WHERE</codeph> clause refers to
|
||
one or more primary key columns that are also used as partition key columns. For
|
||
example, if a partitioned Kudu table uses a <codeph>HASH</codeph> clause for
|
||
<codeph>col1</codeph> and a <codeph>RANGE</codeph> clause for <codeph>col2</codeph>, a
|
||
query using a clause such as <codeph>WHERE col1 IN (1,2,3) AND col2 > 100</codeph>
|
||
can determine exactly which tablet servers contain relevant data, and therefore
|
||
parallelize the query very efficiently.
|
||
</p>
|
||
|
||
<p rev="2.11.0 IMPALA-4252">
|
||
In <keyword keyref="impala211_full"/> and higher, Impala can push down additional
|
||
information to optimize join queries involving Kudu tables. If the join clause
|
||
contains predicates of the form
|
||
<codeph><varname>column</varname> = <varname>expression</varname></codeph>,
|
||
after Impala constructs a hash table of possible matching values for the
|
||
join columns from the bigger table (either an HDFS table or a Kudu table), Impala
|
||
can <q>push down</q> the minimum and maximum matching column values to Kudu,
|
||
so that Kudu can more efficiently locate matching rows in the second (smaller) table.
|
||
These min/max filters are affected by the <codeph>RUNTIME_FILTER_MODE</codeph>,
|
||
<codeph>RUNTIME_FILTER_WAIT_TIME_MS</codeph>, and <codeph>DISABLE_ROW_RUNTIME_FILTERING</codeph>
|
||
query options; the min/max filters are not affected by the
|
||
<codeph>RUNTIME_BLOOM_FILTER_SIZE</codeph>, <codeph>RUNTIME_FILTER_MIN_SIZE</codeph>,
|
||
<codeph>RUNTIME_FILTER_MAX_SIZE</codeph>, and <codeph>MAX_NUM_RUNTIME_FILTERS</codeph>
|
||
query options.
|
||
</p>
|
||
|
||
<p>
|
||
See <xref keyref="explain"/> for examples of evaluating the effectiveness of
|
||
the predicate pushdown for a specific query against a Kudu table.
|
||
</p>
|
||
|
||
<p conref="../shared/impala_common.xml#common/tablesample_caveat"/>
|
||
|
||
<!-- Hide until subtopics are ready to display. -->
|
||
<p outputclass="toc inpage" audience="hidden"/>
|
||
|
||
</conbody>
|
||
|
||
<concept id="kudu_vs_parquet" audience="hidden">
|
||
<!-- To do: if there is enough real-world experience in future to have a
|
||
substantive discussion of this subject, revisit this topic and
|
||
consider unhiding it. -->
|
||
|
||
<title>How Kudu Works with Column-Oriented Operations</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
For immutable data, Impala is often used with Parquet tables due to the efficiency of
|
||
the column-oriented Parquet layout. This section describes how Kudu stores and
|
||
retrieves columnar data, to help you understand performance and storage considerations
|
||
of Kudu tables as compared with Parquet tables.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="kudu_memory" audience="hidden">
|
||
<!-- To do: if there is enough real-world experience in future to have a
|
||
substantive discussion of this subject, revisit this topic and
|
||
consider unhiding it. -->
|
||
|
||
<title>Memory Usage for Operations on Kudu Tables</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The Apache Kudu architecture, topology, and data storage techniques result in
|
||
different patterns of memory usage for Impala statements than with HDFS-backed tables.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
</concept>
|
||
|
||
</concept>
|