mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
[DOCS] Major update to Impala + Kudu page
Upgrade with details of latest syntax. Fine-tune discussion of PK and other Kudu notions. The impala_kudu diff looks larger than actual changes to the page, because subtopics got moved around and promoted/demoted (which changes the indentation). Best to review that page start-to-finish. CREATE TABLE details for Impala + Kudu. ALTER TABLE details for Impala + Kudu. Unhide the Impala partitioning + Kudu topic. Mainly a brief intro then a link to delegate details to the main Kudu page, which already has a partitioning subtopic. Include changes to reserved words. Entirely from Kudu integration work. Add Kudu considerations for misc SQL statements. Addressed Todd's and Dimitris's comments for certain files. (Up to the beginning of the "Partitioning" section in impala_kudu.xml.) Added Kudu blurbs to data type topics: - Some aren't supported. - Others are supported but can't go in the primary key. Added walkthrough of renaming internal/external tables. Split out Kudu CREATE TABLE syntax from other file formats. Correct info about CTAS for Kudu tables. Add examples of basic Kudu, external Kudu, and Kudu CTAS. Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Reviewed-on: http://gerrit.cloudera.org:8080/5649 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins
This commit is contained in:
committed by
Impala Public Jenkins
parent
aee5457a55
commit
661921b205
@@ -10285,6 +10285,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=p
|
||||
<keydef keys="impala25"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala24"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala23"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala223"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala22"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala21"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala20"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef>
|
||||
@@ -10298,6 +10299,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=p
|
||||
<keydef keys="impala25_full"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala24_full"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala23_full"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala223_full"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala22_full"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala21_full"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef>
|
||||
<keydef keys="impala20_full"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef>
|
||||
|
||||
@@ -3730,6 +3730,58 @@ sudo pip-python install ssl</codeblock>
|
||||
NULL</codeph> attribute to that column.
|
||||
</p>
|
||||
|
||||
<p id="kudu_metadata_intro" rev="kudu">
|
||||
Much of the metadata for Kudu tables is handled by the underlying
|
||||
storage layer. Kudu tables have less reliance on the metastore
|
||||
database, and require less metadata caching on the Impala side.
|
||||
For example, information about partitions in Kudu tables is managed
|
||||
by Kudu, and Impala does not cache any block locality metadata
|
||||
for Kudu tables.
|
||||
</p>
|
||||
|
||||
<p id="kudu_metadata_details" rev="kudu">
|
||||
The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
|
||||
statements are needed less frequently for Kudu tables than for
|
||||
HDFS-backed tables. Neither statement is needed when data is
|
||||
added to, removed, or updated in a Kudu table, even if the changes
|
||||
are made directly to Kudu through a client program using the Kudu API.
|
||||
Run <codeph>REFRESH <varname>table_name</varname></codeph> or
|
||||
<codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
|
||||
for a Kudu table only after making a change to the Kudu table schema,
|
||||
such as adding or dropping a column, by a mechanism other than
|
||||
Impala.
|
||||
</p>
|
||||
|
||||
<p id="kudu_internal_external_tables">
|
||||
The distinction between internal and external tables has some special
|
||||
details for Kudu tables. Tables created entirely through Impala are
|
||||
internal tables. The table name as represented within Kudu includes
|
||||
notation such as an <codeph>impala::</codeph> prefix and the Impala
|
||||
database name. External Kudu tables are those created by a non-Impala
|
||||
mechanism, such as a user application calling the Kudu APIs. For
|
||||
these tables, the <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets
|
||||
you establish a mapping from Impala to the existing Kudu table:
|
||||
<codeblock>
|
||||
CREATE EXTERNAL TABLE impala_name STORED AS KUDU
|
||||
TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
|
||||
</codeblock>
|
||||
External Kudu tables differ in one important way from other external
|
||||
tables: adding or dropping a column or range partition changes the
|
||||
data in the underlying Kudu table, in contrast to an HDFS-backed
|
||||
external table where existing data files are left untouched.
|
||||
</p>
|
||||
|
||||
<p id="kudu_sentry_limitations" rev="IMPALA-4000">
|
||||
Access to Kudu tables must be granted to and revoked from roles as usual.
|
||||
Only users with <codeph>ALL</codeph> privileges on <codeph>SERVER</codeph> can create external Kudu tables.
|
||||
Currently, access to a Kudu table is <q>all or nothing</q>:
|
||||
enforced at the table level rather than the column level, and applying to all
|
||||
SQL operations rather than individual statements such as <codeph>INSERT</codeph>.
|
||||
Because non-SQL APIs can access Kudu data without going through Sentry
|
||||
authorization, currently the Sentry support is considered preliminary
|
||||
and subject to change.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
</conbody>
|
||||
|
||||
@@ -34,6 +34,7 @@ under the License.
|
||||
<data name="Category" value="S3"/>
|
||||
<data name="Category" value="Developers"/>
|
||||
<data name="Category" value="Data Analysts"/>
|
||||
<data name="Category" value="Kudu"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
@@ -63,9 +64,11 @@ ALTER TABLE <varname>name</varname> REPLACE COLUMNS (<varname>col_spec</varname>
|
||||
ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] PARTITION (<varname>partition_spec</varname>)
|
||||
<ph rev="IMPALA-4390">[<varname>location_spec</varname>]</ph>
|
||||
<ph rev="IMPALA-4390">[<varname>cache_spec</varname>]</ph>
|
||||
<ph rev="kudu">ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] RANGE PARTITION (<varname>kudu_partition_spec</varname>)</ph>
|
||||
|
||||
ALTER TABLE <varname>name</varname> DROP [IF EXISTS] PARTITION (<varname>partition_spec</varname>)
|
||||
<ph rev="2.3.0">[PURGE]</ph>
|
||||
<ph rev="kudu">ALTER TABLE <varname>name</varname> DROP [IF EXISTS] RANGE PARTITION <varname>kudu_partition_spec</varname></ph>
|
||||
|
||||
<ph rev="2.3.0 IMPALA-1568 CDH-36799">ALTER TABLE <varname>name</varname> RECOVER PARTITIONS</ph>
|
||||
|
||||
@@ -86,12 +89,18 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize</ph>
|
||||
|
||||
<varname>col_spec</varname> ::= <varname>col_name</varname> <varname>type_name</varname>
|
||||
|
||||
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
|
||||
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
|
||||
|
||||
<varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
|
||||
|
||||
<ph rev="IMPALA-1654"><varname>complex_partition_spec</varname> ::= <varname>comparison_expression_on_partition_col</varname></ph>
|
||||
|
||||
<ph rev="kudu"><varname>kudu_partition_spec</varname> ::= <varname>constant</varname> <varname>range_operator</varname> VALUES <varname>range_operator</varname> <varname>constant</varname> | VALUE = <varname>constant</varname></ph>
|
||||
|
||||
<ph rev="IMPALA-4390">cache_spec ::= CACHED IN '<varname>pool_name</varname>' [WITH REPLICATION = <varname>integer</varname>] | UNCACHED</ph>
|
||||
|
||||
<ph rev="IMPALA-4390">location_spec ::= LOCATION '<varname>hdfs_path_of_directory</varname>'</ph>
|
||||
|
||||
<varname>table_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...]
|
||||
|
||||
<varname>serde_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...]
|
||||
@@ -896,6 +905,75 @@ alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</cod
|
||||
require write and execute permissions for the associated partition directory.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
|
||||
<p rev="kudu IMPALA-2890">
|
||||
Because of the extra constraints and features of Kudu tables, such as the <codeph>NOT NULL</codeph>
|
||||
and <codeph>DEFAULT</codeph> attributes for columns, <codeph>ALTER TABLE</codeph> has specific
|
||||
requirements related to Kudu tables:
|
||||
<ul>
|
||||
<li>
|
||||
<p>
|
||||
In an <codeph>ADD COLUMNS</codeph> operation, you can specify the <codeph>NULL</codeph>,
|
||||
<codeph>NOT NULL</codeph>, and <codeph>DEFAULT <varname>default_value</varname></codeph>
|
||||
column attributes.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
If you add a column with a <codeph>NOT NULL</codeph> attribute, it must also have a
|
||||
<codeph>DEFAULT</codeph> attribute, so the default value can be assigned to that
|
||||
column for all existing rows.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
The <codeph>DROP COLUMN</codeph> clause works the same for a Kudu table as for other
|
||||
kinds of tables.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
Although you can change the name of a column with the <codeph>CHANGE</codeph> clause,
|
||||
you cannot change the type of a column in a Kudu table.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
You cannot assign the <codeph>ENCODING</codeph>, <codeph>COMPRESSION</codeph>,
|
||||
or <codeph>BLOCK_SIZE</codeph> attributes when adding a column.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
You cannot change the default value, nullability, encoding, compression, or block size
|
||||
of existing columns in a Kudu table.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
You cannot use the <codeph>REPLACE COLUMNS</codeph> clause with a Kudu table.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
The <codeph>RENAME TO</codeph> clause for a Kudu table only affects the name stored in the
|
||||
metastore database that Impala uses to refer to the table. To change which underlying Kudu
|
||||
table is associated with an Impala table name, you must change the <codeph>TBLPROPERTIES</codeph>
|
||||
property of the table: <codeph>SET TBLPROPERTIES('kudu.table_name'='<varname>kudu_tbl_name</varname>)</codeph>.
|
||||
Doing so causes Kudu to change the name of the underlying Kudu table.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
||||
<p rev="kudu">
|
||||
Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu
|
||||
tables. You can use the <codeph>ALTER TABLE</codeph> statement to add and drop <term>range partitions</term>
|
||||
from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated
|
||||
rows from the table. See <xref href="impala_kudu.xml#kudu_partitioning"/> for details.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -115,6 +115,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
|
||||
<li/>
|
||||
</ul>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>
|
||||
|
||||
@@ -161,6 +161,9 @@ SELECT claim FROM assertions WHERE really = TRUE;
|
||||
|
||||
<!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> -->
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
|
||||
|
||||
<!-- <p conref="../shared/impala_common.xml#common/related_info"/> -->
|
||||
|
||||
<p>
|
||||
|
||||
@@ -243,6 +243,9 @@ select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c fr
|
||||
+------------------------+----------------------------------+--------------------------------------------+
|
||||
</codeblock>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -52,8 +52,7 @@ under the License.
|
||||
<codeblock rev="2.1.0">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname>
|
||||
COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>partition_spec</varname>)]
|
||||
|
||||
<!-- Is kudu_partition_spec applicable here? -->
|
||||
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
|
||||
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
|
||||
|
||||
<varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
|
||||
|
||||
@@ -523,6 +522,17 @@ show table stats item_partitioned;
|
||||
against the table.)
|
||||
</p>
|
||||
|
||||
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
|
||||
<p rev="IMPALA-2830">
|
||||
The <codeph>COMPUTE STATS</codeph> statement applies to Kudu tables.
|
||||
Impala does not compute the number of rows for each partition for
|
||||
Kudu tables. Therefore, you do not need to re-run the operation when
|
||||
you see -1 in the <codeph># Rows</codeph> column of the output from
|
||||
<codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
|
||||
all Kudu tables.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -822,6 +822,9 @@ SELECT CAST(1000.5 AS DECIMAL);
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -697,6 +697,91 @@ Returned 27 row(s) in 0.17s</codeblock>
|
||||
in an arbitrary HDFS directory based on its <codeph>LOCATION</codeph> attribute.)
|
||||
</p>
|
||||
|
||||
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
|
||||
<p rev="kudu">
|
||||
The information displayed for Kudu tables includes the additional attributes
|
||||
that are only applicable for Kudu tables:
|
||||
</p>
|
||||
<ul rev="kudu">
|
||||
<li>
|
||||
Whether or not the column is part of the primary key. Every Kudu table
|
||||
has a <codeph>true</codeph> value here for at least one column. There
|
||||
could be multiple <codeph>true</codeph> values, for tables with
|
||||
composite primary keys.
|
||||
</li>
|
||||
<li>
|
||||
Whether or not the column is nullable. Specified by the <codeph>NULL</codeph>
|
||||
or <codeph>NOT NULL</codeph> attributes on the <codeph>CREATE TABLE</codeph> statement.
|
||||
Columns that are part of the primary key are automatically non-nullable.
|
||||
</li>
|
||||
<li>
|
||||
The default value, if any, for the column. Specified by the <codeph>DEFAULT</codeph>
|
||||
attribute on the <codeph>CREATE TABLE</codeph> statement. If the default value is
|
||||
<codeph>NULL</codeph>, that is not indicated in this column. It is implied by
|
||||
<codeph>nullable</codeph> being true and no other default value specified.
|
||||
</li>
|
||||
<li>
|
||||
The encoding used for values in the column. Specified by the <codeph>ENCODING</codeph>
|
||||
attribute on the <codeph>CREATE TABLE</codeph> statement.
|
||||
</li>
|
||||
<li>
|
||||
The compression used for values in the column. Specified by the <codeph>COMPRESSION</codeph>
|
||||
attribute on the <codeph>CREATE TABLE</codeph> statement.
|
||||
</li>
|
||||
<li>
|
||||
The block size (in bytes) used for the underlying Kudu storage layer for the column.
|
||||
Specified by the <codeph>BLOCK_SIZE</codeph> attribute on the <codeph>CREATE TABLE</codeph>
|
||||
statement.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p rev="kudu">
|
||||
The following example shows <codeph>DESCRIBE</codeph> output for a simple Kudu table, with
|
||||
a single-column primary key and all column attributes left with their default values:
|
||||
</p>
|
||||
|
||||
<codeblock rev="kudu">
|
||||
describe million_rows;
|
||||
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
|
||||
| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
|
||||
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
|
||||
| id | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| s | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
|
||||
</codeblock>
|
||||
|
||||
<p rev="kudu">
|
||||
The following example shows <codeph>DESCRIBE</codeph> output for a Kudu table with a
|
||||
two-column primary key, and Kudu-specific attributes applied to some columns:
|
||||
</p>
|
||||
|
||||
<codeblock rev="kudu">
|
||||
create table kudu_describe_example
|
||||
(
|
||||
c1 int, c2 int,
|
||||
c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
|
||||
c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
|
||||
primary key(c1,c2)
|
||||
)
|
||||
partition by hash (c1, c2) partitions 10 stored as kudu;
|
||||
|
||||
describe kudu_describe_example;
|
||||
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
|
||||
| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
|
||||
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
|
||||
| c1 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c2 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c3 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c4 | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c5 | string | | false | true | n/a | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c6 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c7 | bigint | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c8 | bigint | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
|
||||
| c9 | bigint | | false | true | -1 | BIT_SHUFFLE | DEFAULT_COMPRESSION | 0 |
|
||||
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
|
||||
</codeblock>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -108,6 +108,9 @@ SELECT CAST(1000.5 AS DOUBLE);
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/float_double_decimal_caveat"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -155,6 +155,15 @@ drop table temporary.trivial;</codeblock>
|
||||
no particular permissions are needed for the associated HDFS files or directories.
|
||||
</p>
|
||||
|
||||
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p rev="kudu">
|
||||
Kudu tables can be managed or external, the same as with HDFS-based
|
||||
tables. For a managed table, the underlying Kudu table and its data
|
||||
are removed by <codeph>DROP TABLE</codeph>. For an external table,
|
||||
the underlying Kudu table and its data remain after a
|
||||
<codeph>DROP TABLE</codeph>.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -234,6 +234,42 @@ EXPLAIN_LEVEL set to extended
|
||||
if the source table is partitioned.)
|
||||
</p>
|
||||
|
||||
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p>
|
||||
The <codeph>EXPLAIN</codeph> statement displays equivalent plan
|
||||
information for queries against Kudu tables as for queries
|
||||
against HDFS-based tables.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To see which predicates Impala can <q>push down</q> to Kudu for
|
||||
efficient evaluation, without transmitting unnecessary rows back
|
||||
to Impala, look for the <codeph>kudu predicates</codeph> item in
|
||||
the scan phase of the query. The label <codeph>kudu predicates</codeph>
|
||||
indicates a condition that can be evaluated efficiently on the Kudu
|
||||
side. The label <codeph>predicates</codeph> in a <codeph>SCAN KUDU</codeph>
|
||||
node indicates a condition that is evaluated by Impala.
|
||||
For example, in a table with primary key column <codeph>X</codeph>
|
||||
and non-primary key column <codeph>Y</codeph>, you can see that
|
||||
some operators in the <codeph>WHERE</codeph> clause are evaluated
|
||||
immediately by Kudu and others are evaluated later by Impala:
|
||||
<codeblock>
|
||||
EXPLAIN SELECT x,y from kudu_table WHERE
|
||||
x = 1 AND x NOT IN (2,3) AND y = 1
|
||||
AND x IS NOT NULL AND x > 0;
|
||||
+----------------
|
||||
| Explain String
|
||||
+----------------
|
||||
...
|
||||
| 00:SCAN KUDU [jrussell.hash_only]
|
||||
| predicates: x IS NOT NULL, x NOT IN (2, 3)
|
||||
| kudu predicates: x = 1, x > 0, y = 1
|
||||
</codeblock>
|
||||
Only binary predicates and <codeph>IN</codeph> predicates containing
|
||||
literal values that exactly match the types in the Kudu table, and do not
|
||||
require any casting, can be pushed to Kudu.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
<p>
|
||||
<xref href="impala_select.xml#select"/>,
|
||||
|
||||
@@ -102,6 +102,9 @@ SELECT CAST(1000.5 AS FLOAT);
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/float_double_decimal_caveat"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -129,6 +129,9 @@ object_type ::= TABLE | DATABASE | SERVER | URI
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -241,6 +241,11 @@ ERROR: AnalysisException: Database does not exist: new_db_from_hive
|
||||
<p conref="../shared/impala_common.xml#common/s3_metadata"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
|
||||
|
||||
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
<p>
|
||||
<xref href="impala_hadoop.xml#intro_metastore"/>,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -397,6 +397,24 @@ insert into t1 partition(x=NULL, y) select c1, c3 from some_other_table;</codeb
|
||||
<codeph>nullifzero()</codeph>, and <codeph>zeroifnull()</codeph>. See
|
||||
<xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p rev="kudu">
|
||||
Columns in Kudu tables have an attribute that specifies whether or not they can contain
|
||||
<codeph>NULL</codeph> values. A column with a <codeph>NULL</codeph> attribute can contain
|
||||
nulls. A column with a <codeph>NOT NULL</codeph> attribute cannot contain any nulls, and
|
||||
an <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>UPSERT</codeph> statement
|
||||
will skip any row that attempts to store a null in a column designated as <codeph>NOT NULL</codeph>.
|
||||
Kudu tables default to the <codeph>NULL</codeph> setting for each column, except columns that
|
||||
are part of the primary key.
|
||||
</p>
|
||||
<p rev="kudu">
|
||||
In addition to columns with the <codeph>NOT NULL</codeph> attribute, Kudu tables also have
|
||||
restrictions on <codeph>NULL</codeph> values in columns that are part of the primary key for
|
||||
a table. No column that is part of the primary key in a Kudu table can contain any
|
||||
<codeph>NULL</codeph> values.
|
||||
</p>
|
||||
|
||||
</conbody>
|
||||
</concept>
|
||||
</concept>
|
||||
|
||||
@@ -85,6 +85,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
|
||||
<li/>
|
||||
</ul>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>
|
||||
|
||||
@@ -575,7 +575,7 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
|
||||
|
||||
</concept>
|
||||
|
||||
<concept rev="kudu" id="partition_kudu" audience="hidden">
|
||||
<concept rev="kudu 2.8.0" id="partition_kudu">
|
||||
|
||||
<title>Using Partitioning with Kudu Tables</title>
|
||||
|
||||
@@ -593,6 +593,12 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
|
||||
columns.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
See <xref href="impala_kudu.xml#kudu_partitioning"/> for
|
||||
details and examples of the partitioning techniques
|
||||
for Kudu tables.
|
||||
</p>
|
||||
|
||||
</conbody>
|
||||
|
||||
</concept>
|
||||
|
||||
@@ -333,6 +333,11 @@ ERROR: AnalysisException: Items in partition spec must exactly match the partiti
|
||||
<p conref="../shared/impala_common.xml#common/s3_metadata"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
|
||||
|
||||
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
<p>
|
||||
<xref href="impala_hadoop.xml#intro_metastore"/>,
|
||||
|
||||
@@ -82,7 +82,9 @@ avro
|
||||
between
|
||||
bigint
|
||||
<ph rev="1.4.0">binary</ph>
|
||||
<ph rev="kudu">blocksize</ph>
|
||||
boolean
|
||||
<!-- <ph rev="kudu">buckets</ph> -->
|
||||
by
|
||||
<ph rev="1.4.0">cached</ph>
|
||||
<ph rev="2.3.0">cascade</ph>
|
||||
@@ -95,6 +97,7 @@ change
|
||||
column
|
||||
columns
|
||||
comment
|
||||
<ph rev="kudu">compression</ph>
|
||||
compute
|
||||
create
|
||||
cross
|
||||
@@ -105,15 +108,18 @@ databases
|
||||
date
|
||||
datetime
|
||||
decimal
|
||||
<ph rev="2.6.0">delete</ph>
|
||||
<ph rev="kudu">default</ph>
|
||||
<ph rev="kudu">delete</ph>
|
||||
delimited
|
||||
desc
|
||||
describe
|
||||
distinct
|
||||
<!-- <ph rev="kudu">distribute</ph> -->
|
||||
div
|
||||
double
|
||||
drop
|
||||
else
|
||||
<ph rev="kudu">encoding</ph>
|
||||
end
|
||||
escaped
|
||||
exists
|
||||
@@ -136,10 +142,10 @@ function
|
||||
functions
|
||||
<ph rev="2.1.0">grant</ph>
|
||||
group
|
||||
<ph rev="2.6.0">hash</ph>
|
||||
<ph rev="kudu">hash</ph>
|
||||
having
|
||||
if
|
||||
<ph rev="2.6.0">ignore</ph>
|
||||
<!-- <ph rev="kudu">ignore</ph> -->
|
||||
<ph rev="2.5.0">ilike</ph>
|
||||
in
|
||||
<ph rev="2.1.0">incremental</ph>
|
||||
@@ -210,6 +216,7 @@ serdeproperties
|
||||
set
|
||||
show
|
||||
smallint
|
||||
<!-- <ph rev="kudu">split</ph> -->
|
||||
stats
|
||||
stored
|
||||
straight_join
|
||||
@@ -229,8 +236,9 @@ true
|
||||
<ph rev="2.0.0">unbounded</ph>
|
||||
<ph rev="1.4.0">uncached</ph>
|
||||
union
|
||||
<ph rev="2.6.0">update</ph>
|
||||
<ph rev="kudu">update</ph>
|
||||
<ph rev="1.2.1">update_fn</ph>
|
||||
<ph rev="kudu">upsert</ph>
|
||||
use
|
||||
using
|
||||
values
|
||||
|
||||
@@ -108,6 +108,9 @@ object_type ::= TABLE | DATABASE | SERVER | URI
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
|
||||
|
||||
<p rev="2.8.0" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -28,6 +28,7 @@ under the License.
|
||||
<data name="Category" value="Developers"/>
|
||||
<data name="Category" value="Data Analysts"/>
|
||||
<data name="Category" value="Reports"/>
|
||||
<data name="Category" value="Kudu"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
@@ -49,7 +50,8 @@ SHOW TABLES [IN <varname>database_name</varname>] [[LIKE] '<varname>pattern</var
|
||||
<ph rev="1.2.1">SHOW TABLE STATS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
|
||||
<ph rev="1.2.1">SHOW COLUMN STATS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
|
||||
<ph rev="1.4.0">SHOW PARTITIONS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
|
||||
SHOW FILES IN [<varname>database_name</varname>.]<varname>table_name</varname> <ph rev="IMPALA-1654">[PARTITION (<varname>key_col_expression</varname> [, <varname>key_col_expression</varname>]</ph>]
|
||||
<ph rev="1.4.0">SHOW <ph rev="kudu">[RANGE]</ph> PARTITIONS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
|
||||
SHOW FILES IN [<varname>database_name</varname>.]<varname>table_name</varname> <ph rev="IMPALA-1654">[PARTITION (<varname>key_col_expression</varname> [, <varname>key_col_expression</varname>]</ph>]
|
||||
|
||||
<ph rev="2.0.0">SHOW ROLES
|
||||
SHOW CURRENT ROLES
|
||||
@@ -129,7 +131,8 @@ show files in sample_table partition (month like 'J%');
|
||||
<note>
|
||||
This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3).
|
||||
It does not apply to views.
|
||||
It does not apply to tables mapped onto HBase, because HBase does not use the same file-based storage layout.
|
||||
It does not apply to tables mapped onto HBase <ph rev="kudu">or Kudu</ph>,
|
||||
because those data management systems do not use the same file-based storage layout.
|
||||
</note>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
||||
@@ -742,6 +745,61 @@ show tables like '*dim*|t*';
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
|
||||
|
||||
<p rev="kudu">
|
||||
For Kudu tables:
|
||||
</p>
|
||||
|
||||
<ul rev="kudu">
|
||||
<li>
|
||||
<p>
|
||||
The column specifications include attributes such as <codeph>NULL</codeph>,
|
||||
<codeph>NOT NULL</codeph>, <codeph>ENCODING</codeph>, and <codeph>COMPRESSION</codeph>.
|
||||
If you do not specify those attributes in the original <codeph>CREATE TABLE</codeph> statement,
|
||||
the <codeph>SHOW CREATE TABLE</codeph> output displays the defaults that were used.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
The specifications of any <codeph>RANGE</codeph> clauses are not displayed in full.
|
||||
To see the definition of the range clauses for a Kudu table, use the <codeph>SHOW RANGE PARTITIONS</codeph> statement.
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
The <codeph>TBLPROPERTIES</codeph> output reflects the Kudu master address
|
||||
and the internal Kudu name associated with the Impala table.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<codeblock rev="kudu">
|
||||
show CREATE TABLE numeric_grades_default_letter;
|
||||
+------------------------------------------------------------------------------------------------+
|
||||
| result |
|
||||
+------------------------------------------------------------------------------------------------+
|
||||
| CREATE TABLE user.numeric_grades_default_letter ( |
|
||||
| score TINYINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
|
||||
| letter_grade STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION DEFAULT '-', |
|
||||
| student STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
|
||||
| PRIMARY KEY (score) |
|
||||
| ) |
|
||||
| PARTITION BY <b>RANGE (score) (...)</b> |
|
||||
| STORED AS KUDU |
|
||||
| TBLPROPERTIES ('kudu.master_addresses'='vd0342.example.com:7051', |
|
||||
| 'kudu.table_name'='impala::USER.numeric_grades_default_letter') |
|
||||
+------------------------------------------------------------------------------------------------+
|
||||
|
||||
show range partitions numeric_grades_default_letter;
|
||||
+--------------------+
|
||||
| RANGE (score) |
|
||||
+--------------------+
|
||||
| 0 <= VALUES < 50 |
|
||||
| 50 <= VALUES < 65 |
|
||||
| 65 <= VALUES < 80 |
|
||||
| 80 <= VALUES < 100 |
|
||||
+--------------------+
|
||||
</codeblock>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<p>
|
||||
@@ -855,6 +913,39 @@ show create table show_create_table_demo;
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/show_security"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
|
||||
<p rev="kudu IMPALA-2830">
|
||||
Because Kudu tables do not have characteristics derived from HDFS, such
|
||||
as number of files, file format, and HDFS cache status, the output of
|
||||
<codeph>SHOW TABLE STATS</codeph> reflects different characteristics
|
||||
that apply to Kudu tables. If the Kudu table is created with the
|
||||
clause <codeph>PARTITIONS 20</codeph>, then the result set of
|
||||
<codeph>SHOW TABLE STATS</codeph> consists of 20 rows, each representing
|
||||
one of the numbered partitions. For example:
|
||||
</p>
|
||||
|
||||
<codeblock rev="kudu IMPALA-2830">
|
||||
show table stats kudu_table;
|
||||
+--------+-----------+----------+-----------------------+------------+
|
||||
| # Rows | Start Key | Stop Key | Leader Replica | # Replicas |
|
||||
+--------+-----------+----------+-----------------------+------------+
|
||||
| -1 | | 00000001 | host.example.com:7050 | 3 |
|
||||
| -1 | 00000001 | 00000002 | host.example.com:7050 | 3 |
|
||||
| -1 | 00000002 | 00000003 | host.example.com:7050 | 3 |
|
||||
| -1 | 00000003 | 00000004 | host.example.com:7050 | 3 |
|
||||
| -1 | 00000004 | 00000005 | host.example.com:7050 | 3 |
|
||||
...
|
||||
</codeblock>
|
||||
|
||||
<p rev="IMPALA-2830">
|
||||
Impala does not compute the number of rows for each partition for
|
||||
Kudu tables. Therefore, you do not need to re-run <codeph>COMPUTE STATS</codeph>
|
||||
when you see -1 in the <codeph># Rows</codeph> column of the output from
|
||||
<codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
|
||||
all Kudu tables.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<p>
|
||||
@@ -959,6 +1050,14 @@ show table stats store_sales;
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/show_security"/>
|
||||
|
||||
<p rev="kudu IMPALA-2830">
|
||||
The output for <codeph>SHOW COLUMN STATS</codeph> includes
|
||||
the relevant information for Kudu tables.
|
||||
The information for column statistics that originates in the
|
||||
underlying Kudu storage layer is also represented in the
|
||||
metastore database that Impala uses.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<p>
|
||||
@@ -1145,8 +1244,31 @@ show column stats store_sales;
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/show_security"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
|
||||
<p rev="kudu IMPALA-4403">
|
||||
The optional <codeph>RANGE</codeph> clause only applies to Kudu tables. It displays only the partitions
|
||||
defined by the <codeph>RANGE</codeph> clause of <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph>.
|
||||
</p>
|
||||
|
||||
<p rev="kudu IMPALA-4403">
|
||||
Although you can specify <codeph><</codeph> or
|
||||
<codeph><=</codeph> comparison operators when defining
|
||||
range partitions for Kudu tables, Kudu rewrites them if necessary
|
||||
to represent each range as
|
||||
<codeph><varname>low_bound</varname> <= VALUES < <varname>high_bound</varname></codeph>.
|
||||
This rewriting might involve incrementing one of the boundary values
|
||||
or appending a <codeph>\0</codeph> for string values, so that the
|
||||
partition covers the same range as originally specified.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<p>
|
||||
The following example shows the output for a Parquet, text, or other
|
||||
HDFS-backed table partitioned on the <codeph>YEAR</codeph> column:
|
||||
</p>
|
||||
|
||||
<codeblock rev="1.4.0">[localhost:21000] > show partitions census;
|
||||
+-------+-------+--------+------+---------+
|
||||
| year | #Rows | #Files | Size | Format |
|
||||
@@ -1160,6 +1282,53 @@ show column stats store_sales;
|
||||
| 2013 | 1 | 1 | 231B | PARQUET |
|
||||
| Total | 9 | 3 | 275B | |
|
||||
+-------+-------+--------+------+---------+
|
||||
</codeblock>
|
||||
|
||||
<p rev="kudu IMPALA-4403">
|
||||
The following example shows the output for a Kudu table
|
||||
using the hash partitioning mechanism. The number of
|
||||
rows in the result set corresponds to the values used
|
||||
in the <codeph>PARTITIONS <varname>N</varname></codeph>
|
||||
clause of <codeph>CREATE TABLE</codeph>.
|
||||
</p>
|
||||
|
||||
<codeblock rev="kudu IMPALA-4403"><![CDATA[
|
||||
show partitions million_rows_hash;
|
||||
|
||||
+--------+-----------+----------+-----------------------+--
|
||||
| # Rows | Start Key | Stop Key | Leader Replica | # Replicas
|
||||
+--------+-----------+----------+-----------------------+--
|
||||
| -1 | | 00000001 | n236.example.com:7050 | 3
|
||||
| -1 | 00000001 | 00000002 | n236.example.com:7050 | 3
|
||||
| -1 | 00000002 | 00000003 | n336.example.com:7050 | 3
|
||||
| -1 | 00000003 | 00000004 | n238.example.com:7050 | 3
|
||||
| -1 | 00000004 | 00000005 | n338.example.com:7050 | 3
|
||||
....
|
||||
| -1 | 0000002E | 0000002F | n240.example.com:7050 | 3
|
||||
| -1 | 0000002F | 00000030 | n336.example.com:7050 | 3
|
||||
| -1 | 00000030 | 00000031 | n240.example.com:7050 | 3
|
||||
| -1 | 00000031 | | n334.example.com:7050 | 3
|
||||
+--------+-----------+----------+-----------------------+--
|
||||
Fetched 50 row(s) in 0.05s
|
||||
]]>
|
||||
</codeblock>
|
||||
|
||||
<p rev="kudu IMPALA-4403">
|
||||
The following example shows the output for a Kudu table
|
||||
using the range partitioning mechanism:
|
||||
</p>
|
||||
|
||||
<codeblock rev="kudu IMPALA-4403"><![CDATA[
|
||||
show range partitions million_rows_range;
|
||||
+-----------------------+
|
||||
| RANGE (id) |
|
||||
+-----------------------+
|
||||
| VALUES < "A" |
|
||||
| "A" <= VALUES < "[" |
|
||||
| "a" <= VALUES < "{" |
|
||||
| "{" <= VALUES < "~\0" |
|
||||
+-----------------------+
|
||||
]]>
|
||||
</codeblock>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
|
||||
|
||||
@@ -112,6 +112,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
|
||||
<li/>
|
||||
</ul>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>
|
||||
|
||||
@@ -73,14 +73,16 @@ under the License.
|
||||
</ul>
|
||||
|
||||
<p rev="2.2.0">
|
||||
Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (CDH 5.4.0 or higher),
|
||||
or on Isilon storage devices (CDH 5.4.3 or higher). See <xref href="impala_hbase.xml#impala_hbase"/>,
|
||||
Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<keyword keyref="impala22_full"/> or higher),
|
||||
or on Isilon storage devices (<keyword keyref="impala223_full"/> or higher). See <xref href="impala_hbase.xml#impala_hbase"/>,
|
||||
<xref href="impala_s3.xml#s3"/>, and <xref href="impala_isilon.xml#impala_isilon"/>
|
||||
for details about those special kinds of tables.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
|
||||
|
||||
<p outputclass="toc inpage"/>
|
||||
|
||||
<p>
|
||||
<b>Related statements:</b> <xref href="impala_create_table.xml#create_table"/>,
|
||||
<xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>
|
||||
@@ -241,6 +243,7 @@ under the License.
|
||||
|
||||
<concept id="table_file_formats">
|
||||
<title>File Formats</title>
|
||||
|
||||
<conbody>
|
||||
<p>
|
||||
Each table has an associated file format, which determines how Impala interprets the
|
||||
@@ -273,4 +276,142 @@ under the License.
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
<concept rev="kudu" id="kudu_tables">
|
||||
<title>Kudu Tables</title>
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Kudu"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
<p>
|
||||
Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files.
|
||||
Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
|
||||
managed internally by Kudu.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form
|
||||
<codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>. You can see the Kudu-assigned name
|
||||
in the output of <codeph>DESCRIBE FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of the table properties.
|
||||
The Kudu-assigned name remains the same even if you use <codeph>ALTER TABLE</codeph> to rename the Impala table
|
||||
or move it to a different Impala database. If you issue the statement
|
||||
<codeph>ALTER TABLE <varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' = '<varname>different_kudu_table_name</varname>')</codeph>,
|
||||
the effect is different depending on whether the Impala table was created with a regular <codeph>CREATE TABLE</codeph>
|
||||
statement (that is, if it is an internal or managed table), or if it was created with a
|
||||
<codeph>CREATE EXTERNAL TABLE</codeph> statement (and therefore is an external table). Changing the <codeph>kudu.table_name</codeph>
|
||||
property of an internal table physically renames the underlying Kudu table to match the new name.
|
||||
Changing the <codeph>kudu.table_name</codeph> property of an external table switches which underlying Kudu table
|
||||
the Impala table refers to; the underlying Kudu table must already exist.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The following example shows what happens with both internal and external Kudu tables as the <codeph>kudu.table_name</codeph>
|
||||
property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created
|
||||
outside of Impala, that is, through the Kudu API.
|
||||
</p>
|
||||
|
||||
<codeblock>
|
||||
-- This is an internal table that we will create and then rename.
|
||||
create table old_name (id bigint primary key, s string)
|
||||
partition by hash(id) partitions 2 stored as kudu;
|
||||
|
||||
-- Initially, the name OLD_NAME is the same on the Impala and Kudu sides.
|
||||
describe formatted old_name;
|
||||
...
|
||||
| Location: | hdfs://host.example.com:8020/path/user.db/old_name
|
||||
| Table Type: | MANAGED_TABLE | NULL
|
||||
| Table Parameters: | NULL | NULL
|
||||
| | DO_NOT_UPDATE_STATS | true
|
||||
| | kudu.master_addresses | vd0342.halxg.cloudera.com
|
||||
| | kudu.table_name | impala::user.old_name
|
||||
|
||||
-- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name.
|
||||
alter table old_name rename to new_name;
|
||||
|
||||
describe formatted new_name;
|
||||
| Location: | hdfs://host.example.com:8020/path/user.db/new_name
|
||||
| Table Type: | MANAGED_TABLE | NULL
|
||||
| Table Parameters: | NULL | NULL
|
||||
| | DO_NOT_UPDATE_STATS | true
|
||||
| | kudu.master_addresses | vd0342.halxg.cloudera.com
|
||||
| | kudu.table_name | impala::user.old_name
|
||||
|
||||
-- Setting TBLPROPERTIES changes the underlying Kudu name.
|
||||
alter table new_name
|
||||
set tblproperties('kudu.table_name' = 'impala::user.new_name');
|
||||
|
||||
describe formatted new_name;
|
||||
| Location: | hdfs://host.example.com:8020/path/user.db/new_name
|
||||
| Table Type: | MANAGED_TABLE | NULL
|
||||
| Table Parameters: | NULL | NULL
|
||||
| | DO_NOT_UPDATE_STATS | true
|
||||
| | kudu.master_addresses | vd0342.halxg.cloudera.com
|
||||
| | kudu.table_name | impala::user.new_name
|
||||
|
||||
-- Put some data in the table to demonstrate how external tables can map to
|
||||
-- different underlying Kudu tables.
|
||||
insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two');
|
||||
|
||||
-- This external table points to the same underlying Kudu table, NEW_NAME,
|
||||
-- as we created above. No need to declare columns or other table aspects.
|
||||
create external table kudu_table_alias stored as kudu
|
||||
tblproperties('kudu.table_name' = 'impala::user.new_name');
|
||||
|
||||
-- The external table can fetch data from the NEW_NAME table that already
|
||||
-- existed and already had data.
|
||||
select * from kudu_table_alias limit 100;
|
||||
+----+------+
|
||||
| id | s |
|
||||
+----+------+
|
||||
| 1 | one |
|
||||
| 0 | zero |
|
||||
| 2 | two |
|
||||
+----+------+
|
||||
|
||||
-- We cannot re-point the external table at a different underlying Kudu table
|
||||
-- unless that other underlying Kudu table already exists.
|
||||
alter table kudu_table_alias
|
||||
set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
|
||||
ERROR:
|
||||
TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name',
|
||||
Kudu error: The table does not exist: table_name: "impala::user.yet_another_name"
|
||||
|
||||
-- Once the underlying Kudu table exists, we can re-point the external table to it.
|
||||
create table yet_another_name (id bigint primary key, x int, y int, s string)
|
||||
partition by hash(id) partitions 2 stored as kudu;
|
||||
|
||||
alter table kudu_table_alias
|
||||
set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
|
||||
|
||||
-- Now no data is returned because this other table is empty.
|
||||
select * from kudu_table_alias limit 100;
|
||||
|
||||
-- The Impala table automatically recognizes the table schema of the new table,
|
||||
-- for example the extra X and Y columns not present in the original table.
|
||||
describe kudu_table_alias;
|
||||
+------+--------+---------+-------------+----------+...
|
||||
| name | type | comment | primary_key | nullable |...
|
||||
+------+--------+---------+-------------+----------+...
|
||||
| id | bigint | | true | false |...
|
||||
| x | int | | false | true |...
|
||||
| y | int | | false | true |...
|
||||
| s | string | | false | true |...
|
||||
+------+--------+---------+-------------+----------+...
|
||||
</codeblock>
|
||||
|
||||
<p>
|
||||
The <codeph>SHOW TABLE STATS</codeph> output for a Kudu table shows Kudu-specific details about the layout of the table.
|
||||
Instead of information about the number and sizes of files, the information is divided by the Kudu tablets.
|
||||
For each tablet, the output includes the fields
|
||||
<codeph># Rows</codeph> (although this number is not currently computed), <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
|
||||
The output of <codeph>SHOW COLUMN STATS</codeph>, illustrating the distribution of values within each column, is the same for Kudu tables
|
||||
as for HDFS-backed tables.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
</concept>
|
||||
|
||||
@@ -436,6 +436,9 @@ insert into dates_and_times values
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/related_info"/>
|
||||
|
||||
<ul>
|
||||
|
||||
@@ -102,6 +102,9 @@ under the License.
|
||||
permission for all the files and directories that make up the table.
|
||||
</p>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_no_truncate_table"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
||||
|
||||
<p>
|
||||
|
||||
@@ -128,6 +128,9 @@ prefer to use an integer data type with sufficient range (<codeph>INT</codeph>,
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/column_stats_variable"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
||||
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/blobs_are_strings"/>
|
||||
|
||||
Reference in New Issue
Block a user