[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Reviewed-on: http://gerrit.cloudera.org:8080/5649
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This commit is contained in:
John Russell
2017-01-09 14:17:23 -08:00
committed by Impala Public Jenkins
parent aee5457a55
commit 661921b205
29 changed files with 2590 additions and 352 deletions

View File

@@ -10285,6 +10285,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=p
<keydef keys="impala25"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef>
<keydef keys="impala24"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef>
<keydef keys="impala23"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef>
<keydef keys="impala223"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef>
<keydef keys="impala22"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef>
<keydef keys="impala21"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef>
<keydef keys="impala20"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef>
@@ -10298,6 +10299,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=p
<keydef keys="impala25_full"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef>
<keydef keys="impala24_full"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef>
<keydef keys="impala23_full"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef>
<keydef keys="impala223_full"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef>
<keydef keys="impala22_full"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef>
<keydef keys="impala21_full"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef>
<keydef keys="impala20_full"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef>

View File

@@ -3730,6 +3730,58 @@ sudo pip-python install ssl</codeblock>
NULL</codeph> attribute to that column.
</p>
<p id="kudu_metadata_intro" rev="kudu">
Much of the metadata for Kudu tables is handled by the underlying
storage layer. Kudu tables have less reliance on the metastore
database, and require less metadata caching on the Impala side.
For example, information about partitions in Kudu tables is managed
by Kudu, and Impala does not cache any block locality metadata
for Kudu tables.
</p>
<p id="kudu_metadata_details" rev="kudu">
The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
statements are needed less frequently for Kudu tables than for
HDFS-backed tables. Neither statement is needed when data is
added to, removed, or updated in a Kudu table, even if the changes
are made directly to Kudu through a client program using the Kudu API.
Run <codeph>REFRESH <varname>table_name</varname></codeph> or
<codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
for a Kudu table only after making a change to the Kudu table schema,
such as adding or dropping a column, by a mechanism other than
Impala.
</p>
<p id="kudu_internal_external_tables">
The distinction between internal and external tables has some special
details for Kudu tables. Tables created entirely through Impala are
internal tables. The table name as represented within Kudu includes
notation such as an <codeph>impala::</codeph> prefix and the Impala
database name. External Kudu tables are those created by a non-Impala
mechanism, such as a user application calling the Kudu APIs. For
these tables, the <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets
you establish a mapping from Impala to the existing Kudu table:
<codeblock>
CREATE EXTERNAL TABLE impala_name STORED AS KUDU
TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
</codeblock>
External Kudu tables differ in one important way from other external
tables: adding or dropping a column or range partition changes the
data in the underlying Kudu table, in contrast to an HDFS-backed
external table where existing data files are left untouched.
</p>
<p id="kudu_sentry_limitations" rev="IMPALA-4000">
Access to Kudu tables must be granted to and revoked from roles as usual.
Only users with <codeph>ALL</codeph> privileges on <codeph>SERVER</codeph> can create external Kudu tables.
Currently, access to a Kudu table is <q>all or nothing</q>:
enforced at the table level rather than the column level, and applying to all
SQL operations rather than individual statements such as <codeph>INSERT</codeph>.
Because non-SQL APIs can access Kudu data without going through Sentry
authorization, currently the Sentry support is considered preliminary
and subject to change.
</p>
</section>
</conbody>

View File

@@ -34,6 +34,7 @@ under the License.
<data name="Category" value="S3"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Kudu"/>
</metadata>
</prolog>
@@ -63,9 +64,11 @@ ALTER TABLE <varname>name</varname> REPLACE COLUMNS (<varname>col_spec</varname>
ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] PARTITION (<varname>partition_spec</varname>)
<ph rev="IMPALA-4390">[<varname>location_spec</varname>]</ph>
<ph rev="IMPALA-4390">[<varname>cache_spec</varname>]</ph>
<ph rev="kudu">ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] RANGE PARTITION (<varname>kudu_partition_spec</varname>)</ph>
ALTER TABLE <varname>name</varname> DROP [IF EXISTS] PARTITION (<varname>partition_spec</varname>)
<ph rev="2.3.0">[PURGE]</ph>
<ph rev="kudu">ALTER TABLE <varname>name</varname> DROP [IF EXISTS] RANGE PARTITION <varname>kudu_partition_spec</varname></ph>
<ph rev="2.3.0 IMPALA-1568 CDH-36799">ALTER TABLE <varname>name</varname> RECOVER PARTITIONS</ph>
@@ -86,12 +89,18 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize</ph>
<varname>col_spec</varname> ::= <varname>col_name</varname> <varname>type_name</varname>
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
<varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
<ph rev="IMPALA-1654"><varname>complex_partition_spec</varname> ::= <varname>comparison_expression_on_partition_col</varname></ph>
<ph rev="kudu"><varname>kudu_partition_spec</varname> ::= <varname>constant</varname> <varname>range_operator</varname> VALUES <varname>range_operator</varname> <varname>constant</varname> | VALUE = <varname>constant</varname></ph>
<ph rev="IMPALA-4390">cache_spec ::= CACHED IN '<varname>pool_name</varname>' [WITH REPLICATION = <varname>integer</varname>] | UNCACHED</ph>
<ph rev="IMPALA-4390">location_spec ::= LOCATION '<varname>hdfs_path_of_directory</varname>'</ph>
<varname>table_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...]
<varname>serde_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...]
@@ -896,6 +905,75 @@ alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</cod
require write and execute permissions for the associated partition directory.
</p>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="kudu IMPALA-2890">
Because of the extra constraints and features of Kudu tables, such as the <codeph>NOT NULL</codeph>
and <codeph>DEFAULT</codeph> attributes for columns, <codeph>ALTER TABLE</codeph> has specific
requirements related to Kudu tables:
<ul>
<li>
<p>
In an <codeph>ADD COLUMNS</codeph> operation, you can specify the <codeph>NULL</codeph>,
<codeph>NOT NULL</codeph>, and <codeph>DEFAULT <varname>default_value</varname></codeph>
column attributes.
</p>
</li>
<li>
<p>
If you add a column with a <codeph>NOT NULL</codeph> attribute, it must also have a
<codeph>DEFAULT</codeph> attribute, so the default value can be assigned to that
column for all existing rows.
</p>
</li>
<li>
<p>
The <codeph>DROP COLUMN</codeph> clause works the same for a Kudu table as for other
kinds of tables.
</p>
</li>
<li>
<p>
Although you can change the name of a column with the <codeph>CHANGE</codeph> clause,
you cannot change the type of a column in a Kudu table.
</p>
</li>
<li>
<p>
You cannot assign the <codeph>ENCODING</codeph>, <codeph>COMPRESSION</codeph>,
or <codeph>BLOCK_SIZE</codeph> attributes when adding a column.
</p>
</li>
<li>
<p>
You cannot change the default value, nullability, encoding, compression, or block size
of existing columns in a Kudu table.
</p>
</li>
<li>
<p>
You cannot use the <codeph>REPLACE COLUMNS</codeph> clause with a Kudu table.
</p>
</li>
<li>
<p>
The <codeph>RENAME TO</codeph> clause for a Kudu table only affects the name stored in the
metastore database that Impala uses to refer to the table. To change which underlying Kudu
table is associated with an Impala table name, you must change the <codeph>TBLPROPERTIES</codeph>
property of the table: <codeph>SET TBLPROPERTIES('kudu.table_name'='<varname>kudu_tbl_name</varname>)</codeph>.
Doing so causes Kudu to change the name of the underlying Kudu table.
</p>
</li>
</ul>
</p>
<p rev="kudu">
Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu
tables. You can use the <codeph>ALTER TABLE</codeph> statement to add and drop <term>range partitions</term>
from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated
rows from the table. See <xref href="impala_kudu.xml#kudu_partitioning"/> for details.
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -115,6 +115,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
<li/>
</ul>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

View File

@@ -161,6 +161,9 @@ SELECT claim FROM assertions WHERE really = TRUE;
<!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> -->
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
<!-- <p conref="../shared/impala_common.xml#common/related_info"/> -->
<p>

View File

@@ -243,6 +243,9 @@ select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c fr
+------------------------+----------------------------------+--------------------------------------------+
</codeblock>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
<p>

View File

@@ -52,8 +52,7 @@ under the License.
<codeblock rev="2.1.0">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname>
COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>partition_spec</varname>)]
<!-- Is kudu_partition_spec applicable here? -->
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
<varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
@@ -523,6 +522,17 @@ show table stats item_partitioned;
against the table.)
</p>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="IMPALA-2830">
The <codeph>COMPUTE STATS</codeph> statement applies to Kudu tables.
Impala does not compute the number of rows for each partition for
Kudu tables. Therefore, you do not need to re-run the operation when
you see -1 in the <codeph># Rows</codeph> column of the output from
<codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
all Kudu tables.
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

File diff suppressed because it is too large Load Diff

View File

@@ -822,6 +822,9 @@ SELECT CAST(1000.5 AS DECIMAL);
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -697,6 +697,91 @@ Returned 27 row(s) in 0.17s</codeblock>
in an arbitrary HDFS directory based on its <codeph>LOCATION</codeph> attribute.)
</p>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="kudu">
The information displayed for Kudu tables includes the additional attributes
that are only applicable for Kudu tables:
</p>
<ul rev="kudu">
<li>
Whether or not the column is part of the primary key. Every Kudu table
has a <codeph>true</codeph> value here for at least one column. There
could be multiple <codeph>true</codeph> values, for tables with
composite primary keys.
</li>
<li>
Whether or not the column is nullable. Specified by the <codeph>NULL</codeph>
or <codeph>NOT NULL</codeph> attributes on the <codeph>CREATE TABLE</codeph> statement.
Columns that are part of the primary key are automatically non-nullable.
</li>
<li>
The default value, if any, for the column. Specified by the <codeph>DEFAULT</codeph>
attribute on the <codeph>CREATE TABLE</codeph> statement. If the default value is
<codeph>NULL</codeph>, that is not indicated in this column. It is implied by
<codeph>nullable</codeph> being true and no other default value specified.
</li>
<li>
The encoding used for values in the column. Specified by the <codeph>ENCODING</codeph>
attribute on the <codeph>CREATE TABLE</codeph> statement.
</li>
<li>
The compression used for values in the column. Specified by the <codeph>COMPRESSION</codeph>
attribute on the <codeph>CREATE TABLE</codeph> statement.
</li>
<li>
The block size (in bytes) used for the underlying Kudu storage layer for the column.
Specified by the <codeph>BLOCK_SIZE</codeph> attribute on the <codeph>CREATE TABLE</codeph>
statement.
</li>
</ul>
<p rev="kudu">
The following example shows <codeph>DESCRIBE</codeph> output for a simple Kudu table, with
a single-column primary key and all column attributes left with their default values:
</p>
<codeblock rev="kudu">
describe million_rows;
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| id | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| s | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
</codeblock>
<p rev="kudu">
The following example shows <codeph>DESCRIBE</codeph> output for a Kudu table with a
two-column primary key, and Kudu-specific attributes applied to some columns:
</p>
<codeblock rev="kudu">
create table kudu_describe_example
(
c1 int, c2 int,
c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
primary key(c1,c2)
)
partition by hash (c1, c2) partitions 10 stored as kudu;
describe kudu_describe_example;
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
| c1 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c2 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c3 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c4 | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c5 | string | | false | true | n/a | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c6 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c7 | bigint | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c8 | bigint | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
| c9 | bigint | | false | true | -1 | BIT_SHUFFLE | DEFAULT_COMPRESSION | 0 |
+------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
</codeblock>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -108,6 +108,9 @@ SELECT CAST(1000.5 AS DOUBLE);
<p conref="../shared/impala_common.xml#common/float_double_decimal_caveat"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -155,6 +155,15 @@ drop table temporary.trivial;</codeblock>
no particular permissions are needed for the associated HDFS files or directories.
</p>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="kudu">
Kudu tables can be managed or external, the same as with HDFS-based
tables. For a managed table, the underlying Kudu table and its data
are removed by <codeph>DROP TABLE</codeph>. For an external table,
the underlying Kudu table and its data remain after a
<codeph>DROP TABLE</codeph>.
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -234,6 +234,42 @@ EXPLAIN_LEVEL set to extended
if the source table is partitioned.)
</p>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p>
The <codeph>EXPLAIN</codeph> statement displays equivalent plan
information for queries against Kudu tables as for queries
against HDFS-based tables.
</p>
<p>
To see which predicates Impala can <q>push down</q> to Kudu for
efficient evaluation, without transmitting unnecessary rows back
to Impala, look for the <codeph>kudu predicates</codeph> item in
the scan phase of the query. The label <codeph>kudu predicates</codeph>
indicates a condition that can be evaluated efficiently on the Kudu
side. The label <codeph>predicates</codeph> in a <codeph>SCAN KUDU</codeph>
node indicates a condition that is evaluated by Impala.
For example, in a table with primary key column <codeph>X</codeph>
and non-primary key column <codeph>Y</codeph>, you can see that
some operators in the <codeph>WHERE</codeph> clause are evaluated
immediately by Kudu and others are evaluated later by Impala:
<codeblock>
EXPLAIN SELECT x,y from kudu_table WHERE
x = 1 AND x NOT IN (2,3) AND y = 1
AND x IS NOT NULL AND x > 0;
+----------------
| Explain String
+----------------
...
| 00:SCAN KUDU [jrussell.hash_only]
| predicates: x IS NOT NULL, x NOT IN (2, 3)
| kudu predicates: x = 1, x > 0, y = 1
</codeblock>
Only binary predicates and <codeph>IN</codeph> predicates containing
literal values that exactly match the types in the Kudu table, and do not
require any casting, can be pushed to Kudu.
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_select.xml#select"/>,

View File

@@ -102,6 +102,9 @@ SELECT CAST(1000.5 AS FLOAT);
<p conref="../shared/impala_common.xml#common/float_double_decimal_caveat"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -129,6 +129,9 @@ object_type ::= TABLE | DATABASE | SERVER | URI
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -241,6 +241,11 @@ ERROR: AnalysisException: Database does not exist: new_db_from_hive
<p conref="../shared/impala_common.xml#common/s3_metadata"/>
<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_hadoop.xml#intro_metastore"/>,

File diff suppressed because it is too large Load Diff

View File

@@ -397,6 +397,24 @@ insert into t1 partition(x=NULL, y) select c1, c3 from some_other_table;</codeb
<codeph>nullifzero()</codeph>, and <codeph>zeroifnull()</codeph>. See
<xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
</p>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="kudu">
Columns in Kudu tables have an attribute that specifies whether or not they can contain
<codeph>NULL</codeph> values. A column with a <codeph>NULL</codeph> attribute can contain
nulls. A column with a <codeph>NOT NULL</codeph> attribute cannot contain any nulls, and
an <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>UPSERT</codeph> statement
will skip any row that attempts to store a null in a column designated as <codeph>NOT NULL</codeph>.
Kudu tables default to the <codeph>NULL</codeph> setting for each column, except columns that
are part of the primary key.
</p>
<p rev="kudu">
In addition to columns with the <codeph>NOT NULL</codeph> attribute, Kudu tables also have
restrictions on <codeph>NULL</codeph> values in columns that are part of the primary key for
a table. No column that is part of the primary key in a Kudu table can contain any
<codeph>NULL</codeph> values.
</p>
</conbody>
</concept>
</concept>

View File

@@ -85,6 +85,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
<li/>
</ul>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

View File

@@ -575,7 +575,7 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
</concept>
<concept rev="kudu" id="partition_kudu" audience="hidden">
<concept rev="kudu 2.8.0" id="partition_kudu">
<title>Using Partitioning with Kudu Tables</title>
@@ -593,6 +593,12 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
columns.
</p>
<p>
See <xref href="impala_kudu.xml#kudu_partitioning"/> for
details and examples of the partitioning techniques
for Kudu tables.
</p>
</conbody>
</concept>

View File

@@ -333,6 +333,11 @@ ERROR: AnalysisException: Items in partition spec must exactly match the partiti
<p conref="../shared/impala_common.xml#common/s3_metadata"/>
<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_hadoop.xml#intro_metastore"/>,

View File

@@ -82,7 +82,9 @@ avro
between
bigint
<ph rev="1.4.0">binary</ph>
<ph rev="kudu">blocksize</ph>
boolean
<!-- <ph rev="kudu">buckets</ph> -->
by
<ph rev="1.4.0">cached</ph>
<ph rev="2.3.0">cascade</ph>
@@ -95,6 +97,7 @@ change
column
columns
comment
<ph rev="kudu">compression</ph>
compute
create
cross
@@ -105,15 +108,18 @@ databases
date
datetime
decimal
<ph rev="2.6.0">delete</ph>
<ph rev="kudu">default</ph>
<ph rev="kudu">delete</ph>
delimited
desc
describe
distinct
<!-- <ph rev="kudu">distribute</ph> -->
div
double
drop
else
<ph rev="kudu">encoding</ph>
end
escaped
exists
@@ -136,10 +142,10 @@ function
functions
<ph rev="2.1.0">grant</ph>
group
<ph rev="2.6.0">hash</ph>
<ph rev="kudu">hash</ph>
having
if
<ph rev="2.6.0">ignore</ph>
<!-- <ph rev="kudu">ignore</ph> -->
<ph rev="2.5.0">ilike</ph>
in
<ph rev="2.1.0">incremental</ph>
@@ -210,6 +216,7 @@ serdeproperties
set
show
smallint
<!-- <ph rev="kudu">split</ph> -->
stats
stored
straight_join
@@ -229,8 +236,9 @@ true
<ph rev="2.0.0">unbounded</ph>
<ph rev="1.4.0">uncached</ph>
union
<ph rev="2.6.0">update</ph>
<ph rev="kudu">update</ph>
<ph rev="1.2.1">update_fn</ph>
<ph rev="kudu">upsert</ph>
use
using
values

View File

@@ -108,6 +108,9 @@ object_type ::= TABLE | DATABASE | SERVER | URI
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
<p rev="2.8.0" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>

View File

@@ -28,6 +28,7 @@ under the License.
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Reports"/>
<data name="Category" value="Kudu"/>
</metadata>
</prolog>
@@ -49,7 +50,8 @@ SHOW TABLES [IN <varname>database_name</varname>] [[LIKE] '<varname>pattern</var
<ph rev="1.2.1">SHOW TABLE STATS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
<ph rev="1.2.1">SHOW COLUMN STATS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
<ph rev="1.4.0">SHOW PARTITIONS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
SHOW FILES IN [<varname>database_name</varname>.]<varname>table_name</varname> <ph rev="IMPALA-1654">[PARTITION (<varname>key_col_expression</varname> [, <varname>key_col_expression</varname>]</ph>]
<ph rev="1.4.0">SHOW <ph rev="kudu">[RANGE]</ph> PARTITIONS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
SHOW FILES IN [<varname>database_name</varname>.]<varname>table_name</varname> <ph rev="IMPALA-1654">[PARTITION (<varname>key_col_expression</varname> [, <varname>key_col_expression</varname>]</ph>]
<ph rev="2.0.0">SHOW ROLES
SHOW CURRENT ROLES
@@ -129,7 +131,8 @@ show files in sample_table partition (month like 'J%');
<note>
This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3).
It does not apply to views.
It does not apply to tables mapped onto HBase, because HBase does not use the same file-based storage layout.
It does not apply to tables mapped onto HBase <ph rev="kudu">or Kudu</ph>,
because those data management systems do not use the same file-based storage layout.
</note>
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
@@ -742,6 +745,61 @@ show tables like '*dim*|t*';
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
<p rev="kudu">
For Kudu tables:
</p>
<ul rev="kudu">
<li>
<p>
The column specifications include attributes such as <codeph>NULL</codeph>,
<codeph>NOT NULL</codeph>, <codeph>ENCODING</codeph>, and <codeph>COMPRESSION</codeph>.
If you do not specify those attributes in the original <codeph>CREATE TABLE</codeph> statement,
the <codeph>SHOW CREATE TABLE</codeph> output displays the defaults that were used.
</p>
</li>
<li>
<p>
The specifications of any <codeph>RANGE</codeph> clauses are not displayed in full.
To see the definition of the range clauses for a Kudu table, use the <codeph>SHOW RANGE PARTITIONS</codeph> statement.
</p>
</li>
<li>
<p>
The <codeph>TBLPROPERTIES</codeph> output reflects the Kudu master address
and the internal Kudu name associated with the Impala table.
</p>
</li>
</ul>
<codeblock rev="kudu">
show CREATE TABLE numeric_grades_default_letter;
+------------------------------------------------------------------------------------------------+
| result |
+------------------------------------------------------------------------------------------------+
| CREATE TABLE user.numeric_grades_default_letter ( |
| score TINYINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
| letter_grade STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION DEFAULT '-', |
| student STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
| PRIMARY KEY (score) |
| ) |
| PARTITION BY <b>RANGE (score) (...)</b> |
| STORED AS KUDU |
| TBLPROPERTIES ('kudu.master_addresses'='vd0342.example.com:7051', |
| 'kudu.table_name'='impala::USER.numeric_grades_default_letter') |
+------------------------------------------------------------------------------------------------+
show range partitions numeric_grades_default_letter;
+--------------------+
| RANGE (score) |
+--------------------+
| 0 &lt;= VALUES &lt; 50 |
| 50 &lt;= VALUES &lt; 65 |
| 65 &lt;= VALUES &lt; 80 |
| 80 &lt;= VALUES &lt; 100 |
+--------------------+
</codeblock>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
@@ -855,6 +913,39 @@ show create table show_create_table_demo;
<p conref="../shared/impala_common.xml#common/show_security"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="kudu IMPALA-2830">
Because Kudu tables do not have characteristics derived from HDFS, such
as number of files, file format, and HDFS cache status, the output of
<codeph>SHOW TABLE STATS</codeph> reflects different characteristics
that apply to Kudu tables. If the Kudu table is created with the
clause <codeph>PARTITIONS 20</codeph>, then the result set of
<codeph>SHOW TABLE STATS</codeph> consists of 20 rows, each representing
one of the numbered partitions. For example:
</p>
<codeblock rev="kudu IMPALA-2830">
show table stats kudu_table;
+--------+-----------+----------+-----------------------+------------+
| # Rows | Start Key | Stop Key | Leader Replica | # Replicas |
+--------+-----------+----------+-----------------------+------------+
| -1 | | 00000001 | host.example.com:7050 | 3 |
| -1 | 00000001 | 00000002 | host.example.com:7050 | 3 |
| -1 | 00000002 | 00000003 | host.example.com:7050 | 3 |
| -1 | 00000003 | 00000004 | host.example.com:7050 | 3 |
| -1 | 00000004 | 00000005 | host.example.com:7050 | 3 |
...
</codeblock>
<p rev="IMPALA-2830">
Impala does not compute the number of rows for each partition for
Kudu tables. Therefore, you do not need to re-run <codeph>COMPUTE STATS</codeph>
when you see -1 in the <codeph># Rows</codeph> column of the output from
<codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
all Kudu tables.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
@@ -959,6 +1050,14 @@ show table stats store_sales;
<p conref="../shared/impala_common.xml#common/show_security"/>
<p rev="kudu IMPALA-2830">
The output for <codeph>SHOW COLUMN STATS</codeph> includes
the relevant information for Kudu tables.
The information for column statistics that originates in the
underlying Kudu storage layer is also represented in the
metastore database that Impala uses.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
@@ -1145,8 +1244,31 @@ show column stats store_sales;
<p conref="../shared/impala_common.xml#common/show_security"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p rev="kudu IMPALA-4403">
The optional <codeph>RANGE</codeph> clause only applies to Kudu tables. It displays only the partitions
defined by the <codeph>RANGE</codeph> clause of <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph>.
</p>
<p rev="kudu IMPALA-4403">
Although you can specify <codeph>&lt;</codeph> or
<codeph>&lt;=</codeph> comparison operators when defining
range partitions for Kudu tables, Kudu rewrites them if necessary
to represent each range as
<codeph><varname>low_bound</varname> &lt;= VALUES &lt; <varname>high_bound</varname></codeph>.
This rewriting might involve incrementing one of the boundary values
or appending a <codeph>\0</codeph> for string values, so that the
partition covers the same range as originally specified.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
The following example shows the output for a Parquet, text, or other
HDFS-backed table partitioned on the <codeph>YEAR</codeph> column:
</p>
<codeblock rev="1.4.0">[localhost:21000] &gt; show partitions census;
+-------+-------+--------+------+---------+
| year | #Rows | #Files | Size | Format |
@@ -1160,6 +1282,53 @@ show column stats store_sales;
| 2013 | 1 | 1 | 231B | PARQUET |
| Total | 9 | 3 | 275B | |
+-------+-------+--------+------+---------+
</codeblock>
<p rev="kudu IMPALA-4403">
The following example shows the output for a Kudu table
using the hash partitioning mechanism. The number of
rows in the result set corresponds to the values used
in the <codeph>PARTITIONS <varname>N</varname></codeph>
clause of <codeph>CREATE TABLE</codeph>.
</p>
<codeblock rev="kudu IMPALA-4403"><![CDATA[
show partitions million_rows_hash;
+--------+-----------+----------+-----------------------+--
| # Rows | Start Key | Stop Key | Leader Replica | # Replicas
+--------+-----------+----------+-----------------------+--
| -1 | | 00000001 | n236.example.com:7050 | 3
| -1 | 00000001 | 00000002 | n236.example.com:7050 | 3
| -1 | 00000002 | 00000003 | n336.example.com:7050 | 3
| -1 | 00000003 | 00000004 | n238.example.com:7050 | 3
| -1 | 00000004 | 00000005 | n338.example.com:7050 | 3
....
| -1 | 0000002E | 0000002F | n240.example.com:7050 | 3
| -1 | 0000002F | 00000030 | n336.example.com:7050 | 3
| -1 | 00000030 | 00000031 | n240.example.com:7050 | 3
| -1 | 00000031 | | n334.example.com:7050 | 3
+--------+-----------+----------+-----------------------+--
Fetched 50 row(s) in 0.05s
]]>
</codeblock>
<p rev="kudu IMPALA-4403">
The following example shows the output for a Kudu table
using the range partitioning mechanism:
</p>
<codeblock rev="kudu IMPALA-4403"><![CDATA[
show range partitions million_rows_range;
+-----------------------+
| RANGE (id) |
+-----------------------+
| VALUES < "A" |
| "A" <= VALUES < "[" |
| "a" <= VALUES < "{" |
| "{" <= VALUES < "~\0" |
+-----------------------+
]]>
</codeblock>
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>

View File

@@ -112,6 +112,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
<li/>
</ul>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

View File

@@ -73,14 +73,16 @@ under the License.
</ul>
<p rev="2.2.0">
Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (CDH 5.4.0 or higher),
or on Isilon storage devices (CDH 5.4.3 or higher). See <xref href="impala_hbase.xml#impala_hbase"/>,
Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<keyword keyref="impala22_full"/> or higher),
or on Isilon storage devices (<keyword keyref="impala223_full"/> or higher). See <xref href="impala_hbase.xml#impala_hbase"/>,
<xref href="impala_s3.xml#s3"/>, and <xref href="impala_isilon.xml#impala_isilon"/>
for details about those special kinds of tables.
</p>
<p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
<p outputclass="toc inpage"/>
<p>
<b>Related statements:</b> <xref href="impala_create_table.xml#create_table"/>,
<xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>
@@ -241,6 +243,7 @@ under the License.
<concept id="table_file_formats">
<title>File Formats</title>
<conbody>
<p>
Each table has an associated file format, which determines how Impala interprets the
@@ -273,4 +276,142 @@ under the License.
</conbody>
</concept>
<concept rev="kudu" id="kudu_tables">
<title>Kudu Tables</title>
<prolog>
<metadata>
<data name="Category" value="Kudu"/>
</metadata>
</prolog>
<conbody>
<p>
Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files.
Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
managed internally by Kudu.
</p>
<p>
When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form
<codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>. You can see the Kudu-assigned name
in the output of <codeph>DESCRIBE FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of the table properties.
The Kudu-assigned name remains the same even if you use <codeph>ALTER TABLE</codeph> to rename the Impala table
or move it to a different Impala database. If you issue the statement
<codeph>ALTER TABLE <varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' = '<varname>different_kudu_table_name</varname>')</codeph>,
the effect is different depending on whether the Impala table was created with a regular <codeph>CREATE TABLE</codeph>
statement (that is, if it is an internal or managed table), or if it was created with a
<codeph>CREATE EXTERNAL TABLE</codeph> statement (and therefore is an external table). Changing the <codeph>kudu.table_name</codeph>
property of an internal table physically renames the underlying Kudu table to match the new name.
Changing the <codeph>kudu.table_name</codeph> property of an external table switches which underlying Kudu table
the Impala table refers to; the underlying Kudu table must already exist.
</p>
<p>
The following example shows what happens with both internal and external Kudu tables as the <codeph>kudu.table_name</codeph>
property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created
outside of Impala, that is, through the Kudu API.
</p>
<codeblock>
-- This is an internal table that we will create and then rename.
create table old_name (id bigint primary key, s string)
partition by hash(id) partitions 2 stored as kudu;
-- Initially, the name OLD_NAME is the same on the Impala and Kudu sides.
describe formatted old_name;
...
| Location: | hdfs://host.example.com:8020/path/user.db/old_name
| Table Type: | MANAGED_TABLE | NULL
| Table Parameters: | NULL | NULL
| | DO_NOT_UPDATE_STATS | true
| | kudu.master_addresses | vd0342.halxg.cloudera.com
| | kudu.table_name | impala::user.old_name
-- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name.
alter table old_name rename to new_name;
describe formatted new_name;
| Location: | hdfs://host.example.com:8020/path/user.db/new_name
| Table Type: | MANAGED_TABLE | NULL
| Table Parameters: | NULL | NULL
| | DO_NOT_UPDATE_STATS | true
| | kudu.master_addresses | vd0342.halxg.cloudera.com
| | kudu.table_name | impala::user.old_name
-- Setting TBLPROPERTIES changes the underlying Kudu name.
alter table new_name
set tblproperties('kudu.table_name' = 'impala::user.new_name');
describe formatted new_name;
| Location: | hdfs://host.example.com:8020/path/user.db/new_name
| Table Type: | MANAGED_TABLE | NULL
| Table Parameters: | NULL | NULL
| | DO_NOT_UPDATE_STATS | true
| | kudu.master_addresses | vd0342.halxg.cloudera.com
| | kudu.table_name | impala::user.new_name
-- Put some data in the table to demonstrate how external tables can map to
-- different underlying Kudu tables.
insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two');
-- This external table points to the same underlying Kudu table, NEW_NAME,
-- as we created above. No need to declare columns or other table aspects.
create external table kudu_table_alias stored as kudu
tblproperties('kudu.table_name' = 'impala::user.new_name');
-- The external table can fetch data from the NEW_NAME table that already
-- existed and already had data.
select * from kudu_table_alias limit 100;
+----+------+
| id | s |
+----+------+
| 1 | one |
| 0 | zero |
| 2 | two |
+----+------+
-- We cannot re-point the external table at a different underlying Kudu table
-- unless that other underlying Kudu table already exists.
alter table kudu_table_alias
set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
ERROR:
TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name',
Kudu error: The table does not exist: table_name: "impala::user.yet_another_name"
-- Once the underlying Kudu table exists, we can re-point the external table to it.
create table yet_another_name (id bigint primary key, x int, y int, s string)
partition by hash(id) partitions 2 stored as kudu;
alter table kudu_table_alias
set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
-- Now no data is returned because this other table is empty.
select * from kudu_table_alias limit 100;
-- The Impala table automatically recognizes the table schema of the new table,
-- for example the extra X and Y columns not present in the original table.
describe kudu_table_alias;
+------+--------+---------+-------------+----------+...
| name | type | comment | primary_key | nullable |...
+------+--------+---------+-------------+----------+...
| id | bigint | | true | false |...
| x | int | | false | true |...
| y | int | | false | true |...
| s | string | | false | true |...
+------+--------+---------+-------------+----------+...
</codeblock>
<p>
The <codeph>SHOW TABLE STATS</codeph> output for a Kudu table shows Kudu-specific details about the layout of the table.
Instead of information about the number and sizes of files, the information is divided by the Kudu tablets.
For each tablet, the output includes the fields
<codeph># Rows</codeph> (although this number is not currently computed), <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
The output of <codeph>SHOW COLUMN STATS</codeph>, illustrating the distribution of values within each column, is the same for Kudu tables
as for HDFS-backed tables.
</p>
<p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
</conbody>
</concept>
</concept>

View File

@@ -436,6 +436,9 @@ insert into dates_and_times values
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<ul>

View File

@@ -102,6 +102,9 @@ under the License.
permission for all the files and directories that make up the table.
</p>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_no_truncate_table"/>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>

View File

@@ -128,6 +128,9 @@ prefer to use an integer data type with sufficient range (<codeph>INT</codeph>,
<p conref="../shared/impala_common.xml#common/column_stats_variable"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
<p conref="../shared/impala_common.xml#common/blobs_are_strings"/>