diff --git a/docs/topics/impala_alter_table.xml b/docs/topics/impala_alter_table.xml index 506531950..950ebc6f2 100644 --- a/docs/topics/impala_alter_table.xml +++ b/docs/topics/impala_alter_table.xml @@ -80,7 +80,11 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize col_spec ::= col_name type_name -partition_spec ::= partition_col=constant_value +partition_spec ::= simple_partition_spec | complex_partition_spec | kudu_partition_spec + +simple_partition_spec ::= partition_col=constant_value + +complex_partition_spec ::= comparison_expression_on_partition_col table_properties ::= 'name'='value'[, 'name'='value' ...] @@ -124,6 +128,80 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize an external table, the underlying data directory is not renamed or moved.

+

+ Dropping or altering multiple partitions: +

+ +

+ In and higher, + the expression for the partition clause with a DROP or SET + operation can include comparison operators such as <, IN, + or BETWEEN, and Boolean operators such as AND + and OR. +

+ +

+ For example, you might drop a group of partitions corresponding to a particular date + range after the data ages out: +

+ + + + +

+ For tables with multiple partition keys columns, you can specify multiple + conditions separated by commas, and the operation only applies to the partitions + that match all the conditions (similar to using an AND clause): +

+ + + + +

+ This technique can also be used to change the file format of groups of partitions, + as part of an ETL pipeline that periodically consolidates and rewrites the underlying + data files in a different file format: +

+ + + + + +

+ The extended syntax involving comparison operators and multiple partitions + applies to the SET FILEFORMAT, SET TBLPROPERTIES, + SET SERDEPROPERTIES, and SET [UN]CACHED clauses. + You can also use this syntax with the PARTITION clause + in the COMPUTE INCREMENTAL STATS statement, and with the + PARTITION clause of the SHOW FILES statement. + Some forms of ALTER TABLE still only apply to one partition + at a time: the SET LOCATION and ADD PARTITION + clauses. The PARTITION clauses in the LOAD DATA + and INSERT statements also only apply to one partition at a time. +

+

+ A DDL statement that applies to multiple partitions is considered successful + (resulting in no changes) even if no partitions match the conditions. + The results are the same as if the IF EXISTS clause was specified. +

+

+ The performance and scalability of this technique is similar to + issuing a sequence of single-partition ALTER TABLE + statements in quick succession. To minimize bottlenecks due to + communication with the metastore database, or causing other + DDL operations on the same table to wait, test the effects of + performing ALTER TABLE statements that affect + large numbers of partitions. +

+
+

diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml index 8142da4f5..5a15c723b 100644 --- a/docs/topics/impala_compute_stats.xml +++ b/docs/topics/impala_compute_stats.xml @@ -52,7 +52,12 @@ under the License. COMPUTE STATS [db_name.]table_name COMPUTE INCREMENTAL STATS [db_name.]table_name [PARTITION (partition_spec)] -partition_spec ::= partition_col=constant_value + +partition_spec ::= simple_partition_spec | complex_partition_spec | kudu_partition_spec + +simple_partition_spec ::= partition_col=constant_value + +complex_partition_spec ::= comparison_expression_on_partition_col

@@ -108,6 +113,75 @@ COMPUTE INCREMENTAL STATS [db_name.]table_name +

+ Computing stats for groups of partitions: +

+ +

+ In and higher, you can run COMPUTE INCREMENTAL STATS + on multiple partitions, instead of the entire table or one partition at a time. You include + comparison operators other than = in the PARTITION clause, + and the COMPUTE INCREMENTAL STATS statement applies to all partitions that + match the comparison expression. +

+ +

+ For example, the INT_PARTITIONS table contains 4 partitions. + The following COMPUTE INCREMENTAL STATS statements affect some but not all + partitions, as indicated by the Updated n partition(s) + messages. The partitions that are affected depend on values in the partition key column X + that match the comparison expression in the PARTITION clause. +

+ + + +

diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml index 72c9b12a9..a910ef000 100644 --- a/docs/topics/impala_known_issues.xml +++ b/docs/topics/impala_known_issues.xml @@ -248,6 +248,24 @@ https://issues.cloudera.org/browse/IMPALA-2144 - Don't have + + Use Hive Metastore bulk API for dropping multiple partitions. + +

+ The bulk partition dropping and setting feature of IMPALA-1654 is not as efficient + as it could be, because it currently does not use the Hive Metastore bulk API. +

+

Bug: IMPALA-4106

+

Severity: High

+

Workaround: Schedule ALTER TABLE operations that touch + many partitions for times when the table is not undergoing any other DDL operations, + and be prepared for the table to be locked for some time while the ALTER TABLE + is in progress. Test the performance of large-scale partition operations in a development + environment before trying on tables in a production system. +

+ + + diff --git a/docs/topics/impala_show.xml b/docs/topics/impala_show.xml index c5a7740b5..84e9c0b47 100644 --- a/docs/topics/impala_show.xml +++ b/docs/topics/impala_show.xml @@ -49,7 +49,7 @@ SHOW TABLES [IN database_name] [[LIKE] 'patternSHOW TABLE STATS [database_name.]table_name SHOW COLUMN STATS [database_name.]table_name SHOW PARTITIONS [database_name.]table_name -SHOW FILES IN [database_name.]table_name [PARTITION (key_col=value [, key_col=value]] +SHOW FILES IN [database_name.]table_name [PARTITION (key_col_expression [, key_col_expression]] SHOW ROLES SHOW CURRENT ROLES @@ -113,6 +113,19 @@ show tables '*dim*|*fact*'; MB for megabytes, and GB for gigabytes.

+

+ In and higher, you can use general + expressions with operators such as <, IN, + LIKE, and BETWEEN in the PARTITION + clause, instead of only equality operators. For example: + + +

+ This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3). It does not apply to views.