Files
impala/testdata/datasets/functional
Zoltan Borok-Nagy 296ed74d6f IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
This patch adds support to INSERT INTO identity-partitioned
Iceberg tables.

Identity-partitioned Iceberg tables are similar to regular
partitioned tables, they are even stored in the same directory
structure. The difference is that the data files still store
the partitioning columns.

The INSERT INTO syntax is similar to the syntax for non-partitioned
tables, i.e.:

INSERT INTO <iceberg_tbl> VALUES (<val1>, <val2>, <val3>, ...);
Or,
INSERT INTO <iceberg_tbl> SELECT <val1>, <val2>, ... FROM <source_tbl>
(please note that we don't use the PARTITION keyword)

The values must be in column order corresponding to the table schema.
Impala will automatically create/find the partitions based on the
Iceberg partition spec.

Partitioned Iceberg tables are stored as non-partitioned tables
in the Hive Metastore (similarly to partitioned Kudu tables). However,
the InsertStmt still generates the partition expressions for them.
These partition expressions are used to shuffle and sort the input
data so we don't end up writing too many files. The HdfsTableSink
also uses the partition expressions to write the data files with
the proper partition paths.

Iceberg is able to parse the partition paths to generate the
corresponding metadata for the partitions. This happens at the
end in IcebergCatalogOpExecutor.

Testing:
 * added planner test to verify shuffling and sorting
 * added negative tests for unsupported features like PARTITION clause
   and non-identity partition transforms
 * e2e tests with partitioned inserts

Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4
Reviewed-on: http://gerrit.cloudera.org:8080/16825
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-17 08:54:51 +00:00
..