IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Reviewed-on: http://gerrit.cloudera.org:8080/17575
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Zoltan Borok-Nagy
2021-06-10 10:39:52 +02:00
committed by Impala Public Jenkins
parent 9d46255739
commit d0749d59de
20 changed files with 181 additions and 180 deletions

View File

@@ -3050,7 +3050,7 @@ functional
iceberg_int_partitioned
---- CREATE
CREATE TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name} (i INT, j INT, k INT)
PARTITION BY SPEC (i identity, j identity)
PARTITIONED BY SPEC (i, j)
STORED AS ICEBERG;
====
---- DATASET
@@ -3060,7 +3060,7 @@ iceberg_partition_transforms_zorder
---- CREATE
CREATE TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name}
(ts timestamp, s string, i int, j int)
PARTITION BY SPEC (ts year, s bucket 5)
PARTITIONED BY SPEC (year(ts), bucket(5, s))
SORT BY ZORDER (i, j)
STORED AS ICEBERG;
====