Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by
E.g. Impala is using the following syntax:
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;
The same in Spark is:
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))
HIVE-25179 added the following syntax for Hive:
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;
I.e. the same syntax as Spark, but adding the keyword "SPEC".
This patch makes Impala use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.
Testing:
* existing tests has been rewritten with the new syntax
Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Reviewed-on: http://gerrit.cloudera.org:8080/17575
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for INSERT OVERWRITE statements for
Iceberg tables. We use Iceberg's ReplacePartitions interface
for this. This interface provides consistent behavior with
INSERT OVERWRITEs against regular tables. It's also consistent
with other engines dynamic overwrites, e.g. Spark.
INSERT OVERWRITE for partitioned tables replaces the partitions
affected by the INSERT, while keeping the other partitions
untouched.
INSERT OVERWRITE is prohibited for tables that use the BUCKET
partition transform because it would randomly overwrite table
data.
Testing
* added e2e test
Change-Id: Idf4acfb54cf62a3f3b2e8db9d04044580151299c
Reviewed-on: http://gerrit.cloudera.org:8080/17012
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>