impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2026-02-02 15:00:38 -05:00

Commit Graph

Author	SHA1	Message	Date
Zoltan Borok-Nagy	d0749d59de	IMPALA-10732: Use consistent DDL for specifying Iceberg partitions Currently we have a DDL syntax for defining Iceberg partitions that differs from SparkSQL: https://iceberg.apache.org/spark-ddl/#partitioned-by E.g. Impala is using the following syntax: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR) STORED AS ICEBERG; The same in Spark is: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) USING ICEBERG PARTITIONED BY (bucket(5, i), months(ts), years(d)) HIVE-25179 added the following syntax for Hive: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d)) STORED BY ICEBERG; I.e. the same syntax as Spark, but adding the keyword "SPEC". This patch makes Impala use Hive's syntax, i.e. we will also use the PARTITIONED BY SPEC clause + the unified partition transform syntax. Testing: * existing tests has been rewritten with the new syntax Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62 Reviewed-on: http://gerrit.cloudera.org:8080/17575 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-15 15:15:07 +00:00
Zoltan Borok-Nagy	a3f441193d	IMPALA-10223: Implement INSERT OVERWRITE for Iceberg tables This patch adds support for INSERT OVERWRITE statements for Iceberg tables. We use Iceberg's ReplacePartitions interface for this. This interface provides consistent behavior with INSERT OVERWRITEs against regular tables. It's also consistent with other engines dynamic overwrites, e.g. Spark. INSERT OVERWRITE for partitioned tables replaces the partitions affected by the INSERT, while keeping the other partitions untouched. INSERT OVERWRITE is prohibited for tables that use the BUCKET partition transform because it would randomly overwrite table data. Testing * added e2e test Change-Id: Idf4acfb54cf62a3f3b2e8db9d04044580151299c Reviewed-on: http://gerrit.cloudera.org:8080/17012 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-05 14:46:08 +00:00

Author

SHA1

Message

Date

Zoltan Borok-Nagy

d0749d59de

IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

Currently we have a DDL syntax for defining Iceberg partitions that
differs from SparkSQL:
https://iceberg.apache.org/spark-ddl/#partitioned-by

E.g. Impala is using the following syntax:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)
STORED AS ICEBERG;

The same in Spark is:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
USING ICEBERG
PARTITIONED BY (bucket(5, i), months(ts), years(d))

HIVE-25179 added the following syntax for Hive:

CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
PARTITIONED BY SPEC (bucket(5, i), months(ts), years(d))
STORED BY ICEBERG;

I.e. the same syntax as Spark, but adding the keyword "SPEC".

This patch makes Impala use Hive's syntax, i.e. we will also
use the PARTITIONED BY SPEC clause + the unified partition
transform syntax.

Testing:
 * existing tests has been rewritten with the new syntax

Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Reviewed-on: http://gerrit.cloudera.org:8080/17575
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2021-07-15 15:15:07 +00:00

Zoltan Borok-Nagy

a3f441193d

IMPALA-10223: Implement INSERT OVERWRITE for Iceberg tables

This patch adds support for INSERT OVERWRITE statements for
Iceberg tables. We use Iceberg's ReplacePartitions interface
for this. This interface provides consistent behavior with
INSERT OVERWRITEs against regular tables. It's also consistent
with other engines dynamic overwrites, e.g. Spark.

INSERT OVERWRITE for partitioned tables replaces the partitions
affected by the INSERT, while keeping the other partitions
untouched.

INSERT OVERWRITE is prohibited for tables that use the BUCKET
partition transform because it would randomly overwrite table
data.

Testing
 * added e2e test

Change-Id: Idf4acfb54cf62a3f3b2e8db9d04044580151299c
Reviewed-on: http://gerrit.cloudera.org:8080/17012
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2021-02-05 14:46:08 +00:00

2 Commits