IMPALA-14081: Support create/drop paimon table for impala

This patch mainly implement the creation/drop of paimon table
through impala.

Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE

Syntax for creating paimon table:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];

Two types of paimon catalogs are supported.

(1) Create table with hive catalog:

CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;

(2) Create table with hadoop catalog:

CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');

SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.

TODO:
    - Patches pending submission:
        - Query support for paimon data files.
        - Partition pruning and predicate push down.
        - Query support with time travel.
        - Query support for paimon meta tables.
    - WIP:
        - Complex type query support.
        - Virtual Column query support for querying
          paimon data table.
        - Native paimon table scanner, instead of
          jni based.
Testing:
    - Add unit test for paimon impala type conversion.
    - Add unit test for ToSqlTest.java.
    - Add unit test for AnalyzeDDLTest.java.
    - Update default_file_format TestEnumCase in
      be/src/service/query-options-test.cc.
    - Update test case in
      testdata/workloads/functional-query/queries/QueryTest/set.test.
    - Add test cases in metadata/test_show_create_table.py.
    - Add custom test test_paimon.py.

Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This commit is contained in:
jichen0919
2025-05-20 15:58:06 +08:00
committed by Riza Suminto
parent a41c5cbfdd
commit 826c8cf9b0
100 changed files with 4159 additions and 37 deletions

View File

@@ -296,7 +296,7 @@ export IMPALA_DBCP2_VERSION=2.9.0
export IMPALA_DROPWIZARD_METRICS_VERSION=4.2.26
export IMPALA_AIRCOMPRESSOR_VERSION=0.27
export IMPALA_DATASKETCHES_VERSION=6.0.0
export IMPALA_PAIMON_VERSION=1.1.1
# When Impala is building docker images on Redhat-based distributions,
# it is useful to be able to customize the base image. Some users will
# want to use open source / free distributions like Centos/Rocky/Alma/etc.
@@ -1242,6 +1242,7 @@ echo "IMPALA_HUDI_VERSION = $IMPALA_HUDI_VERSION"
echo "IMPALA_KUDU_VERSION = $IMPALA_KUDU_VERSION"
echo "IMPALA_RANGER_VERSION = $IMPALA_RANGER_VERSION"
echo "IMPALA_ICEBERG_VERSION = $IMPALA_ICEBERG_VERSION"
echo "IMPALA_PAIMON_VERSION = $IMPALA_PAIMON_VERSION"
echo "IMPALA_COS_VERSION = $IMPALA_COS_VERSION"
echo "IMPALA_OBS_VERSION = $IMPALA_OBS_VERSION"
echo "IMPALA_SYSTEM_PYTHON2 = $IMPALA_SYSTEM_PYTHON2"

View File

@@ -170,6 +170,7 @@ testdata/data/widerow.txt
testdata/data/local_tbl/00000.txt
testdata/data/hudi_parquet/*
testdata/data/iceberg_test/*
testdata/data/paimon_test/*
testdata/data/json_test/*
testdata/data/sfs_d2.txt
testdata/data/sfs_d4.txt