Files
impala/bin/rat_exclude_files.txt
jichen0919 826c8cf9b0 IMPALA-14081: Support create/drop paimon table for impala
This patch mainly implement the creation/drop of paimon table
through impala.

Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE

Syntax for creating paimon table:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];

Two types of paimon catalogs are supported.

(1) Create table with hive catalog:

CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;

(2) Create table with hadoop catalog:

CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');

SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.

TODO:
    - Patches pending submission:
        - Query support for paimon data files.
        - Partition pruning and predicate push down.
        - Query support with time travel.
        - Query support for paimon meta tables.
    - WIP:
        - Complex type query support.
        - Virtual Column query support for querying
          paimon data table.
        - Native paimon table scanner, instead of
          jni based.
Testing:
    - Add unit test for paimon impala type conversion.
    - Add unit test for ToSqlTest.java.
    - Add unit test for AnalyzeDDLTest.java.
    - Update default_file_format TestEnumCase in
      be/src/service/query-options-test.cc.
    - Update test case in
      testdata/workloads/functional-query/queries/QueryTest/set.test.
    - Add test cases in metadata/test_show_create_table.py.
    - Add custom test test_paimon.py.

Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-09-10 21:24:49 +00:00

274 lines
8.9 KiB
Plaintext

# These are the globs that RAT ignores when doing a copyright
# audit. Comments start with "# ".
# http://www.apache.org/legal/src-headers.html: "A file without any degree of creativity
# in either its literal elements or its structure is not protected by copyright law;
# therefore, such a file does not require a license header."
.clang-format
.devcontainer/*
.gitattributes
.gitignore
*/.gitignore
*/rat_exclude_files.txt
be/src/testutil/htpasswd
be/src/testutil/*.key
bin/generate_junitxml.py
bin/junitxml_prune_notrun.py
tests/*/__init__.py
testdata/common/__init__.py
fe/src/test/resources/regionservers
shell/impala_shell/__init__.py
ssh_keys/id_rsa_impala
testdata/__init__.py
tests/__init__.py
bin/diagnostics/__init__.py
lib/python/impala_py_lib/__init__.py
lib/python/impala_py_lib/jenkins/__init__.py
shell/MANIFEST.in
shell/requirements.txt
testdata/cluster/node_templates/cdh7/etc/init.d/kms
testdata/authentication/*
bin/banned_py3k_warnings.txt
# See $IMPALA_HOME/LICENSE.txt
be/src/gutil/*
be/src/thirdparty/datasketches/*
be/src/thirdparty/llvm/*
be/src/thirdparty/murmurhash/*
be/src/thirdparty/mpfit/*
be/src/thirdparty/fast_double_parser/*
be/src/kudu/gutil
www/bootstrap/css/bootstrap*
www/bootstrap/js/bootstrap*
www/c3/*
www/Chart*
www/d3.v3.min.js
www/d3.v5.min.js
www/datatables/*
www/favicon.ico
www/highlight/*
www/icons/*
www/jquery/jquery-3.5.1.min.js
www/pako.min.js
tests/comparison/leopard/static/css/bootstrap*
tests/comparison/leopard/static/fonts/glyphicons-halflings*
tests/comparison/leopard/static/js/bootstrap*
tests/comparison/leopard/static/css/hljs.css
tests/comparison/leopard/static/js/highlight.pack.js
common/protobuf/kudu
be/src/kudu/util/array_view.h
be/src/kudu/util/cache-test.cc
be/src/kudu/util/cache.cc
be/src/kudu/util/cache.h
be/src/kudu/util/cloud/instance_detector-test.cc
be/src/kudu/util/coding.cc
be/src/kudu/util/coding.h
be/src/kudu/util/condition_variable.cc
be/src/kudu/util/condition_variable.h
be/src/kudu/util/debug/trace_event.h
be/src/kudu/util/debug/trace_event_impl.cc
be/src/kudu/util/debug/trace_event_impl.h
be/src/kudu/util/debug/trace_event_impl_constants.cc
be/src/kudu/util/debug/trace_event_synthetic_delay.cc
be/src/kudu/util/debug/trace_event_synthetic_delay.h
be/src/kudu/util/env.cc
be/src/kudu/util/env.h
be/src/kudu/util/env_posix.cc
be/src/kudu/util/nvm_cache.cc
be/src/kudu/util/random.h
be/src/kudu/util/slice.h
be/src/kudu/util/status-test.cc
be/src/kudu/util/status.cc
be/src/kudu/util/status.h
be/src/kudu/security/x509_check_host.h
be/src/kudu/security/x509_check_host.cc
be/src/util/cache/cache.h
be/src/util/cache/cache.cc
be/src/util/cache/cache-internal.h
be/src/util/cache/cache-test.h
be/src/util/cache/cache-test.cc
be/src/util/cache/rl-cache.cc
be/src/util/cache/rl-cache-test.cc
docs/css/dita-ot-doc.css
docs/shared/header.xml
# http://www.apache.org/legal/src-headers.html: "Short informational text files; for
# example README, INSTALL files. The expectation is that these files make it obvious which
# product they relate to."
be/src/testutil/certificates-info.txt
bin/README-RUNNING-BENCHMARKS
LOGS.md
README*.md
*/README
*/README.dox
*/README.txt
testdata/bin/README-BENCHMARK-TEST-GENERATION
testdata/bin/minicluster_lakekeeper/README.md
testdata/bin/otel-collector/README.md
testdata/scale_test_metadata/README.md
tests/comparison/ORACLE.txt
bin/distcc/README.md
tests/comparison/POSTGRES.txt
docs/README.md
docker/README.md
be/src/thirdparty/pcg-cpp-0.98/README.md
lib/python/README.md
lib/python/impala_py_lib/gdb/README.md
shell/README.md
bin/kerberos/README-kerberos.md
# http://www.apache.org/legal/src-headers.html: "Test data for which the addition of a
# source header would cause the tests to fail."
testdata/*.csv
testdata/*.sql
testdata/*.test
be/src/kudu/util/testdata/*.txt
be/src/testutil/*.pem
*.json
fe/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
fe/src/test/resources/*.xml
fe/src/test/resources/adschema.ldif
fe/src/test/resources/adusers.ldif
fe/src/test/resources/hbase-jaas-client.conf.template
fe/src/test/resources/hbase-jaas-server.conf.template
fe/src/test/resources/users.ldif
java/.mvn/maven.config
java/toolchains.xml.tmpl
testdata/AllTypesError/*.txt
testdata/AllTypesErrorNoNulls/*.txt
*.avsc
*.parq
*.parquet
testdata/charcodec/*
testdata/cluster/hive/*.diff
testdata/cluster/node_templates/cdh5/etc/hadoop/conf/*.xml.tmpl
testdata/cluster/node_templates/common/etc/kudu/*.conf.tmpl
testdata/cluster/node_templates/common/etc/hadoop/conf/*.xml.tmpl
testdata/cluster/ranger/setup/*.json.template
testdata/cluster/ranger/*.diff
testdata/data/binary_tbl/000000_0.txt
testdata/data/chars-formats.txt
testdata/data/chars-tiny.txt
testdata/data/parent_table.txt
testdata/data/parent_table_2.txt
testdata/data/child_table.txt
testdata/data/date_tbl/*.txt
testdata/data/date_tbl_error/*.txt
testdata/data/decimal-tiny.txt
testdata/data/decimal_tbl.txt
testdata/data/decimal_rtf_tiny_tbl.txt
testdata/data/decimal_rtf_tbl.txt
testdata/data/invalid_binary_data.txt
testdata/data/overflow.txt
testdata/data/text-comma-backslash-newline.txt
testdata/data/text-dollar-hash-pipe.txt
testdata/data/dateless_timestamps.txt
testdata/data/text_large_zstd.txt
testdata/data/text_large_zstd.zst
testdata/data/timestamp_at_dst_changes.txt
testdata/data/widerow.txt
testdata/data/local_tbl/00000.txt
testdata/data/hudi_parquet/*
testdata/data/iceberg_test/*
testdata/data/paimon_test/*
testdata/data/json_test/*
testdata/data/sfs_d2.txt
testdata/data/sfs_d4.txt
testdata/data/load_data_with_catalog_v1.txt
testdata/data/warmup_table_list.txt
testdata/data/warmup_test_config.txt
testdata/datasets/functional/functional_schema_template.sql
testdata/impala-profiles/README
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.txt
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.json
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.pretty.json
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_extended.expected.pretty.json
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_default.expected.txt
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_extended.expected.txt
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2.expected.json
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2_default.expected.txt
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2_extended.expected.txt
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2_extended.expected.pretty.json
testdata/hive_benchmark/grepTiny/part-00000
testdata/jceks/.gitkeep
testdata/jwt/*.json
testdata/jwt/jwt_expired
testdata/jwt/jwt_signed
testdata/jwt/jwt_signed_untrusted
testdata/jwt/oauth_client_secret
testdata/jwt/okta_oauth_payload_valid
testdata/jwt/okta_oauth_payload_invalid
testdata/tzdb/2017c.zip
testdata/tzdb/2017c-corrupt.zip
testdata/tzdb_tiny/*
tests/pytest.ini
tests/shell/bad_impalarc
tests/shell/good_impalarc
tests/shell/good_impalarc2
tests/shell/good_impalarc3
tests/shell/good_impalarc4
tests/shell/impalarc_with_error
tests/shell/impalarc_with_error2
tests/shell/impalarc_with_query_options
tests/shell/impalarc_with_warnings
tests/shell/impalarc_with_warnings2
tests/shell/shell.cmds
tests/shell/shell2.cmds
tests/shell/shell_case_sensitive.cmds
tests/shell/shell_case_sensitive2.cmds
tests/shell/shell_error.cmds
tests/shell/test_close_queries.sql
tests/shell/test_file_comments.sql
tests/shell/test_file_no_comments.sql
tests/shell/test_var_substitution.sql
# symlink to testdata/workloads/tpcds/queries
testdata/workloads/tpcds_partitioned/queries
# Generated by Apache-licensed software:
be/src/transport/config.h
# BSD 2-Clause license:
bin/summarize-pstacks
# BSD 3-clause license that RAT can't seem to identify:
cmake_modules/FindJNI.cmake
# http://www.apache.org/legal/resolved.html#category-a : Python Software Foundation
# License is allowed.
shell/legacy/pkg_resources.py
# Notices in Impala as required by ASF rules:
DISCLAIMER
LICENSE.txt
NOTICE.txt
# Notices in thirdparty sources included in the Impala repo and called out in /LICENSE.txt
be/src/thirdparty/roaring/LICENSE
be/src/thirdparty/squeasel/LICENSE
be/src/thirdparty/pcg-cpp-0.98/LICENSE.txt
be/src/thirdparty/xxhash/xxhash.h
be/src/thirdparty/xxhash/README.md
# http://www.apache.org/legal/src-headers.html: 'Snippet' files that are combined as form
# a larger file where the larger file would have duplicate licensing headers.
www/all_child_groups.tmpl
www/blacklisted_tooltip.txt
www/common-footer.tmpl
www/form-hidden-inputs.tmpl
www/quiescing_tooltip.txt
# GNU tar artifact
pax_global_header
# Files in binary image formats:
tests/comparison/leopard/static/favicon.ico
docs/images/howto_access_control.png
docs/images/howto_per_node_peak_memory_usage.png
docs/images/howto_show_histogram.png
docs/images/howto_static_server_pools_config.png
docs/images/impala_arch.jpeg
docs/images/support_send_diagnostic_data.png