mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-14081: Support create/drop paimon table for impala
This patch mainly implement the creation/drop of paimon table
through impala.
Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE
Syntax for creating paimon table:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];
Two types of paimon catalogs are supported.
(1) Create table with hive catalog:
CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;
(2) Create table with hadoop catalog:
CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');
SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.
TODO:
- Patches pending submission:
- Query support for paimon data files.
- Partition pruning and predicate push down.
- Query support with time travel.
- Query support for paimon meta tables.
- WIP:
- Complex type query support.
- Virtual Column query support for querying
paimon data table.
- Native paimon table scanner, instead of
jni based.
Testing:
- Add unit test for paimon impala type conversion.
- Add unit test for ToSqlTest.java.
- Add unit test for AnalyzeDDLTest.java.
- Update default_file_format TestEnumCase in
be/src/service/query-options-test.cc.
- Update test case in
testdata/workloads/functional-query/queries/QueryTest/set.test.
- Add test cases in metadata/test_show_create_table.py.
- Add custom test test_paimon.py.
Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This commit is contained in:
@@ -347,6 +347,7 @@ under the License.
|
||||
<topicref href="topics/impala_kudu.xml"/>
|
||||
<topicref href="topics/impala_hbase.xml"/>
|
||||
<topicref rev="4.1.0" href="topics/impala_iceberg.xml"/>
|
||||
<topicref rev="5.0.0" href="topics/impala_paimon.xml"/>
|
||||
<topicref href="topics/impala_s3.xml"/>
|
||||
<topicref rev="2.9.0" href="topics/impala_adls.xml"/>
|
||||
<topicref href="topics/impala_isilon.xml"/>
|
||||
|
||||
@@ -61,6 +61,10 @@ under the License.
|
||||
<topicmeta><linktext>the Apache Iceberg Puffin site</linktext></topicmeta>
|
||||
</keydef>
|
||||
|
||||
<keydef href="https://https://paimon.apache.org" scope="external" format="html" keys="upstream_paimon_site">
|
||||
<topicmeta><linktext>the Apache Paimon site</linktext></topicmeta>
|
||||
</keydef>
|
||||
|
||||
<keydef href="https://ozone.apache.org" scope="external" format="html" keys="upstream_ozone_site">
|
||||
<topicmeta><linktext>the Apache Ozone site</linktext></topicmeta>
|
||||
</keydef>
|
||||
|
||||
352
docs/topics/impala_paimon.xml
Normal file
352
docs/topics/impala_paimon.xml
Normal file
@@ -0,0 +1,352 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||||
<concept id="impala_paimon">
|
||||
|
||||
<title id="paimon">Using Impala with Paimon Tables</title>
|
||||
<titlealts audience="PDF"><navtitle>Paimon Tables</navtitle></titlealts>
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Impala"/>
|
||||
<data name="Category" value="Paimon"/>
|
||||
<data name="Category" value="Querying"/>
|
||||
<data name="Category" value="Data Analysts"/>
|
||||
<data name="Category" value="Developers"/>
|
||||
<data name="Category" value="Tables"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
<indexterm audience="hidden">Paimon</indexterm>
|
||||
Impala now adds experimental support for Apache Paimon, which is an open table format for realtime lakehouse.
|
||||
With this functionality, you can access any existing Paimon tables using SQL and perform
|
||||
analytics over them. It now supports Hive catalog and Hadoop catalog.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
For more information on Paimon, see <xref keyref="upstream_paimon_site"/>.
|
||||
</p>
|
||||
|
||||
<p outputclass="toc inpage"/>
|
||||
</conbody>
|
||||
|
||||
<concept id="paimon_features">
|
||||
<title>Overview of Paimon features</title>
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Concepts"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
<conbody>
|
||||
<ul>
|
||||
<li>
|
||||
<b>Real time updates:</b>
|
||||
<ul>
|
||||
<li>
|
||||
Primary key table supports writing of large-scale updates, has very high update performance,
|
||||
typically through Flink Streaming.
|
||||
</li>
|
||||
<li>
|
||||
Support defining Merge Engines, update records however you like.
|
||||
Deduplicate to keep the last row, or partial-update, or aggregate records, or first-row, you decide.
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>
|
||||
<b>Data Lake Capabilities:</b>
|
||||
<ul>
|
||||
<li>
|
||||
Scalable metadata: supports storing Petabyte large-scale datasets and storing a large
|
||||
number of partitions.
|
||||
</li>
|
||||
<li>
|
||||
Supports ACID Transactions & Time Travel & Schema Evolution.
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
<concept id="paimon_create">
|
||||
|
||||
<title>Creating Paimon tables with Impala</title>
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Concepts"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
<p>
|
||||
When you have an existing Paimon table that is not yet present in the Hive Metastore,
|
||||
you can use the <codeph>CREATE EXTERNAL TABLE</codeph> command in Impala to add the table to the Hive
|
||||
Metastore and make Impala able to interact with this table. Currently Impala supports
|
||||
HadoopCatalog, and HiveCatalog. If you have an existing table in HiveCatalog,
|
||||
and you are using the same Hive Metastore, you need no further actions.
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
<b>HadoopCatalog</b>. A table in HadoopCatalog means that there is a catalog location
|
||||
in the file system under which Paimon tables are stored. Use the following command
|
||||
to add a table in a HadoopCatalog to Impala:
|
||||
<codeblock>
|
||||
CREATE EXTERNAL TABLE paimon_hadoop_cat
|
||||
STORED AS PAIMON
|
||||
TBLPROPERTIES('paimon.catalog'='hadoop',
|
||||
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
|
||||
'paimon.table_identifier'='paimondb.paimontable');
|
||||
</codeblock>
|
||||
</li>
|
||||
<li>
|
||||
<b>HiveCatalog</b>. User can create managed paimon table in HMS like below :
|
||||
<codeblock>
|
||||
CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
|
||||
STORED AS PAIMON;
|
||||
</codeblock>
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
<b>Syntax for creating DDL tables</b>
|
||||
<codeblock>
|
||||
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
|
||||
(
|
||||
[col_name data_type ,...]
|
||||
[PRIMARY KEY (col1,col2)]
|
||||
)
|
||||
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
|
||||
STORED AS PAIMON
|
||||
[LOCATION 'hdfs_path']
|
||||
[TBLPROPERTIES (
|
||||
'primary-key'='col1,col2',
|
||||
'file.format' = 'orc/parquet',
|
||||
'bucket' = '2',
|
||||
'bucket-key' = 'col3',
|
||||
]
|
||||
</codeblock>
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
<b>Create Partitioned paimon table example:</b>
|
||||
<codeblock>
|
||||
CREATE TABLE support_partitioned_by_table2 (
|
||||
user_id BIGINT COMMENT 'The user_id field',
|
||||
item_id BIGINT COMMENT 'The item_id field',
|
||||
behavior STRING COMMENT 'The behavior field',
|
||||
)
|
||||
PARTITIONED BY (
|
||||
dt STRING COMMENT 'The dt field',
|
||||
hh STRING COMMENT 'The hh field'
|
||||
)
|
||||
STORED AS PAIMON;
|
||||
</codeblock>
|
||||
</li>
|
||||
<li>
|
||||
<b>Create Partitioned paimon table example with primary key:</b>
|
||||
<codeblock>
|
||||
CREATE TABLE test_create_managed_part_pk_paimon_table (
|
||||
user_id BIGINT COMMENT 'The user_id field',
|
||||
item_id BIGINT COMMENT 'The item_id field',
|
||||
behavior STRING COMMENT 'The behavior field'
|
||||
)
|
||||
PARTITIONED BY (
|
||||
dt STRING COMMENT 'The dt field',
|
||||
hh STRING COMMENT 'The hh field'
|
||||
)
|
||||
STORED AS PAIMON
|
||||
TBLPROPERTIES (
|
||||
'primary-key'='user_id'
|
||||
);
|
||||
</codeblock>
|
||||
</li>
|
||||
<li>
|
||||
<b>Create Partitioned paimon table example with bucket:</b>
|
||||
<codeblock>
|
||||
CREATE TABLE test_create_managed_bucket_paimon_table (
|
||||
user_id BIGINT COMMENT 'The user_id field',
|
||||
item_id BIGINT COMMENT 'The item_id field',
|
||||
behavior STRING COMMENT 'The behavior field'
|
||||
)
|
||||
STORED AS PAIMON
|
||||
TBLPROPERTIES (
|
||||
'bucket' = '4',
|
||||
'bucket-key'='behavior'
|
||||
);
|
||||
</codeblock>
|
||||
</li>
|
||||
<li>
|
||||
<b>Create External paimon table example with no column definitions:</b>
|
||||
<p>For external table creation, user can ignore column definitions, impala will infer schema from underlying paimon
|
||||
table. for example:</p>
|
||||
<codeblock>
|
||||
CREATE EXTERNAL TABLE ext_paimon_table
|
||||
STORED AS PAIMON
|
||||
[LOCATION 'underlying_paimon_table_location']
|
||||
</codeblock>
|
||||
</li>
|
||||
</ul>
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
<concept id="paimon_drop">
|
||||
<title>Dropping Paimon tables</title>
|
||||
<conbody>
|
||||
<p>
|
||||
One can use <codeph>DROP TABLE</codeph> statement to remove an Paimon table:
|
||||
<codeblock>
|
||||
DROP TABLE test_create_managed_bucket_paimon_table;
|
||||
</codeblock>
|
||||
</p>
|
||||
<p>
|
||||
When <codeph>external.table.purge</codeph> table property is set to true, then the
|
||||
<codeph>DROP TABLE</codeph> statement will also delete the data files. This property
|
||||
is set to true when Impala creates the Paimon table via <codeph>CREATE TABLE</codeph>.
|
||||
When <codeph>CREATE EXTERNAL TABLE</codeph> is used (the table already exists in some
|
||||
catalog) then this <codeph>external.table.purge</codeph> is set to false, i.e.
|
||||
<codeph>DROP TABLE</codeph> doesn't remove any files, only the table definition
|
||||
in HMS.
|
||||
</p>
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
<concept id="paimon_types">
|
||||
<title>Supported Data Types for Paimon Columns</title>
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
You can get information about the supported Paimon data types in
|
||||
<xref href="https://paimon.apache.org/docs/1.1/concepts/data-types/" scope="external" format="html">
|
||||
the Paimon spec</xref>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The Paimon data types can be mapped to the following SQL types in Impala:
|
||||
<table rowsep="1" colsep="1" id="paimon_types_sql_types">
|
||||
<tgroup cols="2">
|
||||
<colspec colname="c1" colnum="1"/>
|
||||
<colspec colname="c2" colnum="2"/>
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Paimon type</entry>
|
||||
<entry>SQL type in Impala</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>BOOLEAN</entry>
|
||||
<entry>BOOLEAN</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>TINYINT</entry>
|
||||
<entry>TINYINT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>SMALLINT</entry>
|
||||
<entry>SMALLINT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>INT</entry>
|
||||
<entry>INTEGER</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>BIGINT</entry>
|
||||
<entry>BIGINT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>FLOAT</entry>
|
||||
<entry>FLOAT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>DOUBLE</entry>
|
||||
<entry>DOUBLE</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>STRING</entry>
|
||||
<entry>STRING</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>DECIMAL(P,S)</entry>
|
||||
<entry>DECIMAL(P,S)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>TIMESTAMP</entry>
|
||||
<entry>TIMESTAMP</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>TIMESTAMP(*WITH*TIMEZONE)</entry>
|
||||
<entry>Not Supported</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>CHAR(N)</entry>
|
||||
<entry>CHAR(N)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>VARCHAR(N)</entry>
|
||||
<entry>VARCHAR(N)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>BINARY(N)</entry>
|
||||
<entry>BINARY(N)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>VARBINARY(N)</entry>
|
||||
<entry>BINARY(N)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>DATE</entry>
|
||||
<entry>DATE</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>TIME</entry>
|
||||
<entry>Not Supported</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>Not Supported</entry>
|
||||
<entry>DATETIME</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>MULTISET<t></entry>
|
||||
<entry>Not Supported</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>ARRAY<t></entry>
|
||||
<entry>Not Supported For Now</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>MAP<t></entry>
|
||||
<entry>Not Supported For Now</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>ROW<n1 t1,n2 t2></entry>
|
||||
<entry>Not Supported For Now</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
</p>
|
||||
<p>
|
||||
note: the unsupported type for paimon and impala is noted as "Not Supported".
|
||||
The item noted 'Not Supported for Now' will be supported later.
|
||||
</p>
|
||||
</conbody>
|
||||
</concept>
|
||||
</concept>
|
||||
@@ -1588,6 +1588,13 @@ See the history, any recent changes, here:
|
||||
<entry>X<fn>Impala 4.0 and higher</fn></entry>
|
||||
<entry/>
|
||||
</row>
|
||||
<row>
|
||||
<entry><codeph>paimon</codeph></entry>
|
||||
<entry/>
|
||||
<entry/>
|
||||
<entry>X<fn>Impala 5.0 and higher</fn></entry>
|
||||
<entry/>
|
||||
</row>
|
||||
<row>
|
||||
<entry><codeph>identity</codeph></entry>
|
||||
<entry>X</entry>
|
||||
|
||||
Reference in New Issue
Block a user