mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
This patch mainly implement the creation/drop of paimon table
through impala.
Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE
Syntax for creating paimon table:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];
Two types of paimon catalogs are supported.
(1) Create table with hive catalog:
CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;
(2) Create table with hadoop catalog:
CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');
SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.
TODO:
- Patches pending submission:
- Query support for paimon data files.
- Partition pruning and predicate push down.
- Query support with time travel.
- Query support for paimon meta tables.
- WIP:
- Complex type query support.
- Virtual Column query support for querying
paimon data table.
- Native paimon table scanner, instead of
jni based.
Testing:
- Add unit test for paimon impala type conversion.
- Add unit test for ToSqlTest.java.
- Add unit test for AnalyzeDDLTest.java.
- Update default_file_format TestEnumCase in
be/src/service/query-options-test.cc.
- Update test case in
testdata/workloads/functional-query/queries/QueryTest/set.test.
- Add test cases in metadata/test_show_create_table.py.
- Add custom test test_paimon.py.
Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
353 lines
11 KiB
XML
353 lines
11 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="impala_paimon">
|
|
|
|
<title id="paimon">Using Impala with Paimon Tables</title>
|
|
<titlealts audience="PDF"><navtitle>Paimon Tables</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Paimon"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Tables"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">Paimon</indexterm>
|
|
Impala now adds experimental support for Apache Paimon, which is an open table format for realtime lakehouse.
|
|
With this functionality, you can access any existing Paimon tables using SQL and perform
|
|
analytics over them. It now supports Hive catalog and Hadoop catalog.
|
|
</p>
|
|
|
|
<p>
|
|
For more information on Paimon, see <xref keyref="upstream_paimon_site"/>.
|
|
</p>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
|
|
<concept id="paimon_features">
|
|
<title>Overview of Paimon features</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Concepts"/>
|
|
</metadata>
|
|
</prolog>
|
|
<conbody>
|
|
<ul>
|
|
<li>
|
|
<b>Real time updates:</b>
|
|
<ul>
|
|
<li>
|
|
Primary key table supports writing of large-scale updates, has very high update performance,
|
|
typically through Flink Streaming.
|
|
</li>
|
|
<li>
|
|
Support defining Merge Engines, update records however you like.
|
|
Deduplicate to keep the last row, or partial-update, or aggregate records, or first-row, you decide.
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
<b>Data Lake Capabilities:</b>
|
|
<ul>
|
|
<li>
|
|
Scalable metadata: supports storing Petabyte large-scale datasets and storing a large
|
|
number of partitions.
|
|
</li>
|
|
<li>
|
|
Supports ACID Transactions & Time Travel & Schema Evolution.
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="paimon_create">
|
|
|
|
<title>Creating Paimon tables with Impala</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Concepts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
<p>
|
|
When you have an existing Paimon table that is not yet present in the Hive Metastore,
|
|
you can use the <codeph>CREATE EXTERNAL TABLE</codeph> command in Impala to add the table to the Hive
|
|
Metastore and make Impala able to interact with this table. Currently Impala supports
|
|
HadoopCatalog, and HiveCatalog. If you have an existing table in HiveCatalog,
|
|
and you are using the same Hive Metastore, you need no further actions.
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<b>HadoopCatalog</b>. A table in HadoopCatalog means that there is a catalog location
|
|
in the file system under which Paimon tables are stored. Use the following command
|
|
to add a table in a HadoopCatalog to Impala:
|
|
<codeblock>
|
|
CREATE EXTERNAL TABLE paimon_hadoop_cat
|
|
STORED AS PAIMON
|
|
TBLPROPERTIES('paimon.catalog'='hadoop',
|
|
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
|
|
'paimon.table_identifier'='paimondb.paimontable');
|
|
</codeblock>
|
|
</li>
|
|
<li>
|
|
<b>HiveCatalog</b>. User can create managed paimon table in HMS like below :
|
|
<codeblock>
|
|
CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
|
|
STORED AS PAIMON;
|
|
</codeblock>
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
<b>Syntax for creating DDL tables</b>
|
|
<codeblock>
|
|
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
|
|
(
|
|
[col_name data_type ,...]
|
|
[PRIMARY KEY (col1,col2)]
|
|
)
|
|
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
|
|
STORED AS PAIMON
|
|
[LOCATION 'hdfs_path']
|
|
[TBLPROPERTIES (
|
|
'primary-key'='col1,col2',
|
|
'file.format' = 'orc/parquet',
|
|
'bucket' = '2',
|
|
'bucket-key' = 'col3',
|
|
]
|
|
</codeblock>
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<b>Create Partitioned paimon table example:</b>
|
|
<codeblock>
|
|
CREATE TABLE support_partitioned_by_table2 (
|
|
user_id BIGINT COMMENT 'The user_id field',
|
|
item_id BIGINT COMMENT 'The item_id field',
|
|
behavior STRING COMMENT 'The behavior field',
|
|
)
|
|
PARTITIONED BY (
|
|
dt STRING COMMENT 'The dt field',
|
|
hh STRING COMMENT 'The hh field'
|
|
)
|
|
STORED AS PAIMON;
|
|
</codeblock>
|
|
</li>
|
|
<li>
|
|
<b>Create Partitioned paimon table example with primary key:</b>
|
|
<codeblock>
|
|
CREATE TABLE test_create_managed_part_pk_paimon_table (
|
|
user_id BIGINT COMMENT 'The user_id field',
|
|
item_id BIGINT COMMENT 'The item_id field',
|
|
behavior STRING COMMENT 'The behavior field'
|
|
)
|
|
PARTITIONED BY (
|
|
dt STRING COMMENT 'The dt field',
|
|
hh STRING COMMENT 'The hh field'
|
|
)
|
|
STORED AS PAIMON
|
|
TBLPROPERTIES (
|
|
'primary-key'='user_id'
|
|
);
|
|
</codeblock>
|
|
</li>
|
|
<li>
|
|
<b>Create Partitioned paimon table example with bucket:</b>
|
|
<codeblock>
|
|
CREATE TABLE test_create_managed_bucket_paimon_table (
|
|
user_id BIGINT COMMENT 'The user_id field',
|
|
item_id BIGINT COMMENT 'The item_id field',
|
|
behavior STRING COMMENT 'The behavior field'
|
|
)
|
|
STORED AS PAIMON
|
|
TBLPROPERTIES (
|
|
'bucket' = '4',
|
|
'bucket-key'='behavior'
|
|
);
|
|
</codeblock>
|
|
</li>
|
|
<li>
|
|
<b>Create External paimon table example with no column definitions:</b>
|
|
<p>For external table creation, user can ignore column definitions, impala will infer schema from underlying paimon
|
|
table. for example:</p>
|
|
<codeblock>
|
|
CREATE EXTERNAL TABLE ext_paimon_table
|
|
STORED AS PAIMON
|
|
[LOCATION 'underlying_paimon_table_location']
|
|
</codeblock>
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="paimon_drop">
|
|
<title>Dropping Paimon tables</title>
|
|
<conbody>
|
|
<p>
|
|
One can use <codeph>DROP TABLE</codeph> statement to remove an Paimon table:
|
|
<codeblock>
|
|
DROP TABLE test_create_managed_bucket_paimon_table;
|
|
</codeblock>
|
|
</p>
|
|
<p>
|
|
When <codeph>external.table.purge</codeph> table property is set to true, then the
|
|
<codeph>DROP TABLE</codeph> statement will also delete the data files. This property
|
|
is set to true when Impala creates the Paimon table via <codeph>CREATE TABLE</codeph>.
|
|
When <codeph>CREATE EXTERNAL TABLE</codeph> is used (the table already exists in some
|
|
catalog) then this <codeph>external.table.purge</codeph> is set to false, i.e.
|
|
<codeph>DROP TABLE</codeph> doesn't remove any files, only the table definition
|
|
in HMS.
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="paimon_types">
|
|
<title>Supported Data Types for Paimon Columns</title>
|
|
<conbody>
|
|
|
|
<p>
|
|
You can get information about the supported Paimon data types in
|
|
<xref href="https://paimon.apache.org/docs/1.1/concepts/data-types/" scope="external" format="html">
|
|
the Paimon spec</xref>.
|
|
</p>
|
|
|
|
<p>
|
|
The Paimon data types can be mapped to the following SQL types in Impala:
|
|
<table rowsep="1" colsep="1" id="paimon_types_sql_types">
|
|
<tgroup cols="2">
|
|
<colspec colname="c1" colnum="1"/>
|
|
<colspec colname="c2" colnum="2"/>
|
|
<thead>
|
|
<row>
|
|
<entry>Paimon type</entry>
|
|
<entry>SQL type in Impala</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>BOOLEAN</entry>
|
|
<entry>BOOLEAN</entry>
|
|
</row>
|
|
<row>
|
|
<entry>TINYINT</entry>
|
|
<entry>TINYINT</entry>
|
|
</row>
|
|
<row>
|
|
<entry>SMALLINT</entry>
|
|
<entry>SMALLINT</entry>
|
|
</row>
|
|
<row>
|
|
<entry>INT</entry>
|
|
<entry>INTEGER</entry>
|
|
</row>
|
|
<row>
|
|
<entry>BIGINT</entry>
|
|
<entry>BIGINT</entry>
|
|
</row>
|
|
<row>
|
|
<entry>FLOAT</entry>
|
|
<entry>FLOAT</entry>
|
|
</row>
|
|
<row>
|
|
<entry>DOUBLE</entry>
|
|
<entry>DOUBLE</entry>
|
|
</row>
|
|
<row>
|
|
<entry>STRING</entry>
|
|
<entry>STRING</entry>
|
|
</row>
|
|
<row>
|
|
<entry>DECIMAL(P,S)</entry>
|
|
<entry>DECIMAL(P,S)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>TIMESTAMP</entry>
|
|
<entry>TIMESTAMP</entry>
|
|
</row>
|
|
<row>
|
|
<entry>TIMESTAMP(*WITH*TIMEZONE)</entry>
|
|
<entry>Not Supported</entry>
|
|
</row>
|
|
<row>
|
|
<entry>CHAR(N)</entry>
|
|
<entry>CHAR(N)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>VARCHAR(N)</entry>
|
|
<entry>VARCHAR(N)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>BINARY(N)</entry>
|
|
<entry>BINARY(N)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>VARBINARY(N)</entry>
|
|
<entry>BINARY(N)</entry>
|
|
</row>
|
|
<row>
|
|
<entry>DATE</entry>
|
|
<entry>DATE</entry>
|
|
</row>
|
|
<row>
|
|
<entry>TIME</entry>
|
|
<entry>Not Supported</entry>
|
|
</row>
|
|
<row>
|
|
<entry>Not Supported</entry>
|
|
<entry>DATETIME</entry>
|
|
</row>
|
|
<row>
|
|
<entry>MULTISET<t></entry>
|
|
<entry>Not Supported</entry>
|
|
</row>
|
|
<row>
|
|
<entry>ARRAY<t></entry>
|
|
<entry>Not Supported For Now</entry>
|
|
</row>
|
|
<row>
|
|
<entry>MAP<t></entry>
|
|
<entry>Not Supported For Now</entry>
|
|
</row>
|
|
<row>
|
|
<entry>ROW<n1 t1,n2 t2></entry>
|
|
<entry>Not Supported For Now</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</p>
|
|
<p>
|
|
note: the unsupported type for paimon and impala is noted as "Not Supported".
|
|
The item noted 'Not Supported for Now' will be supported later.
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
</concept>
|