With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.
Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU
Changes:
1) Remove the requirement to specify table properties such as key
columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
time of creation in Impala.
4) Disallow table properties that could conflict with an existing
table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
addresses. The flag is used as the default value for the table
property kudu_master_addresses but it can still be overriden
using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
wasn't implemented for Kudu tables and silently ignored. The Kudu
tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
Kudu) the existence of the other delegate and the use of delegates in
general has led to confusion. The Kudu delegate only exists to provide
functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
standard. When used at the column level, only one column can be
marked as a key. When used at the table level, multiple columns can
be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
The old "kudu.key_columns" table property is no longer accepted
though it is still used internally. "PRIMARY" is now a keyword.
The ident style declaration is used for "KEY" because it is also used
for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
The table property "kudu.table_name" is optional for managed tables
and is required for external tables. If for a managed table a Kudu
table name is not provided, a table name will be generated based
on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
of HMS when a table is loaded or refreshed. Table/column metadata
are cached in the catalog and are stored in HMS in order to be
able to use table and column statistics.
Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Reviewed-on: http://gerrit.cloudera.org:8080/4414
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins