impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Author	SHA1	Message	Date
Matthew Jacobs	2dcbefc652	IMPALA-5338: Fix Kudu timestamp column default values While support for TIMESTAMP columns in Kudu tables has been committed (IMPALA-5137), it does not support TIMESTAMP column default values. This supports CREATE TABLE syntax to specify the default values, but more importantly this fixes the loading of Kudu tables that may have had default values set on UNIXTIME_MICROS columns, e.g. if the table was created via the python client. This involves fixing KuduColumn to hide the LiteralExpr representing the default value because it will be a BIGINT if the column type is TIMESTAMP. It is only needed to call toSql() and toStringValue(), so helper functions are added to KuduColumn to encapsulate special logic for TIMESTAMP. TODO: Add support and tests for ALTER setting the default value (when IMPALA-4622 is committed). Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Reviewed-on: http://gerrit.cloudera.org:8080/6936 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-02 01:47:48 +00:00
Matthew Jacobs	6226e59702	IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL Adds support in DDL for timestamps in Kudu range partition syntax. For convenience, strings can be specified with or without explicit casts to TIMESTAMP. E.g. create table ts_ranges (ts timestamp primary key, i int) partition by range ( partition '2009-01-02 00:00:00' <= VALUES < '2009-01-03 00:00:00' ) stored as kudu Range bounds are converted to Kudu UNIXTIME_MICROS during analysis. Testing: Adds FE and EE tests. Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff Reviewed-on: http://gerrit.cloudera.org:8080/6849 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-19 00:41:46 +00:00
Matthew Jacobs	878fcf5a74	IMPALA-5111: Fix check when creating NOT NULL PK col in Kudu The fix for IMPALA-4616 broke the ability to create a PK key col in a Kudu table as explicitly 'NOT NULL'. While this is the default, it should be possible to specify. The precondition that was failing was fixed, and some tests were added/modified. Change-Id: I557eea7cd994d6a2ed38893d283d08107e78f789 Reviewed-on: http://gerrit.cloudera.org:8080/6465 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-24 21:22:50 +00:00
Dimitris Tsirogiannis	5ea1798661	IMPALA-4619: Allow NULL as default value in Kudu tables This commit fixes an issue where an error is thrown if the default value for a Kudu column is set to NULL. Change-Id: Ida27ce56f1dd7603485a69c680db3bcea6702aff Reviewed-on: http://gerrit.cloudera.org:8080/5405 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-08 04:53:38 +00:00
Dan Burkert	f83652c1da	Replace INTO N BUCKETS with PARTITIONS N in CREATE TABLE This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and `BUCKETS` keywords that were going to be newly released in Impala 2.6, but are now unused. Additionally, a few remaining uses of the `DISTRIBUTE BY` syntax has been switched to `PARTITION BY`. Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922 Reviewed-on: http://gerrit.cloudera.org:8080/5382 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 07:31:16 +00:00
Matthew Jacobs	5188f879a7	IMPALA-4477: Bump Kudu version to latest master (60aa54e) Bumps the toolchain version to get a newer Kudu build. Also fixes test failures resulting from changes in Kudu. Notably error strings have changed (IMPALA-4590) and the number of replicas must be odd (IMPALA-4589). Note: The toolchain binaries starting with this build are now using the toolchain binutils rather than the system binutils. Testing: private exhaustive build. Change-Id: If1912f058c240fbe82b06f77e31add7755289be1 Reviewed-on: http://gerrit.cloudera.org:8080/5369 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 05:11:13 +00:00
Dimitris Tsirogiannis	cba93f1ac3	IMPALA-4561: Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE Change-Id: I0e07c41eabb4c8cb95754cf04293cbd9e03d6ab2 Reviewed-on: http://gerrit.cloudera.org:8080/5317 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 10:41:53 +00:00
Thomas Tauber-Marshall	3833707dbd	IMPALA-4466: Improve Kudu CRUD test coverage The results in the test files were verified by hand. This patch also introduces a new test section 'DML_RESULTS', which takes the name of a table as a comment and the contents of the table as its body and then verifies that the body matches the actual contents of the table. This makes it easy to check that a DML operation has the desired effect on the contents of a table, rather than always having to add another test case that runs a select on the table. For now, this section cannot be used in a test along with the RESULTS or ERRORS sections. TODO: Refactor the DML test case handling (IMPALA-4471) Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Reviewed-on: http://gerrit.cloudera.org:8080/4953 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 02:54:30 +00:00
Dimitris Tsirogiannis	d802f321b2	IMPALA-3724: Support Kudu non-covering range partitions This commit adds support for non-covering range partitions in Kudu tables. The SPLIT ROWS clause is now deprecated and no longer supported. The following new syntax provides more flexibility in creating range partitions and it supports bounded and unbounded ranges as well as single value partitions; multi-column range partitions are supported as well. The new syntax is: DISTRIBUTE BY RANGE (col_list) ( PARTITION lower_1 <[=] VALUES <[=] upper_1, PARTITION lower_2 <[=] VALUES <[=] upper_2, .... PARTITION lower_n <[=] VALUES <[=] upper_n, PARTITION VALUE = val_1, .... PARTITION VALUE = val_n ) Multi-column range partitions are specified as follows: DISTRIBUTE BY RANGE (col1, col2,..., coln) ( PARTITION VALUE = (col1_val, col2_val, ..., coln_val), .... PARTITION VALUE = (col1_val, col2_val, ..., coln_val) ) Change-Id: I6799c01a37003f0f4c068d911a13e3f060110a06 Reviewed-on: http://gerrit.cloudera.org:8080/4856 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-11-04 22:02:22 +00:00
Matthew Jacobs	9b507b6ed6	IMPALA-4379: Fix and test Kudu table type checking Creating Kudu tables shouldn't allow types not supported by Kudu (e.g. VARCHAR/CHAR, DECIMAL, TIMESTAMP, collection types). The behavior is inconsistent: for some types it throws in the catalog, for VARCHAR/CHAR these become strings. This changes behavior so that all fail during analysis. Analysis tests were added. Similarly, external tables cannot contain Kudu types that Impala doesn't support (e.g. UNIXTIME_MICROS, BINARY). Tests were added to validate this behavior. Note that this required upgrading the python Kudu client. This also fixes a small corner case with ALTER TABLE: ALTER TABLE shouldn't allow Kudu tables to change the storage descriptor tblproperty, otherwise the table metadata gets in an inconsistent state. Tests were added for all of the above. Change-Id: I475273cbbf4110db8d0f78ddf9a56abfc6221e3e Reviewed-on: http://gerrit.cloudera.org:8080/4857 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-10-31 16:03:54 +00:00
Dimitris Tsirogiannis	041fa6d946	IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Reviewed-on: http://gerrit.cloudera.org:8080/4414 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 10:52:25 +00:00

11 Commits