impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 15:01:43 -05:00

Author	SHA1	Message	Date
Matthew Jacobs	24c77f194b	IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu This change builds on the support for reading and writing TIMESTAMP columns to Kudu tables (see [1]), adding support for pushing TIMESTAMP predicates to Kudu for scans. Binary predicates and IN list predicates are supported. Testing: Added some planner and EE tests to validate the behavior. 1: https://gerrit.cloudera.org/#/c/6526/ Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea Reviewed-on: http://gerrit.cloudera.org:8080/6789 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-18 21:09:51 +00:00
Joe McDonnell	077c07eec7	IMPALA-4859: Push down IS NULL / IS NOT NULL to Kudu This detects IS NULL / IS NOT NULL and creates a Kudu predicate to push this to Kudu. For testing, there are planner tests to verify that the predicate is pushed to Kudu. There are also end-to-end tests for correctness. Change-Id: I9c96fec8d41f77222879c0ffdd6940b168e47e65 Reviewed-on: http://gerrit.cloudera.org:8080/5958 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-25 04:51:36 +00:00
Dan Burkert	f83652c1da	Replace INTO N BUCKETS with PARTITIONS N in CREATE TABLE This commit also removes the now unused `DISTRIBUTE`, `SPLIT`, and `BUCKETS` keywords that were going to be newly released in Impala 2.6, but are now unused. Additionally, a few remaining uses of the `DISTRIBUTE BY` syntax has been switched to `PARTITION BY`. Change-Id: I32fdd5ef26c532f7a30220db52bdfbf228165922 Reviewed-on: http://gerrit.cloudera.org:8080/5382 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-12-07 07:31:16 +00:00
Dimitris Tsirogiannis	cba93f1ac3	IMPALA-4561: Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE Change-Id: I0e07c41eabb4c8cb95754cf04293cbd9e03d6ab2 Reviewed-on: http://gerrit.cloudera.org:8080/5317 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-12-06 10:41:53 +00:00
Alex Behm	4918b20ac0	IMPALA-4408: Omit null bytes for Kudu scans with no nullable slots. Kudu does not allocate null bytes if all projected columns are non-nullable. Otherwise, Kudu allocates a null bit for all columns, even the non-nullable ones. The bug was that Impala's memory layout did not match the first requirement. Change-Id: I762ad9d5cc4198922ea4b5218c504fde355c49a5 Reviewed-on: http://gerrit.cloudera.org:8080/4892 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-01 01:47:30 +00:00
Dimitris Tsirogiannis	041fa6d946	IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Reviewed-on: http://gerrit.cloudera.org:8080/4414 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 10:52:25 +00:00
Matthew Jacobs	d113205cee	IMPALA-3650: DISTRIBUTE BY required for managed Kudu tables As of Kudu 0.9, DISTRIBUTE BY is now required when creating a new Kudu table. Create table analysis, data loading, and tests are updated to reflect this. This also bumps the Kudu version to 0.10.0. Change-Id: Ieb15110b10b28ef6dd8ec136c2522b5f44dca43e Reviewed-on: http://gerrit.cloudera.org:8080/3987 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-08-19 02:14:39 +00:00
casey	8c224398bb	IMPALA-2635: Kudu scanner hangs on UNION A UNION is special because it may cause a scan node to be started without any scan ranges. The Kudu scanner didn't expect that scenario and would hang waiting for data from scanner threads that would never be started. The fix is to exit early when there are no scan ranges. Change-Id: Id53fb880ba23ee9bbcf3169598f97fa1a3285dd9 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/10044 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins	2016-01-28 21:49:39 -08:00
casey	84ce1e22af	IMPALA-2740: Kudu scanner - Reset tuple's null bits after filtering row The problem was, if a tuple was filtered, the bits indicating values are NULL were not reset and the tuple's memory was reused. So NULLs from consecutively filtered rows would accumulate in the tuple. The fix is to always reset the NULL bits (as it doesn't matter whether the row was filtered). Change-Id: Ib4d980980e02bf2c82dc229a8ed1ada16bb8174f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/9958 Tested-by: jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-12-28 11:34:22 -08:00
Martin Grund	23ca2f01ad	Execution of Update This patch adds the backend implementation to the update. It reuses the Kudu table sink and simply changes the KuduWriteOperation type to Update. Change-Id: I31e524210b9401d4619ab0f892d9fb044b6dfdea Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6999 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: jenkins	2015-08-08 07:02:37 -07:00
David Alves	ee8d830d7a	Frontend part of Kudu predicate pushdown This adds the frontend part of Kudu predicate pushdown. Namely it goes through all the predicates that are assigned to the KuduScanNode and selects those that are pushable to Kudu (binary predicates: <=, >= and = that have a constant on one side and a slot ref on the other). Pushable predicates are then set on TKuduScanNode for the backend to transform into range predicates. Partition pruning is not handled at the moment due to limitations/bugs on the Kudu java API. This adds a test that makes sure that predicates are pushed down when they match the pushable rules and are not when they don't. Change-Id: I8f86bb8b5f6667422df7080315045d69b61dba92 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7042 Tested-by: jenkins Reviewed-by: David Alves <david.alves@cloudera.com>	2015-08-04 23:49:14 -07:00
David Alves	af1e1bea15	On Kudu scans, always build a schema with 0 key columns. We currently have a bug where SELECT queries with named columns only work if the key columns are declared first. This because, on scans, we're passing a number of key columns equal to the number of key columns referred to by slot descriptors. The problem is that Kudu expects key columns to come first in the schema if the number of key columns is > 0 and we build a schema that matches the column order in the SlotDescriptors vector, which might not have key columns first. However Kudu scans don't actually care about key column ordering on scans _if_ the number of key columns is set to 0 (which is weird behavior, filed KUDU-852 for this). This patch just changes the built Kudu schema so that we always pass 0 key columns. It also adds an end-to-end test that makes sure a previously failing projection now works. Change-Id: I0826dabd87493a684cfc18058a4b5aa02f7f6cdc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7130 Tested-by: jenkins Reviewed-by: Daniel Hecht <dhecht@cloudera.com>	2015-07-13 14:56:03 -07:00
David Alves	84b2d60123	Enforce any limit set on backend KuduScanNode We were previously only incrementing the rows returned counter after a full batch was processed, missing the 'num_rows_returned_' on the scanner, which is actually used in ReachedLimit(). This caused us to return more rows than needed when with a single node plan. This patch fixes this and adds an update to 'rows_read_counter_'. Moreover this patch adds a test that makes sure the limit is enforced. Change-Id: I31c76e67fd1acb7b2bb6d31de8904954e01f9da3 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7046 Tested-by: jenkins Reviewed-by: David Alves <david.alves@cloudera.com>	2015-07-13 09:25:35 -07:00
David Alves	47bb63d249	If Kudu returns an empty string don't try to allocate buffer space In KuduScanner, when an empty string was returned we would try and allocate an empty buffer getting, correctly, a NULL buffer back. However we would interpret the NULL buffer as an inability to allocate memory, returning a MEM_LIMIT_EXCEEDED error. This patch special cases handling empty strings so that we just accept the NULL buffer and don't return an error. Specifically the following sequence of operations: INSERT INTO TABLE (id, name) testbl VALUES (10, ""); SELECT * FROM testtbl; Would fail with the aforementioned error and with this patch returns, correctly: +----+------+------+ \| id \| name \| zip \| +----+------+------+ ... \| 10 \| \| NULL \| +----+------+------+ Change-Id: I5eeee4b57ed3163b9c9888d694eba5dd4dd45bb5 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7053 Tested-by: jenkins Reviewed-by: David Alves <david.alves@cloudera.com>	2015-07-13 09:25:08 -07:00

14 Commits