impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
Vlad Berindei	ece7fed421	IMPALA-2316: Add RESTRICT to DROP DATABASE Change-Id: Iffad73175b49160ae049911bd33c110a830f932b Reviewed-on: http://gerrit.cloudera.org:8080/796 Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com> Tested-by: Internal Jenkins	2015-09-11 20:37:27 +00:00
Alex Behm	e9e43488cf	IMPALA-2297: Handle collection types in ExprContext::GetValue(). Change-Id: I6af780791e392c0431efdf5a513e4b1cb60d14cf Reviewed-on: http://gerrit.cloudera.org:8080/749 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-10 17:46:21 +00:00
Alex Behm	deb9c6f8e6	Nested Types: Poor man's projection for collection-typed slots. Collection-typed slots are expensive to copy, e.g., during data exchanges or when writing into a buffered-tuple-stream. Even worse, such slots could be duplicated many times after unnesting in a subplan. To alleviate this problem, this patch implements a poor man's projection where collection-typed slots are set to NULL inside the SubplanNode that flattens them. The FE guarantees that the contents of an array-typed slot are never referenced outside of the single UnnestNode that access them, so when returning eos in UnnestNode::GetNext() we also set the unnested array slot to NULL to avoid those expensive copies in downstream exec nodes. The FE provides that guarantee by creating a new slot in the parent scan for every relative CollectionTableRef. For example, for a table 't' with a collection-typed column 'c' the following query would have two separate slots in the tuple of 't', one for 'c1' and one for 'c2': select * from t, t.c c1, t.c c2 Change-Id: I90e5b86463019c9ed810c299945c831c744ff563 Reviewed-on: http://gerrit.cloudera.org:8080/763 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-10 05:44:55 +00:00
Alex Behm	361da01152	Fail queries that require a SubplanNode when using legacy joins and aggs. We will not provide full nested types support if any of these options are set: --enable_partitioned_aggregation=false --enable_partitioned_hash_join=false Change-Id: I0f8607914faf9691d5f7b1a4327609fefba22e56 Reviewed-on: http://gerrit.cloudera.org:8080/792 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-09-10 04:50:31 +00:00
Henry Robinson	8809567e82	IMPALA-2290: Fix btrim() thread-safety. By not using THREAD_LOCAL for its state, btrim() invocations in multi-threaded contexts (i.e. pushed to the scanner) would have threads trampling over each other's bitset used to check for trimmed characters. Testing: See new test in expr.test: select count(*) from functional.alltpyes where btrim(string_col, string_col) != "" .. should give 0 results, but would give > 0 with this bug. Change-Id: I595e25b1d4fb7c76b846fce837b4ec140f47d43c Reviewed-on: http://gerrit.cloudera.org:8080/748 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2015-09-09 04:15:30 +00:00
Tim Armstrong	5ac55f24cc	IMPALA-2296: missing DeepCopy array support Implement Tuple-to-Tuple DeepCopy for collections. Add query test that uses the TOP-N node, which deep copies tuples in this way. Confirmed that the query test failed before this fix. Change-Id: I3fea860d8251038d7b5eb85c77973939abe9dbf8 Reviewed-on: http://gerrit.cloudera.org:8080/757 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-08 23:40:53 +00:00
Tim Armstrong	d73683b320	Fix nested types tpch test formatting Invalid test file format caused tpch tests to fail. Change-Id: Ibf523d071bb14db72689e39645fd1724897543c7 Reviewed-on: http://gerrit.cloudera.org:8080/766 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-08 21:57:52 +00:00
Alex Behm	662bc24c79	IMPALA-2100: Exclude explain header from expected results of test_partitioning.py. HDFS acknowledges writes when the first replica is written. As a result, the estimated memory requirements for an Impala query may vary depending on how many replicas existed at the time of table loading. This racey behavior caused a few tests to sometimes fail due to different actual and expected memory requirements. The fix is to exclude the explain header from the expected results. Change-Id: Ifb13de937a104a48960d35745df521de66596837 Reviewed-on: http://gerrit.cloudera.org:8080/762 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-08 19:57:55 +00:00
aacalfa	5e733e8d62	IMPALA-2190: Complete conversion functions between timestamp, unixtime, and string dates Change-Id: I48a446f19c7634477f175d0defa8779dd70a392f Reviewed-on: http://gerrit.cloudera.org:8080/654 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-07 07:07:20 +00:00
Dimitris Tsirogiannis	f647b36e58	IMPALA-2289: Properly set eos_ in the BlockingJoinNode when the probe side is exhausted This commit fixes an issue where BlockingJoinNode will incorrectly set eos_ flag to true when the probe side is exhausted without considering the join mode that is executed. This would cause the NestedLoopJoinNode to sometimes return wrong results when a right-outer, right-anti or full-outer join mode is used. This issue appeared in nested TPC-H Q22. Change-Id: I01f2118d4db3d8739201d5c3f475f5b7e328555a Reviewed-on: http://gerrit.cloudera.org:8080/753 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-06 05:29:22 +00:00
Alex Behm	d48ec4b8b3	IMPALA-2289: Properly handle AtCapacity() in SubplaNode. After this patch we get correct results for nested TPCH Q13. The bug: Since we were not properly handling AtCapacity() of the output batch in SubplanNode, we sometimes passed a row batch that was already at capacity into GetNext() on the second child of the SubplanNode. In this particular case, that batch was passed into the NestedLoopJoinNode which may return incomplete results if the output batch is already at capacity (e.g., ProcessUnmatchedBuildRows() was not called). The fix is to return from SuplanNode::GetNext() if the output batch is at capacity due to resources being tranferred to it from the input batch used to fetch from the first child. Change-Id: Ib97821e8457867dc0d00fd37149a3f0a75872297 Reviewed-on: http://gerrit.cloudera.org:8080/742 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-04 20:26:52 +00:00
Vlad Berindei	cfc3952a83	IMPALA-898: Support explicit column names in WITH-clause views. Example: WITH t(c1, c2) AS (SELECT int_col, bool_col FROM functional.alltypes) SELECT * FROM t This will create a local view with the 'int_col' and 'bool_col' columns labeled as 'c1' and 'c2'. If the number of labels is less than the number of columns, then the remaining columns in the local view will be labeled as the corresponding columns in the query statement. Therefore, this is also a valid query (only 'int_col' will be labeled as 'c1'): WITH t(c1) AS (SELECT int_col, bool_col FROM functional.alltypes) SELECT * FROM t Change-Id: Ie3a559ca9eaf95c6980c5695a49f02010c42899b Reviewed-on: http://gerrit.cloudera.org:8080/717 Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com> Tested-by: Internal Jenkins	2015-09-03 01:19:43 +00:00
Skye Wanderman-Milne	bcc73a36da	Nested types: read and materialize nested types in Parquet scanner This patch modifies the Parquet scanner to resolve nested schemas, and read and materialize collection types. The high-level modification is to create a CollectionColumnReader that recursively materializes map- and array-type slots. This patch also adds many tests, most of which query a new table called complextypestbl. This table contains hand-generated data that is meant to expose edge cases in the scanner. The tests mostly test the scanner, with a few tests of other functionality (e.g. array serialization). I ran a local benchmark comparing this scanner code to the original scanner code on an expanded version of tpch_parquet.lineitem with 48009720 rows. My benchmark involved selecting different numbers of columns with a single scanner thread, and I looked at the HDFS scan node time in the query profiles. This code introduces a 10%-20% regression in single-threaded scan time. Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a Reviewed-on: http://gerrit.cloudera.org:8080/576 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-02 19:23:54 +00:00
Dimitris Tsirogiannis	f6985772dc	IMPALA-2275: S3: authorization.test_grant_revoke failure due to stale grant_revoke_no_insert.test This commit updates the test file of grant/revoke statements running against S3 to include column-level privileges. Change-Id: Ia21595740fd37c88040d9a692444c6009591a188 Reviewed-on: http://gerrit.cloudera.org:8080/735 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-09-02 04:29:41 +00:00
Juan Yu	c66785be4a	IMPALA-2227: S3:query_test.test_queries.TestQueries.test_exprs failure Use select query instead of insert query to verify constant expression on partition column. Change-Id: I442111225e8df29bcc5fe89500d023559bb1c1fb Reviewed-on: http://gerrit.cloudera.org:8080/707 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-08-29 00:40:41 +00:00
Dimitris Tsirogiannis	fdb90ed753	CDH-23206: Impala support for column-level authorization (part 1) This commit adds partial support for column-level authorization in Impala using the Sentry Service. The following changes are included: * Added support for parsing and analyzing GRANT/REVOKE statements with column-level privileges. The supporting syntax is: - GRANT SELECT (<col_names>) ON TABLE <table_name> TO [ROLE] <role_name> [WITH GRANT OPTION] - REVOKE [GRANT OPTION FROM] SELECT (<col_names>) ON TABLE <table_name> FROM [ROLE] <role_name> * Added support for storing column-level privileges in the Catalog Service and updating the Sentry Service when GRANT/REVOKE statements are executed. * Modified the SHOW GRANT ROLE statement to include information about column-level privileges. Subsequent patches will add support for enforcing column-level privileges in SQL queries and other statements. Change-Id: I0fd9daa92cc5147cb6f4b25eb9651aab8bf3049f Reviewed-on: http://gerrit.cloudera.org:8080/607 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-28 23:58:36 +00:00
Juan Yu	d42ecb310a	IMPALA-1756: Add test case for partition insert query Change-Id: I4879d8fe7221b551898fa9fa94076bb9b0804f06 Reviewed-on: http://gerrit.cloudera.org:8080/696 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-08-27 18:50:58 +00:00
Martin Grund	60c5140ea7	IMPALA-1983: Warn if table stats are potentially corrupt. When the `numRows` parameter stored in the table properties is errornously set to 0 and a number of non-empty files are present the table statistics are considered to be corrupt. To hint that there might be a problem, the explain statement will emit an additional warning if it detects potentially corrupt table stats like in the following example: Estimated Per-Host Requirements: Memory=42.00MB VCores=1 WARNING: The following tables have potentially corrupt table and/or column statistics. compute_stats_db.corrupted 03:AGGREGATE [FINALIZE] \| output: count:merge() \| 02:EXCHANGE [UNPARTITIONED] \| 01:AGGREGATE \| output: count() \| 00:SCAN HDFS [compute_stats_db.corrupted] partitions=1/2 files=1 size=24B In addition, the small query optimization is disabled for such queries. Change-Id: I0fa911f5132aa62195b854248663a94dcd8b14de Reviewed-on: http://gerrit.cloudera.org:8080/689 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 22:19:33 +00:00
Sailesh Mukil	1a9fc47295	IMPALA-2227: S3: query_test.test_queries.TestQueries.test_exprs failure The test file testdata/workloads/functional-query/queries/QueryTest/exprs.test had INSERT statements in it, which are not supported on S3. This commit gets rid of those statements and rewrites them with SELECT [...] FROM VALUES(...) so that the tests are compatible on S3. Change-Id: I25faacf9fae3780f627afee86dc8c1ede7f6e2a2 Reviewed-on: http://gerrit.cloudera.org:8080/670 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 00:36:51 +00:00
Vlad Berindei	452ebee59d	IMPALA-1906: PARQUET_FILE_SIZE query option overflows for values >= 2GB. The value of PARQUET_FILE_SIZE overflows when RoundUp() is called because this function returns an int32. Even with this change, this value will still overflow when calling the HDFS API since it is passed to hdfsOpenFile() as blocksize, which is an int32 parameter (see HDFS-8949). Changes: - Return an error if PARQUET_FILE_SIZE is set to a value greater than or equal to 2GB. - If PARQUET_FILE_SIZE is set in an Impala session to a value greater than or equal to 2GB, then every query will fail with an error message. - If PARQUET_FILE_SIZE is changed to a value greater than or equal to 2GB as an impalad argument, impalad will not start and log an error. - Ceil(), RoundUp(), RoundDown() return int64. Change-Id: Ie4f2551b72954e2a57db5594e4789e3f7434d578 Reviewed-on: http://gerrit.cloudera.org:8080/678 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com> Tested-by: Internal Jenkins	2015-08-25 23:28:13 +00:00
Alex Behm	6f0b255c5a	Address several shortcomings with respect to the usability of Avro tables. Addressed JIRAs: IMPALA-1947 and IMPALA-1813 New Feature: Adds support for creating an Avro table without an explicit Avro schema with the following syntax. CREATE TABLE <table_name> column_defs STORED AS AVRO Fixes and Improvements: This patch fixes and unifies the logic for reconciling differences between an Avro table's Avro Schema and its column definitions. This reconciliation logic is executed during Impala's CREATE TABLE and when loading a table's metadata. Impala generally performs the schema reconciliation during table creation, but Hive does not. In many cases, Hive's CREATE TABLE stores the original column definitions in the HMS (in the StorageDescriptor) instead of the reconciled column definitions. The reconciliation logic considers the field/column names and follows this conflict resolution policy which is similar to Hive's: Mismatched number of columns -> Prefer Avro columns. Mismatched name/type -> Prefer Avro column, except: A CHAR/VARCHAR column definition maps to an Avro STRING, and is preserved as a CHAR/VARCHAR in the reconciled schema. Behavior for TIMESTAMP: A TIMESTAMP column definition maps to an Avro STRING and is presented as a STRING in the reconciled schema, because Avro has no binary TIMESTAMP representation. As a result, no Avro table may have a TIMESTAMP column (existing behavior). Change-Id: I8457354568b6049b2dd2794b65fadc06e619d648 Reviewed-on: http://gerrit.cloudera.org:8080/550 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-25 09:52:18 +00:00
Taras Bobrovytsky	75691156be	IMPALA-2239: update misc.test to match the new .test file format Change-Id: Ia5b9925628b415c306f320ef186246179e38f73b Reviewed-on: http://gerrit.cloudera.org:8080/684 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-25 00:12:52 +00:00
Alex Behm	ae9fd52c51	IMPALA-2089: Retain eq predicates bound by grouping slots with complex grouping exprs. The bug: When enforcing slot equivalences at an aggregation node, we used to incorrectly assume that equivalences among grouping slots must have already been enforced below the aggregation (e.g., in a scan). This assumption is correct if the grouping slots are produced by simple SlotRef grouping exprs, because then there is certainly a value transfer between the grouping slot and another slot below the aggregation. However, for grouping slots with complex grouping exprs this assumption is not correct, and as a result, we would incorrectly remove eq predicates bound by gropuing slots with complex grouping exprs because we assumed they were redundant. Ths fix is to enforce slot equivalences among grouping slots with complex grouping exprs as usual, and not assume that they have already been enforced below the agg. Change-Id: Idcd44acccb9326a35c9121025dc88c2c70c7c7c7 Reviewed-on: http://gerrit.cloudera.org:8080/656 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-23 04:43:37 +00:00
Alex Behm	14a8cadcf6	Nested Types: Pretty print complex types in DESCRIBE. The current DESCRIBE prints the column type as a single string without whitespace. As a result, the DESCRIBE output for tables with complex types is basically unreadable/unusable, e.g., from the Impala shell. This patch adds a prettyPrint() function to the FE Type and uses that for generating a nicely formatted DESCRIBE output. The output of DESCRIBE FORMATTED is intentionally not modified because exact Hive-compatibility has been and presumably continues to be very important to our users. Change-Id: Ida810facdffd970948b837b83a60f9ddcd95f44d Reviewed-on: http://gerrit.cloudera.org:8080/633 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-22 09:26:35 +00:00
Taras Bobrovytsky	b8b7930377	Add nested types support to Create Table Like File Add support for creating a table based on a parquet file which contains arrays, structs and/or maps. Change-Id: I56259d53a3d9b82f318228e864c783b48a03f9ae Reviewed-on: http://gerrit.cloudera.org:8080/582 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-22 01:46:26 +00:00
Vlad Berindei	e4c42fa8bf	IMPALA-595: Add CASCADE to DROP DATABASE and use it in cleanup_db Change-Id: Idfa5b6943bc797e10d542487c31b8f1b527d8c97 Reviewed-on: http://gerrit.cloudera.org:8080/635 Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com> Tested-by: Internal Jenkins	2015-08-20 03:34:31 +00:00
Skye Wanderman-Milne	7906ed44ac	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c Reviewed-on: http://gerrit.cloudera.org:8080/652 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-19 08:40:14 +00:00
Tim Armstrong	5350d49f8c	IMPALA-1829: UDAs with different intermediate type Previously the frontend rejected UDAs with different intermediate and result type. The backend supports these, so this change enables support in the frontend and adds tests. This patch adds a test UDA function with different intermediate type and a simple end-to-end test that exercises it. It modifies an existing unused test UDA that used a currently unsupported intermediate type - BufferVal. Change-Id: I5675ec7f275ea698c24ea8e92de7f469a950df83 Reviewed-on: http://gerrit.cloudera.org:8080/655 Tested-by: Internal Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-08-19 04:37:39 +00:00
Sailesh Mukil	1c46cab5c6	IMPALA-2084: SPLIT_PART and REGEXP_LIKE functions for Tableau pushdown Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both. The REGEXP_LIKE has an optional third parameter which if used, uses a different 'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate options can be set in the RE2 library. Added a patch for the RE2 library so that the 'dot matches all' option is exposed via the RE2 class. Fixed a bug in the case when the function to be evaluated for the WHERE clause operates on constants, proper cleanup isn't guaranteed on certain edge cases. Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b Reviewed-on: http://gerrit.cloudera.org:8080/501 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 09:07:34 +00:00
Alex Behm	f9d26fb896	IMPALA-2203: Set an InsertStmt's result exprs from the source statement's result exprs. This patch fixes an issue where incorrect results are produced by a CTAS or IAS that is fed from a QueryStmt that has outer-joined inline views with constants or conditionals in the select list. The regression was introduced in this commit: b8f642710ea9d311a7aca32611eaa7cac6cd86df Now that the final expression substitution with TupleIsNullPredicate() wrapping is performed in planning, the InsertStmt's result expressions should be taken from the feeding QueryStmt's result expressions, and not the QueryStmt's (already substituted) base table result expressions. Change-Id: Iae29683638df01f140d0f74976cca8ca9ba0852d Reviewed-on: http://gerrit.cloudera.org:8080/637 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 01:44:45 +00:00
Casey Ching	cf60967b7e	IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs It turns out there is a variety of cases where boost incorrectly adds intervals if the interval is at (or beyond) an edge case value. This change defines a max interval and returns NULL if the user supplies an interval beyond the max. Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080 Reviewed-on: http://gerrit.cloudera.org:8080/492 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-16 12:09:24 +00:00
Christopher Channing	9ea5caf0ef	IMPALA-2199: Row count not set for empty partition when spec is used with compute incremental stats This patch resolves an issue where row count is not set to 0 when a partition spec is used with 'compute incremental stats' on a partition that contains no data. The fix is to populate the partition 'expected list' in the frontend with the partition spec, the backend keeps track of which partitions had statistics generated. In the scenario where no statistics are generated for a partition, the backend will fall back to the 'expected list' to zero out the statistics. Change-Id: If4aac131dbe44e14a0477afa58e980da9e235d6b Reviewed-on: http://gerrit.cloudera.org:8080/627 Reviewed-by: Christopher Channing <cchanning@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 09:38:30 +00:00
Dimitris Tsirogiannis	47c5ae405a	Revert "IMPALA-2015: Add support for nested loop join" This reverts commit 6837cdec7f6a7e1c7e8157e323f3ab68277689aa. Change-Id: I2fd6424c553a701fcbfd425b4486af7280820b23 Reviewed-on: http://gerrit.cloudera.org:8080/636 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 02:20:07 +00:00
Sailesh Mukil	8f11fbdd5c	IMPALA-2081: Add PERCENT_RANK, NTILE, CUME_DIST analytic window functions These functions are implemented as rewrites in the analysis stage. They are rewritten as different arithmetic expressions and make use of the existing analytic functions such as 'rank', 'count' and 'row_number' to compute the final results. TODO: IMPALA-2171: NTILE() currently takes only constant expressions. We need to modify it to take non-constant expressions as well in a future patch. Change-Id: I8773df8ceefff27ab66a41169dc4ac0927465191 Reviewed-on: http://gerrit.cloudera.org:8080/584 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-08-07 04:57:37 +00:00
Skye Wanderman-Milne	f000758ca8	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e Reviewed-on: http://gerrit.cloudera.org:8080/457 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-07 02:47:32 +00:00
Alex Behm	480a56e3a0	IMPALA-1737: Substitute an InsertStmt's partition key exprs with the root node's smap. The bug was that we were not substituting the partition key exprs of an InsertStmt with the root plan node's output smap during single-node planning. Change-Id: I16eff4bab0b1d95c7f30fd89b14af2628d6f865f Reviewed-on: http://gerrit.cloudera.org:8080/580 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-03 19:31:51 +00:00
Alex Behm	c908ba1b7e	IMPALA-1136: Support loading Avro tables without an explicit Avro schema Hive allows creating Avro tables without an explicit Avro schema since 0.14.0. For such tables, the Avro schema is inferred from the column definitions, and not stored in the metadata at all (no Avro schema literal or Avro schema file). This patch adds support for loading the metadata of such tables, although Impala currently cannot create such tables (expect a follow-on patch). Change-Id: I9e66921ffbeff7ce6db9619bcfb30278b571cd95 Reviewed-on: http://gerrit.cloudera.org:8080/538 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-31 12:13:37 +00:00
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Sailesh Mukil	8a01527bad	IMPALA-2141: UnionNode::GetNext() doesn't check for query errors When a UDF with constant parameters in the select list calls SetError(), it does not fail the query. This is because UnionNode::GetNext() does not check for errors after UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not report the error. Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a	2015-07-27 22:09:19 -07:00
Alex Behm	3ac341287c	IMPALA-2088: Fix planning of empty union operands with analytics. The check for ignoring empty union operands was simply misplaced. This misplacement resulted in empty union operands not being dropped if the containing UnionStmt had analytic functions. Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3 Reviewed-on: http://gerrit.cloudera.org:8080/554 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 15:46:41 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Sailesh Mukil	c21c080a46	IMPALA-1756: Constant expressions not checked for errors, no state cleanup on exception. Changed the way the function context error message is returned. Also, changed the exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException in case of an exception in registerConjuncts(). This commit follows from: `d497ba6cef` This is a new commit since the previous one was closed before making these changes. Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18 Reviewed-on: http://gerrit.cloudera.org:8080/560 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-24 19:04:38 +00:00
Tim Armstrong	5990b43fe2	IMPALA-1898: Explicit aliases + ordinals analysis bug Analysis errors occurred with select queries that combined ordinals in the group by/order by clauses with select list aliases that had the same name as a column in one of the underlying tables. The root cause was a double substitution: e.g. the ordinal 1 in a GROUP BY clause was replaced with the corresponding select list expression, then a reference to column 'x' in an underlying table was replaced erroneously with the select list expression with alias 'x' Change-Id: I0f298290c58f18239e1ff83f0388d037c311f5fb Reviewed-on: http://gerrit.cloudera.org:8080/542 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-07-22 21:23:36 +00:00
Sailesh Mukil	6d7bb76e87	IMPALA-1756: Constant filter expressions are not checked for errors and state cleanup is not done before throwing exception. When a builtin has an error (in the constant case), it is checked for but the state cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the constant case), the error does not propagate back up the stack due to a lack of error checking in ScalarFnCall::Open() after it calls GetConstVal(). Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1 Reviewed-on: http://gerrit.cloudera.org:8080/537 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 04:01:39 +00:00
Casey Ching	a6d534682b	IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic Boost handles a couple of edge cases differently than other databases such as Postgres and MySQL when adding year/month intervals to timestamps. This change makes Impala consistent for the other databases. The performance difference was not noticeable (<5% if any). Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06 Reviewed-on: http://gerrit.cloudera.org:8080/489 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-20 10:16:54 +00:00
Shant Hovsepian	6d87fe090c	Improve Hll estimate for small cardinalities. Based on Google's HyperLogLog++ paper. Uses a bias correcting interpolation as a sub algorithm for Hll estimates within a specific range. Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9 Reviewed-on: http://gerrit.cloudera.org:8080/415 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-07-16 19:38:17 +00:00
Ippokratis Pandis	7e9f8478e1	Removing duplicate query test Change-Id: Ia8b33ca2a2eadae288acea4bd2111a1a974bc484 Reviewed-on: http://gerrit.cloudera.org:8080/526 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-15 03:28:36 +00:00
Ippokratis Pandis	e99c68fe52	IMPALA-2130: Wrong verification of Parquet file version This patch corrects a mistake in the Parquet magic file number verification and adds a test about it. Note that with this patch Impala may fail to read Parquet files with wrong magic number that it used to read before. Change-Id: Iff31accda1e1d541946ef1f750e38886ce4cb8d5 Reviewed-on: http://gerrit.cloudera.org:8080/515 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-14 02:52:02 +00:00
Martin Grund	51aa077448	IMPALA-2133: Properly unescape string value for HBase filters This patch fixes the problem, that the Frontend would simply pass the escaped value to the backend as an HBase filter and not the unescaped one. Now queries including an escaped character will work as well. Change-Id: I96e544973b523f3ef1abdec86ea1ec5596d9bee9 Reviewed-on: http://gerrit.cloudera.org:8080/520 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-13 18:38:39 +00:00
Ippokratis Pandis	4951f895e7	Nested Types: Reset() for partitioned hash join node TODO: Need to modify Reset()'s functionality in case of NAAJs. Change-Id: I7d0ea0dabd0b3404957e228bbaa51781c5fc34c0 Reviewed-on: http://gerrit.cloudera.org:8080/490 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-08 01:51:09 +00:00

1 2 3 4 5 ...

560 Commits