impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 18:00:57 -05:00

Author	SHA1	Message	Date
Michael Ho	b14ca6d09f	IMPALA-3645: Free probe expressions' local allocations in ConstructBuildSide() With the prefetching changes, the probe expressions' local allocations are no longer freed via QueryMaintenance() in PHJ. Instead, they are freed explicitly in GetNext() after an entire probe batch has been processed. Due to this change in how we handle local allocations of probe expressions, a DCHECK was added to verify that there is no local allocation from the probe expression in ProcessBuildInput(). Turns out that Expr::Open() called in ConstructBuildSide() on the probe expressions may have caused local allocations to occur for certain UDFs (e.g. extract()). This change handles the situation above by freeing local allocations of the probe expressions once before calling ProcessBuildInput() in ConstructBuildSide(). A new regression test is also added for this specific case. Change-Id: I2096ca3e2093c5ab0ecc0e7ca4cd1b5f3c1ed1ed Reviewed-on: http://gerrit.cloudera.org:8080/3253 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-06-02 09:32:54 -07:00
Jim Apple	1a3d7ffd4f	IMPALA-2147: Support IS [NOT] DISTINCT FROM and "<=>" predicates Enforces that the planner treats IS NOT DISTINCT FROM as eligible for hash joins, but does not find the minimum spanning tree of equivalences for use in optimizing query plans; this is left as future work. Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0 Reviewed-on: http://gerrit.cloudera.org:8080/710 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 05:45:22 +00:00
Tim Armstrong	25e7454bc9	IMPALA-2319: correctly enforce NLJ limit In some cases in the NLJ node eos_ wasn't set even though the limit was reached. This prevented the limit from being handled correctly before returning rows to the caller of GetNext(). This could result in either too many rows being returned, or a crash when the row batch size was set to an invalid negative number. The fix is to always check for whether the limit was reached before returning from GetNext(). Change-Id: I660e774787870213ada9f2d3e6f10953d9937022 Reviewed-on: http://gerrit.cloudera.org:8080/797 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-11 22:45:39 +00:00
Skye Wanderman-Milne	7906ed44ac	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c Reviewed-on: http://gerrit.cloudera.org:8080/652 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-19 08:40:14 +00:00
Dimitris Tsirogiannis	47c5ae405a	Revert "IMPALA-2015: Add support for nested loop join" This reverts commit 6837cdec7f6a7e1c7e8157e323f3ab68277689aa. Change-Id: I2fd6424c553a701fcbfd425b4486af7280820b23 Reviewed-on: http://gerrit.cloudera.org:8080/636 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 02:20:07 +00:00
Skye Wanderman-Milne	f000758ca8	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e Reviewed-on: http://gerrit.cloudera.org:8080/457 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-07 02:47:32 +00:00
ishaan	8369c3b13b	Remove explicit references to functional_hbase tables from .test files. Additionally, this patch also disabled the hbase/none test dimension if the TARGET_FILESYSTEM environment variable is set to either s3 of isilon. Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb Reviewed-on: http://gerrit.cloudera.org:8080/74 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-02-23 23:32:41 +00:00
Nong Li	a2e7b05bb1	IMPALA-1332: Fix memory leak for FULL OUTER/RIGHT OUTER joins. This can happen if not all rows are returned. Change-Id: I4d54641b71c44faa85a2138d16f9dda1052317b5 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4737 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-10-06 19:49:56 -07:00
Skye Wanderman-Milne	0db2181d97	IMPALA-1326: fix bug in BufferedTupleStream::GetTupleRow() Change-Id: If133a2041e0bae0c327fe83b114e36b9320784bb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4658 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-10-06 15:12:32 -07:00
Skye Wanderman-Milne	3b7449a59b	Codegen PartitionedHashJoinNode This also reverts back to using CRC hash since FNV is not codegen'd yet. The perf is not as good as the original HJ in a microbenchmark; I haven't run a cluster run yet. Change-Id: Ie4dc983f31631fbc78720425a0e354dd1d3342a6 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4219 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-09-13 00:17:33 -07:00
Ippokratis Pandis	e21987e338	Bug fix in PHJ, addresses also IMPALA-1160 In PHJ, we have to reset hash_tbl_iterator_ before probing a new batch. Adds regression test for IMPALA-1160. Change-Id: I608280815de2c5c1e334b7d2b4a50b12bf1d9096 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3968 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3995	2014-08-22 01:51:34 -07:00
Alex Behm	7f51449869	Rename ANTI JOIN to LEFT ANTI JOIN for consistency with LEFT SEMI JOIN. Change-Id: I8171b2d44b45529fdbd040d5709aaeb9f13facfa Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3873 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-08-17 12:46:10 -07:00
Ippokratis Pandis	3ee273ae50	Adding support for {anti,left semi,left outer} joins to the partitioned hash join implementation. Adding the "anti join" keyword in the frontend and the corresponding backend paths for the partitioned hash join implementation. Adding some basic testing for this new join (the other types have already tests). Also, fixing a bug in the tuple stream when it was handling strings. Change-Id: Ied8cff96b2bca284a5f66f7d11df5c5b5ec789cc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3805 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins	2014-08-17 12:24:17 -07:00
Alex Behm	22858ba7e1	IMPALA-1123: Add casts to the partition exprs of hash-partitioning senders. This patch ensures that all hash-partitioning senders to a hash-partitioned fragment hash on exprs of identical types. Casts are added as necessary. Otherwise, the hashes generated for identical partition values may differ among senders if the partition-expr types are not identical. The new logic is placed into PlanFragment.finalize() in order to avoid repeated re-casting of senders during plan generation, since every time a child fragment is absorbed into a partition-compatible parent we potentially need to add casts to all senders of that fragment again. Change-Id: Id9f581cc03127f64f0631d9b288fab4cd4dd8a82 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3689 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3708	2014-07-31 23:57:08 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Alex Behm	bf85225911	IMPALA-881: Tests for joins with union inputs. Change-Id: I4be6821ac3938345ca95c542d868c87512ff66da Reviewed-on: http://gerrit.ent.cloudera.com:8080/3229 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-06-23 15:38:06 -07:00
Dimitris Tsirogiannis	2d7a8b7c70	IMPALA-964: Full outer join on values() followed by group by hits a preconditions check This commit fixes IMPALA-964 where full outer join between two inline views followed by a group by (e.g. select 1 FROM (VALUES(1 x, 1 y)) a FULL OUTER JOIN (VALUES(1 x, 1 y)) b ON (a.x = b.y) GROUP BY a.x;) hits a preconditions check. This check evaluates if the numNodes (number of nodes for the purpose of resource estimation) variable is greater or equal to zero and is triggered when we try to compute the resource estimates (number of distinct values) of a plan fragment. The following changes are included in this commit: 1. Modified the getNumDistinctValues function in PlanFragment class to consider the special case where the numNodes of a plan fragment is -1. 2. Added a test case in QueryTest/joins.test. Change-Id: I2962ed5079e174d0e76ad990ab84e1fb1a4607ef Reviewed-on: http://gerrit.ent.cloudera.com:8080/2466 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2514 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-05-11 19:30:38 -07:00
Alex Behm	dd0409e9d6	IMPALA-509: Minimal type promotion for arithmetic exprs. Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:30 -08:00
Matthew Jacobs	93368e20b1	Fix CROSS JOIN handling in join order optimization and add tests Cross joins should be handled like outer joins in the join order optimization in that the right table referenced by a cross join may not be reordered anywhere before tables referenced to the left of the cross join. If there are inner joins to the right of the cross join, those tables may be reordered before the cross join. E.g., if we have A JOIN B CROSS JOIN C JOIN D, then C must come after A and B, but D may be reordered to come before C. Also adds test cases for join order optimization and predicate propagation. Change-Id: I6b1022dd3e862efbff81e283b43284d846c8eca4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1096 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:29 -08:00
Matthew Jacobs	f327431a8e	IMPALA-171: Add CROSS JOIN Adds a CROSS JOIN (cartesian product). Common join code is moved from to a new abstract base class BlockingJoinNode. We must keep all build RowBatches in memory in order to iterate over them for every row from the left child. The TupleRowList provides a convenient way to iterate over all of the rows. A future change will address codegen for the CrossJoinNode. Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/970 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Alan Choi	595edaa9d1	Disable all string to numeric and boolean implicit cast	2014-01-08 10:46:24 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

26 Commits