impala

mirror of https://github.com/apache/impala.git synced 2026-01-02 03:00:32 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	46c3e43edb	IMPALA-3232: Allow not-exists uncorrelated subqueries Before this patch, correlated exists and not exists subqueries were rewritten as as left semi and anti joins respectively. Uncorrelated exists subqueries were rewritten as cross joins, and uncorrelated not-exists subqueries were not supported at all. This patch takes advantage of the nested loop join that was recently introduced, which allows us to rewrite both correlated and uncorrelated exists subqueries as left semi joins and both correlated and uncorrelated not-exists subqueries as anti joins. Change-Id: I52ae12f116d026190f3a2a7575cda855317d11e8 Reviewed-on: http://gerrit.cloudera.org:8080/2792 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 23:06:36 -07:00
Dimitris Tsirogiannis	ccf1f8f73f	IMPALA-2734: Correlated EXISTS subqueries with HAVING clause return wrong results This commit fixes an issue where wrong results are returned if an EXISTS subquery contains a HAVING clause and non-equality correlated binary predicates. This case does not have a valid rewrite as the HAVING clause needs to be applied after the correlated predicates have been evaluated. With this fix, we detect cases like this and throw an AnalysisException. Change-Id: I159f956e2b01f408601829b5d2afcf11d76bedcd Reviewed-on: http://gerrit.cloudera.org:8080/1927 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-02-04 01:06:03 +00:00
Dimitris Tsirogiannis	4eceeacf16	IMPALA-1550: Invalid rewrite when EXISTS subqueries contain aggregate functions This commit fixes an issue where a [NOT] EXISTS subquery that contains an aggregate function will sometimes be incorrectly rewritten into a join, thereby returning incorrect results. Change-Id: I18b211d76ee3de77d8061603ff5bb1fbceae2e60 Reviewed-on: http://gerrit.cloudera.org:8080/266 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-02 19:11:00 +00:00
Alex Behm	f696861c5c	Throw error on unrecognized test sections. Our .test file parser used to not abort tests when there is a malformed test/section. This patch changes that behavior to report an error and treat the test as failed. Quite a few tests were not well-formed, and were not executed as a result. This patch fixes those tests. Arguably, the test file parser should be more flexible in which places to accept comments, but this patch does not address that problem. Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-02 18:08:09 -08:00
casey	516d7483dd	IMPALA-1300: Allow subqueries in UNION operands This enables the existing subquery rewrite rules to rewrite UNION statements. UNION rewriting is easily done by simply calling the rewriter for each operand in the UNION. At least one TPC-DS query requires this functionality (IMPALA-1365). The more difficult case of a UNION within a subquery is still not supported. Change-Id: I7f83eed0eb8ae81565e629f09f6918a4ba86ee13 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4859 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins	2014-11-17 11:19:09 -08:00
Alex Behm	7b6ecbeea5	Fix exhaustive test run: Modify test to produce identical results on HBase. Change-Id: I7187f9aca63f61ea1686820b3cbec277240da191 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4866 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-11-17 11:19:01 -08:00
Nong Li	86aebc7f8f	IMPALA-1348: Fix NAAJ where the null partitions have streams with multiple blocks. Change-Id: I892f3435814bd4fcddeb496017dbb60704f13419 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4728 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-10-14 12:01:53 -07:00
Alex Behm	3e7de9f304	IMPALA-1318: Joins should not return semi-joined tuples. Change-Id: I93f5ddb8317af7794b5977e145805f9ff498d722 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4633 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-10-06 15:17:22 -07:00
Ippokratis Pandis	5c4486a2b2	Proper handling of NULL tuples by buffered-tuple-stream. Adding a bitstring at the head of each block in the TupleStream that indicates which tuples of the appended rows in the block are NULLs. When reading the stream, through GetNext() or GetTupleRow() calls, the NULL tuples are stitched back to their correct position. This fixes crashes in PHJ of bushy plans with NULLs on the build side(s) as well as similar crashes in PAGG and the analytic node. For example, it fixes IMPALA-1204, IMPALA-1223, and IMPALA-1249. Also, adds regression tests for IMPALA-1175, IMPALA-1204, IMPALA-1223, IMPALA-1249 and IMPALA-1306. Change-Id: I30ad0dbd4dfeabcda8fae444d1c6ec9291f38398 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4596 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins	2014-10-06 15:10:58 -07:00
Dimitris Tsirogiannis	5db0f877cb	Fix subqueries test for HBase Change-Id: I8d3c10d29a198135e87ab848ba206c2662166760 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4597 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-10-06 15:09:37 -07:00
Dimitris Tsirogiannis	b201c7a7d1	IMPALA-1299: Analytic should be allows in correlated EXISTS subquery With this commit we enable correlated and uncorrelated EXISTS subqueries with grouping and/or aggregation including analytic functions. Furthermore, we enable correlated EXISTS subqueries with a LIMIT clause. Change-Id: I36c33f80b152b7f175bf803cbe920ce1983d7162 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4583 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-10-06 15:09:25 -07:00
Dimitris Tsirogiannis	5046a47dc3	IMPALA-1297: Results of NOT IN may not be correct if subquery results in NULL This commit fixes a bug in the implementation of the null-aware anti join that resulted in wrong results being returned from NOT IN correlated subqueries in the presence of nulls. Change-Id: I6f2eb326ec7e40d80ec8da94ba33946b9ac9b115 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4506 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-09-26 12:02:47 -07:00
Dimitris Tsirogiannis	f21aed16fd	Bug fixes in null-aware anti-join This commit fixes issue IMPALA-1215 where NOT IN subqueries return wrong results in the presence of null values. Change-Id: I97e41c8df8ba864d0189595d670b3f0349fcad36 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4467 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-09-23 07:33:23 -07:00
Dimitris Tsirogiannis	3b5f1d3ab5	Rewrite NOT IN subqueries with a null-aware anti-join. This commit fixes the issue (IMPALA-1215) where NOT IN subqueries return wrong results in the presence of NULL values. The null-matching equality operator is introduced in the front-end and the NOT IN subqueries are rewritten using the null-aware anti-join operator. Change-Id: I5a323357025d77c2143db86e1057999ec8a371c0 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4391 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-09-20 16:13:49 -07:00
Dimitris Tsirogiannis	e1e874a77f	IMPALA-1212 Accept subquery as LHS or RHS of between operator This commit fixes the issue where an error was thrown if a subquery was used in either side of a between predicate. Between predicates with subqueries are replaced by their corresponding compound predicates during query rewrite. Change-Id: I4315a6e91c9306c6817bf6aa6bc1d0b586a1a067 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4246 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins	2014-09-13 00:17:36 -07:00
Dimitris Tsirogiannis	b670d98e40	IMPALA-1195: IllegalStateException in query with agg scalar subquery This commit fixes IMPALA-1195 in which an exception is thrown when a scalar subquery is in an IS NULL predicate. With this commit we also add support for scalar subqueries in functions and other exprs. Change-Id: Id995e77e6561a6450c4347706e4901fb3e236cfe Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4185 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-09-12 18:17:28 -07:00
Dimitris Tsirogiannis	2ab66c4ca2	Add support for uncorrelated EXISTS subqueries This commit adds support for uncorrelated EXISTS subqueries in Impala. Uncorrelated EXISTS subqueries are rewritten using a CROSS JOIN. Uncorrelated NOT EXISTS subqueries are not supported. Change-Id: I0003dcdc0fa5cc99931b9a9f4deddbcd42572490 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4140 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4186	2014-09-05 12:36:18 -07:00
Nong Li	8fbd5fe2c9	PHJ memory transfer fixes and misc bug fixes. Row batches contain auxiliary memory that can reside in tuple pools, io buffers and now tuple streams. Like the other resources, these need to attached to row batches and transfered up the operator tree to make sure the tuple ptrs are always valid. Fixed bug in BufferedTupleStream to not delete blocks on read if it is pinned. Fixed PHJ bug with row batch boundaries causing current_probe_row_ to be NULL. Change-Id: I4c66d9961a117bfe3ed577de6170e875ea1feee7 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3983 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4157	2014-09-03 20:12:24 -07:00
Dimitris Tsirogiannis	d9fa1a2e60	Fix issue where subqueries return wrong results in the presence of distinct This commit fixes two subquery issues: 1. During the rewrite of aggregate subqueries with count, a new select list is created for the outer select block to eliminate new visible tuples. However, the new select list was not initialized correctly, causing distinct clauses to not be preserved. 2. Pushing negation to operands during a query rewrite was causing a StackOverflowError when it was encountering predicates for which a negate function is not implemented. Consequently, it was using the negate function from the parent class causing it to recurse infinitely. Change-Id: I6f1b8090af40fa55b13661d637f9aaaa00dfcf5c Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4115 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4141	2014-09-03 12:25:59 -07:00
Dimitris Tsirogiannis	c2abcd6f3d	Query transformation of nested queries. This commit implements nested queries with [NOT] IN, [NOT] EXISTS and aggregate subquery predicates in Impala. The following cases are supported: 1. Correlated and uncorrelated [NOT] IN. 2. Correlated [NOT] EXISTS. 3. Correlated and uncorrelated aggregate subqueries. Change-Id: Ia3f4843c5f07d4e31ef3faedc58a15e623f91a5d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3754 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4109	2014-08-29 15:35:21 -07:00
Alex Behm	bceeb834f3	IMPALA-677: Fix visibility of semi and anti-joined table references. Semi or anti-joined table references are now only visible inside the On-clause of the corresponding join. Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-08-17 12:45:45 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Nong Li	b0de4bbe40	IMPALA-812: Fix select node to properly transfer memory ownership. Change-Id: I83b6d085362726aa080077845d3bef71b184621c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2076 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-25 18:38:55 -07:00
Henry Robinson	16af29ea5f	IMPALA-770: Fix crash in aggregation node with zero-width tuple The select exprs of an inline view may not always be materialised, yet the output tuple itself may be. This patch fixes a crash in this situation in the backend aggregation node which assumed its output tuple would always have at least one materialised slot. The cause was a couple of too-conservative DCHECKs that failed if the tuple was NULL. In fact, the code was robust to this possibility without the checks, so this bug didn't affect release builds of Impala. Change-Id: If0b90809d30fcd196f55197953392452d1ac9c4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1431 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 8c1c21b66c43e900760ace54d090305f32a85a1f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1471 Tested-by: Henry Robinson <henry@cloudera.com>	2014-02-05 22:01:35 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Alex Behm	937a44f9f8	IMPALA-68: Support Values() statement.	2014-01-08 10:50:31 -08:00
Alex Behm	21685d4f8f	Fixed a failed Preconditions check if a join predicate has constants.	2014-01-08 10:49:52 -08:00
Alex Behm	5db3f2cdf5	IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive.	2014-01-08 10:49:48 -08:00
Alex Behm	805fa50d6f	IMPALA-67: Constant SELECT clauses do not work in subqueries.	2014-01-08 10:49:48 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	837f35eab3	Updated results for more query tests to reflect proper ordering + improved result updating	2014-01-08 10:46:53 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Marcel Kornacker	2fda5d9b99	IMP-491 Fixes bug in Planner.createHashJoinFragment(), which didn't set the left child of the hj node to the output of the left child fragment. Also: row descriptor was set incorrectly (too wide; included tuples that weren't materialized) for roots of plan trees of non-root fragments if those fragments materialized an aggregate	2014-01-08 10:46:33 -08:00
Alan Choi	595edaa9d1	Disable all string to numeric and boolean implicit cast	2014-01-08 10:46:24 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

36 Commits