Commit Graph

21 Commits

Author SHA1 Message Date
Alex Behm
9a9886ee37 IMPALA-2950: Fully resolve exprs before wrapping with TupleIsNullPredicates.
The bug: In SingleNodePlanner.createInlineViewPlan() we need to wrap some
exprs with TupleIsNullPredicates to preserve correctness if the inline view
is outer joined. The bug was that we used to perform this wrapping on
the rhs of the inline view's smap, and not the final output smap after
those rhs exprs have been resolved against the physical output of the
inline view's plan root. As a result, the TupleIsNullWrapping did not
work correctly for deeply nested inline views with exprs that require
wrapping at various nesting levels.

The fix: Resolve the exprs against the physical output of the inline view's
plan root before performing the TupleIsNullPredicate wrapping.

Change-Id: I183bba6a36bf5e19a88687ed8c82977ae769ddf4
Reviewed-on: http://gerrit.cloudera.org:8080/2092
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-02-10 07:16:58 +00:00
Matthew Jacobs
851056489d IMPALA-2440: Fix old HJ full outer join with no rows
When a full outer join on the old (non-partitioned)
HashJoinNode, if any join fragment has 0 build rows and 0
probe rows an extra null row will be produced.

Change-Id: I75373edc4f6b3b0c23afba3c1fa363c613f23507
Reviewed-on: http://gerrit.cloudera.org:8080/1068
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:47 -07:00
Ippokratis Pandis
c3a7916812 IMPALA-2065: Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()
If the build side of any partition of PHJ was very large we could end up
trying to Init() hash tables that are larger than 1GB. The result was
overflows (see IMPALA-1619) and eventually DCHECKS.
This patch returns false whenever we try to allocate memory in the
BufferedBlockMgr that it is larger than 1GB.

Change-Id: Id4590ea434bef4dca7dc3f137cfe7b638ae3d916
Reviewed-on: http://gerrit.cloudera.org:8080/465
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-06-27 01:17:50 +00:00
Dimitris Tsirogiannis
2c1f0a4942 IMPALA-1987: Fix TupleIsNullPredicate to return false if no tuples are
nullable.

This commit fixes the issue where an outer join returns wrong results if
the equi-join predicate contains a TupleIssNullPredicate expr.

Change-Id: I71f05479a442544d578c0d173e2a8412d7bbb3c4
Reviewed-on: http://gerrit.cloudera.org:8080/445
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-06-11 03:37:18 +00:00
Ippokratis Pandis
e36c436fa6 Adding tests with right joins and duplicates
Those tests were added as part of the new hash table implementation, as we didn't have
tests with right joins and duplicates (and other conjuncts) as well as aggregation
distinct queries with group bys on multiple columns. Adding them as a separate
patch, To improve testing coverage in the 2.2 release branch.

Change-Id: Id1b4f27fa6e587b2031635974ac9d2d39a1b015a
Reviewed-on: http://gerrit.cloudera.org:8080/193
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:40 -07:00
Dimitris Tsirogiannis
4b748ef5da IMPALA-1371: Predicate applied incorrectly when FULL OUTER JOIN is
present

This commit fixes the issue where predicates are not applied to the
correct query tree nodes when the query contains full outer joins. To
address this issue, we register information about the tuple ids that
are outer joined by full outer joins and use that information to guide
the assignment of predicates.

Change-Id: I854c05c159d86c0aaabfc12b7dd5c5982c5ece4b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5284
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
2014-11-23 21:36:31 -08:00
Ippokratis Pandis
5c4486a2b2 Proper handling of NULL tuples by buffered-tuple-stream.
Adding a bitstring at the head of each block in the TupleStream that indicates which
tuples of the appended rows in the block are NULLs. When reading the stream, through
GetNext() or GetTupleRow() calls, the NULL tuples are stitched back to their correct
position.

This fixes crashes in PHJ of bushy plans with NULLs on the build side(s) as well as
similar crashes in PAGG and the analytic node.
For example, it fixes IMPALA-1204, IMPALA-1223, and IMPALA-1249.
Also, adds regression tests for IMPALA-1175, IMPALA-1204, IMPALA-1223, IMPALA-1249
and IMPALA-1306.

Change-Id: I30ad0dbd4dfeabcda8fae444d1c6ec9291f38398
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4596
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-10-06 15:10:58 -07:00
Ippokratis Pandis
946aa3089b Adding support in PHJ for right-{semi,anti} joins.
Changes needed for PHJ to support RIGHT {SEMI, ANTI} JOINs. Codegen works as well.
Basic parser tests and minimal (end-to-end) query tests.
Need to add analyzer tests and add more query tests.
Note that in the case of right-{semi,anti} and perhaps also on {right,full}-outer we
should not be broadcasting the build side.

Change-Id: I6854ee9e4640f809f0350229bcc00811fa474f07
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4288
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4369
2014-09-16 19:42:24 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Alex Behm
a85dacafe8 IMPALA-904: Make TupleIsNullPredicate work on non-nullable tuples.
We wrap certain exprs substituted from outer-joined inline view in an expr that
evaluates to NULL if the underling tuple(s) are NULL. We do this for exprs that evaluate
to non-NULL values if their slots are NULL, i.e., we must then distinguish tuples that are
NULL from slots that are NULL (otherwise evaluating an expr against a tuple that is NULL
due to the outer join may incorrectly return a non-NULL value.)

The bug: Exprs referring to an outer-joined inline view may appear in various places
in the outer query block. For example, they could appear in an On-clause or be
placed into scans/aggregates due to predicate propagation. In such cases, the underlying
tuples may not be nullable yet because they only become nullable after the outer join.
We had a DCHECK in tuple-is-null-predicate.cc requiring the tuples to be nullable.
The fix: Remove the DCHECK. The fix is not elegant but practical. It would be rather
difficult to fix the inline view expr substitution such that a TupleIsNullPredicate
never references a non-nullable tuple, esp. due to predicate propagation.

Change-Id: I180f75f14173f356abfeec751e6b2d419378a9a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2157
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-07 14:18:49 -07:00
Nong Li
1a55133f0a IMPALA-735. Fix codegen bug affecting outer joins.
Change-Id: I99ca45b558fb2ed694f261a22e7e91e59f1ad675
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1496
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-02-10 05:00:21 -08:00
Nong Li
e3fdef7839 Fix subexpr elimination IR rewriting.
Change-Id: Iabdcc1686951e71136a603ed30f9d16fb1c1ec46
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1056
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
Alex Behm
9754f5bf52 IMPALA-504: Right and full outer joins do not return row with NULL value for rhs table.
Change-Id: Ia3f8d474fb30189b36fb587b2920d7b9b224ea71
Reviewed-on: http://gerrit.ent.cloudera.com:8080/129
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:03 -08:00
Alex Behm
861ba05989 IMPALA-197: Outer join on constant expressions returns incorrect results. 2014-01-08 10:50:09 -08:00
Alex Behm
abafcf81ff IMPALA-287: Full outer join is missing results. 2014-01-08 10:49:54 -08:00
Alex Behm
dbe3127383 IMPALA-285: Multiple outer joins with nesting crash impalad 2014-01-08 10:49:53 -08:00
Marcel Kornacker
7bf87a4b54 fix for IMPALA-90/IMPALA-221 2014-01-08 10:49:50 -08:00
Alan Choi
57c2f828e0 IMP-791 Fix full outer join hang
In full or right outer join, the hash-join-node does not release
the io buffer when calling get next, causing deadlock.
2014-01-08 10:48:58 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00