IMPALA-3065/IMPALA-3062: Restrict !empty() predicates to scan nodes.

The bug:
Evaluating !empty() predicates at non-scan nodes interacts
poorly with our BE projection of collection slots. For example,
rows could incorrectly be filtered if a !empty() predicate is
assigned to a plan node that comes after the unnest of the
collection that also performs the projection.

The fix:
This patch reworks the generation of !empty() predicates
introduced in IMPALA-2663 for correctness purposes.
The predicates are generated in cases where we can ensure that
they will be assigned only by the parent scan, and no other
plan node.

The conditions are as follows:
- collection table ref is relative and non-correlated
- collection table ref represents the rhs of an inner/cross/semi join
- collection table ref's parent tuple is not outer joined

Change-Id: Ie975ce139a103285c4e9f93c59ce1f1d2aa71767
Reviewed-on: http://gerrit.cloudera.org:8080/2399
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Internal Jenkins
This commit is contained in:
Alex Behm
2016-02-23 23:54:16 -08:00
committed by Harrison Sheinblatt
parent 6cdcdb12ff
commit 54a46e9459
6 changed files with 197 additions and 88 deletions

View File

@@ -421,3 +421,32 @@ select id, pos from complextypestbl t1 full outer join t1.int_array t2
---- TYPES
bigint,bigint
====
---- QUERY
# IMPALA-3065/IMPALA-3062: Test a join on a nested collection whose
# parent tuple is outer joined. This test covers the case where the
# outer joined collection is on the probe side of the outer join.
# To reliably reproduce one of the problematic cases, we need
# > batch_size matches for at least one probe row.
select straight_join count(o.pos) from tpch_nested_parquet.customer c1
right outer join tpch_nested_parquet.customer c2
on c1.c_custkey % 2 = c2.c_custkey % 2
inner join c1.c_orders o
where c1.c_custkey < 10 and c2.c_custkey < 10000
---- RESULTS
329960
---- TYPES
bigint
====
---- QUERY
# IMPALA-3065/IMPALA-3062: Test a join on a nested collection whose
# parent tuple is outer joined. This test covers the case where the
# outer joined collection is on the build side of the outer join.
select count(a.pos) from complextypestbl t1
full outer join complextypestbl t2
on t1.id = t2.id
inner join t2.int_array a
---- RESULTS
10
---- TYPES
bigint
====