IMPALA-2015: Add support for nested loop join

Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.

Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.

Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c
Reviewed-on: http://gerrit.cloudera.org:8080/652
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
This commit is contained in:
Skye Wanderman-Milne
2015-06-23 21:10:47 -07:00
committed by Internal Jenkins
parent 5350d49f8c
commit 7906ed44ac
36 changed files with 1738 additions and 1084 deletions

View File

@@ -47,6 +47,13 @@ class TestJoinQueries(ImpalaTestSuite):
new_vector.get_value('exec_option')['batch_size'] = vector.get_value('batch_size')
self.run_test_case('QueryTest/outer-joins', new_vector)
def test_single_node_nested_loop_joins(self, vector):
# Test the execution of nested-loops joins for join types that can only be
# executed in a single node (right [outer|semi|anti] and full outer joins).
new_vector = copy(vector)
new_vector.get_value('exec_option')['num_nodes'] = 1
self.run_test_case('QueryTest/single-node-nlj', new_vector)
class TestTPCHJoinQueries(ImpalaTestSuite):
# Uses the tpch dataset in order to have larger joins. Needed for example to test
# the repartitioning codepaths.