Commit Graph

9 Commits

Author SHA1 Message Date
Michael Ho
2a31fbdbfa IMPALA-4180: Synchronize accesses to RuntimeState::reader_contexts_
HdfsScanNodeBase::Close() may add its outstanding DiskIO context to
RuntimeState::reader_contexts_ to be unregistered later when the
fragment is closed. In a plan fragment with multiple HDFS scan nodes,
it's possible for HdfsScanNodeBase::Close() to be called concurrently.
To allow safe concurrent accesses, this change adds a SpinLock to
synchronize accesses to 'reader_contexts_' in RuntimeState.

Change-Id: I911fda526a99514b12f88a3e9fb5952ea4fe1973
Reviewed-on: http://gerrit.cloudera.org:8080/4558
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-09-30 01:21:05 +00:00
Tim Armstrong
f613dcd02d Add functional and targeted perf tests for joins with empty builds
I wrote these tests for my IMPALA-3987 patch, but other issues block
that optimisations.  These tests exercise an interesting corner case
so I split them out into a separate patch.

The functional tests exercise every join mode for nested loop join and
hash join with an empty build side. The perf test exercises hash join
with an empty build side.

Testing:
Made sure the tests passed with both partitioned and non-partitioned
hash join implementations. Ran the targeted perf query through the
single node perf run script to make sure it worked.

Change-Id: I0a68cafec32011a47c569b254979601237e7f2a5
Reviewed-on: http://gerrit.cloudera.org:8080/4051
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-08-19 06:04:18 +00:00
Michael Ho
13007f9634 IMPALA-561: Allow multiple callbacks in a thread resource pool.
Previously, thread resource manager only supports a single callback
for each resource pool. The callback is invoked when a thread token
is available. This mostly works as scan node is the only consumer
and there is usually one scan node in a plan fragment. As shown in
IMPALA-3064 and IMPALA-561, it's possible to generate a plan fragment
with more than one scan nodes. In which case, one of the scan nodes
may be running with single thread and in debug builds, a DCHECK will
be hit.

This change fixes the problem by allowing more than one callbacks in
a given resource pool. The thread resource manager will go through all
the registered callbacks in round robin fashion.

This change also adds a missing thread token release call in
HdfsScanNode::ThreadTokenAvailableCb().

Change-Id: Iddfff1feef0b59d407994ad3bc560166acbfa623
Reviewed-on: http://gerrit.cloudera.org:8080/2430
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-03-10 23:16:29 +00:00
Tim Armstrong
db7519df24 IMPALA-2207: memory corruption on build side of NLJ
The NLJ node did not follow the expected protocol when need_to_return
is set on a row batch, which means that memory referenced by a rowbatch
can be freed or reused the next time GetNext() is called on the child.

This patch changes the NLJ node to follow the protocol by deep copying
all build side row batches when the need_to_return_ flag is set on the
row batches. This prevents the row batches from referencing memory that
may be freed or reused.

Reenable test that was disabled because of IMPALA-2332 since this was
the root cause.

Change-Id: Idcbb8df12c292b9e2b243e1cef5bdfc1366898d1
Reviewed-on: http://gerrit.cloudera.org:8080/810
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Tim Armstrong
14bd361143 Temporarily remove nested loop join with limit test
This caused failures on the non-partitioned agg/join tests and the ASAN
test. Remove the query that failed to unbreak the build. This query can
be readded when IMPALA-2207 is fixed (for the ASAN) test and the cause of
the other failure is diagnosed and fixed.

Change-Id: Idb1ca951e0de05aee3c1237392fff74ddd756ed7
2015-09-13 08:29:25 -07:00
Tim Armstrong
25e7454bc9 IMPALA-2319: correctly enforce NLJ limit
In some cases in the NLJ node eos_ wasn't set even though the limit was
reached. This prevented the limit from being handled correctly before
returning rows to the caller of GetNext(). This could result in either
too many rows being returned, or a crash when the row batch size was
set to an invalid negative number.

The fix is to always check for whether the limit was reached before
returning from GetNext().

Change-Id: I660e774787870213ada9f2d3e6f10953d9937022
Reviewed-on: http://gerrit.cloudera.org:8080/797
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-11 22:45:39 +00:00
Skye Wanderman-Milne
7906ed44ac IMPALA-2015: Add support for nested loop join
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.

Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.

Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c
Reviewed-on: http://gerrit.cloudera.org:8080/652
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-08-19 08:40:14 +00:00
Dimitris Tsirogiannis
47c5ae405a Revert "IMPALA-2015: Add support for nested loop join"
This reverts commit 6837cdec7f6a7e1c7e8157e323f3ab68277689aa.

Change-Id: I2fd6424c553a701fcbfd425b4486af7280820b23
Reviewed-on: http://gerrit.cloudera.org:8080/636
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-13 02:20:07 +00:00
Skye Wanderman-Milne
f000758ca8 IMPALA-2015: Add support for nested loop join
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.

Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.

Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e
Reviewed-on: http://gerrit.cloudera.org:8080/457
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-08-07 02:47:32 +00:00