HdfsScanNodeBase::Close() may add its outstanding DiskIO context to
RuntimeState::reader_contexts_ to be unregistered later when the
fragment is closed. In a plan fragment with multiple HDFS scan nodes,
it's possible for HdfsScanNodeBase::Close() to be called concurrently.
To allow safe concurrent accesses, this change adds a SpinLock to
synchronize accesses to 'reader_contexts_' in RuntimeState.
Change-Id: I911fda526a99514b12f88a3e9fb5952ea4fe1973
Reviewed-on: http://gerrit.cloudera.org:8080/4558
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
I wrote these tests for my IMPALA-3987 patch, but other issues block
that optimisations. These tests exercise an interesting corner case
so I split them out into a separate patch.
The functional tests exercise every join mode for nested loop join and
hash join with an empty build side. The perf test exercises hash join
with an empty build side.
Testing:
Made sure the tests passed with both partitioned and non-partitioned
hash join implementations. Ran the targeted perf query through the
single node perf run script to make sure it worked.
Change-Id: I0a68cafec32011a47c569b254979601237e7f2a5
Reviewed-on: http://gerrit.cloudera.org:8080/4051
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Previously, thread resource manager only supports a single callback
for each resource pool. The callback is invoked when a thread token
is available. This mostly works as scan node is the only consumer
and there is usually one scan node in a plan fragment. As shown in
IMPALA-3064 and IMPALA-561, it's possible to generate a plan fragment
with more than one scan nodes. In which case, one of the scan nodes
may be running with single thread and in debug builds, a DCHECK will
be hit.
This change fixes the problem by allowing more than one callbacks in
a given resource pool. The thread resource manager will go through all
the registered callbacks in round robin fashion.
This change also adds a missing thread token release call in
HdfsScanNode::ThreadTokenAvailableCb().
Change-Id: Iddfff1feef0b59d407994ad3bc560166acbfa623
Reviewed-on: http://gerrit.cloudera.org:8080/2430
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The NLJ node did not follow the expected protocol when need_to_return
is set on a row batch, which means that memory referenced by a rowbatch
can be freed or reused the next time GetNext() is called on the child.
This patch changes the NLJ node to follow the protocol by deep copying
all build side row batches when the need_to_return_ flag is set on the
row batches. This prevents the row batches from referencing memory that
may be freed or reused.
Reenable test that was disabled because of IMPALA-2332 since this was
the root cause.
Change-Id: Idcbb8df12c292b9e2b243e1cef5bdfc1366898d1
Reviewed-on: http://gerrit.cloudera.org:8080/810
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This caused failures on the non-partitioned agg/join tests and the ASAN
test. Remove the query that failed to unbreak the build. This query can
be readded when IMPALA-2207 is fixed (for the ASAN) test and the cause of
the other failure is diagnosed and fixed.
Change-Id: Idb1ca951e0de05aee3c1237392fff74ddd756ed7
In some cases in the NLJ node eos_ wasn't set even though the limit was
reached. This prevented the limit from being handled correctly before
returning rows to the caller of GetNext(). This could result in either
too many rows being returned, or a crash when the row batch size was
set to an invalid negative number.
The fix is to always check for whether the limit was reached before
returning from GetNext().
Change-Id: I660e774787870213ada9f2d3e6f10953d9937022
Reviewed-on: http://gerrit.cloudera.org:8080/797
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.
Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.
Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c
Reviewed-on: http://gerrit.cloudera.org:8080/652
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.
Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.
Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e
Reviewed-on: http://gerrit.cloudera.org:8080/457
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins