Commit Graph

5 Commits

Author SHA1 Message Date
Alex Behm
e00130e39a IMPALA-2368: Prevent double Reset() with nested subplans.
ExecNode::Reset() is not idempotent. It always requires a preceding
call to ExecNode::Open(). The bug was that with nested subplans the
following situation could lead to calling Reset() on the same exec
node multiple times resulting in a DCHECK or crash depending on node
being reset. The scenario is illustrated as follows.

Example Plan:

01:subplan
  05:nested-loop join
     02:singular row src
  04:subplan
     ...
     arbitrary exec node tree
     ...
  03:unnest
00:scan

Sequence of calls leading to double Reset():

1. Node 04 calls Reset() on its child(1)
2. Node 04 returns from GetNext() without calling Open() on its child(1)
3. Node 01 calls Reset() on its child(1)

The problem was that SubplanNode::Reset() of node 04 used to call Reset()
on its child(1) even though child(1) had not been opened yet.

The fix is to only call Reset() on child(1) if it has been opened.

Change-Id: I1d6f5f893c9b773c05dc6cbfe99de9f74e47a888
Reviewed-on: http://gerrit.cloudera.org:8080/916
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:28 -07:00
Ippokratis Pandis
4951f895e7 Nested Types: Reset() for partitioned hash join node
TODO: Need to modify Reset()'s functionality in case of NAAJs.

Change-Id: I7d0ea0dabd0b3404957e228bbaa51781c5fc34c0
Reviewed-on: http://gerrit.cloudera.org:8080/490
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-07-08 01:51:09 +00:00
Alex Behm
a274cfd787 Nested Types: Fix self-joining of collection table refs.
When referencing the same path in multiple CollectionTableRefs
(e.g., self-join on a nested collection), we used to register only a
single SlotDescriptor in the root tuple descriptor and share it among
those multiple CollectionTableRefs.
A collection-typed SlotDescriptor has a single item tuple descriptor,
set to the tuple descriptor of the corresponding CollectionTableRef.
Therefore, sharing a single collection-typed SlotDescriptor among
multiple CollectionTableRefs with the same path does not work
(the item tuple desc was arbitrarily set to the last CollectionTableRef's
tuple desc).

In order to maintain our assumed 1:1 relationship between a table ref
and a tuple descriptor, the siple fix for now is to give each
CollectionTableRef a new slot in the root tuple descriptor,
regardless of its path.

We could conceivably allow more intelligent sharing of tuple descriptors
for nested collections, but that change is too invasive for now.

Change-Id: I2135d026191f51d1daa741455a7e1b0f6905af1e
Reviewed-on: http://gerrit.cloudera.org:8080/495
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-07-01 06:56:28 +00:00
Ippokratis Pandis
f2c483802f Nested Types: Reset() for partitioned aggregation node
Change-Id: Ia5b4b9b3a7b8e9acb1b614c979cccca615fe2fbe
Reviewed-on: http://gerrit.cloudera.org:8080/480
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-07-01 00:43:55 +00:00
Alex Behm
569e86a60b Nested Types: Change ExecNode::Reset() to only clear state and not tuple data.
This patch changes the ExecNode::Reset() to:

Status ExecNode::Reset(RuntimeState* state);

The new Reset() should only clear the internal state of an exec node
in preparation for another Open()/GetNext()*. Reset() should not clear
memory backing rows returned by a node in GetNext() because those rows
could still be in flight.

Subplan Memory Management:
To ensure that the memory backing rows produced by the subplan tree of
a SubplanExecNode remains valid for the lifetime a row batch, we intend
to use our conventional transfer mechanism. That is, the ownership of
memory that is no longer used by an exec node is transferred to an output
row batch in GetNext() at a "convenient" point, typically at eos or when
the memory usage exceeds some threshold. Note that exec nodes may choose
not to transfer memory at eos to amortize the cost of memory allocation
over multiple Reset()/Open()/GetNext()* cycles.

To show the main ideas, this patch fixes transferring of tuple data ownership
in several places and implements Reset() for the following nodes:
- AnalyticEvalNode
- BlockingJoinNode
- CrossJoinNode
- SelectNode
- SortNode
- TopNNode
- UnionNode

To make the transfer of ownership work for SortNode a row batch can now also
own a list of BufferdBlockMgr::Block*.

Also included are basic query tests that are not meant to be exhaustive.
The tests are disabled for now because we cannot run them without several
other code changes. I have manually run the test queries on a branch
that has all necessary changes.

Change-Id: I3ac94b8dd7c7eb48f2e639ea297b447fbf443185
Reviewed-on: http://gerrit.cloudera.org:8080/454
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-06-23 07:43:22 +00:00