In cases where we had to spill the probe side of PHJs, we were not only appending
the probe row to the tuple stream to be spilled, but we were also getting into the
regular processing loop with the iterator set to End(). In the case of left anti
and left outer joins, the result was to incorrectly output this row, since it did not
have a match.
This bug had a small perf impact for all spilling joins because we were doing an
unnecessary loop for each probe row we had to spill.
This patch solves the problem by immediately going to the next probe row if the
current row is spilled. Additionally, it fixes a bug in the block mgr where there
was a code path we were not counting correctly the number of pinned buffers.
It also adds tpch-q21 in the set of queries to run in the spilling test.
Change-Id: I762f5c41fe468e4485a4b31dabe2e53f6b49ae24
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5313
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5334
Small buffers introduced an issue that is exacerbated by the large fanout. A stream can
only be appended to forever once it has grabbed the initial io sized buffer. With small
buffers, we don't grab that at the beginning anymore and, before this patch, it is
grabbed when the stream first needs it. This means when one stream needs it, another
stream could have already grabbed it (meaning this stream is pinned with multiple
buffers).
This patch has all the streams grab an IO buffer as soon as the first stream needs an
io buffer. This guarantees that all streams get 1 before any get 2.
Change-Id: I1be1219fc5f1fa3ceedd4d5e76ae056c8bb8ff3d
Update PA/PHJ to use small (< io sized buffers) initially. Without this we would
not be able to run at the QPS that we need just due to the buffering requirements
of these operators.
Change-Id: Ic8a777d147893567c9590fbab17f561eadb6ee19
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4623
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
This was always a TODO. We want memory to come from the block mgr and trigger spilling.
Change-Id: I07f1f79fbbb33068fb2df64510a80a9b008ef73d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4466
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
The previous code did not handle well the case where the spilling happens when
building the hash table (i.e. partitioning the build rows fit). This caused the
probe partition to be starved causing queries that should be able to run to fail
with a not enough buffers error.
Change-Id: I3a9a84e8800a72ed3ce6f5ab7ff03bc2d6eb7ad8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4403
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
This patch fixes two issues:
- Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has
the semantics of unpinning a block and giving the buffer to the new block. This
is necessary for the tuple stream to make sure another thread does not grab the
unpinned block in between.
- Buffer management reading an unpinned stream. Before moving onto a new block (and
unpinning the current), we need to make sure all the tuples returned from the
current block are returned up the operator tree.
Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>