Commit Graph

15 Commits

Author SHA1 Message Date
Ippokratis Pandis
d58aedff42 IMPALA-1820: Start with small pages for hash tables during repartitioning
The change of the PARTITION_FANOUT from 32 to 16 exposed a pathological case due to
the lack of coordination across concurrently executing spilling nodes of the same query.
In particular, when we repartition a partition we try to initialize hash tables for the
new partitions. But each hash table needs a block (for the nodes). In case there were not
any IO-sized blocks available, because they had been consumed by other nodes, we would get
into a loop trying to repartition those smaller partitions that couldn't initialize their
hash table. Additional repartitions that, among others, would need additional blocks for
the new streams. These partitions would end up being very small, still we would fail the
query when we were reaching the MAX_PARTITION_DEPTH limit, which was fixed to 4.

This patch fixes the problem by initializing the hash tables during repartitions with
small pages. That is, the hash tables always first use a 64KB and a 512KB block for their
nodes before switching to IO-sized blocks. This helps the partitioning algorithm to
finish when we end up with partitions that can fit in those small pages. The performance
may not be optimal, still the memory consumption is lower and the algorithm finishes. For
example, without this patch and with PARTITION_FANOUT == 16 in order to run TPC-H Q18 and
Q20 we needed 3.4GB and 3.1GB respectively. With this patch TPC-H Q18 needs ~1GB and Q20
975MB.

This patch also removes the restriction of stopping repartitioning when we are reaching
4 levels of repartitioning. Instead, whenever we repartition we compare the size of
the input partition to the size of the largest new partition. If there is no reduction
on the size we stop the algorithm. Otherwise, we keep on repartitioning. That should
help in cases of skew (e.g. due to bad hashing). There is a new MAX_PARTITION_DEPTH limit
of 16. It is very unlikely we will ever hit this limit.

Change-Id: Ib33fece10585448bc2d07bb39d0535d78b168ccc
Reviewed-on: http://gerrit.cloudera.org:8080/119
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-02-28 00:42:04 +00:00
Alex Behm
f696861c5c Throw error on unrecognized test sections.
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.

Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.

Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.

Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-12-02 18:08:09 -08:00
Ippokratis Pandis
87502f829c IMPALA-1471: Bug in spilling of PHJ that was affecting left anti and outer joins.
In cases where we had to spill the probe side of PHJs, we were not only appending
the probe row to the tuple stream to be spilled, but we were also getting into the
regular processing loop with the iterator set to End(). In the case of left anti
and left outer joins, the result was to incorrectly output this row, since it did not
have a match.

This bug had a small perf impact for all spilling joins because we were doing an
unnecessary loop for each probe row we had to spill.

This patch solves the problem by immediately going to the next probe row if the
current row is spilled. Additionally, it fixes a bug in the block mgr where there
was a code path we were not counting correctly the number of pinned buffers.
It also adds tpch-q21 in the set of queries to run in the spilling test.

Change-Id: I762f5c41fe468e4485a4b31dabe2e53f6b49ae24
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5313
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5334
2014-11-20 02:21:14 -08:00
ishaan
23964c19af [CDH5] Fix bad merge in in spilling.test
Change-Id: Ia6e30cf5916c737088d8cb969e0167b9d69a599e
2014-10-08 23:19:02 -07:00
Nong Li
de31fa8e21 Disable spilling tests that are too flaky.
Change-Id: I4ac877c3fa8297d873c67f219bb0c75f0001562d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4731
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-10-06 15:18:56 -07:00
Nong Li
e08ffde009 PA/PHJ: Increase fanout to 32 and fix interaction with small buffers.
Small buffers introduced an issue that is exacerbated by the large fanout. A stream can
only be appended to forever once it has grabbed the initial io sized buffer. With small
buffers, we don't grab that at the beginning anymore and, before this patch, it is
grabbed when the stream first needs it. This means when one stream needs it, another
stream could have already grabbed it (meaning this stream is pinned with multiple
buffers).

This patch has all the streams grab an IO buffer as soon as the first stream needs an
io buffer. This guarantees that all streams get 1 before any get 2.

Change-Id: I1be1219fc5f1fa3ceedd4d5e76ae056c8bb8ff3d
2014-10-06 15:16:16 -07:00
Nong Li
3e632ef6ad Reduce min PA/PHJ mem requirement.
Update PA/PHJ to use small (< io sized buffers) initially. Without this we would
not be able to run at the QPS that we need just due to the buffering requirements
of these operators.

Change-Id: Ic8a777d147893567c9590fbab17f561eadb6ee19
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4623
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-10-06 15:14:10 -07:00
ishaan
010cc22a2f [CDH5] Fix test spilling.
tpch in cdh5 does not have double columns. Also, remove round calls to test that we get
consistent results.

Change-Id: Ia45ef08644ed78b05a08c47422733ab38a26b508
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4595
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-09-26 22:57:02 -07:00
Nong Li
d5c948c351 Increase the mem limit for one of the spilling queries.
Change-Id: I9b52582b2ded82821ecc446762f07d7702dedabf
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4555
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-09-26 12:27:29 -07:00
Nong Li
f03b05ed50 Fix hash table buckets to allocate memory from the BlockMgr.
This was always a TODO. We want memory to come from the block mgr and trigger spilling.

Change-Id: I07f1f79fbbb33068fb2df64510a80a9b008ef73d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4466
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-09-26 12:26:09 -07:00
Matthew Jacobs
da5198e615 Add spilling test for an analytic fn
Change-Id: Ia93c71c9c2a01f7f04a81593d51f5ca565286b7d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4447
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-09-23 07:26:09 -07:00
Nong Li
8a661d0787 [CDH5] cherry pick conflicts.
Change-Id: Ic11237b7ead4a810b523d6b6095781efbc5bb66b
2014-09-20 19:41:42 -07:00
Nong Li
6b73eec02d PHJ: Fix block management when spilling.
The previous code did not handle well the case where the spilling happens when
building the hash table (i.e. partitioning the build rows fit). This caused the
probe partition to be starved causing queries that should be able to run to fail
with a not enough buffers error.

Change-Id: I3a9a84e8800a72ed3ce6f5ab7ff03bc2d6eb7ad8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4403
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-09-20 16:12:21 -07:00
Skye Wanderman-Milne
2a449651da Use CRC hash for 0th partition level.
Change-Id: Ie845e0edb684f13421eea41327b1571b368db21a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4370
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-09-20 16:11:40 -07:00
ishaan
c4b4e010ff Buffered Tuple Stream fixes.
This patch fixes two issues:
  - Add API to buffered block mgr to allow an atomic Unpin and GetNewBlock. This has
    the semantics of unpinning a block and giving the buffer to the new block. This
    is necessary for the tuple stream to make sure another thread does not grab the
    unpinned block in between.
  - Buffer management reading an unpinned stream. Before moving onto a new block (and
    unpinning the current), we need to make sure all the tuples returned from the
    current block are returned up the operator tree.

Change-Id: I95ee58d1019dd971f6a7dc19ecafdfa54cdbf942
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4333
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-09-20 16:05:11 -07:00