60 Commits

Author SHA1 Message Date
Dan Hecht
b318b82f06 Temporarily skip test_mem_usage_scaling.py on S3 until IMPALA-1863 is solved
Otherwise, the S3 job always hangs at this test and we loose coverage
of everything downstream.  I'm pretty sure IMPALA-1863 is not S3 related,
but we hit that bug on EC2/S3 for whatever reason.

Change-Id: I3f27413fdd53e57d11c08dbef1daac36a032f4a6
Reviewed-on: http://gerrit.cloudera.org:8080/210
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:40 -07:00
Ippokratis Pandis
a519020908 IMPALA-1836: PAGG::Partition::Close() may need to clean up hash tables w/o buckets
In the rare low memory case where during a hash table's initialization we can not consume
even 8KBs to create the array of buckets, Init() would set num_buckets==0 and return
false. But then, PAGG::Partition::Close() would try to clean up the hash table of the
partition iteratating its buckets, even though it didn't have any buckets. That would
result in a dcheck at HashTable::Begin().

This patch fixes the problem not dcheck'ing at HashTable::Begin(). That function would
call NextBucket() which would correctly return End() if there were no buckets.

Change-Id: I9c5984de79fb5ef8b7f31e082ac0d0bfbf242e77
Reviewed-on: http://gerrit.cloudera.org:8080/135
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-04 10:15:50 +00:00
Ippokratis Pandis
e89cccb7b4 Adding more mem_limit tests for the TPC-H queries
Adding Q1-Q9, Q18, Q20 and Q21.

Change-Id: If4e545e3f64316665691d53770bdf0ca9d5059ff
Reviewed-on: http://gerrit.cloudera.org:8080/85
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-01 22:36:27 +00:00
Ippokratis Pandis
d58aedff42 IMPALA-1820: Start with small pages for hash tables during repartitioning
The change of the PARTITION_FANOUT from 32 to 16 exposed a pathological case due to
the lack of coordination across concurrently executing spilling nodes of the same query.
In particular, when we repartition a partition we try to initialize hash tables for the
new partitions. But each hash table needs a block (for the nodes). In case there were not
any IO-sized blocks available, because they had been consumed by other nodes, we would get
into a loop trying to repartition those smaller partitions that couldn't initialize their
hash table. Additional repartitions that, among others, would need additional blocks for
the new streams. These partitions would end up being very small, still we would fail the
query when we were reaching the MAX_PARTITION_DEPTH limit, which was fixed to 4.

This patch fixes the problem by initializing the hash tables during repartitions with
small pages. That is, the hash tables always first use a 64KB and a 512KB block for their
nodes before switching to IO-sized blocks. This helps the partitioning algorithm to
finish when we end up with partitions that can fit in those small pages. The performance
may not be optimal, still the memory consumption is lower and the algorithm finishes. For
example, without this patch and with PARTITION_FANOUT == 16 in order to run TPC-H Q18 and
Q20 we needed 3.4GB and 3.1GB respectively. With this patch TPC-H Q18 needs ~1GB and Q20
975MB.

This patch also removes the restriction of stopping repartitioning when we are reaching
4 levels of repartitioning. Instead, whenever we repartition we compare the size of
the input partition to the size of the largest new partition. If there is no reduction
on the size we stop the algorithm. Otherwise, we keep on repartitioning. That should
help in cases of skew (e.g. due to bad hashing). There is a new MAX_PARTITION_DEPTH limit
of 16. It is very unlikely we will ever hit this limit.

Change-Id: Ib33fece10585448bc2d07bb39d0535d78b168ccc
Reviewed-on: http://gerrit.cloudera.org:8080/119
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-02-28 00:42:04 +00:00
Ippokratis Pandis
82aa2ccd1d IMPALA-1655: Probe filters were being attached even if partitions were spilled
PHJ can hit OOM while building a hash table. In that case it will stop building the HT
and try to spill a partition. Unfortunately, if PHJ had determined that it should build
probe filters (ie. the input is small enough), it would attach those incomplete probe
filters ignoring the fact that some tuples were not seen. Those incomplete probe filters
would be used by the scanners producing wrong results as more tuples would be filtered
out.

This patch solves the problem by setting that probe filters should not be used when a
hash table fails to build. It also adds memory scaling tests for TPC-H Q9 and Q21 that
expose this problem.

Change-Id: I58d97aa0fd3dce92f67482c301178877177df6dd
Reviewed-on: http://gerrit.cloudera.org:8080/99
Reviewed-by: Ippokratis Pandis <ippokratis@gmail.com>
Tested-by: Internal Jenkins
2015-02-24 20:40:26 +00:00
Ippokratis Pandis
8088afd56d IMPALA-1617: Bug fix to avoid crash in PHJ::BuildHashTables() on low memory
The partition spilling code of PHJ had a wrong DCHECK that would fail in cases of very low
memory, where we would have memory to allocate buffers but not the hash table. In
particular, the spill code would DCHECK if the partition was still using small buffers,
which is wrong as partitions with small buffers can still spill.

This patch also adds a test that makes sure that TPC-H Q4 (that contains a join and a
group by) either executes correctly or gracefully fails in a set of relatively small mem
limits. Currently when compute stats have been run to both lineitem and orders tables,
TPC-H Q4 needs a minimum of 150MB to run successfully.

Change-Id: I997718384832bd6bf469cd608e9f692b6d727c4f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5752
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2015-01-10 02:49:50 -08:00
Ippokratis Pandis
e764b29819 IMPALA-1584, IMPALA-1585: Bug fixes on Sorter to avoid crashes on low memory
This patch fixes a few problems in the Sorter code. In particular:

(a) The AddBatch() function was not checking the return of the templated AddBatch<>()
function, and it would proceed trying to sort a run even though errors had occured,
leading to crashes.
(b) The Sorter was trying to use the vector of var-len blocks even if that was empty,
because the initial allocation of blocks had failed.
(c) The constructor was calling a couple of functions of the block manager that could
fail, especially in low memory situations.

This patch adds error checks and moves the constructor's code that may fail to a separate
Init() function that must be called right after the Sorter object is constructed.

It also adds a new type of memory scaling test that makes sure that TPC-H Q1 either
executes correctly or gracefully fails in a set of relatively small mem limits.

Change-Id: Ia2846839472bde33fc37955474a0c3c574d1cce0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5526
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-12-23 19:43:05 -08:00
Skye Wanderman-Milne
09dbd5dd9f IMPALA-1397: free local expr allocations in scanner threads
Change-Id: If42ab1258a7750fa506089751e3bb3cbd3e99911
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4939
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit d6d6b33653080ebc29c41b7b041e9e467613c36f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4998
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-10-28 17:48:55 -07:00
Lenni Kuff
0ac0527643 Reduce test execution time by limiting long running tests to exhaustive exec strategy
I looked at the latest run from master and took the tests suites that had long
execution times. This cleans those test suites up to either completely disable them
on 'core' or add constraints to limit the number of test vectors. It shouldn't impact
nightly coverage since we still run the same tests exhaustively.

Change-Id: I10c78c35155b00de0c36d9fc0923b2b1fc6b44de
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3119
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3125
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-18 16:18:17 -07:00
Nong Li
bb3feb675e Dynamically scale down mem usage in scanners and io mgr.
This patch scales down the amount of buffering in the io mgr and the number
of scanner threads if the query is close to mem limits.

Change-Id: I68ef247a68642939b98ec7c429dfd393b23a20d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1906
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2417
2014-05-01 15:04:07 -07:00