Files
impala/testdata/workloads/functional-query/queries/QueryTest
Tim Armstrong 643c800d62 IMPALA-2473: reduce scanner memory usage
This patch reduces memory usage of scanners by adjusting how batch
capacity is checked and handled and by freeing unneeded memory.

Change RowBatch::AtCapacity(MemPool) so that batches with no rows
cannot hold onto an unbounded amount of memory - instead they
will pass these batches up operator tree so that the resources
can be freed.

The Parquet scanner also only checked capacity every 1024 rows.
With large rows (e.g. nested collections), it can overrun the
intended 8mb limit. It also didn't include the MemPool usage
in its checks. After the change the scanner will produce smaller
batches if rows contain large nested collections or strings.
I benchmarked this with a scan of the nested TPC-H customers
tables. The row batch sized decrease from ~16MB to ~8MB. If the
nested collections were larger this would be more drastic.

Also pass at capacity up the tree if no rows passed the conjuncts in
the DataSourceScanNode and Parquet scanner so that resources can be
freed.

HdfsTableSink is modified to avoid the incorrect assumption that a batch
only has 0 rows at eos. It is also refactored to pass a related flag as
an argument to make the semantics clearer.

Two simple benchmarks (one column and many columns) shows no change
in scanner performance:
 > set num_scanner_threads=1;
 > select count(l_orderkey) from biglineitem;
 > select count(l_orderkey), count(l_partkey), count(l_suppkey),
   count(l_returnflag), count(l_quantity), count(l_linenumber),
   count(l_extendedprice), count(l_linestatus), count(l_shipdate),
   count(l_commitdate) from biglineitem;

Change-Id: I3b79671ffd3af50a2dc20c643b06cc353ba13503
Reviewed-on: http://gerrit.cloudera.org:8080/1239
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-11-19 22:57:05 +00:00
..
2014-05-16 22:26:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-24 02:14:27 -07:00
2015-03-11 16:39:39 -07:00
2014-01-08 10:48:09 -08:00
2014-06-20 13:35:10 -07:00