impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 00:02:28 -05:00

Files

Dan Hecht 84c4c2ce86 IMPALA-2480, IMPALA-2519: Don't force IO-buffer on probe side when spilling PHJ

This fixes a regression introduced with:
IMPALA-1621,2241,2271,2330,2352: Lazy switch to IO buffers to reduce
min mem needed for PAGG/PHJ

Prior to that change, as soon as any partition's stream overflowed
its small buffers, all partitions' streams would be switched
immediately to IO-buffers, which would be satisfied by the initial
buffer "reservation".

After that change, individual streams are switched to IO-buffers on
demand as they overflow their small buffers.  However, that change
also made it so that Partition::Spill() would eagerly switch that
partition's streams to IO-buffers, and fail the query if the buffer
is not available.  The buffer may not be available because the
reserved buffers may be in use by other partition's streams.

We don't need to fail the query if the switch to IO-buffers in
Partition::Spill() fails.  Instead, we should just let the streams
switch on demand as they fill up the small buffers.  When that
happens, if the IO buffer is not available, then we already have a
mechanism to pick partitions to spill until we can get the IO-buffer
(in the worst case it means working our way back down to the initial
reservation).  See AppendRowStreamFull() and BuildHashTables().

The symptom of this regression was that some queries would fail at a
lower memory limit than before.

Also revert the max_block_mgr_memory values back to their originals.

Additional testing: loop custom_cluster/spilling.py.  We should also
remeasure minimum memory required by queries after this change.

Change-Id: I11add15540606d42cd64f2af99f4e96140ae8bb5
Reviewed-on: http://gerrit.cloudera.org:8080/1228
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins

2015-10-12 14:41:08 -07:00

functional-planner

IMPALA-2495: make Expr::IsConstant() recurse on children

2015-10-09 16:47:46 -07:00

functional-query

IMPALA-2480, IMPALA-2519: Don't force IO-buffer on probe side when spilling PHJ

2015-10-12 14:41:08 -07:00

hive-benchmark

Refactor testing framework to generate Avro tables.

2014-01-08 10:48:45 -08:00

targeted-perf

Add targted-perf query that makes local expr allocations

2014-10-07 15:48:32 -07:00

targeted-stress

BufferedBlockMgr: bug fixes for stress.

2014-10-06 15:09:13 -07:00

tpcds

IMPALA-2364: Wrong DCHECK in PHJ::ProcessProbeBatch

2015-09-23 10:38:58 -07:00

tpcds-insert

[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit

2014-08-08 02:20:40 -07:00

tpch

IMPALA-2168: Do not try to access streams of repartitioned spilled partition in right-joins

2015-10-09 16:33:14 -07:00

tpch_nested

Fixes to Nested TPCH workload

2015-07-08 02:54:20 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload