impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2026-01-28 09:03:52 -05:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Tim Armstrong	161cbe30ff	Revert IMPALA-4835 and dependent changes Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit" This reverts commit `25bcb258df`. Revert "IMPALA-6588: don't add empty list of ranges in text scan" This reverts commit `d57fbec6f6`. Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool" This reverts commit `24b4ed0b29`. Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront" This reverts commit `5699b59d0c`. Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation" This reverts commit `65680dc421`. Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb Reviewed-on: http://gerrit.cloudera.org:8080/9480 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-03 04:22:12 +00:00
Tim Armstrong	24b4ed0b29	IMPALA-4835: Part 3: switch I/O buffers to buffer pool This is the final patch to switch the Disk I/O manager to allocate all buffer from the buffer pool and to reserve the buffers required for a query upfront. * The planner reserves enough memory to run a single scanner per scan node. * The multi-threaded scan node must increase reservation before spinning up more threads. * The scanner implementations must be careful to stay within their assigned reservation. The row-oriented scanners were most straightforward, since they only have a single scan range active at a time. A single I/O buffer is sufficient to scan the whole file but more I/O buffers can improve I/O throughput. Parquet is more complex because it issues a scan range per column and the sizes of the columns on disk are not known during planning. To deal with this, the reservation in the frontend is based on a heuristic involving the file size and # columns. The Parquet scanner can then divvy up reservation to columns based on the size of column data on disk. I adjusted how the 'mem_limit' is divided between buffer pool and non buffer pool memory for low mem_limits to account for the increase in buffer pool memory. Testing: * Added more planner tests to cover reservation calcs for scan node. * Test scanners for all file formats with the reservation denial debug action, to test behaviour when the scanners hit reservation limits. * Updated memory and buffer pool limits for tests. * Added unit tests for dividing reservation between columns in parquet, since the algorithm is non-trivial. Perf: I ran TPC-H and targeted perf locally comparing with master. Both showed small improvements of a few percent and no regressions of note. Cluster perf tests showed no significant change. Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786 Reviewed-on: http://gerrit.cloudera.org:8080/8966 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-23 04:17:41 +00:00
Tim Armstrong	c4f903033c	IMPALA-3200: more buffer pool end-to-end tests This adds most of the end-to-end tests described in the test plan. See http://goo.gl/v3Strz. * End-to-end test for disk spill encryption. * Admission control test for the case when acquiring initial reservation fails. * Initial reservation acquire failure test * scratch_limit tests for Join, Agg, Sort, Analytic * Memory usage scaling tests for Join, Agg, Sort, Analytic Also splits out the slow sort queries in test_spilling and moves them to exhaustive so the individual tests run faster and have better parallelism. Testing: Ran all the core tests. Will do a full exhaustive run before committing. Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f Reviewed-on: http://gerrit.cloudera.org:8080/7552 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-07 00:57:46 +00:00

Tim Armstrong

161cbe30ff

Revert IMPALA-4835 and dependent changes

Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit"

This reverts commit 25bcb258df.

Revert "IMPALA-6588: don't add empty list of ranges in text scan"

This reverts commit d57fbec6f6.

Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool"

This reverts commit 24b4ed0b29.

Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront"

This reverts commit 5699b59d0c.

Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation"

This reverts commit 65680dc421.

Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb
Reviewed-on: http://gerrit.cloudera.org:8080/9480
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins

2018-03-03 04:22:12 +00:00

Tim Armstrong

24b4ed0b29

IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.

I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.

Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
  action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
  since the algorithm is non-trivial.

Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.

Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786
Reviewed-on: http://gerrit.cloudera.org:8080/8966
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins

2018-02-23 04:17:41 +00:00

Tim Armstrong

c4f903033c

IMPALA-3200: more buffer pool end-to-end tests

This adds most of the end-to-end tests described in the test plan.
See http://goo.gl/v3Strz.

* End-to-end test for disk spill encryption.
* Admission control test for the case when acquiring initial
  reservation fails.
* Initial reservation acquire failure test
* scratch_limit tests for Join, Agg, Sort, Analytic
* Memory usage scaling tests for Join, Agg, Sort, Analytic

Also splits out the slow sort queries in test_spilling and moves them
to exhaustive so the individual tests run faster and have better
parallelism.

Testing:
Ran all the core tests. Will do a full exhaustive run before
committing.

Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f
Reviewed-on: http://gerrit.cloudera.org:8080/7552
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins

2017-08-07 00:57:46 +00:00

3 Commits