Commit Graph

3 Commits

Author SHA1 Message Date
Tim Armstrong
161cbe30ff Revert IMPALA-4835 and dependent changes
Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit"

This reverts commit 25bcb258df.

Revert "IMPALA-6588: don't add empty list of ranges in text scan"

This reverts commit d57fbec6f6.

Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool"

This reverts commit 24b4ed0b29.

Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront"

This reverts commit 5699b59d0c.

Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation"

This reverts commit 65680dc421.

Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb
Reviewed-on: http://gerrit.cloudera.org:8080/9480
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-03 04:22:12 +00:00
Tim Armstrong
24b4ed0b29 IMPALA-4835: Part 3: switch I/O buffers to buffer pool
This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.

I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.

Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
  action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
  since the algorithm is non-trivial.

Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.

Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786
Reviewed-on: http://gerrit.cloudera.org:8080/8966
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-23 04:17:41 +00:00
Tim Armstrong
c4f903033c IMPALA-3200: more buffer pool end-to-end tests
This adds most of the end-to-end tests described in the test plan.
See http://goo.gl/v3Strz.

* End-to-end test for disk spill encryption.
* Admission control test for the case when acquiring initial
  reservation fails.
* Initial reservation acquire failure test
* scratch_limit tests for Join, Agg, Sort, Analytic
* Memory usage scaling tests for Join, Agg, Sort, Analytic

Also splits out the slow sort queries in test_spilling and moves them
to exhaustive so the individual tests run faster and have better
parallelism.

Testing:
Ran all the core tests. Will do a full exhaustive run before
committing.

Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f
Reviewed-on: http://gerrit.cloudera.org:8080/7552
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-07 00:57:46 +00:00