Files
impala/testdata/workloads/functional-query/queries/QueryTest/scanner-reservation.test
Csaba Ringhofer 97dda2b27d IMPALA-6636: Use async IO in ORC scanner
This patch implements async IO in the ORC scanner. For each ORC stripe,
we begin with iterating the column streams. If a column stream is
possible for async IO, it will create ColumnRange, register
ScannerContext::Stream for that ORC stream, and start the stream. We
modify HdfsOrcScanner::ScanRangeInputStream::read to check whether there
is a matching ColumnRange for the given offset and length. If so, the
reading continue through HdfsOrcScanner::ColumnRange::read.

We leverage existing async IO methods from HdfsParquetScanner class for
initial memory allocations. We moved related methods such as
DivideReservationBetweenColumns and ComputeIdealReservation up to
HdfsColumnarScanner class.

Planner calculates the memory reservation differently between async
Parquet and async ORC. In async Parquet, the planner calculates the
column memory reservation and relies on the backend to divide them as
needed. In async ORC, the planner needs to split the column's memory
reservation based on the estimated number of streams for that column
type. For example, a string column with a 4MB memory estimate will need
to split that estimate into four 1MB because it might use dictionary
encoding with four streams (PRESENT, DATA, DICTIONARY_DATA, and LENGTH
stream). This splitting is required because each async IO stream needs
to start with an 8KB (min_buffer_size) initial memory reservation.

To show the improvement from ORC async IO, we contrast the total time
and geomean (in milliseconds) to run full TPC-DS 10 TB, 19 executors,
with varying ORC_ASYNC_IO and DISABLE_DATA_CACHE options as follow:

+----------------------+------------------+------------------+
| Total time           | ORC_ASYNC_READ=0 | ORC_ASYNC_READ=1 |
+----------------------+------------------+------------------+
| DISABLE_DATA_CACHE=0 |          3511075 |          3484736 |
| DISABLE_DATA_CACHE=1 |          5243337 |          4370095 |
+----------------------+------------------+------------------+

+----------------------+------------------+------------------+
| Geomean              | ORC_ASYNC_READ=0 | ORC_ASYNC_READ=1 |
+----------------------+------------------+------------------+
| DISABLE_DATA_CACHE=0 |      12786.58042 |      12454.80365 |
| DISABLE_DATA_CACHE=1 |      23081.10888 |      16692.31512 |
+----------------------+------------------+------------------+

Testing:
- Pass core tests.
- Pass core e2e tests with ORC_ASYNC_READ=1.

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Reviewed-on: http://gerrit.cloudera.org:8080/15370
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-02-09 21:20:23 +00:00

64 lines
2.0 KiB
Plaintext

====
---- QUERY
# Scan moderately large file - scanner should try to increase reservation and succeed.
select count(*)
from tpch.customer
---- TYPES
BIGINT
---- RESULTS
150000
---- RUNTIME_PROFILE
row_regex:.*InitialRangeIdealReservation.*Avg: 24.00 MB.*Number of samples: 1.*
row_regex:.*InitialRangeActualReservation.*Avg: 24.00 MB.*Number of samples: 1.*
====
---- QUERY
# Scan moderately large file - scanner should try to increase reservation and fail.
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
select count(*)
from tpch.customer
---- TYPES
BIGINT
---- RESULTS
150000
---- RUNTIME_PROFILE
row_regex:.*InitialRangeIdealReservation.*Avg: 24.00 MB.*Number of samples: 1.*
row_regex:.*InitialRangeActualReservation.*Avg: 8.00 MB.*Number of samples: 1.*
====
---- QUERY
# Scan large Parquet column - scanner should try to increase reservation and succeed.
select min(l_comment)
from tpch_parquet.lineitem
---- TYPES
STRING
---- RESULTS
' Tiresias '
---- RUNTIME_PROFILE
row_regex:.*InitialRangeIdealReservation.*Avg: 128.00 KB.*
row_regex:.*InitialRangeActualReservation.*Avg: 4.00 MB.*
row_regex:.*ColumnarScannerIdealReservation.*Avg: 24.00 MB.*
row_regex:.*ColumnarScannerActualReservation.*Avg: 24.00 MB.*
====
---- QUERY
# Scan moderately large file - scanner should try to increase reservation and fail.
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
select min(l_comment)
from tpch_parquet.lineitem
---- TYPES
STRING
---- RESULTS
' Tiresias '
---- RUNTIME_PROFILE
row_regex:.*InitialRangeIdealReservation.*Avg: 128.00 KB.*
row_regex:.*InitialRangeActualReservation.*Avg: 4.00 MB.*
row_regex:.*ColumnarScannerIdealReservation.*Avg: 24.00 MB.*
row_regex:.*ColumnarScannerActualReservation.*Avg: 4.00 MB.*
====
---- QUERY
# IMPALA-8742: Use ScanRange::bytes_to_read() instead of len(), it has an effect
# on the calculated ideal reservation.
select * from tpch_parquet.lineitem
where l_orderkey < 10;
---- RUNTIME_PROFILE
row_regex:.*ColumnarScannerIdealReservation.*Avg: [34].\d+ MB.*
====