mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
This patch implements async IO in the ORC scanner. For each ORC stripe, we begin with iterating the column streams. If a column stream is possible for async IO, it will create ColumnRange, register ScannerContext::Stream for that ORC stream, and start the stream. We modify HdfsOrcScanner::ScanRangeInputStream::read to check whether there is a matching ColumnRange for the given offset and length. If so, the reading continue through HdfsOrcScanner::ColumnRange::read. We leverage existing async IO methods from HdfsParquetScanner class for initial memory allocations. We moved related methods such as DivideReservationBetweenColumns and ComputeIdealReservation up to HdfsColumnarScanner class. Planner calculates the memory reservation differently between async Parquet and async ORC. In async Parquet, the planner calculates the column memory reservation and relies on the backend to divide them as needed. In async ORC, the planner needs to split the column's memory reservation based on the estimated number of streams for that column type. For example, a string column with a 4MB memory estimate will need to split that estimate into four 1MB because it might use dictionary encoding with four streams (PRESENT, DATA, DICTIONARY_DATA, and LENGTH stream). This splitting is required because each async IO stream needs to start with an 8KB (min_buffer_size) initial memory reservation. To show the improvement from ORC async IO, we contrast the total time and geomean (in milliseconds) to run full TPC-DS 10 TB, 19 executors, with varying ORC_ASYNC_IO and DISABLE_DATA_CACHE options as follow: +----------------------+------------------+------------------+ | Total time | ORC_ASYNC_READ=0 | ORC_ASYNC_READ=1 | +----------------------+------------------+------------------+ | DISABLE_DATA_CACHE=0 | 3511075 | 3484736 | | DISABLE_DATA_CACHE=1 | 5243337 | 4370095 | +----------------------+------------------+------------------+ +----------------------+------------------+------------------+ | Geomean | ORC_ASYNC_READ=0 | ORC_ASYNC_READ=1 | +----------------------+------------------+------------------+ | DISABLE_DATA_CACHE=0 | 12786.58042 | 12454.80365 | | DISABLE_DATA_CACHE=1 | 23081.10888 | 16692.31512 | +----------------------+------------------+------------------+ Testing: - Pass core tests. - Pass core e2e tests with ORC_ASYNC_READ=1. Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Reviewed-on: http://gerrit.cloudera.org:8080/15370 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
64 lines
2.0 KiB
Plaintext
64 lines
2.0 KiB
Plaintext
====
|
|
---- QUERY
|
|
# Scan moderately large file - scanner should try to increase reservation and succeed.
|
|
select count(*)
|
|
from tpch.customer
|
|
---- TYPES
|
|
BIGINT
|
|
---- RESULTS
|
|
150000
|
|
---- RUNTIME_PROFILE
|
|
row_regex:.*InitialRangeIdealReservation.*Avg: 24.00 MB.*Number of samples: 1.*
|
|
row_regex:.*InitialRangeActualReservation.*Avg: 24.00 MB.*Number of samples: 1.*
|
|
====
|
|
---- QUERY
|
|
# Scan moderately large file - scanner should try to increase reservation and fail.
|
|
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
|
|
select count(*)
|
|
from tpch.customer
|
|
---- TYPES
|
|
BIGINT
|
|
---- RESULTS
|
|
150000
|
|
---- RUNTIME_PROFILE
|
|
row_regex:.*InitialRangeIdealReservation.*Avg: 24.00 MB.*Number of samples: 1.*
|
|
row_regex:.*InitialRangeActualReservation.*Avg: 8.00 MB.*Number of samples: 1.*
|
|
====
|
|
---- QUERY
|
|
# Scan large Parquet column - scanner should try to increase reservation and succeed.
|
|
select min(l_comment)
|
|
from tpch_parquet.lineitem
|
|
---- TYPES
|
|
STRING
|
|
---- RESULTS
|
|
' Tiresias '
|
|
---- RUNTIME_PROFILE
|
|
row_regex:.*InitialRangeIdealReservation.*Avg: 128.00 KB.*
|
|
row_regex:.*InitialRangeActualReservation.*Avg: 4.00 MB.*
|
|
row_regex:.*ColumnarScannerIdealReservation.*Avg: 24.00 MB.*
|
|
row_regex:.*ColumnarScannerActualReservation.*Avg: 24.00 MB.*
|
|
====
|
|
---- QUERY
|
|
# Scan moderately large file - scanner should try to increase reservation and fail.
|
|
set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
|
|
select min(l_comment)
|
|
from tpch_parquet.lineitem
|
|
---- TYPES
|
|
STRING
|
|
---- RESULTS
|
|
' Tiresias '
|
|
---- RUNTIME_PROFILE
|
|
row_regex:.*InitialRangeIdealReservation.*Avg: 128.00 KB.*
|
|
row_regex:.*InitialRangeActualReservation.*Avg: 4.00 MB.*
|
|
row_regex:.*ColumnarScannerIdealReservation.*Avg: 24.00 MB.*
|
|
row_regex:.*ColumnarScannerActualReservation.*Avg: 4.00 MB.*
|
|
====
|
|
---- QUERY
|
|
# IMPALA-8742: Use ScanRange::bytes_to_read() instead of len(), it has an effect
|
|
# on the calculated ideal reservation.
|
|
select * from tpch_parquet.lineitem
|
|
where l_orderkey < 10;
|
|
---- RUNTIME_PROFILE
|
|
row_regex:.*ColumnarScannerIdealReservation.*Avg: [34].\d+ MB.*
|
|
====
|