mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-14619: Reset levels_readahead_ for late materialization
Previously, `BaseScalarColumnReader::levels_readahead_` was not reset when the reader did not do page filtering. If a query selected the last row containing a collection value in a row group, `levels_readahead_` would be set and would not be reset when advancing to the next row group without page filtering. As a result, trying to skip collection values at the start of the next row group would cause a check failure. This patch fixes the failure by resetting `levels_readahead_` in `BaseScalarColumnReader::Reset()`, which is always called when advancing to the next row group. `levels_readahead_` is also moved out of the "Members used for page filtering" section as the variable is also used in late materialization. Testing: - Added an E2E test for the fix. Change-Id: Idac138ffe4e1a9260f9080a97a1090b467781d00 Reviewed-on: http://gerrit.cloudera.org:8080/23779 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
committed by
Impala Public Jenkins
parent
2ebdc05c1d
commit
d54b75ccf1
@@ -1067,6 +1067,7 @@ Status BaseScalarColumnReader::Reset(const HdfsFileDesc& file_desc,
|
||||
pos_current_value_ = ParquetLevel::INVALID_POS;
|
||||
row_group_first_row_ = row_group_first_row;
|
||||
current_row_ = -1;
|
||||
levels_readahead_ = false;
|
||||
|
||||
vector<ScanRange::SubRange> sub_ranges;
|
||||
CreateSubRanges(&sub_ranges);
|
||||
|
||||
@@ -452,6 +452,19 @@ class BaseScalarColumnReader : public ParquetColumnReader {
|
||||
/// processed the first (zeroeth) row.
|
||||
int64_t current_row_ = -1;
|
||||
|
||||
/// This flag is needed for the proper tracking of the last processed row.
|
||||
/// The batched and non-batched interfaces behave differently. E.g. when using the
|
||||
/// batched interface you don't need to invoke NextLevels() in advance, while you need
|
||||
/// to do that for the non-batched interface. In fact, the batched interface doesn't
|
||||
/// call NextLevels() at all. It directly reads the levels then the corresponding value
|
||||
/// in a loop. On the other hand, the non-batched interface (ReadValue()) expects that
|
||||
/// the levels for the next value are already read via NextLevels(). And after reading
|
||||
/// the value it calls NextLevels() to read the levels of the next value. Hence, the
|
||||
/// levels are always read ahead in this case.
|
||||
/// Returns true, if we read ahead def and rep levels. In this case 'current_row_'
|
||||
/// points to the row we'll process next, not to the row we already processed.
|
||||
bool levels_readahead_ = false;
|
||||
|
||||
/////////////////////////////////////////
|
||||
/// BEGIN: Members used for page filtering
|
||||
/// They are not set when we don't filter out pages at all.
|
||||
@@ -475,19 +488,6 @@ class BaseScalarColumnReader : public ParquetColumnReader {
|
||||
/// rows and increment this field.
|
||||
int current_row_range_ = 0;
|
||||
|
||||
/// This flag is needed for the proper tracking of the last processed row.
|
||||
/// The batched and non-batched interfaces behave differently. E.g. when using the
|
||||
/// batched interface you don't need to invoke NextLevels() in advance, while you need
|
||||
/// to do that for the non-batched interface. In fact, the batched interface doesn't
|
||||
/// call NextLevels() at all. It directly reads the levels then the corresponding value
|
||||
/// in a loop. On the other hand, the non-batched interface (ReadValue()) expects that
|
||||
/// the levels for the next value are already read via NextLevels(). And after reading
|
||||
/// the value it calls NextLevels() to read the levels of the next value. Hence, the
|
||||
/// levels are always read ahead in this case.
|
||||
/// Returns true, if we read ahead def and rep levels. In this case 'current_row_'
|
||||
/// points to the row we'll process next, not to the row we already processed.
|
||||
bool levels_readahead_ = false;
|
||||
|
||||
/// END: Members used for page filtering
|
||||
/////////////////////////////////////////
|
||||
|
||||
|
||||
Reference in New Issue
Block a user