IMPALA-14619: Reset levels_readahead_ for late materialization

Previously, `BaseScalarColumnReader::levels_readahead_` was not reset
when the reader did not do page filtering. If a query selected the last
row containing a collection value in a row group, `levels_readahead_`
would be set and would not be reset when advancing to the next row
group without page filtering. As a result, trying to skip collection
values at the start of the next row group would cause a check failure.

This patch fixes the failure by resetting `levels_readahead_` in
`BaseScalarColumnReader::Reset()`, which is always called when advancing
to the next row group.

`levels_readahead_` is also moved out of the "Members used for page
filtering" section as the variable is also used in late materialization.

Testing:
- Added an E2E test for the fix.

Change-Id: Idac138ffe4e1a9260f9080a97a1090b467781d00
Reviewed-on: http://gerrit.cloudera.org:8080/23779
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Xuebin Su
2025-12-11 17:18:59 +08:00
committed by Impala Public Jenkins
parent 2ebdc05c1d
commit d54b75ccf1
4 changed files with 29 additions and 13 deletions

View File

@@ -40,3 +40,16 @@ select count(unnest(arr)) from nested_decimals n where d_38 = 1;
aggregation(SUM, NumPagesSkippedByLateMaterialization): 0
aggregation(SUM, NumTopLevelValuesSkipped): 17
====
---- QUERY
# Selects the last row in a row group and then skips the first row in the next
# row group.
select count(o_orderkey)
from customer_nested_multiblock_multipage t
left join t.c_orders
where cast(c_custkey as string) like '100';
---- RESULTS
20
---- RUNTIME_PROFILE
aggregation(SUM, NumPagesSkippedByLateMaterialization): 0
aggregation(SUM, NumTopLevelValuesSkipped): 299
====