IMPALA-9572: Fix DCHECK in nested Parquet scanning

The issue occurred when there were skipped pages and a column
inside a collection was scanned, but its position was not needed.
The repetition level still needs to be read in this case, as the
skipped ranges are set in top level rows, so collection items
need to know which top level row do they belong to.

A DCHECK in StrideWriter's constructor was hit, otherwise the
code ran correctly in release mode. The DCHECK is moved to
functions where the condition would actually cause problems.

Testing:
- added and ran a regression test

Change-Id: I5e8ef514ead71f732c73f910af7fd1aecd37bb81
Reviewed-on: http://gerrit.cloudera.org:8080/15598
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Csaba Ringhofer
2020-03-30 20:15:09 +02:00
committed by Impala Public Jenkins
parent 01691b998a
commit e8f604a213
3 changed files with 18 additions and 2 deletions

View File

@@ -702,3 +702,12 @@ BIGINT, DECIMAL
---- RUNTIME_PROFILE
aggregation(SUM, NumStatsFilteredPages): 6
====
---- QUERY
# Regression test for IMPALA-9572.
select count(l_partkey) from tpch_nested_parquet.customer.c_orders.o_lineitems
where l_partkey < 10
---- RESULTS
263
---- TYPES
BIGINT
====