Files
impala/testdata/workloads/functional-query/queries/QueryTest
Zoltan Borok-Nagy 26438d8e3e IMPALA-11414: Off-by-one error in Parquet late materialization
With PARQUET_LATE_MATERIALIZATION we can set the number of minimum
consecutive rows that if filtered out, we avoid materialization of rows
in other columns in parquet.

E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we
find at least 10 consecutive rows that don't pass the predicates we
avoid materializing the corresponding rows in the other columns.

But due to an off-by-one error we actually only needed
(PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if
we set PARQUET_LATE_MATERIALIZATION to one, then we need zero
consecutive filtered out elements which leads to a crash/DCHECK. The bug
is in the GetMicroBatches() algorithm when we produce the micro batches
based on the selected rows.

Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it
shouldn't be allowed.

Testing
 * e2e test with PARQUET_LATE_MATERIALIZATION=1
 * e2e test for checking SET PARQUET_LATE_MATERIALIZATION=N

Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa
Reviewed-on: http://gerrit.cloudera.org:8080/18700
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-11 19:25:02 +00:00
..
2021-07-06 18:35:30 +00:00
2022-02-11 07:01:58 +00:00
2021-02-15 22:25:41 +00:00