mirror of
https://github.com/apache/impala.git
synced 2026-01-31 18:00:17 -05:00
Currently, parquet row-groups can be pruned at run-time using min/max stats when predicates (in, binary) are specified for column scalar types. This patch extends pruning to nested types for the same class of predicates. A nested value is an instance of a nested type (struct, array, map). A nested value consists of other nested and scalar values (as declared by its type). Predicates that can be used for row-group pruning must be applied to nested scalar values. In addition, the parent of the nested scalar must also be required, that is, not empty. The latter requirement is conservative: some filters that could be used for pruning are not used for correctness reasons. Testing: - extended nested-types-parquet-stats e2e test cases. Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe Reviewed-on: http://gerrit.cloudera.org:8080/8480 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins
This directory contains Impala test workloads. The directory layout for the workloads should follow: workloads/ <data set name>/<data set name>_dimensions.csv <- The test dimension file <data set name>/<data set name>_core.csv <- A test vector file <data set name>/<data set name>_pairwise.csv <data set name>/<data set name>_exhaustive.csv <data set name>/queries/<query test>.test <- The queries for this workload