impala

mirror of https://github.com/apache/impala.git synced 2026-01-31 18:00:17 -05:00

Files

Vuk Ercegovac 21a96ed2e3 IMPALA-4985: use parquet stats of nested types for dynamic pruning

Currently, parquet row-groups can be pruned at run-time using
min/max stats when predicates (in, binary) are specified for
column scalar types. This patch extends pruning to nested types
for the same class of predicates. A nested value is an instance
of a nested type (struct, array, map). A nested value consists of
other nested and scalar values (as declared by its type).
Predicates that can be used for row-group pruning must be applied to
nested scalar values. In addition, the parent of the nested scalar
must also be required, that is, not empty. The latter requirement
is conservative: some filters that could be used for pruning are
not used for correctness reasons.

Testing:
- extended nested-types-parquet-stats e2e test cases.

Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Reviewed-on: http://gerrit.cloudera.org:8080/8480
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins

2017-11-22 22:00:16 +00:00

functional-planner

IMPALA-4985: use parquet stats of nested types for dynamic pruning

2017-11-22 22:00:16 +00:00

functional-query

IMPALA-4985: use parquet stats of nested types for dynamic pruning

2017-11-22 22:00:16 +00:00

hive-benchmark

Refactor testing framework to generate Avro tables.

2014-01-08 10:48:45 -08:00

perf-regression

IMPALA-3311: fix string data coming out of aggs in subplans

2016-05-12 23:06:36 -07:00

targeted-perf

IMPALA-4252: Min-max runtime filters for Kudu

2017-11-17 21:33:51 +00:00

targeted-stress

IMPALA-4674: Part 2: port backend exec to BufferPool

2017-08-05 01:03:02 +00:00

tpcds

IMPALA-6155: Allow tests to pass when ORDER BY does not cover the query.

2017-11-08 00:13:56 +00:00

tpcds-insert

[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit

2014-08-08 02:20:40 -07:00

tpch

IMPALA-5681: release reservation from blocking operators

2017-08-17 20:17:48 +00:00

tpch_nested

IMPALA-5617: Include full workload name in tpch_nested query filenames

2017-08-31 01:08:40 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload