impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Files

Tianyi Wang b660bd652f IMPALA-4794: Grouping distinct agg plan robust to data skew

This patch changes the query plan for grouping distinct aggregations to
be more robust to data skew in the grouping expressions. The existing
plan partitions data between phase-1 and phase-2 by the grouping exprs.
Under this strategy the data skewness on the grouping exprs directly
impacts performance. The new plan partitions data by both the grouping
exprs and distinct agg exprs, then adds one more aggregation and
exchange node. The new plan is more robust to data skew but does more
work than the old plan.

Testing: Modified existing planner tests which already provide
sufficient coverage. The pattern is that the distinct agg exprs are
added to the first exchange node, followed by an additional merge agg
and exchange node.

Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
Reviewed-on: http://gerrit.cloudera.org:8080/7643
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins

2017-08-16 23:20:22 +00:00

functional-planner

IMPALA-4794: Grouping distinct agg plan robust to data skew

2017-08-16 23:20:22 +00:00

functional-query

IMPALA-4833: Compute precise per-host reservation size

2017-08-12 08:10:07 +00:00

hive-benchmark

Refactor testing framework to generate Avro tables.

2014-01-08 10:48:45 -08:00

perf-regression

IMPALA-3311: fix string data coming out of aggs in subplans

2016-05-12 23:06:36 -07:00

targeted-perf

IMPALA-3200: more buffer pool end-to-end tests

2017-08-07 00:57:46 +00:00

targeted-stress

IMPALA-4674: Part 2: port backend exec to BufferPool

2017-08-05 01:03:02 +00:00

tpcds

IMPALA-5376: Loads all TPC-DS tables

2017-05-27 05:19:53 +00:00

tpcds-insert

[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit

2014-08-08 02:20:40 -07:00

tpch

IMPALA-4674: Part 2: port backend exec to BufferPool

2017-08-05 01:03:02 +00:00

tpch_nested

Improve the SQL for nested TPCH-Q18.

2016-03-04 04:35:54 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload