impala

mirror of https://github.com/apache/impala.git synced 2026-01-10 09:00:16 -05:00

Files

Tim Armstrong 63f5e8ec00 IMPALA-1270: add distinct aggregation to semi joins

When generating plans with left semi/anti joins (typically
resulting from subquery rewrites), the planner now
considers inserting a distinct aggregation on the inner
side of the join. The decision is based on whether that
aggregation would reduce the number of rows by more than
75%. This is fairly conservative and the optimization
might be beneficial for smaller reductions, but the
conservative threshold is chosen to reduce the number
of potential plan regressions.

The aggregation can both reduce the # of rows and the
width of the rows, by projecting out unneeded slots.

ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option is
added to allow toggling the optimization.

Tests:
* Add positive and negative planner tests for various
  cases - including semi/anti joins, missing stats,
  broadcast/shuffle, different numbers of join predicates.
* Add some end-to-end tests to verify plans execute correctly.

Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Reviewed-on: http://gerrit.cloudera.org:8080/16180
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2020-07-15 17:10:50 +00:00

functional-planner

IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 17:10:50 +00:00

functional-query

IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 17:10:50 +00:00

perf-regression

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

targeted-perf

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

targeted-stress

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

tpcds

IMPALA-9917: grouping() and grouping_id() support

2020-07-14 03:13:18 +00:00

tpcds-insert

IMPALA-4356,IMPALA-7331: codegen all ScalarExprs

2019-05-15 22:34:28 +00:00

tpcds-unmodified

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

tpch

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

tpch_nested

IMPALA-9604: Add TPCH-nested tests for column masking

2020-06-17 06:54:50 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload