impala

mirror of https://github.com/apache/impala.git synced 2025-12-26 14:02:53 -05:00

Files

Tim Armstrong 94f7d12f87 IMPALA-7604: part 2: fixes for AggregationNode cardinality

* Use saturating arithmetic in Expr.getNumDistinctValues() to
  avoid overflows.
* Avoid double-adding with checkedAdd()
* Fix incorrect logic with multiple groups - each group cannot
  return more than the input rows, but with multiple groups
  it can add up to more than the input rows.

Testing:
Updated planner tests from part 1 to reflect bugfixes.

Added targeted cardinality tests to verify behaviour
with and without stats.

Updated other planner tests that changed as a result of
this fixed.

Ran exhaustive tests.

Change-Id: Ieed41d60c0e0dfeca64035e919cb8c28a054a9ab
Reviewed-on: http://gerrit.cloudera.org:8080/14132
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2019-08-28 22:52:29 +00:00

functional-planner

IMPALA-7604: part 2: fixes for AggregationNode cardinality

2019-08-28 22:52:29 +00:00

functional-query

IMPALA-8793: Implement TRUNCATE for insert-only ACID tables

2019-08-27 18:51:55 +00:00

perf-regression

IMPALA-3311: fix string data coming out of aggs in subplans

2016-05-12 23:06:36 -07:00

targeted-perf

IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

2018-11-13 23:25:02 +00:00

targeted-stress

IMPALA-4674: Part 2: port backend exec to BufferPool

2017-08-05 01:03:02 +00:00

tpcds

IMPALA-8207: Fix query loading for perf and stress tests

2019-02-19 22:31:17 +00:00

tpcds-insert

IMPALA-4356,IMPALA-7331: codegen all ScalarExprs

2019-05-15 22:34:28 +00:00

tpcds-unmodified

IMPALA-5946,IMPALA-5956: add TPC-DS q31,q59,q89

2018-11-02 22:11:03 +00:00

tpch

IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

2019-02-28 20:20:14 +00:00

tpch_nested

IMPALA-6503: Support reading complex types from ORC

2019-03-08 04:39:08 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload