Files
impala/testdata/workloads
Tim Wood f05bd241ea IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.
Main source for TPCDS query and result definitions: https://github.com/gregrahn/tpcds-kit.
TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc.
Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh -testdata....
This commit continues previous work on IMPALA-5376 in the ASF Impala repo
and the Cloudera Gerrit service.

This commit splits multi-query tests in the TPC-DS suite definition into one
query and result set per test file, as the test framework requires.  Names for
such files have -1, -2... inner suffixes.

The portion of the TPC-DS test suite in this commit passes.
It contains no failures, as reflected by runs of
$IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ...

IMPALA-6007 addresses the TPC-DS cases that require skipping (because we don't
support them or they flap) or expected-failure (xfail, because we support them
but they fail due to bugs.)  These require some added tooling for non-Pytest
frameworks like the stress test to avoid attempting them until they work.
Tests that flap are marked to skip, with a bug ID, since they don't reliably pass or xfail.

Expected result sets come from the TPC-DS kit.  Some TPC-DS test cases
in this commit have been modified in sematically-neutral ways so as to pass
on Impala.

The tests/query_test/test_tpcds_queries.py driver file is authoritative for the
active/skip/xfail status for each case and a brief reason.  The following list
describes the current status as:
--- test-name
deviance from TPC-DS spec
changes made

--- tpcds-q22a.test
RESULT MISMATCH in LSD of AVG() values
FIXED, HAND_ROUNDED AVG() VALUES IN RESULT SET
--- tpcds-q26.test
RESULT MISMATCH in LSD of AVG() values
ABSENT, IMPALA-6087
--- tpcds-q28.test
RESULT MISMATCH in LSD of AVG() values
ABSENT, IMPALA-6087
--- tpcds-q30.test
UNRECOGNIZED CHARACTER
ABSENT, IMPALA-5961.
--- tpcds-q31.test
RESULT MISMATCH in LSD of DECIMAL values
ABSENT, IMPALA-5956.
--- tpcds-q35a.test
RESULT MISMATCH
ABSENT, IMPALA-5950.
--- tpcds-q36a.test
RESULT MISMATCH
ABSENT, IMPALA-4741
--- tpcds-q47.test
RESULT MISMATCH in LSD of DECIMAL values
ABSENT, IMPALA-6087
--- tpcds-q48.test
RESULT MISMATCH in scalar value
ABSENT, IMPALA-5950.
--- tpcds-q49.test
RESULT MISMATCH in LSD of DECIMAL values
ABSENT, IMPALA-5945
--- tpcds-q57.test
RESULT MISMATCH, excess scale in DECIMAL values
ABSENT, IMPALA-6087
--- tpcds-q58.test
RESULT MISMATCH in DECIMAL values
ABSENT, IMPALA-5946
--- tpcds-q59.test
RESULT MISMATCH, excess scale in DECIMAL values
ABSENT, IMPALA-6087
--- tpcds-q61.test
RESULT MISMATCH in DECIMAL value
FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q63.test
RESULT MISMATCH, excess scale in DECIMAL values
ABSENT, IMPALA-6087
--- tpcds-q64.test
RESULT MISMATCH
ADDED ORDER BY COLUMNS.
--- tpcds-q66.test
RESULT MISMATCH
ABSENT, IMPALA-4741
--- tpcds-q77a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q78.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q83.test
RESULT MISMATCH
ABSENT, IMPALA-5945.
--- tpcds-q85.test
MISSING TABLE "reason"
ABSENT, IMPALA-5960
--- tpcds-q86a.test
RESULT MISMATCH
FIXED. TAKE ACTUAL RESULT AS EXPECTED
--- tpcds-q89.test
RESULT MISMATCH, DECIMAL values flap
ABSENT, ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, IMPALA-5956.
--- tpcds-q90.test
RESULT MISMATCH
ABSENT, IMPALA-5945.
--- tpcds-q93.test
MISSING TABLE "reason"
ABSENT, IMPALA-5960
--- tpcds-q98.test
RESULT MISMATCH
FIXED, ADDED ROUND() TO LAST COLUMN

Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Reviewed-on: http://gerrit.cloudera.org:8080/8102
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-23 19:32:10 +00:00
..

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload