mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
Fix determinism in TPCH Q3, Q10, Q18 by adding another column to the queries' ORDER BY to guarantee deterministic results. With TPCH 10000 these queries were producing differing results across stress test runs. They were all valid, but the LIMIT without the more specific ORDER BY meant that several different result sets were possible. By adding these columns, we sort by a column that has uniqueness across all rows returned. Testing: Repeated runs of these specific TPCH queries via: impala-py.test -k Q18 tests/query_test/test_tpch_queries.py Stress test on a 140-node cluster with TPCH 10000 loaded. Previously when using these queries, the stress test would incorrectly report incorrect results. Change-Id: If74d127fb57546b1948a34aa6d2e68cdc6880fae Reviewed-on: http://gerrit.cloudera.org:8080/10351 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This directory contains Impala test workloads. The directory layout for the workloads should follow: workloads/ <data set name>/<data set name>_dimensions.csv <- The test dimension file <data set name>/<data set name>_core.csv <- A test vector file <data set name>/<data set name>_pairwise.csv <data set name>/<data set name>_exhaustive.csv <data set name>/queries/<query test>.test <- The queries for this workload