mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
Problems with perf queries (run-workload.py): - TPCH picks up stress test specific queries (TPCH-AGG1/2/3) - TPCDS picks up queries that were intended just to validate that data was loaded properly but that aren't interesting from a perf perspective (TPCDS-COUNT-<table>) - TPCDS picks up both decimal_v1 and decimal_v2 queries. This is mostly harmless as for queries with matching names only one gets run but it causes some queries with mismatched names to be run twice (TPCDS-Q39-1/2 vs. TPCDS-Q39.1/2) Problems with stress queries (concurrent_select.py): - TPCDS fails to pick up Q22A as it does not use the decimal_v2 queries, even though decimal_v2 is the default now. This problem is exacerbated by the fact that the two scripts have different code paths for selecting the queries, so in the past changes that were made to one path were not always made to the other. This patch merges the two paths to reduce code duplication and prevent these sorts of issues in the future, and fixes the above issues. One complication is that historically the stress test has used query names in the form 'q1' whereas the perf test has used query names in the form 'TPCH-Q1'. This patch standardizes on using 'TPCH-Q1'. Testing: - Added a test that checks that the perf tests pick up the expected number of queries. - Manually ran the scripts and verified that the correct queries are selected. Change-Id: Id1966d6ca8babdda07d47e089b75ba06d0318c0d Reviewed-on: http://gerrit.cloudera.org:8080/12503 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>