IMPALA-9890 (Part 1): Add more TPCDS queries to Impala's test suite

This patch adds the following 12 TPCDS queries to the class of
TestTpcdsDecimalV2Query: Q26, Q30, Q31, Q47, Q48, Q57, Q58, Q59, Q63,
Q83, Q85, and Q89. All the queries except for Q31 are added to the class
of TestTpcdsQuery as well because Impala returns one fewer row than
expected for TestTpcdsQuery::test_tpcds_q31(), which requires further
investigation.

To verify whether or not the returned result set from Impala for a given
query is correct, we compare the result set with that produced by the
HiveServer2 (HS2) in Impala's mini-cluster. We could execute SQL
statements in HS2 via Beeline, HS2's command line shell, which could be
launched by the following command.

beeline -u "jdbc:hive2://localhost:11050/default"

We note that among these 12 queries, the execution of Q31, Q58, and Q83
result in the error of "Counters limit exceeded" by TEZ. To work around
this problem, for these 3 queries we have to execute the following
statement before running them to increase the default number of
counters, which is set to 120.

set tez.counters.max=1200

On the other hand, the table of 'reason' is referenced by Q85. This
table was not referenced by any TPCDS query before this patch and thus
was not created. In this regard, in this patch we also modify
tpcds_schema_template.sql to create this additional table along with its
data.

Testing:
- Verified that this patch passes the exhaustive tests in the DEBUG
  build.

Change-Id: Ib5f260e75a3803aabe9ccef271ba94036f96e5cf
Reviewed-on: http://gerrit.cloudera.org:8080/16119
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Fang-Yu Rao
2020-06-26 16:11:29 -07:00
committed by Impala Public Jenkins
parent 1fbca6d43b
commit 2a48f7dd98
26 changed files with 2622 additions and 1 deletions

View File

@@ -22,7 +22,7 @@ from datetime import datetime
# changed, and the stress test loses the ability to run the full set of queries. Set
# these constants and assert that when a workload is used, all the queries we expect to
# use are there.
EXPECTED_TPCDS_QUERIES_COUNT = 72
EXPECTED_TPCDS_QUERIES_COUNT = 84
EXPECTED_TPCH_NESTED_QUERIES_COUNT = 22
EXPECTED_TPCH_QUERIES_COUNT = 22
# Add the number of stress test specific queries, i.e. in files like '*-stress-*.test'