mirror of
https://github.com/apache/impala.git
synced 2026-01-22 09:01:58 -05:00
Cardinality is vital to understanding why a plan has the form it does, yet the planner normally emits cardinality information only for the detailed levels. Unfortunately, most query profiles we see are at the standard level without this information (except in the summary table), making it hard to understand what happened. This patch adds cardinality to the standard EXPLAIN output. It also changes the displayed cardinality value to be in abbreviated "metric" form: 1.23K instead of 1234, etc. Changing the DESCRIBE output has a huge impact on PlannerTest: all the "golden" test files must change. To avoid doing this twice, this patch also includes: IMPALA-7919: Add predicates line in plan output for partition key predicates This is also the time to also include: IMPALA-8022: Add cardinality checks to PlannerTest The comparison code was changed to allow a set of validators, one of which compares cardinality to ensure it is within 5% of the expected value. This should ensure we don't change estimates unintentionally. While many planner tests are concerned with cardinality, many others are not. Testing showed that the cardinality is actually unstable within tests. For such tests, added filters to ignore cardinality. The filter is enabled by default (for backward compatibility) but disabled (to allow cardinality verification) for the critical tests. Rebasing the tests was complicated by a bug in the error-matching code, so this patch also fixes: IMPALA-8023: Fix PlannerTest to handle error lines consistently Now, the error output written to the output "save results" file matches that expected in the "golden" file -- no more handling these specially. Testing: * Added cardinality verification. * Reran all FE tests. * Rebased all PlannerTest .test files. * Adjusted the metadata/test_explain.py test to handle the changed EXPLAIN output. Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Reviewed-on: http://gerrit.cloudera.org:8080/12136 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
42 lines
1.4 KiB
Plaintext
42 lines
1.4 KiB
Plaintext
====
|
|
---- QUERY
|
|
# Explain a simple hash join query.
|
|
explain
|
|
select *
|
|
from tpch.lineitem join tpch.orders on l_orderkey = o_orderkey;
|
|
---- RESULTS: VERIFY_IS_EQUAL
|
|
row_regex:.*Max Per-Host Resource Reservation: Memory=[0-9.]*MB Threads=[0-9]*.*
|
|
row_regex:.*Per-Host Resource Estimates: Memory=[0-9.]*MB.*
|
|
''
|
|
'PLAN-ROOT SINK'
|
|
'|'
|
|
'04:EXCHANGE [UNPARTITIONED]'
|
|
'|'
|
|
'02:HASH JOIN [INNER JOIN, BROADCAST]'
|
|
'| hash predicates: l_orderkey = o_orderkey'
|
|
'| runtime filters: RF000 <- o_orderkey'
|
|
row_regex:.*row-size=.* cardinality=.*
|
|
'|'
|
|
'|--03:EXCHANGE [BROADCAST]'
|
|
'| |'
|
|
'| 01:SCAN HDFS [tpch.orders]'
|
|
row_regex:.*partitions=1/1 files=1 size=.*
|
|
row_regex:.*row-size=.* cardinality=.*
|
|
'|'
|
|
'00:SCAN HDFS [tpch.lineitem]'
|
|
row_regex:.*partitions=1/1 files=1 size=.*
|
|
' runtime filters: RF000 -> l_orderkey'
|
|
row_regex:.*row-size=.* cardinality=.*
|
|
====
|
|
---- QUERY
|
|
# Tests the warning about missing table stats in the explain header.
|
|
explain select count(t1.int_col), avg(t2.float_col), sum(t3.bigint_col)
|
|
from functional_avro.alltypes t1
|
|
inner join functional_parquet.alltypessmall t2 on (t1.id = t2.id)
|
|
left outer join functional_avro.alltypes t3 on (t2.id = t3.id)
|
|
where t1.month = 1 and t2.year = 2009 and t3.bool_col = false
|
|
---- RESULTS: VERIFY_IS_SUBSET
|
|
'WARNING: The following tables are missing relevant table and/or column statistics.'
|
|
'functional_avro.alltypes, functional_parquet.alltypessmall'
|
|
====
|