Files
impala/testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
Alex Behm 9f678a7426 IMPALA-5547: Rework FK/PK join detection.
Reworks the FK/PK join detection logic to:
- more accurately recognize many-to-many joins
- avoid dim/dim joins for multi-column PKs

The new detection logic maintains our existing philosophy of generally
assuming a FK/PK join, unless there is strong evidence to the
contrary, as follows.

For each set of simple equi-join conjuncts between two tables, we
compute the joint NDV of the right-hand side columns by
multiplication, and if the joint NDV is significantly smaller than
the right-hand side row count, then we are fairly confident that the
right-hand side is not a PK. Otherwise, we assume the set of conjuncts
could represent a FK/PK relationship.

Extends the explain plan to include the outcome of the FK/PK detection
at EXPLAIN_LEVEL > STANDARD.

Performance testing:
1. Full TPC-DS run on 10TB:
   - Q10 improved by >100x
   - Q72 improved by >25x
   - Q17,Q26,Q29 improved by 2x
   - Q64 regressed by 10x
   - Total runtime: Improved by 2x
   - Geomean: Minor improvement
   The regression of Q64 is understood and we will try to address it
   in follow-on changes. The previous plan was better by accident and
   not because of superior logic.
2. Nightly TPC-H and TPC-DS runs:
   - No perf differences

Testing:
- The existing planner test cover the changes.
- Code/hdfs run passed.

Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Reviewed-on: http://gerrit.cloudera.org:8080/7257
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-03 00:04:54 +00:00

61 lines
2.2 KiB
Plaintext

====
---- QUERY
# Explain a simple hash join query.
explain
select *
from tpch.lineitem join tpch.orders on l_orderkey = o_orderkey;
---- RESULTS: VERIFY_IS_EQUAL
'Per-Host Resource Reservation: Memory=136.00MB'
'Per-Host Resource Estimates: Memory=388.41MB'
''
'F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1'
'PLAN-ROOT SINK'
'| mem-estimate=0B mem-reservation=0B'
'|'
'04:EXCHANGE [UNPARTITIONED]'
'| mem-estimate=0B mem-reservation=0B'
'| tuple-ids=0,1 row-size=454B cardinality=5757710'
'|'
'F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3'
'02:HASH JOIN [INNER JOIN, BROADCAST]'
'| hash predicates: l_orderkey = o_orderkey'
'| fk/pk conjuncts: l_orderkey = o_orderkey'
'| runtime filters: RF000 <- o_orderkey'
'| mem-estimate=300.41MB mem-reservation=136.00MB'
'| tuple-ids=0,1 row-size=454B cardinality=5757710'
'|'
'|--03:EXCHANGE [BROADCAST]'
'| | mem-estimate=0B mem-reservation=0B'
'| | tuple-ids=1 row-size=191B cardinality=1500000'
'| |'
'| F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2'
'| 01:SCAN HDFS [tpch.orders, RANDOM]'
row_regex:.*partitions=1/1 files=1 size=.*
'| stats-rows=1500000 extrapolated-rows=disabled'
'| table stats: rows=1500000 size=162.56MB'
'| column stats: all'
'| mem-estimate=88.00MB mem-reservation=0B'
'| tuple-ids=1 row-size=191B cardinality=1500000'
'|'
'00:SCAN HDFS [tpch.lineitem, RANDOM]'
row_regex:.*partitions=1/1 files=1 size=.*
' runtime filters: RF000 -> l_orderkey'
' stats-rows=6001215 extrapolated-rows=disabled'
' table stats: rows=6001215 size=718.94MB'
' column stats: all'
' mem-estimate=88.00MB mem-reservation=0B'
' tuple-ids=0 row-size=263B cardinality=6001215'
====
---- QUERY
# Tests the warning about missing table stats in the explain header.
explain select count(t1.int_col), avg(t2.float_col), sum(t3.bigint_col)
from functional_avro.alltypes t1
inner join functional_parquet.alltypessmall t2 on (t1.id = t2.id)
left outer join functional_avro.alltypes t3 on (t2.id = t3.id)
where t1.month = 1 and t2.year = 2009 and t3.bool_col = false
---- RESULTS: VERIFY_IS_SUBSET
'Per-Host Resource Estimates: Memory=4.03GB'
'WARNING: The following tables are missing relevant table and/or column statistics.'
'functional_avro.alltypes, functional_parquet.alltypessmall'
====