mirror of
https://github.com/apache/impala.git
synced 2026-01-03 06:00:52 -05:00
Compute the minimum buffer requirement for spilling nodes and per-host estimates for the entire plan tree. This builds on top of the existing resource estimation code, which computes the sets of plan nodes that can execute concurrently. This is cleaned up so that the process of producing resource requirements is clearer. It also removes the unused VCore estimates. Fixes various bugs and other issues: * computeCosts() was not called for unpartitioned fragments, so the per-operator memory estimate was not visible. * Nested loop join was not treated as a blocking join. * The TODO comment about union was misleading * Fix the computation for mt_dop > 1 by distinguishing per-instance and per-host estimates. * Always generate an estimate instead of unpredictably returning -1/"unavailable" in many circumstances - there was little rhyme or reason to when this happened. * Remove the special "trivial plan" estimates. With the rest of the cleanup we generate estimates <= 10MB for those trivial plans through the normal code path. I left one bug (IMPALA-4862) unfixed because it is subtle, will affect estimates for many plans and will be easier to review once we have the test infra in place. Testing: Added basic planner tests for resource requirements in both the MT and non-MT cases. Re-enabled the explain_level tests, which appears to be the only coverage for many of these estimates. Removed the complex and brittle test cases and replaced with a couple of much simpler end-to-end tests. Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Reviewed-on: http://gerrit.cloudera.org:8080/5847 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins
39 lines
1.2 KiB
Plaintext
39 lines
1.2 KiB
Plaintext
====
|
|
---- QUERY
|
|
# Explain a simple hash join query.
|
|
explain
|
|
select *
|
|
from tpch.lineitem join tpch.orders on l_orderkey = o_orderkey;
|
|
---- RESULTS: VERIFY_IS_EQUAL
|
|
'Per-Host Resource Reservation: Memory=136.00MB'
|
|
'Per-Host Resource Estimates: Memory=388.41MB'
|
|
''
|
|
'PLAN-ROOT SINK'
|
|
'|'
|
|
'04:EXCHANGE [UNPARTITIONED]'
|
|
'|'
|
|
'02:HASH JOIN [INNER JOIN, BROADCAST]'
|
|
'| hash predicates: l_orderkey = o_orderkey'
|
|
'| runtime filters: RF000 <- o_orderkey'
|
|
'|'
|
|
'|--03:EXCHANGE [BROADCAST]'
|
|
'| |'
|
|
'| 01:SCAN HDFS [tpch.orders]'
|
|
row_regex:.*partitions=1/1 files=1 size=.*
|
|
'|'
|
|
'00:SCAN HDFS [tpch.lineitem]'
|
|
row_regex:.*partitions=1/1 files=1 size=.*
|
|
' runtime filters: RF000 -> l_orderkey'
|
|
====
|
|
---- QUERY
|
|
# Tests the warning about missing table stats in the explain header.
|
|
explain select count(t1.int_col), avg(t2.float_col), sum(t3.bigint_col)
|
|
from functional_avro.alltypes t1
|
|
inner join functional_parquet.alltypessmall t2 on (t1.id = t2.id)
|
|
left outer join functional_avro.alltypes t3 on (t2.id = t3.id)
|
|
where t1.month = 1 and t2.year = 2009 and t3.bool_col = false
|
|
---- RESULTS: VERIFY_IS_SUBSET
|
|
'WARNING: The following tables are missing relevant table and/or column statistics.'
|
|
'functional_avro.alltypes, functional_parquet.alltypessmall'
|
|
====
|