Files
impala/testdata/workloads/functional-query/queries/QueryTest/spilling.test
Tim Armstrong 0bb056e525 IMPALA-4224: execute separate join builds fragments
This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.

Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
  query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
   consumption for each fragment separately, and assuming that
   all fragments hit their peak resource consumption at the
   same time. IMPALA-9255 is the follow-on to make the resource
   estimation more accurate.

Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
  which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.

Runtime filter changes:
* Build sinks now produce runtime filters, which required
  planner and coordinator fixes to handle.

DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
  resources. This depends on Send() not holding onto references
  to input batches, which was true except for NljBuilder. This
  invariant is documented.

Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
  functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
  so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
  its input tree.

JoinNode changes:
* Join node blocks waiting for build-side to be ready,
  then eventually signals that it's done, allowing the builder
  to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
  the external builder. There is a 1:1 relationship between
  the node and the builder, so we don't deal with thread safety
  yet.
* Buffer reservations are transferred between the builder and join
  node when running with the separate builder. This is not really
  necessary right now, since it is all single-threaded, but will
  be important for the shared broadcast.
  - The builder transfers memory for probe buffers to the join node
    at the end of each build phase.
  - At end of each probe phase, reservation needs to be handed back
    to builder (or released).

ExecSummary changes:
* The summary logic was modified to handle connecting fragments
  via join builds. The logic is an extension of what was used
  for exchanges.

Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
  custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
  coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
  and mt_dop=4.

Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
  change.

Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Reviewed-on: http://gerrit.cloudera.org:8080/14859
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-20 01:51:54 +00:00

311 lines
14 KiB
Plaintext

====
---- QUERY
set buffer_pool_limit=215m;
select count(l1.l_tax)
from
lineitem l1,
lineitem l2,
lineitem l3
where
l1.l_tax < 0.01 and
l2.l_tax < 0.04 and
l1.l_orderkey = l2.l_orderkey and
l1.l_orderkey = l3.l_orderkey and
l1.l_comment = l3.l_comment and
l1.l_shipdate = l3.l_shipdate
---- RESULTS
1846743
---- TYPES
BIGINT
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
set buffer_pool_limit=50m;
select max(t1.total_count), max(t1.l_shipinstruct), max(t1.l_comment) from
(select l_shipinstruct, l_comment, count(*) over () total_count from lineitem) t1
---- RESULTS
6001215,'TAKE BACK RETURN','zzle? slyly final platelets sleep quickly. '
---- TYPES
BIGINT, STRING, STRING
---- RUNTIME_PROFILE
# Verify that the analytic spilled
row_regex: .*PeakUnpinnedBytes: [1-9][0-9]*.*
====
---- QUERY
# Run this query with very low memory, but enough not to spill.
set buffer_pool_limit=20m;
select a.int_col, count(*)
from functional.alltypessmall a, functional.alltypessmall b, functional.alltypessmall c
where a.id = b.id and b.id = c.id group by a.int_col
---- RESULTS
0,12
1,12
2,12
3,12
4,12
5,8
6,8
7,8
8,8
9,8
---- TYPES
INT, BIGINT
---- RUNTIME_PROFILE
# This query is not meant to spill.
row_regex: .*SpilledPartitions: 0 .*
====
---- QUERY: TPCH-Q21
# Adding TPCH-Q21 in the spilling test to check for IMPALA-1471 (spilling left anti
# and left outer joins were returning wrong results).
# Q21 - Suppliers Who Kept Orders Waiting Query
set buffer_pool_limit=100m;
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1 join [BROADCAST]
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100
---- RESULTS
'Supplier#000002829',20
'Supplier#000005808',18
'Supplier#000000262',17
'Supplier#000000496',17
'Supplier#000002160',17
'Supplier#000002301',17
'Supplier#000002540',17
'Supplier#000003063',17
'Supplier#000005178',17
'Supplier#000008331',17
'Supplier#000002005',16
'Supplier#000002095',16
'Supplier#000005799',16
'Supplier#000005842',16
'Supplier#000006450',16
'Supplier#000006939',16
'Supplier#000009200',16
'Supplier#000009727',16
'Supplier#000000486',15
'Supplier#000000565',15
'Supplier#000001046',15
'Supplier#000001047',15
'Supplier#000001161',15
'Supplier#000001336',15
'Supplier#000001435',15
'Supplier#000003075',15
'Supplier#000003335',15
'Supplier#000005649',15
'Supplier#000006027',15
'Supplier#000006795',15
'Supplier#000006800',15
'Supplier#000006824',15
'Supplier#000007131',15
'Supplier#000007382',15
'Supplier#000008913',15
'Supplier#000009787',15
'Supplier#000000633',14
'Supplier#000001960',14
'Supplier#000002323',14
'Supplier#000002490',14
'Supplier#000002993',14
'Supplier#000003101',14
'Supplier#000004489',14
'Supplier#000005435',14
'Supplier#000005583',14
'Supplier#000005774',14
'Supplier#000007579',14
'Supplier#000008180',14
'Supplier#000008695',14
'Supplier#000009224',14
'Supplier#000000357',13
'Supplier#000000436',13
'Supplier#000000610',13
'Supplier#000000788',13
'Supplier#000000889',13
'Supplier#000001062',13
'Supplier#000001498',13
'Supplier#000002056',13
'Supplier#000002312',13
'Supplier#000002344',13
'Supplier#000002596',13
'Supplier#000002615',13
'Supplier#000002978',13
'Supplier#000003048',13
'Supplier#000003234',13
'Supplier#000003727',13
'Supplier#000003806',13
'Supplier#000004472',13
'Supplier#000005236',13
'Supplier#000005906',13
'Supplier#000006241',13
'Supplier#000006326',13
'Supplier#000006384',13
'Supplier#000006394',13
'Supplier#000006624',13
'Supplier#000006629',13
'Supplier#000006682',13
'Supplier#000006737',13
'Supplier#000006825',13
'Supplier#000007021',13
'Supplier#000007417',13
'Supplier#000007497',13
'Supplier#000007602',13
'Supplier#000008134',13
'Supplier#000008234',13
'Supplier#000009435',13
'Supplier#000009436',13
'Supplier#000009564',13
'Supplier#000009896',13
'Supplier#000000379',12
'Supplier#000000673',12
'Supplier#000000762',12
'Supplier#000000811',12
'Supplier#000000821',12
'Supplier#000001337',12
'Supplier#000001916',12
'Supplier#000001925',12
'Supplier#000002039',12
'Supplier#000002357',12
'Supplier#000002483',12
---- TYPES
string, bigint
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# IMPALA-1346/IMPALA-1546: fix sorter memory management so that it can complete
# successfully when in same pipeline as a spilling join.
set buffer_pool_limit=170m;
set disable_outermost_topn=1;
select * from lineitem
inner join orders on l_orderkey = o_orderkey
order by l_linenumber, l_suppkey, l_partkey, l_orderkey
limit 20
---- RESULTS
4567296,2500,1,1,48.00,67320.00,0.06,0.05,'N','O','1997-05-15','1997-05-19','1997-05-27','DELIVER IN PERSON','REG AIR','ccounts cajole quickly ',4567296,100420,'O',113399.69,'1997-02-21','1-URGENT','Clerk#000000779',0,'ously ironic instructions. pa'
5427587,2500,1,1,40.00,56100.00,0.05,0.07,'N','O','1997-04-01','1997-05-22','1997-04-29','TAKE BACK RETURN','RAIL','oxes wake even theodolites: bold requests',5427587,110356,'O',182983.43,'1997-03-10','1-URGENT','Clerk#000000279',0,'osits wake along the ca'
3834597,7500,1,1,3.00,4222.50,0.03,0.01,'N','O','1998-07-03','1998-05-12','1998-07-04','COLLECT COD','AIR','sly final instructions boost about',3834597,29839,'O',147632.36,'1998-03-23','1-URGENT','Clerk#000000284',0,'le carefully blithel'
4888512,10000,1,1,14.00,12740.00,0.03,0.01,'A','F','1993-02-07','1992-12-29','1993-02-19','NONE','SHIP','al braids. unusual, silent sentiments c',4888512,93742,'F',12481.37,'1992-11-21','4-NOT SPECIFIED','Clerk#000000591',0,' requests hinder blithely. closely ironic theodolites cajole among the car'
693345,12497,1,1,14.00,19732.86,0.03,0.07,'N','O','1998-10-12','1998-08-18','1998-10-23','NONE','FOB','. blithely',693345,49472,'O',111825.01,'1998-07-06','4-NOT SPECIFIED','Clerk#000000132',0,'es wake regularly furiously pending orbits. quickly even requests according t'
710784,12497,1,1,11.00,15504.39,0.02,0.07,'R','F','1992-08-21','1992-08-11','1992-09-05','TAKE BACK RETURN','SHIP','haggle furiously special accounts? final th',710784,82009,'F',91959.19,'1992-06-20','1-URGENT','Clerk#000000001',0,'rve quickly after the express theodolites. furiou'
1240672,12497,1,1,41.00,57789.09,0.08,0.05,'N','O','1998-09-18','1998-08-04','1998-09-24','NONE','RAIL','o the furiously unusual requests sleep alo',1240672,59153,'O',55824.25,'1998-05-28','1-URGENT','Clerk#000000657',0,'ular pinto beans are above the furiously regular accounts: furiously even foxe'
2024230,12497,1,1,16.00,22551.84,0.05,0.00,'A','F','1992-04-10','1992-04-14','1992-04-13','NONE','RAIL','pecial theodolites wake slyly. care',2024230,145955,'F',186715.47,'1992-02-22','5-LOW','Clerk#000000904',0,'uses. accounts are furiously. fluffily regular ideas haggle b'
3039715,12497,1,1,14.00,19732.86,0.01,0.02,'R','F','1993-03-24','1993-02-12','1993-03-28','TAKE BACK RETURN','AIR','onic requests. furiously spe',3039715,27274,'F',80817.94,'1992-11-23','2-HIGH','Clerk#000000766',0,'lently regular packages believe slyly around the regular, regula'
3105635,12497,1,1,16.00,22551.84,0.03,0.07,'R','F','1993-07-30','1993-08-28','1993-08-19','TAKE BACK RETURN','SHIP','nal, ironic theodolites solve carefully',3105635,23665,'F',256886.23,'1993-06-22','5-LOW','Clerk#000000029',0,'otes-- blithely special accounts cajole slyly furiously silent dugout'
3764485,12497,1,1,22.00,31008.78,0.05,0.08,'R','F','1993-02-07','1993-03-18','1993-02-10','COLLECT COD','AIR',' pinto beans nag',3764485,19375,'F',107449.65,'1992-12-26','5-LOW','Clerk#000000364',0,' carefully regular foxes bo'
4767393,12497,1,1,3.00,4228.47,0.09,0.06,'A','F','1992-09-15','1992-10-02','1992-09-17','DELIVER IN PERSON','FOB',' asymptotes. blithely',4767393,22522,'F',163054.32,'1992-08-10','2-HIGH','Clerk#000000174',0,'unusual foxes after the furiously regular multipliers detect ca'
1955428,14998,1,1,47.00,89910.53,0.07,0.02,'R','F','1993-01-25','1993-04-02','1993-01-31','DELIVER IN PERSON','MAIL','thely regular frays',1955428,41801,'F',134123.84,'1993-01-08','4-NOT SPECIFIED','Clerk#000000422',0,'ld deposits are slyly. quickly bold ideas bo'
4197636,14998,1,1,6.00,11477.94,0.03,0.01,'A','F','1993-11-28','1993-11-27','1993-12-03','TAKE BACK RETURN','SHIP','ly. slyly final ex',4197636,77494,'F',283248.52,'1993-10-18','1-URGENT','Clerk#000000087',0,'final asymptotes? packages doze against the ironic packages. ironic, bold dec'
4899558,17499,1,1,12.00,16997.88,0.00,0.08,'A','F','1992-08-06','1992-08-09','1992-08-14','COLLECT COD','RAIL','ourts would sleep fluffily express accou',4899558,111973,'F',300951.70,'1992-05-21','1-URGENT','Clerk#000000826',0,'eodolites wake ironic sentiments. r'
4725760,20000,1,1,48.00,44160.00,0.02,0.05,'R','F','1992-03-06','1992-03-04','1992-03-20','NONE','SHIP','liers. slyly regular request',4725760,45176,'F',202199.68,'1992-01-31','4-NOT SPECIFIED','Clerk#000000853',0,'riously regular accounts. ironic, bold requests was slyly; slyly regula'
49444,22494,1,1,22.00,31162.78,0.06,0.08,'N','O','1997-03-03','1997-04-19','1997-03-25','TAKE BACK RETURN','MAIL','l requests among the blithely fin',49444,11725,'O',273794.80,'1997-01-23','2-HIGH','Clerk#000000814',0,'s. quickly bold packages integrate. furio'
396839,24996,1,1,9.00,17288.91,0.03,0.07,'N','O','1996-06-19','1996-04-27','1996-07-13','COLLECT COD','MAIL',' accounts ',396839,63389,'O',193790.34,'1996-03-11','5-LOW','Clerk#000000436',0,'eans. instructions are quickly--'
1109894,24996,1,1,22.00,42261.78,0.01,0.05,'A','F','1995-01-10','1995-03-01','1995-01-21','TAKE BACK RETURN','RAIL','nusual requests wake. qu',1109894,35489,'F',78070.57,'1994-12-23','2-HIGH','Clerk#000000524',0,' bold deposits engage fluffily among the s'
5825859,24996,1,1,5.00,9604.95,0.02,0.00,'A','F','1995-05-05','1995-03-17','1995-05-21','TAKE BACK RETURN','MAIL','y special a',5825859,13285,'F',90407.81,'1995-01-15','3-MEDIUM','Clerk#000000818',0,'ajole. quickly ironic theodolites '
---- TYPES
BIGINT,BIGINT,BIGINT,INT,DECIMAL,DECIMAL,DECIMAL,DECIMAL,STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING,BIGINT,BIGINT,STRING,DECIMAL,STRING,STRING,STRING,INT,STRING
---- RUNTIME_PROFILE
# Verify that the sort and join actually spilled
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
row_regex: .*TotalMergesPerformed: .* \([1-9][0-9]*\)
====
---- QUERY
# IMPALA-5173: spilling hash join feeding into right side of nested loop join.
# Equivalent to:
# select *
# from lineitem
# where l1.l_quantity = 31.0 and l1.l_tax = 0.03 and l1.l_orderkey <= 100000
# order by l_orderkey, l_partkey, l_suppkey, l_linenumber
# limit 5
set buffer_pool_limit=177m;
set num_nodes=1;
select straight_join l.*
from
(select *
from tpch_parquet.orders limit 1) o,
(select l2.*
from tpch_parquet.lineitem l1
inner join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey
and l1.l_partkey = l2.l_partkey
and l1.l_suppkey = l2.l_suppkey and l1.l_linenumber = l2.l_linenumber
where
# Include a selective post-join predicate so that the RHS of the nested loop join
# doesn't consume too much memory.
(l1.l_quantity != l2.l_quantity or l1.l_quantity = 31.0 and l1.l_tax = 0.03)
# Reduce the data size to get the test to execute quicker
and l1.l_orderkey <= 100000) l
order by l_orderkey, l_partkey, l_suppkey, l_linenumber
limit 5;
---- TYPES
bigint,bigint,bigint,int,decimal,decimal,decimal,decimal,string,string,string,string,string,string,string,string
---- RESULTS
288,50641,8157,1,31.00,49340.84,0.00,0.03,'N','O','1997-03-17','1997-04-28','1997-04-06','TAKE BACK RETURN','AIR','instructions wa'
418,18552,1054,1,31.00,45587.05,0.00,0.03,'N','F','1995-06-05','1995-06-18','1995-06-26','COLLECT COD','FOB','final theodolites. fluffil'
482,61141,6154,3,31.00,34166.34,0.04,0.03,'N','O','1996-06-01','1996-05-06','1996-06-17','NONE','MAIL',' blithe pin'
1382,156162,6163,5,31.00,37762.96,0.07,0.03,'R','F','1993-10-26','1993-10-15','1993-11-09','TAKE BACK RETURN','FOB','hely regular dependencies. f'
1509,186349,3904,6,31.00,44495.54,0.04,0.03,'A','F','1993-07-14','1993-08-21','1993-08-06','COLLECT COD','SHIP','ic deposits cajole carefully. quickly bold '
====
---- QUERY
# Test spilling aggregation when grouping by nondeterministic expression
set buffer_pool_limit=79m;
set num_nodes=1;
select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_comment
from tpch_parquet.lineitem
group by 1, 2, 3, 4, 5, random()
limit 5
---- RUNTIME_PROFILE
row_regex: .*Query State: FINISHED.*
row_regex: .*Query Status: OK.*
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# Test spilling join with many duplicates in join key. We don't expect this to succeed
# with a memory constraint: see IMPALA-4857. Limit size of probe so that query doesn't
# bog down executing an exploding join.
# The additional "order by" and "limit" clauses make sure that a successful
# query does not too much data to the client.
set buffer_pool_limit=167m;
select straight_join *
from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber
where l1.l_orderkey < 100000
order by l1.l_orderkey desc, l1.l_linenumber desc limit 10
---- CATCH
Repartitioning did not reduce the size of a spilled partition
====