mirror of
https://github.com/apache/impala.git
synced 2026-01-28 09:03:52 -05:00
Planner changes:
---------------
The planner now identifies predicates that can be converted into
limits in a partitioned or unpartitioned top-n with the following
method:
* Push down predicates that reference analytic tuple into inline view.
These will be evaluated after the analytic plan for the inline
SelectStmt is generated.
* Identify predicates that reference the analytic tuple and could
be converted to limits.
* If they can be applied to the last sort group of the analytic
plan, and the windows are all compatible, then the lowest
limit gets converted into a limit in the top N.
* Otherwise generate a select node with the conjuncts. We add
logic to merge SELECT nodes to avoid generating duplicates
from inside and outside the inline view.
* The pushed predicate is still added to the SELECT node
because it is necessary for correctness for predicates
like '=' to filter additional rows and also the limit
pushdown optimization looks for analytic predicates
there, so retaining all predicates simplifies that.
The selectivity of the predicate is adjusted so that
cardinality estimates remain accurate.
The optimization can be disabled by setting
ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is
only enabled for limits of 1000 or less, because the
in-memory Top-N may perform significantly worse than
a full sort for large heaps (since updating the heap
for every input row ends up being more expensive than
doing a traditional sort). We could probably optimize
this more with better tuning so that it can gracefully
fall back to doing the full sort at runtime.
rank() and row_number() are handled. rank() needs support in
the TopN node to include ties for the last place, which is
also added in this patch.
If predicates are trivially false, we generate empty nodes.
This interacts with the limit pushdwon optimization. The limit
pushdown optimization is applied after the partitioned top-n
is generated, and can sometimes result in more optimal plans,
so it is generalized to handle pushing into partitioned top-n
nodes.
Backend changes:
---------------
The top-n node in the backend is augmented to handle
the partitioned case, for which we use a std::map and a
comparator based on the partition exprs. The partitioned
top-n node has a soft limit of 64MB on the size of the
in-memory heaps and can spill with use of an embedded Sorter.
The current implementation tries to evict heaps that are
less effective at filtering rows.
Limitations:
-----------
There are several possible extensions to this that we did not do:
* dense_rank() is not supported because it would require additional
backend support - IMPALA-10014.
* ntile() is not supported because it would require additional
backend support - IMPALA-10174.
* Only one predicate per analytic is pushed.
* Redundant rank()/row_number() predicates are not merged,
only the lowest is chosen.
* Lower bounds are not converted into OFFSET.
* The analytic operator cannot be eliminated even if the analytic
expression was only used in the predicate.
* This doesn't push predicates into UNION - IMPALA-10013
* Always false predicates don't result in empty plan - IMPALA-10015
Tests:
-----
* Planner tests - added tests that exercise the interesting code
paths added in planning.
- Predicate ordering in SELECT nodes changed in a couple of cases
because some predicates were pushed into the inline views.
* Modified SORT targeted perf tests to avoid conversion to Top-N
* Added targeted perf test for partitioned top-n.
* End-to-end tests
- Unpartitioned Top-N end-to-end tests
- Basic partitioning and duplicate handling tests on functional
- Similar basic tests on larger inputs from TPC-DS and with
larger partition counts.
- I inspected the results and also ran the same tests with
analytic_rank_pushdown_threshold=0 to confirm that the
results were the same as with the full sort.
- Fallback to spilling sort.
Perf:
-----
Added a targeted benchmark that goes from ~2s to ~1s with
mt_dop=8 on TPC-H 30 on my desktop.
Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Reviewed-on: http://gerrit.cloudera.org:8080/16242
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
435 lines
19 KiB
Plaintext
435 lines
19 KiB
Plaintext
====
|
|
---- QUERY
|
|
set buffer_pool_limit=215m;
|
|
select count(l1.l_tax)
|
|
from
|
|
lineitem l1,
|
|
lineitem l2,
|
|
lineitem l3
|
|
where
|
|
l1.l_tax < 0.01 and
|
|
l2.l_tax < 0.04 and
|
|
l1.l_orderkey = l2.l_orderkey and
|
|
l1.l_orderkey = l3.l_orderkey and
|
|
l1.l_comment = l3.l_comment and
|
|
l1.l_shipdate = l3.l_shipdate
|
|
---- RESULTS
|
|
1846743
|
|
---- TYPES
|
|
BIGINT
|
|
---- RUNTIME_PROFILE
|
|
# Verify that at least one of the joins was spilled.
|
|
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
|
|
====
|
|
---- QUERY
|
|
set buffer_pool_limit=50m;
|
|
select max(t1.total_count), max(t1.l_shipinstruct), max(t1.l_comment) from
|
|
(select l_shipinstruct, l_comment, count(*) over () total_count from lineitem) t1
|
|
---- RESULTS
|
|
6001215,'TAKE BACK RETURN','zzle? slyly final platelets sleep quickly. '
|
|
---- TYPES
|
|
BIGINT, STRING, STRING
|
|
---- RUNTIME_PROFILE
|
|
# Verify that the analytic spilled
|
|
row_regex: .*PeakUnpinnedBytes: [1-9][0-9]*.*
|
|
====
|
|
---- QUERY
|
|
# Run this query with very low memory, but enough not to spill.
|
|
set buffer_pool_limit=20m;
|
|
select a.int_col, count(*)
|
|
from functional.alltypessmall a, functional.alltypessmall b, functional.alltypessmall c
|
|
where a.id = b.id and b.id = c.id group by a.int_col
|
|
---- RESULTS
|
|
0,12
|
|
1,12
|
|
2,12
|
|
3,12
|
|
4,12
|
|
5,8
|
|
6,8
|
|
7,8
|
|
8,8
|
|
9,8
|
|
---- TYPES
|
|
INT, BIGINT
|
|
---- RUNTIME_PROFILE
|
|
# This query is not meant to spill.
|
|
row_regex: .*SpilledPartitions: 0 .*
|
|
====
|
|
---- QUERY: TPCH-Q21
|
|
# Adding TPCH-Q21 in the spilling test to check for IMPALA-1471 (spilling left anti
|
|
# and left outer joins were returning wrong results).
|
|
# Q21 - Suppliers Who Kept Orders Waiting Query
|
|
set buffer_pool_limit=100m;
|
|
select
|
|
s_name,
|
|
count(*) as numwait
|
|
from
|
|
supplier,
|
|
lineitem l1 join [BROADCAST]
|
|
orders,
|
|
nation
|
|
where
|
|
s_suppkey = l1.l_suppkey
|
|
and o_orderkey = l1.l_orderkey
|
|
and o_orderstatus = 'F'
|
|
and l1.l_receiptdate > l1.l_commitdate
|
|
and exists (
|
|
select
|
|
*
|
|
from
|
|
lineitem l2
|
|
where
|
|
l2.l_orderkey = l1.l_orderkey
|
|
and l2.l_suppkey <> l1.l_suppkey
|
|
)
|
|
and not exists (
|
|
select
|
|
*
|
|
from
|
|
lineitem l3
|
|
where
|
|
l3.l_orderkey = l1.l_orderkey
|
|
and l3.l_suppkey <> l1.l_suppkey
|
|
and l3.l_receiptdate > l3.l_commitdate
|
|
)
|
|
and s_nationkey = n_nationkey
|
|
and n_name = 'SAUDI ARABIA'
|
|
group by
|
|
s_name
|
|
order by
|
|
numwait desc,
|
|
s_name
|
|
limit 100
|
|
---- RESULTS
|
|
'Supplier#000002829',20
|
|
'Supplier#000005808',18
|
|
'Supplier#000000262',17
|
|
'Supplier#000000496',17
|
|
'Supplier#000002160',17
|
|
'Supplier#000002301',17
|
|
'Supplier#000002540',17
|
|
'Supplier#000003063',17
|
|
'Supplier#000005178',17
|
|
'Supplier#000008331',17
|
|
'Supplier#000002005',16
|
|
'Supplier#000002095',16
|
|
'Supplier#000005799',16
|
|
'Supplier#000005842',16
|
|
'Supplier#000006450',16
|
|
'Supplier#000006939',16
|
|
'Supplier#000009200',16
|
|
'Supplier#000009727',16
|
|
'Supplier#000000486',15
|
|
'Supplier#000000565',15
|
|
'Supplier#000001046',15
|
|
'Supplier#000001047',15
|
|
'Supplier#000001161',15
|
|
'Supplier#000001336',15
|
|
'Supplier#000001435',15
|
|
'Supplier#000003075',15
|
|
'Supplier#000003335',15
|
|
'Supplier#000005649',15
|
|
'Supplier#000006027',15
|
|
'Supplier#000006795',15
|
|
'Supplier#000006800',15
|
|
'Supplier#000006824',15
|
|
'Supplier#000007131',15
|
|
'Supplier#000007382',15
|
|
'Supplier#000008913',15
|
|
'Supplier#000009787',15
|
|
'Supplier#000000633',14
|
|
'Supplier#000001960',14
|
|
'Supplier#000002323',14
|
|
'Supplier#000002490',14
|
|
'Supplier#000002993',14
|
|
'Supplier#000003101',14
|
|
'Supplier#000004489',14
|
|
'Supplier#000005435',14
|
|
'Supplier#000005583',14
|
|
'Supplier#000005774',14
|
|
'Supplier#000007579',14
|
|
'Supplier#000008180',14
|
|
'Supplier#000008695',14
|
|
'Supplier#000009224',14
|
|
'Supplier#000000357',13
|
|
'Supplier#000000436',13
|
|
'Supplier#000000610',13
|
|
'Supplier#000000788',13
|
|
'Supplier#000000889',13
|
|
'Supplier#000001062',13
|
|
'Supplier#000001498',13
|
|
'Supplier#000002056',13
|
|
'Supplier#000002312',13
|
|
'Supplier#000002344',13
|
|
'Supplier#000002596',13
|
|
'Supplier#000002615',13
|
|
'Supplier#000002978',13
|
|
'Supplier#000003048',13
|
|
'Supplier#000003234',13
|
|
'Supplier#000003727',13
|
|
'Supplier#000003806',13
|
|
'Supplier#000004472',13
|
|
'Supplier#000005236',13
|
|
'Supplier#000005906',13
|
|
'Supplier#000006241',13
|
|
'Supplier#000006326',13
|
|
'Supplier#000006384',13
|
|
'Supplier#000006394',13
|
|
'Supplier#000006624',13
|
|
'Supplier#000006629',13
|
|
'Supplier#000006682',13
|
|
'Supplier#000006737',13
|
|
'Supplier#000006825',13
|
|
'Supplier#000007021',13
|
|
'Supplier#000007417',13
|
|
'Supplier#000007497',13
|
|
'Supplier#000007602',13
|
|
'Supplier#000008134',13
|
|
'Supplier#000008234',13
|
|
'Supplier#000009435',13
|
|
'Supplier#000009436',13
|
|
'Supplier#000009564',13
|
|
'Supplier#000009896',13
|
|
'Supplier#000000379',12
|
|
'Supplier#000000673',12
|
|
'Supplier#000000762',12
|
|
'Supplier#000000811',12
|
|
'Supplier#000000821',12
|
|
'Supplier#000001337',12
|
|
'Supplier#000001916',12
|
|
'Supplier#000001925',12
|
|
'Supplier#000002039',12
|
|
'Supplier#000002357',12
|
|
'Supplier#000002483',12
|
|
---- TYPES
|
|
string, bigint
|
|
---- RUNTIME_PROFILE
|
|
# Verify that at least one of the joins was spilled.
|
|
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
|
|
====
|
|
---- QUERY
|
|
# IMPALA-1346/IMPALA-1546: fix sorter memory management so that it can complete
|
|
# successfully when in same pipeline as a spilling join.
|
|
set buffer_pool_limit=170m;
|
|
set disable_outermost_topn=1;
|
|
select * from lineitem
|
|
inner join orders on l_orderkey = o_orderkey
|
|
order by l_linenumber, l_suppkey, l_partkey, l_orderkey
|
|
limit 20
|
|
---- RESULTS
|
|
4567296,2500,1,1,48.00,67320.00,0.06,0.05,'N','O','1997-05-15','1997-05-19','1997-05-27','DELIVER IN PERSON','REG AIR','ccounts cajole quickly ',4567296,100420,'O',113399.69,'1997-02-21','1-URGENT','Clerk#000000779',0,'ously ironic instructions. pa'
|
|
5427587,2500,1,1,40.00,56100.00,0.05,0.07,'N','O','1997-04-01','1997-05-22','1997-04-29','TAKE BACK RETURN','RAIL','oxes wake even theodolites: bold requests',5427587,110356,'O',182983.43,'1997-03-10','1-URGENT','Clerk#000000279',0,'osits wake along the ca'
|
|
3834597,7500,1,1,3.00,4222.50,0.03,0.01,'N','O','1998-07-03','1998-05-12','1998-07-04','COLLECT COD','AIR','sly final instructions boost about',3834597,29839,'O',147632.36,'1998-03-23','1-URGENT','Clerk#000000284',0,'le carefully blithel'
|
|
4888512,10000,1,1,14.00,12740.00,0.03,0.01,'A','F','1993-02-07','1992-12-29','1993-02-19','NONE','SHIP','al braids. unusual, silent sentiments c',4888512,93742,'F',12481.37,'1992-11-21','4-NOT SPECIFIED','Clerk#000000591',0,' requests hinder blithely. closely ironic theodolites cajole among the car'
|
|
693345,12497,1,1,14.00,19732.86,0.03,0.07,'N','O','1998-10-12','1998-08-18','1998-10-23','NONE','FOB','. blithely',693345,49472,'O',111825.01,'1998-07-06','4-NOT SPECIFIED','Clerk#000000132',0,'es wake regularly furiously pending orbits. quickly even requests according t'
|
|
710784,12497,1,1,11.00,15504.39,0.02,0.07,'R','F','1992-08-21','1992-08-11','1992-09-05','TAKE BACK RETURN','SHIP','haggle furiously special accounts? final th',710784,82009,'F',91959.19,'1992-06-20','1-URGENT','Clerk#000000001',0,'rve quickly after the express theodolites. furiou'
|
|
1240672,12497,1,1,41.00,57789.09,0.08,0.05,'N','O','1998-09-18','1998-08-04','1998-09-24','NONE','RAIL','o the furiously unusual requests sleep alo',1240672,59153,'O',55824.25,'1998-05-28','1-URGENT','Clerk#000000657',0,'ular pinto beans are above the furiously regular accounts: furiously even foxe'
|
|
2024230,12497,1,1,16.00,22551.84,0.05,0.00,'A','F','1992-04-10','1992-04-14','1992-04-13','NONE','RAIL','pecial theodolites wake slyly. care',2024230,145955,'F',186715.47,'1992-02-22','5-LOW','Clerk#000000904',0,'uses. accounts are furiously. fluffily regular ideas haggle b'
|
|
3039715,12497,1,1,14.00,19732.86,0.01,0.02,'R','F','1993-03-24','1993-02-12','1993-03-28','TAKE BACK RETURN','AIR','onic requests. furiously spe',3039715,27274,'F',80817.94,'1992-11-23','2-HIGH','Clerk#000000766',0,'lently regular packages believe slyly around the regular, regula'
|
|
3105635,12497,1,1,16.00,22551.84,0.03,0.07,'R','F','1993-07-30','1993-08-28','1993-08-19','TAKE BACK RETURN','SHIP','nal, ironic theodolites solve carefully',3105635,23665,'F',256886.23,'1993-06-22','5-LOW','Clerk#000000029',0,'otes-- blithely special accounts cajole slyly furiously silent dugout'
|
|
3764485,12497,1,1,22.00,31008.78,0.05,0.08,'R','F','1993-02-07','1993-03-18','1993-02-10','COLLECT COD','AIR',' pinto beans nag',3764485,19375,'F',107449.65,'1992-12-26','5-LOW','Clerk#000000364',0,' carefully regular foxes bo'
|
|
4767393,12497,1,1,3.00,4228.47,0.09,0.06,'A','F','1992-09-15','1992-10-02','1992-09-17','DELIVER IN PERSON','FOB',' asymptotes. blithely',4767393,22522,'F',163054.32,'1992-08-10','2-HIGH','Clerk#000000174',0,'unusual foxes after the furiously regular multipliers detect ca'
|
|
1955428,14998,1,1,47.00,89910.53,0.07,0.02,'R','F','1993-01-25','1993-04-02','1993-01-31','DELIVER IN PERSON','MAIL','thely regular frays',1955428,41801,'F',134123.84,'1993-01-08','4-NOT SPECIFIED','Clerk#000000422',0,'ld deposits are slyly. quickly bold ideas bo'
|
|
4197636,14998,1,1,6.00,11477.94,0.03,0.01,'A','F','1993-11-28','1993-11-27','1993-12-03','TAKE BACK RETURN','SHIP','ly. slyly final ex',4197636,77494,'F',283248.52,'1993-10-18','1-URGENT','Clerk#000000087',0,'final asymptotes? packages doze against the ironic packages. ironic, bold dec'
|
|
4899558,17499,1,1,12.00,16997.88,0.00,0.08,'A','F','1992-08-06','1992-08-09','1992-08-14','COLLECT COD','RAIL','ourts would sleep fluffily express accou',4899558,111973,'F',300951.70,'1992-05-21','1-URGENT','Clerk#000000826',0,'eodolites wake ironic sentiments. r'
|
|
4725760,20000,1,1,48.00,44160.00,0.02,0.05,'R','F','1992-03-06','1992-03-04','1992-03-20','NONE','SHIP','liers. slyly regular request',4725760,45176,'F',202199.68,'1992-01-31','4-NOT SPECIFIED','Clerk#000000853',0,'riously regular accounts. ironic, bold requests was slyly; slyly regula'
|
|
49444,22494,1,1,22.00,31162.78,0.06,0.08,'N','O','1997-03-03','1997-04-19','1997-03-25','TAKE BACK RETURN','MAIL','l requests among the blithely fin',49444,11725,'O',273794.80,'1997-01-23','2-HIGH','Clerk#000000814',0,'s. quickly bold packages integrate. furio'
|
|
396839,24996,1,1,9.00,17288.91,0.03,0.07,'N','O','1996-06-19','1996-04-27','1996-07-13','COLLECT COD','MAIL',' accounts ',396839,63389,'O',193790.34,'1996-03-11','5-LOW','Clerk#000000436',0,'eans. instructions are quickly--'
|
|
1109894,24996,1,1,22.00,42261.78,0.01,0.05,'A','F','1995-01-10','1995-03-01','1995-01-21','TAKE BACK RETURN','RAIL','nusual requests wake. qu',1109894,35489,'F',78070.57,'1994-12-23','2-HIGH','Clerk#000000524',0,' bold deposits engage fluffily among the s'
|
|
5825859,24996,1,1,5.00,9604.95,0.02,0.00,'A','F','1995-05-05','1995-03-17','1995-05-21','TAKE BACK RETURN','MAIL','y special a',5825859,13285,'F',90407.81,'1995-01-15','3-MEDIUM','Clerk#000000818',0,'ajole. quickly ironic theodolites '
|
|
---- TYPES
|
|
BIGINT,BIGINT,BIGINT,INT,DECIMAL,DECIMAL,DECIMAL,DECIMAL,STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING,BIGINT,BIGINT,STRING,DECIMAL,STRING,STRING,STRING,INT,STRING
|
|
---- RUNTIME_PROFILE
|
|
# Verify that the sort and join actually spilled
|
|
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
|
|
row_regex: .*TotalMergesPerformed: .* \([1-9][0-9]*\)
|
|
====
|
|
---- QUERY
|
|
# IMPALA-5173: spilling hash join feeding into right side of nested loop join.
|
|
# Equivalent to:
|
|
# select *
|
|
# from lineitem
|
|
# where l1.l_quantity = 31.0 and l1.l_tax = 0.03 and l1.l_orderkey <= 100000
|
|
# order by l_orderkey, l_partkey, l_suppkey, l_linenumber
|
|
# limit 5
|
|
set buffer_pool_limit=177m;
|
|
set num_nodes=1;
|
|
select straight_join l.*
|
|
from
|
|
(select *
|
|
from tpch_parquet.orders limit 1) o,
|
|
(select l2.*
|
|
from tpch_parquet.lineitem l1
|
|
inner join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey
|
|
and l1.l_partkey = l2.l_partkey
|
|
and l1.l_suppkey = l2.l_suppkey and l1.l_linenumber = l2.l_linenumber
|
|
where
|
|
# Include a selective post-join predicate so that the RHS of the nested loop join
|
|
# doesn't consume too much memory.
|
|
(l1.l_quantity != l2.l_quantity or l1.l_quantity = 31.0 and l1.l_tax = 0.03)
|
|
# Reduce the data size to get the test to execute quicker
|
|
and l1.l_orderkey <= 100000) l
|
|
order by l_orderkey, l_partkey, l_suppkey, l_linenumber
|
|
limit 5;
|
|
---- TYPES
|
|
bigint,bigint,bigint,int,decimal,decimal,decimal,decimal,string,string,string,string,string,string,string,string
|
|
---- RESULTS
|
|
288,50641,8157,1,31.00,49340.84,0.00,0.03,'N','O','1997-03-17','1997-04-28','1997-04-06','TAKE BACK RETURN','AIR','instructions wa'
|
|
418,18552,1054,1,31.00,45587.05,0.00,0.03,'N','F','1995-06-05','1995-06-18','1995-06-26','COLLECT COD','FOB','final theodolites. fluffil'
|
|
482,61141,6154,3,31.00,34166.34,0.04,0.03,'N','O','1996-06-01','1996-05-06','1996-06-17','NONE','MAIL',' blithe pin'
|
|
1382,156162,6163,5,31.00,37762.96,0.07,0.03,'R','F','1993-10-26','1993-10-15','1993-11-09','TAKE BACK RETURN','FOB','hely regular dependencies. f'
|
|
1509,186349,3904,6,31.00,44495.54,0.04,0.03,'A','F','1993-07-14','1993-08-21','1993-08-06','COLLECT COD','SHIP','ic deposits cajole carefully. quickly bold '
|
|
====
|
|
---- QUERY
|
|
# Test spilling aggregation when grouping by nondeterministic expression
|
|
set buffer_pool_limit=79m;
|
|
set num_nodes=1;
|
|
select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_comment
|
|
from tpch_parquet.lineitem
|
|
group by 1, 2, 3, 4, 5, random()
|
|
limit 5
|
|
---- RUNTIME_PROFILE
|
|
row_regex: .*Query State: FINISHED.*
|
|
row_regex: .*Query Status: OK.*
|
|
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
|
|
====
|
|
---- QUERY
|
|
# Test spilling join with many duplicates in join key. We don't expect this to succeed
|
|
# with a memory constraint: see IMPALA-4857. Limit size of probe so that query doesn't
|
|
# bog down executing an exploding join.
|
|
# The additional "order by" and "limit" clauses make sure that a successful
|
|
# query does not too much data to the client.
|
|
set buffer_pool_limit=167m;
|
|
select straight_join *
|
|
from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber
|
|
where l1.l_orderkey < 100000
|
|
order by l1.l_orderkey desc, l1.l_linenumber desc limit 10
|
|
---- CATCH
|
|
Repartitioning did not reduce the size of a spilled partition
|
|
====
|
|
---- QUERY
|
|
# IMPALA-9725: incorrect results for spilling joins with very wide join keys.
|
|
# This test is crafted to always return 0 rows, but with the previous bug
|
|
# would sometimes return rows while spilling. The query needs to have a join
|
|
# key that is more than 256 bytes wide to trigger the bug.
|
|
set buffer_pool_limit=110m;
|
|
set runtime_filter_mode=0;
|
|
SELECT straight_join o_orderkey
|
|
FROM (
|
|
SELECT *
|
|
FROM orders
|
|
JOIN customer ON o_custkey = c_custkey
|
|
JOIN nation ON c_nationkey = n_nationkey
|
|
JOIN region ON n_regionkey = r_regionkey
|
|
WHERE o_orderkey < 500000) o1
|
|
LEFT ANTI JOIN /*+broadcast*/ (
|
|
SELECT *
|
|
FROM orders
|
|
JOIN customer ON o_custkey = c_custkey
|
|
JOIN nation ON c_nationkey = n_nationkey
|
|
JOIN region ON n_regionkey = r_regionkey
|
|
WHERE o_orderkey < 500000) o2 ON o1.o_orderkey = o2.o_orderkey
|
|
AND o1.o_custkey = o2.o_custkey
|
|
AND o1.o_orderstatus = o2.o_orderstatus
|
|
AND o1.o_totalprice = o2.o_totalprice
|
|
AND o1.o_orderdate = o2.o_orderdate
|
|
AND o1.o_orderpriority = o2.o_orderpriority
|
|
AND o1.o_clerk = o2.o_clerk
|
|
AND o1.o_shippriority = o2.o_shippriority
|
|
AND o1.o_comment = o2.o_comment
|
|
AND o1.c_custkey = o2.c_custkey
|
|
AND o1.c_name = o2.c_name
|
|
AND o1.c_address = o2.c_address
|
|
AND o1.c_nationkey = o2.c_nationkey
|
|
AND o1.c_phone = o2.c_phone
|
|
AND o1.c_acctbal = o2.c_acctbal
|
|
AND o1.c_mktsegment = o2.c_mktsegment
|
|
AND o1.n_nationkey = o2.n_nationkey
|
|
AND o1.n_name = o2.n_name
|
|
AND o1.n_regionkey = o2.n_regionkey
|
|
AND o1.n_comment = o2.n_comment
|
|
AND o1.r_name = o2.r_name
|
|
AND o1.r_comment = o2.r_comment
|
|
AND fnv_hash(o1.n_name) = fnv_hash(o2.n_name)
|
|
AND fnv_hash(o1.r_name) = fnv_hash(o2.r_name)
|
|
AND fnv_hash(o1.o_orderstatus) = fnv_hash(o2.o_orderstatus)
|
|
AND fnv_hash(o1.o_shippriority) = fnv_hash(o2.o_shippriority)
|
|
AND fnv_hash(o1.o_orderdate) = fnv_hash(o2.o_orderdate)
|
|
AND fnv_hash(o1.o_orderpriority) = fnv_hash(o2.o_orderpriority)
|
|
AND fnv_hash(o1.o_clerk) = fnv_hash(o2.o_clerk)
|
|
ORDER BY o_orderkey
|
|
---- RESULTS
|
|
---- TYPES
|
|
BIGINT
|
|
---- RUNTIME_PROFILE
|
|
# Verify that at least one of the joins was spilled.
|
|
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
|
|
====
|
|
---- QUERY
|
|
# Spilling partitioned top-N.
|
|
# Find suppliers with two or more duplicate l_extendedprice values tied for last place.
|
|
# This has a relatively high partition count because it partitions by suppliers, so
|
|
# it ends up falling back to doing a full, spilling, sort.
|
|
set buffer_pool_limit=100m;
|
|
SELECT s_name, min(l_quantity), min(l_extendedprice), min(l_discount), min(l_tax),
|
|
count(*)
|
|
FROM (SELECT s_name, lineitem.*, Rank()
|
|
OVER(PARTITION BY s_name
|
|
ORDER BY l_extendedprice) AS rank
|
|
FROM lineitem join supplier on l_suppkey = s_suppkey) a
|
|
WHERE rank < 500
|
|
group by s_name
|
|
having count(*) > 500
|
|
order by s_name
|
|
---- RESULTS
|
|
'Supplier#000001076',1.00,989.07,0.00,0.00,501
|
|
'Supplier#000001645',1.00,1048.14,0.00,0.00,501
|
|
'Supplier#000002145',1.00,1046.14,0.00,0.00,501
|
|
'Supplier#000002197',1.00,1103.19,0.00,0.00,501
|
|
'Supplier#000002269',1.00,1220.26,0.00,0.00,501
|
|
'Supplier#000002435',1.00,1365.42,0.00,0.00,501
|
|
'Supplier#000002644',1.00,1076.13,0.00,0.00,501
|
|
'Supplier#000002741',1.00,1145.24,0.00,0.00,501
|
|
'Supplier#000002827',1.00,1287.30,0.00,0.00,501
|
|
'Supplier#000002986',1.00,1397.48,0.00,0.00,501
|
|
'Supplier#000003304',1.00,1236.30,0.00,0.00,501
|
|
'Supplier#000003631',1.00,1049.12,0.00,0.00,501
|
|
'Supplier#000003646',1.00,1058.14,0.00,0.00,501
|
|
'Supplier#000003695',1.00,1104.19,0.00,0.00,501
|
|
'Supplier#000003811',1.00,1256.30,0.00,0.00,501
|
|
'Supplier#000004068',1.00,981.06,0.00,0.00,501
|
|
'Supplier#000004131',1.00,1054.13,0.00,0.00,501
|
|
'Supplier#000004371',1.00,1284.37,0.00,0.00,501
|
|
'Supplier#000004442',1.00,1395.44,0.00,0.00,501
|
|
'Supplier#000004443',1.00,1346.44,0.00,0.00,501
|
|
'Supplier#000004913',1.00,1368.39,0.00,0.00,501
|
|
'Supplier#000004924',1.00,1330.42,0.00,0.00,501
|
|
'Supplier#000004926',1.00,1426.41,0.00,0.00,501
|
|
'Supplier#000004931',1.00,1368.42,0.00,0.00,501
|
|
'Supplier#000004941',1.00,1368.43,0.00,0.00,501
|
|
'Supplier#000005166',1.00,1080.16,0.00,0.00,501
|
|
'Supplier#000005400',1.00,1299.39,0.00,0.00,501
|
|
'Supplier#000005468',1.00,1367.46,0.00,0.00,501
|
|
'Supplier#000005852',1.00,1254.35,0.00,0.00,501
|
|
'Supplier#000006397',1.00,1342.39,0.00,0.00,501
|
|
'Supplier#000007646',1.00,1045.14,0.00,0.00,501
|
|
'Supplier#000007830',1.00,1234.32,0.00,0.00,501
|
|
'Supplier#000008157',1.00,1144.15,0.00,0.00,501
|
|
'Supplier#000008432',1.00,1414.41,0.00,0.00,501
|
|
'Supplier#000008720',1.00,1143.21,0.00,0.00,501
|
|
'Supplier#000008787',1.00,1192.28,0.00,0.00,501
|
|
'Supplier#000008851',1.00,1251.35,0.00,0.00,501
|
|
'Supplier#000008859',1.00,1259.35,0.00,0.00,501
|
|
'Supplier#000009255',1.00,1163.25,0.00,0.00,501
|
|
'Supplier#000009675',1.00,1139.14,0.00,0.00,501
|
|
'Supplier#000009847',1.00,1262.34,0.00,0.00,501
|
|
'Supplier#000009883',1.00,1298.37,0.00,0.00,501
|
|
---- TYPES
|
|
STRING,DECIMAL,DECIMAL,DECIMAL,DECIMAL,BIGINT
|
|
---- RUNTIME_PROFILE
|
|
# Verify that at least one of the Top-N operators spilled.
|
|
row_regex: .*InMemoryHeapsEvicted: .* \([1-9][0-9]*\)
|
|
#row_regex: .*SpilledRuns: .* \([1-9][0-9]*\)
|
|
====
|