Files
impala/testdata/workloads/functional-query/queries/QueryTest/spilling.test
Tim Armstrong fb5dc9eb48 IMPALA-4835: switch I/O buffers to buffer pool
This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

======================================================================
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
     freed in the order they were returned from the scan range, so that
     the "eos" buffer was returned last. Instead just count the number
     of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
     reason about - violated locking rules and it was unclear that it
     was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

======================================================================
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

======================================================================
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.

I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.

Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
  action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
  since the algorithm is non-trivial.

Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.

Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00
Reviewed-on: http://gerrit.cloudera.org:8080/9679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-04-28 23:41:39 +00:00

306 lines
13 KiB
Plaintext

====
---- QUERY
set buffer_pool_limit=215m;
select count(l1.l_tax)
from
lineitem l1,
lineitem l2,
lineitem l3
where
l1.l_tax < 0.01 and
l2.l_tax < 0.04 and
l1.l_orderkey = l2.l_orderkey and
l1.l_orderkey = l3.l_orderkey and
l1.l_comment = l3.l_comment and
l1.l_shipdate = l3.l_shipdate
---- RESULTS
1846743
---- TYPES
BIGINT
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
set buffer_pool_limit=50m;
select max(t1.total_count), max(t1.l_shipinstruct), max(t1.l_comment) from
(select l_shipinstruct, l_comment, count(*) over () total_count from lineitem) t1
---- RESULTS
6001215,'TAKE BACK RETURN','zzle? slyly final platelets sleep quickly. '
---- TYPES
BIGINT, STRING, STRING
---- RUNTIME_PROFILE
# Verify that the analytic spilled
row_regex: .*PeakUnpinnedBytes: [1-9][0-9]*.*
====
---- QUERY
# Run this query with very low memory, but enough not to spill.
set buffer_pool_limit=20m;
select a.int_col, count(*)
from functional.alltypessmall a, functional.alltypessmall b, functional.alltypessmall c
where a.id = b.id and b.id = c.id group by a.int_col
---- RESULTS
0,12
1,12
2,12
3,12
4,12
5,8
6,8
7,8
8,8
9,8
---- TYPES
INT, BIGINT
---- RUNTIME_PROFILE
# This query is not meant to spill.
row_regex: .*SpilledPartitions: 0 .*
====
---- QUERY: TPCH-Q22
# Adding TPCH-Q21 in the spilling test to check for IMPALA-1471 (spilling left anti
# and left outer joins were returning wrong results).
# Q21 - Suppliers Who Kept Orders Waiting Query
set buffer_pool_limit=200m;
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1 join [BROADCAST]
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100
---- RESULTS
'Supplier#000002829',20
'Supplier#000005808',18
'Supplier#000000262',17
'Supplier#000000496',17
'Supplier#000002160',17
'Supplier#000002301',17
'Supplier#000002540',17
'Supplier#000003063',17
'Supplier#000005178',17
'Supplier#000008331',17
'Supplier#000002005',16
'Supplier#000002095',16
'Supplier#000005799',16
'Supplier#000005842',16
'Supplier#000006450',16
'Supplier#000006939',16
'Supplier#000009200',16
'Supplier#000009727',16
'Supplier#000000486',15
'Supplier#000000565',15
'Supplier#000001046',15
'Supplier#000001047',15
'Supplier#000001161',15
'Supplier#000001336',15
'Supplier#000001435',15
'Supplier#000003075',15
'Supplier#000003335',15
'Supplier#000005649',15
'Supplier#000006027',15
'Supplier#000006795',15
'Supplier#000006800',15
'Supplier#000006824',15
'Supplier#000007131',15
'Supplier#000007382',15
'Supplier#000008913',15
'Supplier#000009787',15
'Supplier#000000633',14
'Supplier#000001960',14
'Supplier#000002323',14
'Supplier#000002490',14
'Supplier#000002993',14
'Supplier#000003101',14
'Supplier#000004489',14
'Supplier#000005435',14
'Supplier#000005583',14
'Supplier#000005774',14
'Supplier#000007579',14
'Supplier#000008180',14
'Supplier#000008695',14
'Supplier#000009224',14
'Supplier#000000357',13
'Supplier#000000436',13
'Supplier#000000610',13
'Supplier#000000788',13
'Supplier#000000889',13
'Supplier#000001062',13
'Supplier#000001498',13
'Supplier#000002056',13
'Supplier#000002312',13
'Supplier#000002344',13
'Supplier#000002596',13
'Supplier#000002615',13
'Supplier#000002978',13
'Supplier#000003048',13
'Supplier#000003234',13
'Supplier#000003727',13
'Supplier#000003806',13
'Supplier#000004472',13
'Supplier#000005236',13
'Supplier#000005906',13
'Supplier#000006241',13
'Supplier#000006326',13
'Supplier#000006384',13
'Supplier#000006394',13
'Supplier#000006624',13
'Supplier#000006629',13
'Supplier#000006682',13
'Supplier#000006737',13
'Supplier#000006825',13
'Supplier#000007021',13
'Supplier#000007417',13
'Supplier#000007497',13
'Supplier#000007602',13
'Supplier#000008134',13
'Supplier#000008234',13
'Supplier#000009435',13
'Supplier#000009436',13
'Supplier#000009564',13
'Supplier#000009896',13
'Supplier#000000379',12
'Supplier#000000673',12
'Supplier#000000762',12
'Supplier#000000811',12
'Supplier#000000821',12
'Supplier#000001337',12
'Supplier#000001916',12
'Supplier#000001925',12
'Supplier#000002039',12
'Supplier#000002357',12
'Supplier#000002483',12
---- TYPES
string, bigint
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# IMPALA-1346/IMPALA-1546: fix sorter memory management so that it can complete
# successfully when in same pipeline as a spilling join.
set buffer_pool_limit=170m;
set disable_outermost_topn=1;
select * from lineitem
inner join orders on l_orderkey = o_orderkey
order by l_linenumber, l_suppkey, l_partkey, l_orderkey
limit 20
---- RESULTS
4567296,2500,1,1,48.00,67320.00,0.06,0.05,'N','O','1997-05-15','1997-05-19','1997-05-27','DELIVER IN PERSON','REG AIR','ccounts cajole quickly ',4567296,100420,'O',113399.69,'1997-02-21','1-URGENT','Clerk#000000779',0,'ously ironic instructions. pa'
5427587,2500,1,1,40.00,56100.00,0.05,0.07,'N','O','1997-04-01','1997-05-22','1997-04-29','TAKE BACK RETURN','RAIL','oxes wake even theodolites: bold requests',5427587,110356,'O',182983.43,'1997-03-10','1-URGENT','Clerk#000000279',0,'osits wake along the ca'
3834597,7500,1,1,3.00,4222.50,0.03,0.01,'N','O','1998-07-03','1998-05-12','1998-07-04','COLLECT COD','AIR','sly final instructions boost about',3834597,29839,'O',147632.36,'1998-03-23','1-URGENT','Clerk#000000284',0,'le carefully blithel'
4888512,10000,1,1,14.00,12740.00,0.03,0.01,'A','F','1993-02-07','1992-12-29','1993-02-19','NONE','SHIP','al braids. unusual, silent sentiments c',4888512,93742,'F',12481.37,'1992-11-21','4-NOT SPECIFIED','Clerk#000000591',0,' requests hinder blithely. closely ironic theodolites cajole among the car'
693345,12497,1,1,14.00,19732.86,0.03,0.07,'N','O','1998-10-12','1998-08-18','1998-10-23','NONE','FOB','. blithely',693345,49472,'O',111825.01,'1998-07-06','4-NOT SPECIFIED','Clerk#000000132',0,'es wake regularly furiously pending orbits. quickly even requests according t'
710784,12497,1,1,11.00,15504.39,0.02,0.07,'R','F','1992-08-21','1992-08-11','1992-09-05','TAKE BACK RETURN','SHIP','haggle furiously special accounts? final th',710784,82009,'F',91959.19,'1992-06-20','1-URGENT','Clerk#000000001',0,'rve quickly after the express theodolites. furiou'
1240672,12497,1,1,41.00,57789.09,0.08,0.05,'N','O','1998-09-18','1998-08-04','1998-09-24','NONE','RAIL','o the furiously unusual requests sleep alo',1240672,59153,'O',55824.25,'1998-05-28','1-URGENT','Clerk#000000657',0,'ular pinto beans are above the furiously regular accounts: furiously even foxe'
2024230,12497,1,1,16.00,22551.84,0.05,0.00,'A','F','1992-04-10','1992-04-14','1992-04-13','NONE','RAIL','pecial theodolites wake slyly. care',2024230,145955,'F',186715.47,'1992-02-22','5-LOW','Clerk#000000904',0,'uses. accounts are furiously. fluffily regular ideas haggle b'
3039715,12497,1,1,14.00,19732.86,0.01,0.02,'R','F','1993-03-24','1993-02-12','1993-03-28','TAKE BACK RETURN','AIR','onic requests. furiously spe',3039715,27274,'F',80817.94,'1992-11-23','2-HIGH','Clerk#000000766',0,'lently regular packages believe slyly around the regular, regula'
3105635,12497,1,1,16.00,22551.84,0.03,0.07,'R','F','1993-07-30','1993-08-28','1993-08-19','TAKE BACK RETURN','SHIP','nal, ironic theodolites solve carefully',3105635,23665,'F',256886.23,'1993-06-22','5-LOW','Clerk#000000029',0,'otes-- blithely special accounts cajole slyly furiously silent dugout'
3764485,12497,1,1,22.00,31008.78,0.05,0.08,'R','F','1993-02-07','1993-03-18','1993-02-10','COLLECT COD','AIR',' pinto beans nag',3764485,19375,'F',107449.65,'1992-12-26','5-LOW','Clerk#000000364',0,' carefully regular foxes bo'
4767393,12497,1,1,3.00,4228.47,0.09,0.06,'A','F','1992-09-15','1992-10-02','1992-09-17','DELIVER IN PERSON','FOB',' asymptotes. blithely',4767393,22522,'F',163054.32,'1992-08-10','2-HIGH','Clerk#000000174',0,'unusual foxes after the furiously regular multipliers detect ca'
1955428,14998,1,1,47.00,89910.53,0.07,0.02,'R','F','1993-01-25','1993-04-02','1993-01-31','DELIVER IN PERSON','MAIL','thely regular frays',1955428,41801,'F',134123.84,'1993-01-08','4-NOT SPECIFIED','Clerk#000000422',0,'ld deposits are slyly. quickly bold ideas bo'
4197636,14998,1,1,6.00,11477.94,0.03,0.01,'A','F','1993-11-28','1993-11-27','1993-12-03','TAKE BACK RETURN','SHIP','ly. slyly final ex',4197636,77494,'F',283248.52,'1993-10-18','1-URGENT','Clerk#000000087',0,'final asymptotes? packages doze against the ironic packages. ironic, bold dec'
4899558,17499,1,1,12.00,16997.88,0.00,0.08,'A','F','1992-08-06','1992-08-09','1992-08-14','COLLECT COD','RAIL','ourts would sleep fluffily express accou',4899558,111973,'F',300951.70,'1992-05-21','1-URGENT','Clerk#000000826',0,'eodolites wake ironic sentiments. r'
4725760,20000,1,1,48.00,44160.00,0.02,0.05,'R','F','1992-03-06','1992-03-04','1992-03-20','NONE','SHIP','liers. slyly regular request',4725760,45176,'F',202199.68,'1992-01-31','4-NOT SPECIFIED','Clerk#000000853',0,'riously regular accounts. ironic, bold requests was slyly; slyly regula'
49444,22494,1,1,22.00,31162.78,0.06,0.08,'N','O','1997-03-03','1997-04-19','1997-03-25','TAKE BACK RETURN','MAIL','l requests among the blithely fin',49444,11725,'O',273794.80,'1997-01-23','2-HIGH','Clerk#000000814',0,'s. quickly bold packages integrate. furio'
396839,24996,1,1,9.00,17288.91,0.03,0.07,'N','O','1996-06-19','1996-04-27','1996-07-13','COLLECT COD','MAIL',' accounts ',396839,63389,'O',193790.34,'1996-03-11','5-LOW','Clerk#000000436',0,'eans. instructions are quickly--'
1109894,24996,1,1,22.00,42261.78,0.01,0.05,'A','F','1995-01-10','1995-03-01','1995-01-21','TAKE BACK RETURN','RAIL','nusual requests wake. qu',1109894,35489,'F',78070.57,'1994-12-23','2-HIGH','Clerk#000000524',0,' bold deposits engage fluffily among the s'
5825859,24996,1,1,5.00,9604.95,0.02,0.00,'A','F','1995-05-05','1995-03-17','1995-05-21','TAKE BACK RETURN','MAIL','y special a',5825859,13285,'F',90407.81,'1995-01-15','3-MEDIUM','Clerk#000000818',0,'ajole. quickly ironic theodolites '
---- TYPES
BIGINT,BIGINT,BIGINT,INT,DECIMAL,DECIMAL,DECIMAL,DECIMAL,STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING,BIGINT,BIGINT,STRING,DECIMAL,STRING,STRING,STRING,INT,STRING
---- RUNTIME_PROFILE
# Verify that the sort and join actually spilled
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
row_regex: .*TotalMergesPerformed: .* \([1-9][0-9]*\)
====
---- QUERY
# IMPALA-5173: spilling hash join feeding into right side of nested loop join.
# Equivalent to:
# select *
# from lineitem
# where l1.l_quantity = 31.0 and l1.l_tax = 0.03 and l1.l_orderkey <= 100000
# order by l_orderkey, l_partkey, l_suppkey, l_linenumber
# limit 5
set buffer_pool_limit=177m;
set num_nodes=1;
select straight_join l.*
from
(select *
from tpch_parquet.orders limit 1) o,
(select l2.*
from tpch_parquet.lineitem l1
inner join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey
and l1.l_partkey = l2.l_partkey
and l1.l_suppkey = l2.l_suppkey and l1.l_linenumber = l2.l_linenumber
where
# Include a selective post-join predicate so that the RHS of the nested loop join
# doesn't consume too much memory.
(l1.l_quantity != l2.l_quantity or l1.l_quantity = 31.0 and l1.l_tax = 0.03)
# Reduce the data size to get the test to execute quicker
and l1.l_orderkey <= 100000) l
order by l_orderkey, l_partkey, l_suppkey, l_linenumber
limit 5;
---- TYPES
bigint,bigint,bigint,int,decimal,decimal,decimal,decimal,string,string,string,string,string,string,string,string
---- RESULTS
288,50641,8157,1,31.00,49340.84,0.00,0.03,'N','O','1997-03-17','1997-04-28','1997-04-06','TAKE BACK RETURN','AIR','instructions wa'
418,18552,1054,1,31.00,45587.05,0.00,0.03,'N','F','1995-06-05','1995-06-18','1995-06-26','COLLECT COD','FOB','final theodolites. fluffil'
482,61141,6154,3,31.00,34166.34,0.04,0.03,'N','O','1996-06-01','1996-05-06','1996-06-17','NONE','MAIL',' blithe pin'
1382,156162,6163,5,31.00,37762.96,0.07,0.03,'R','F','1993-10-26','1993-10-15','1993-11-09','TAKE BACK RETURN','FOB','hely regular dependencies. f'
1509,186349,3904,6,31.00,44495.54,0.04,0.03,'A','F','1993-07-14','1993-08-21','1993-08-06','COLLECT COD','SHIP','ic deposits cajole carefully. quickly bold '
====
---- QUERY
# Test spilling aggregation when grouping by nondeterministic expression
set buffer_pool_limit=79m;
set num_nodes=1;
select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_comment
from tpch_parquet.lineitem
group by 1, 2, 3, 4, 5, random()
limit 5
---- RUNTIME_PROFILE
row_regex: .*Query State: FINISHED.*
row_regex: .*Query Status: OK.*
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# Test spilling join with many duplicates in join key. We don't expect this to succeed
# with a memory constraint: see IMPALA-4857.
set buffer_pool_limit=167m;
select *
from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber
---- CATCH
Repartitioning did not reduce the size of a spilled partition
====