Files
impala/testdata/workloads/functional-query/queries/QueryTest/spilling.test
Zoltan Borok-Nagy ae848a6cef IMPALA-12373: Small String Optimization for StringValue
This patch implements the Small String Optimization (SSO) for
StringValue objects. This is a well-known optimization in the C++
world that is used by the majority of various string implementations
(STL string, boost string, Folly string, etc.)

The old layout of the StringValue was:
  char* ptr;  // 8 bytes
  int len;    // 4 bytes

We also add the __packed__ attribute to the StringValue class which
means there is no padding between 'ptr' and 'len', neither after 'len'.
I.e. StringValue objects take 12 bytes. This means with SSO we can use
11 bytes to store small strings. In this case the last byte is used to
store the length.

Small string layout:
  char[11] buf;
  unsigned char len;

We also need an indicator bit (which tells whether the long
representation or the small representation is active) in the last byte
that is the same bit of LONG_STRING.len and SMALL_STRING.len. On
little-endian architectures this is the most significant bit (MSB) of
both LONG_STRING.len and SMALL_STRING.len. On big endian architectures
this would be the least significant bit (LSB) of both LONG_STRING.len
and SMALL_STRING.len. Since currently impala can only be built on
little endian architectures, this patch only adds code for such
platforms. Moreover, systems that use big endian usually support little
endian as well.

This patch adds SmallableString which implements the above on an
on-demand basis. I.e. all string objects start with the long
representation, then the string object can be explicitly asked to try
smallify itself. This is because I didn't want to introduce too much
change in behavior. This way we can try smallify only at certain points
(e.g. DeepCopy()), and we can also smallify all strings in a tuple at
once. The latter means if we've done that for a tuple, subsequent
smallifications can return on the first small string that is encountered
(because we can assume that all other string slots are also smallified).

Benefits:
 * lower memory and CPU cache consumption
 * smaller serialization buffers to compress
 * less data to send over the network
 * less data to spill to disk

Measurements:
I used TPCH(30) with the following query:

  select * from lineitem a, lineitem b
  where a.l_orderkey = b.l_orderkey and
        a.l_orderkey * b.l_orderkey < 1000

The above query generates significant network traffic and does spilling.
The query selects all 16 columns out of which 6.5 columns contain small
strings. I.e. this kind of data is a good candidate for this
optimization but also not unrealistic.

This improves the following numbers:
Total query time:             5m16s    --> 3m40s
Total CPU time:               12m9s    --> 10m17s
Bytes sent over the network:  54.17 GB --> 41.76 GB
Data had to be spilled:       14.66 GB --> 9.42 GB

On the standard benchmarks I measured the followings:

TPC-H:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.57    | -3.06%     | 2.40       | -1.34%         |
+----------+-----------------------+---------+------------+------------+----------------+

TPC-DS:
+-----------+-----------------------+---------+------------+------------+----------------+
| Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-----------+-----------------------+---------+------------+------------+----------------+
| TPCDS(30) | parquet / none / none | 2.09    | -0.92%     | 0.76       | -1.17%         |
+-----------+-----------------------+---------+------------+------------+----------------+

Testing:
 * There was query in 'spilling.test' that used to spill, but now it
   doesn't (at least in upstream GVO, in other environments the test
   passes). I cannot lower the buffer_pool_limit under the min
   reservation, so cannot make it spill without IMPALA-12549 that will
   update the estimations and min reservations.
 * Other tests that wanted to spill were modified to work on larger
   data sets
 * Added few backend tests in string-value-test
 * Added new complex types tests that have deeply nested small/long
   strings
 * Existing tests pass

Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Reviewed-on: http://gerrit.cloudera.org:8080/20496
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-21 16:42:23 +00:00

454 lines
19 KiB
Plaintext

====
---- QUERY
set buffer_pool_limit=110m;
select count(l1.l_tax)
from
lineitem l1,
lineitem l2,
lineitem l3
where
l1.l_tax < 0.01 and
l2.l_tax < 0.04 and
l1.l_orderkey = l2.l_orderkey and
l1.l_orderkey = l3.l_orderkey and
l1.l_comment = l3.l_comment and
l1.l_shipdate = l3.l_shipdate
---- RESULTS
1846743
---- TYPES
BIGINT
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
set buffer_pool_limit=50m;
select max(t1.total_count), max(t1.l_shipinstruct), max(t1.l_comment) from
(select l_shipinstruct, l_comment, count(*) over () total_count from lineitem) t1
---- RESULTS
6001215,'TAKE BACK RETURN','zzle? slyly final platelets sleep quickly. '
---- TYPES
BIGINT, STRING, STRING
---- RUNTIME_PROFILE
# Verify that the analytic spilled
row_regex: .*PeakUnpinnedBytes: [1-9][0-9]*.*
====
---- QUERY
# Run this query with very low memory, but enough not to spill.
set buffer_pool_limit=20m;
select a.int_col, count(*)
from functional.alltypessmall a, functional.alltypessmall b, functional.alltypessmall c
where a.id = b.id and b.id = c.id group by a.int_col
---- RESULTS
0,12
1,12
2,12
3,12
4,12
5,8
6,8
7,8
8,8
9,8
---- TYPES
INT, BIGINT
---- RUNTIME_PROFILE
# This query is not meant to spill.
row_regex: .*SpilledPartitions: 0 .*
====
---- QUERY: TPCH-Q21
# Adding TPCH-Q21 in the spilling test to check for IMPALA-1471 (spilling left anti
# and left outer joins were returning wrong results).
# Q21 - Suppliers Who Kept Orders Waiting Query
set buffer_pool_limit=90m;
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1 join [BROADCAST]
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100
---- RESULTS
'Supplier#000002829',20
'Supplier#000005808',18
'Supplier#000000262',17
'Supplier#000000496',17
'Supplier#000002160',17
'Supplier#000002301',17
'Supplier#000002540',17
'Supplier#000003063',17
'Supplier#000005178',17
'Supplier#000008331',17
'Supplier#000002005',16
'Supplier#000002095',16
'Supplier#000005799',16
'Supplier#000005842',16
'Supplier#000006450',16
'Supplier#000006939',16
'Supplier#000009200',16
'Supplier#000009727',16
'Supplier#000000486',15
'Supplier#000000565',15
'Supplier#000001046',15
'Supplier#000001047',15
'Supplier#000001161',15
'Supplier#000001336',15
'Supplier#000001435',15
'Supplier#000003075',15
'Supplier#000003335',15
'Supplier#000005649',15
'Supplier#000006027',15
'Supplier#000006795',15
'Supplier#000006800',15
'Supplier#000006824',15
'Supplier#000007131',15
'Supplier#000007382',15
'Supplier#000008913',15
'Supplier#000009787',15
'Supplier#000000633',14
'Supplier#000001960',14
'Supplier#000002323',14
'Supplier#000002490',14
'Supplier#000002993',14
'Supplier#000003101',14
'Supplier#000004489',14
'Supplier#000005435',14
'Supplier#000005583',14
'Supplier#000005774',14
'Supplier#000007579',14
'Supplier#000008180',14
'Supplier#000008695',14
'Supplier#000009224',14
'Supplier#000000357',13
'Supplier#000000436',13
'Supplier#000000610',13
'Supplier#000000788',13
'Supplier#000000889',13
'Supplier#000001062',13
'Supplier#000001498',13
'Supplier#000002056',13
'Supplier#000002312',13
'Supplier#000002344',13
'Supplier#000002596',13
'Supplier#000002615',13
'Supplier#000002978',13
'Supplier#000003048',13
'Supplier#000003234',13
'Supplier#000003727',13
'Supplier#000003806',13
'Supplier#000004472',13
'Supplier#000005236',13
'Supplier#000005906',13
'Supplier#000006241',13
'Supplier#000006326',13
'Supplier#000006384',13
'Supplier#000006394',13
'Supplier#000006624',13
'Supplier#000006629',13
'Supplier#000006682',13
'Supplier#000006737',13
'Supplier#000006825',13
'Supplier#000007021',13
'Supplier#000007417',13
'Supplier#000007497',13
'Supplier#000007602',13
'Supplier#000008134',13
'Supplier#000008234',13
'Supplier#000009435',13
'Supplier#000009436',13
'Supplier#000009564',13
'Supplier#000009896',13
'Supplier#000000379',12
'Supplier#000000673',12
'Supplier#000000762',12
'Supplier#000000811',12
'Supplier#000000821',12
'Supplier#000001337',12
'Supplier#000001916',12
'Supplier#000001925',12
'Supplier#000002039',12
'Supplier#000002357',12
'Supplier#000002483',12
---- TYPES
string, bigint
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# IMPALA-1346/IMPALA-1546: fix sorter memory management so that it can complete
# successfully when in same pipeline as a spilling join.
set buffer_pool_limit=170m;
set disable_outermost_topn=1;
select * from lineitem
inner join orders on l_orderkey = o_orderkey
order by l_linenumber, l_suppkey, l_partkey, l_orderkey
limit 20
---- RESULTS
4567296,2500,1,1,48.00,67320.00,0.06,0.05,'N','O','1997-05-15','1997-05-19','1997-05-27','DELIVER IN PERSON','REG AIR','ccounts cajole quickly ',4567296,100420,'O',113399.69,'1997-02-21','1-URGENT','Clerk#000000779',0,'ously ironic instructions. pa'
5427587,2500,1,1,40.00,56100.00,0.05,0.07,'N','O','1997-04-01','1997-05-22','1997-04-29','TAKE BACK RETURN','RAIL','oxes wake even theodolites: bold requests',5427587,110356,'O',182983.43,'1997-03-10','1-URGENT','Clerk#000000279',0,'osits wake along the ca'
3834597,7500,1,1,3.00,4222.50,0.03,0.01,'N','O','1998-07-03','1998-05-12','1998-07-04','COLLECT COD','AIR','sly final instructions boost about',3834597,29839,'O',147632.36,'1998-03-23','1-URGENT','Clerk#000000284',0,'le carefully blithel'
4888512,10000,1,1,14.00,12740.00,0.03,0.01,'A','F','1993-02-07','1992-12-29','1993-02-19','NONE','SHIP','al braids. unusual, silent sentiments c',4888512,93742,'F',12481.37,'1992-11-21','4-NOT SPECIFIED','Clerk#000000591',0,' requests hinder blithely. closely ironic theodolites cajole among the car'
693345,12497,1,1,14.00,19732.86,0.03,0.07,'N','O','1998-10-12','1998-08-18','1998-10-23','NONE','FOB','. blithely',693345,49472,'O',111825.01,'1998-07-06','4-NOT SPECIFIED','Clerk#000000132',0,'es wake regularly furiously pending orbits. quickly even requests according t'
710784,12497,1,1,11.00,15504.39,0.02,0.07,'R','F','1992-08-21','1992-08-11','1992-09-05','TAKE BACK RETURN','SHIP','haggle furiously special accounts? final th',710784,82009,'F',91959.19,'1992-06-20','1-URGENT','Clerk#000000001',0,'rve quickly after the express theodolites. furiou'
1240672,12497,1,1,41.00,57789.09,0.08,0.05,'N','O','1998-09-18','1998-08-04','1998-09-24','NONE','RAIL','o the furiously unusual requests sleep alo',1240672,59153,'O',55824.25,'1998-05-28','1-URGENT','Clerk#000000657',0,'ular pinto beans are above the furiously regular accounts: furiously even foxe'
2024230,12497,1,1,16.00,22551.84,0.05,0.00,'A','F','1992-04-10','1992-04-14','1992-04-13','NONE','RAIL','pecial theodolites wake slyly. care',2024230,145955,'F',186715.47,'1992-02-22','5-LOW','Clerk#000000904',0,'uses. accounts are furiously. fluffily regular ideas haggle b'
3039715,12497,1,1,14.00,19732.86,0.01,0.02,'R','F','1993-03-24','1993-02-12','1993-03-28','TAKE BACK RETURN','AIR','onic requests. furiously spe',3039715,27274,'F',80817.94,'1992-11-23','2-HIGH','Clerk#000000766',0,'lently regular packages believe slyly around the regular, regula'
3105635,12497,1,1,16.00,22551.84,0.03,0.07,'R','F','1993-07-30','1993-08-28','1993-08-19','TAKE BACK RETURN','SHIP','nal, ironic theodolites solve carefully',3105635,23665,'F',256886.23,'1993-06-22','5-LOW','Clerk#000000029',0,'otes-- blithely special accounts cajole slyly furiously silent dugout'
3764485,12497,1,1,22.00,31008.78,0.05,0.08,'R','F','1993-02-07','1993-03-18','1993-02-10','COLLECT COD','AIR',' pinto beans nag',3764485,19375,'F',107449.65,'1992-12-26','5-LOW','Clerk#000000364',0,' carefully regular foxes bo'
4767393,12497,1,1,3.00,4228.47,0.09,0.06,'A','F','1992-09-15','1992-10-02','1992-09-17','DELIVER IN PERSON','FOB',' asymptotes. blithely',4767393,22522,'F',163054.32,'1992-08-10','2-HIGH','Clerk#000000174',0,'unusual foxes after the furiously regular multipliers detect ca'
1955428,14998,1,1,47.00,89910.53,0.07,0.02,'R','F','1993-01-25','1993-04-02','1993-01-31','DELIVER IN PERSON','MAIL','thely regular frays',1955428,41801,'F',134123.84,'1993-01-08','4-NOT SPECIFIED','Clerk#000000422',0,'ld deposits are slyly. quickly bold ideas bo'
4197636,14998,1,1,6.00,11477.94,0.03,0.01,'A','F','1993-11-28','1993-11-27','1993-12-03','TAKE BACK RETURN','SHIP','ly. slyly final ex',4197636,77494,'F',283248.52,'1993-10-18','1-URGENT','Clerk#000000087',0,'final asymptotes? packages doze against the ironic packages. ironic, bold dec'
4899558,17499,1,1,12.00,16997.88,0.00,0.08,'A','F','1992-08-06','1992-08-09','1992-08-14','COLLECT COD','RAIL','ourts would sleep fluffily express accou',4899558,111973,'F',300951.70,'1992-05-21','1-URGENT','Clerk#000000826',0,'eodolites wake ironic sentiments. r'
4725760,20000,1,1,48.00,44160.00,0.02,0.05,'R','F','1992-03-06','1992-03-04','1992-03-20','NONE','SHIP','liers. slyly regular request',4725760,45176,'F',202199.68,'1992-01-31','4-NOT SPECIFIED','Clerk#000000853',0,'riously regular accounts. ironic, bold requests was slyly; slyly regula'
49444,22494,1,1,22.00,31162.78,0.06,0.08,'N','O','1997-03-03','1997-04-19','1997-03-25','TAKE BACK RETURN','MAIL','l requests among the blithely fin',49444,11725,'O',273794.80,'1997-01-23','2-HIGH','Clerk#000000814',0,'s. quickly bold packages integrate. furio'
396839,24996,1,1,9.00,17288.91,0.03,0.07,'N','O','1996-06-19','1996-04-27','1996-07-13','COLLECT COD','MAIL',' accounts ',396839,63389,'O',193790.34,'1996-03-11','5-LOW','Clerk#000000436',0,'eans. instructions are quickly--'
1109894,24996,1,1,22.00,42261.78,0.01,0.05,'A','F','1995-01-10','1995-03-01','1995-01-21','TAKE BACK RETURN','RAIL','nusual requests wake. qu',1109894,35489,'F',78070.57,'1994-12-23','2-HIGH','Clerk#000000524',0,' bold deposits engage fluffily among the s'
5825859,24996,1,1,5.00,9604.95,0.02,0.00,'A','F','1995-05-05','1995-03-17','1995-05-21','TAKE BACK RETURN','MAIL','y special a',5825859,13285,'F',90407.81,'1995-01-15','3-MEDIUM','Clerk#000000818',0,'ajole. quickly ironic theodolites '
---- TYPES
BIGINT,BIGINT,BIGINT,INT,DECIMAL,DECIMAL,DECIMAL,DECIMAL,STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING,BIGINT,BIGINT,STRING,DECIMAL,STRING,STRING,STRING,INT,STRING
---- RUNTIME_PROFILE
# Verify that the sort and join actually spilled
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
row_regex: .*TotalMergesPerformed: .* \([1-9][0-9]*\)
====
---- QUERY
# IMPALA-5173: spilling hash join feeding into right side of nested loop join.
# Equivalent to:
# select *
# from lineitem
# where l1.l_quantity = 31.0 and l1.l_tax = 0.03 and l1.l_orderkey <= 100000
# order by l_orderkey, l_partkey, l_suppkey, l_linenumber
# limit 5
set buffer_pool_limit=177m;
set num_nodes=1;
select straight_join l.*
from
(select *
from tpch_parquet.orders limit 1) o,
(select l2.*
from tpch_parquet.lineitem l1
inner join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey
and l1.l_partkey = l2.l_partkey
and l1.l_suppkey = l2.l_suppkey and l1.l_linenumber = l2.l_linenumber
where
# Include a selective post-join predicate so that the RHS of the nested loop join
# doesn't consume too much memory.
(l1.l_quantity != l2.l_quantity or l1.l_quantity = 31.0 and l1.l_tax = 0.03)
# Reduce the data size to get the test to execute quicker
and l1.l_orderkey <= 100000) l
order by l_orderkey, l_partkey, l_suppkey, l_linenumber
limit 5;
---- TYPES
bigint,bigint,bigint,int,decimal,decimal,decimal,decimal,string,string,string,string,string,string,string,string
---- RESULTS
288,50641,8157,1,31.00,49340.84,0.00,0.03,'N','O','1997-03-17','1997-04-28','1997-04-06','TAKE BACK RETURN','AIR','instructions wa'
418,18552,1054,1,31.00,45587.05,0.00,0.03,'N','F','1995-06-05','1995-06-18','1995-06-26','COLLECT COD','FOB','final theodolites. fluffil'
482,61141,6154,3,31.00,34166.34,0.04,0.03,'N','O','1996-06-01','1996-05-06','1996-06-17','NONE','MAIL',' blithe pin'
1382,156162,6163,5,31.00,37762.96,0.07,0.03,'R','F','1993-10-26','1993-10-15','1993-11-09','TAKE BACK RETURN','FOB','hely regular dependencies. f'
1509,186349,3904,6,31.00,44495.54,0.04,0.03,'A','F','1993-07-14','1993-08-21','1993-08-06','COLLECT COD','SHIP','ic deposits cajole carefully. quickly bold '
====
---- QUERY
# Test spilling aggregation when grouping by nondeterministic expression
set buffer_pool_limit=79m;
set num_nodes=1;
select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_comment
from tpch_parquet.lineitem
group by 1, 2, 3, 4, 5, random()
limit 50000000
---- RUNTIME_PROFILE
row_regex: .*Query State: FINISHED.*
row_regex: .*Query Status: OK.*
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# Test spilling join with many duplicates in join key. We don't expect this to succeed
# with a memory constraint: see IMPALA-4857. Limit size of probe so that query doesn't
# bog down executing an exploding join.
# The additional "order by" and "limit" clauses make sure that a successful
# query does not too much data to the client.
set buffer_pool_limit=167m;
select straight_join *
from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber
where l1.l_orderkey < 100000
order by l1.l_orderkey desc, l1.l_linenumber desc limit 10
---- CATCH
Repartitioning did not reduce the size of a spilled partition
====
---- QUERY
# IMPALA-9725: incorrect results for spilling joins with very wide join keys.
# This test is crafted to always return 0 rows, but with the previous bug
# would sometimes return rows while spilling. The query needs to have a join
# key that is more than 256 bytes wide to trigger the bug.
# Need to lower buffer_pool_limit just slightly above the Max Per-Host Resource
# Reservation (104.61MB) for the join to spill consistently.
set buffer_pool_limit=105m;
set runtime_filter_mode=0;
SELECT straight_join o_orderkey
FROM (
SELECT *
FROM orders
JOIN customer ON o_custkey = c_custkey
JOIN nation ON c_nationkey = n_nationkey
JOIN region ON n_regionkey = r_regionkey
WHERE o_orderkey < 500000) o1
LEFT ANTI JOIN /*+broadcast*/ (
SELECT *
FROM orders
JOIN customer ON o_custkey = c_custkey
JOIN nation ON c_nationkey = n_nationkey
JOIN region ON n_regionkey = r_regionkey
WHERE o_orderkey < 500000) o2 ON o1.o_orderkey = o2.o_orderkey
AND o1.o_custkey = o2.o_custkey
AND o1.o_orderstatus = o2.o_orderstatus
AND o1.o_totalprice = o2.o_totalprice
AND o1.o_orderdate = o2.o_orderdate
AND o1.o_orderpriority = o2.o_orderpriority
AND o1.o_clerk = o2.o_clerk
AND o1.o_shippriority = o2.o_shippriority
AND o1.o_comment = o2.o_comment
AND o1.c_custkey = o2.c_custkey
AND o1.c_name = o2.c_name
AND o1.c_address = o2.c_address
AND o1.c_nationkey = o2.c_nationkey
AND o1.c_phone = o2.c_phone
AND o1.c_acctbal = o2.c_acctbal
AND o1.c_mktsegment = o2.c_mktsegment
AND o1.n_nationkey = o2.n_nationkey
AND o1.n_name = o2.n_name
AND o1.n_regionkey = o2.n_regionkey
AND o1.n_comment = o2.n_comment
AND o1.r_name = o2.r_name
AND o1.r_comment = o2.r_comment
AND fnv_hash(o1.n_name) = fnv_hash(o2.n_name)
AND fnv_hash(o1.r_name) = fnv_hash(o2.r_name)
AND fnv_hash(o1.o_orderstatus) = fnv_hash(o2.o_orderstatus)
AND fnv_hash(o1.o_shippriority) = fnv_hash(o2.o_shippriority)
AND fnv_hash(o1.o_orderdate) = fnv_hash(o2.o_orderdate)
AND fnv_hash(o1.o_orderpriority) = fnv_hash(o2.o_orderpriority)
AND fnv_hash(o1.o_clerk) = fnv_hash(o2.o_clerk)
ORDER BY o_orderkey
---- RESULTS
---- TYPES
BIGINT
---- RUNTIME_PROFILE
# Verify that at least one of the joins was spilled.
# TODO(IMPALA-12549): Make this query spill again after the estimations and
# min reservations are adjusted.
# row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
====
---- QUERY
# Spilling partitioned top-N.
# Find suppliers with two or more duplicate l_extendedprice values tied for last place.
# This has a relatively high partition count because it partitions by suppliers, so
# it ends up falling back to doing a full, spilling, sort.
set buffer_pool_limit=100m;
SELECT s_name, min(l_quantity), min(l_extendedprice), min(l_discount), min(l_tax),
count(*)
FROM (SELECT s_name, lineitem.*, Rank()
OVER(PARTITION BY s_name
ORDER BY l_extendedprice) AS rank
FROM lineitem join supplier on l_suppkey = s_suppkey) a
WHERE rank < 500
group by s_name
having count(*) > 500
order by s_name
---- RESULTS
'Supplier#000001076',1.00,989.07,0.00,0.00,501
'Supplier#000001645',1.00,1048.14,0.00,0.00,501
'Supplier#000002145',1.00,1046.14,0.00,0.00,501
'Supplier#000002197',1.00,1103.19,0.00,0.00,501
'Supplier#000002269',1.00,1220.26,0.00,0.00,501
'Supplier#000002435',1.00,1365.42,0.00,0.00,501
'Supplier#000002644',1.00,1076.13,0.00,0.00,501
'Supplier#000002741',1.00,1145.24,0.00,0.00,501
'Supplier#000002827',1.00,1287.30,0.00,0.00,501
'Supplier#000002986',1.00,1397.48,0.00,0.00,501
'Supplier#000003304',1.00,1236.30,0.00,0.00,501
'Supplier#000003631',1.00,1049.12,0.00,0.00,501
'Supplier#000003646',1.00,1058.14,0.00,0.00,501
'Supplier#000003695',1.00,1104.19,0.00,0.00,501
'Supplier#000003811',1.00,1256.30,0.00,0.00,501
'Supplier#000004068',1.00,981.06,0.00,0.00,501
'Supplier#000004131',1.00,1054.13,0.00,0.00,501
'Supplier#000004371',1.00,1284.37,0.00,0.00,501
'Supplier#000004442',1.00,1395.44,0.00,0.00,501
'Supplier#000004443',1.00,1346.44,0.00,0.00,501
'Supplier#000004913',1.00,1368.39,0.00,0.00,501
'Supplier#000004924',1.00,1330.42,0.00,0.00,501
'Supplier#000004926',1.00,1426.41,0.00,0.00,501
'Supplier#000004931',1.00,1368.42,0.00,0.00,501
'Supplier#000004941',1.00,1368.43,0.00,0.00,501
'Supplier#000005166',1.00,1080.16,0.00,0.00,501
'Supplier#000005400',1.00,1299.39,0.00,0.00,501
'Supplier#000005468',1.00,1367.46,0.00,0.00,501
'Supplier#000005852',1.00,1254.35,0.00,0.00,501
'Supplier#000006397',1.00,1342.39,0.00,0.00,501
'Supplier#000007646',1.00,1045.14,0.00,0.00,501
'Supplier#000007830',1.00,1234.32,0.00,0.00,501
'Supplier#000008157',1.00,1144.15,0.00,0.00,501
'Supplier#000008432',1.00,1414.41,0.00,0.00,501
'Supplier#000008720',1.00,1143.21,0.00,0.00,501
'Supplier#000008787',1.00,1192.28,0.00,0.00,501
'Supplier#000008851',1.00,1251.35,0.00,0.00,501
'Supplier#000008859',1.00,1259.35,0.00,0.00,501
'Supplier#000009255',1.00,1163.25,0.00,0.00,501
'Supplier#000009675',1.00,1139.14,0.00,0.00,501
'Supplier#000009847',1.00,1262.34,0.00,0.00,501
'Supplier#000009883',1.00,1298.37,0.00,0.00,501
---- TYPES
STRING,DECIMAL,DECIMAL,DECIMAL,DECIMAL,BIGINT
---- RUNTIME_PROFILE
# Verify that at least one of the Top-N operators spilled.
row_regex: .*InMemoryHeapsEvicted: .* \([1-9][0-9]*\)
#row_regex: .*SpilledRuns: .* \([1-9][0-9]*\)
====
---- QUERY
# Test spilling an agg with a LIMIT; see IMPALA-2581
set buffer_pool_limit=136m;
select c.c2 from
(select distinct (a.id*10000 + b.id) c1, a.int_col c2
from functional.alltypes a, functional.alltypes b limit 3500000) c join /* +SHUFFLE */ functional.alltypes d on c.c2 = d.int_col
group by c.c2
limit 5
---- TYPES
BIGINT
---- RUNTIME_PROFILE
# Verify that spilling was activated.
row_regex: .*SpilledPartitions: .* \([1-9][0-9]*\)
row_regex: .*FastLimitCheckExceededRows: [0-9]+
====