Files
impala/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
Amogh Margoor 2040b2621f IMPALA-7635: Reducing HashTable size by packing it's buckets efficiently.
HashTable implementation in Impala comprises of contiguous array
of Buckets and each Bucket contains either data or pointer to
linked list of duplicate entries named DuplicateNode.
These are the structures of Bucket and DuplicateNode:

  struct DuplicateNode {
    bool matched;
    DuplicateNode* next;
    HtData htdata;
  };

  struct Bucket {
    bool filled;
    bool matched;
    bool hasDuplicates;
    uint32_t hash;
    union {
      HtData htdata;
      DuplicateNode* duplicates;
    } bucketData;
  };

Size of Bucket is currently 16 bytes and size of DuplicateNode is
24 bytes. If we can remove the booleans from both struct size of
Bucket would reduce to 12 bytes and DuplicateNode will be 16 bytes.
One of the ways we can remove booleans is to fold it into pointers
already part of struct. Pointers store addresses and on
architectures like x86 and ARM the linear address is only 48 bits
long. With level 5 paging Intel is planning to expand it to 57-bit
long which means we can use most significant 7 bits i.e., 58 to 64
bits to store these booleans. This patch reduces the size of Bucket
and DuplicateNode by implementing this folding. However, there is
another requirement regarding Size of Bucket to be power of 2 and
also for the number of buckets in Hash table to be power of 2.
These requirements are for the following reasons:
1. Memory Allocator allocates memory in power of 2 to avoid
   internal fragmentation. Hence, num of buckets * sizeof(Buckets)
   should be power of 2.
2. Number of buckets being power of 2 enables faster modulo
   operation i.e., instead of slow modulo: (hash % N), faster
   (hash & (N-1)) can be used.

Due to this, 4 bytes 'hash' field from Bucket is removed and
stored separately in new array hash_array_ in HashTable.
This ensures sizeof(Bucket) is 8 which is power of 2.

New Classes:
------------
As a part of patch, TaggedPointer is introduced which is a template
class to store a pointer and 7-bit tag together in 64 bit integer.
This structure contains the ownership of the pointer and will take care
of allocation and deallocation of the object being pointed to.
However derived classes can opt out of the ownership of the object
and let the client manage it. It's derived classes for Bucket and
DuplicateNode do the same. These classes are TaggedBucketData and
TaggedDuplicateNode.

Benchmark:
----------
As a part of this patch a new Micro Benchmark for HashTable has
been introduced, which will help in measuring these:
1. Runtime for building hash table and probing it.
2. Memory consumed after building the Table.
This would help measuring the impact of changes to the HashTable's
data structure and algorithm.
Saw 25-30% reduction in memory consumed and no significant
difference in performance (0.91X-1.2X).

Other Benchmarks:
1. Billion row Synthetic benchmark on single node, single daemon:
   a. 2-3% improvement in Join GEOMEAN for Probe benchmark.
   b. 17% and 21% reduction in PeakMemoryUsage and
      CumulativeBytes allocated respectively
2. TPCH-42: 0-1.5% improvement in GEOMEAN runtime

Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Reviewed-on: http://gerrit.cloudera.org:8080/17592
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-25 20:05:47 +00:00

1117 lines
60 KiB
Plaintext

# Join with tiny build side - should use smallest possible buffers.
select straight_join *
from tpch_parquet.customer
inner join tpch_parquet.nation on c_nationkey = n_nationkey
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=22.97MB Threads=5
Per-Host Resource Estimates: Memory=100MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.customer INNER
JOIN tpch_parquet.nation ON c_nationkey = n_nationkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=57.09MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.customer.c_custkey, tpch_parquet.customer.c_name, tpch_parquet.customer.c_address, tpch_parquet.customer.c_nationkey, tpch_parquet.customer.c_phone, tpch_parquet.customer.c_acctbal, tpch_parquet.customer.c_mktsegment, tpch_parquet.customer.c_comment, tpch_parquet.nation.n_nationkey, tpch_parquet.nation.n_name, tpch_parquet.nation.n_regionkey, tpch_parquet.nation.n_comment
| mem-estimate=46.77MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.33MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=327B cardinality=150.00K
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=26.95MB mem-reservation=18.94MB thread-reservation=2 runtime-filters-memory=1.00MB
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: c_nationkey = n_nationkey
| fk/pk conjuncts: c_nationkey = n_nationkey
| runtime filters: RF000[bloom] <- n_nationkey
| mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=0,1 row-size=327B cardinality=150.00K
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--03:EXCHANGE [BROADCAST]
| | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=109B cardinality=25
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Host Resources: mem-estimate=16.00MB mem-reservation=32.00KB thread-reservation=2
| 01:SCAN HDFS [tpch_parquet.nation, RANDOM]
| HDFS partitions=1/1 files=1 size=3.04KB
| stored statistics:
| table: rows=25 size=3.04KB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=25
| mem-estimate=16.00MB mem-reservation=32.00KB thread-reservation=1
| tuple-ids=1 row-size=109B cardinality=25
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.customer, RANDOM]
HDFS partitions=1/1 files=1 size=12.34MB
runtime filters: RF000[bloom] -> c_nationkey
stored statistics:
table: rows=150.00K size=12.34MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=150.00K
mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=1
tuple-ids=0 row-size=218B cardinality=150.00K
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=25.91MB Threads=4
Per-Host Resource Estimates: Memory=103MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.customer INNER
JOIN tpch_parquet.nation ON c_nationkey = n_nationkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=57.09MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.customer.c_custkey, tpch_parquet.customer.c_name, tpch_parquet.customer.c_address, tpch_parquet.customer.c_nationkey, tpch_parquet.customer.c_phone, tpch_parquet.customer.c_acctbal, tpch_parquet.customer.c_mktsegment, tpch_parquet.customer.c_comment, tpch_parquet.nation.n_nationkey, tpch_parquet.nation.n_name, tpch_parquet.nation.n_regionkey, tpch_parquet.nation.n_comment
| mem-estimate=46.77MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.33MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=327B cardinality=150.00K
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=1
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash-table-id=00
| hash predicates: c_nationkey = n_nationkey
| fk/pk conjuncts: c_nationkey = n_nationkey
| mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
| tuple-ids=0,1 row-size=327B cardinality=150.00K
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F03:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| | Per-Instance Resources: mem-estimate=4.89MB mem-reservation=4.88MB thread-reservation=1 runtime-filters-memory=1.00MB
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: n_nationkey
| | runtime filters: RF000[bloom] <- n_nationkey
| | mem-estimate=3.88MB mem-reservation=3.88MB spill-buffer=64.00KB thread-reservation=0
| |
| 03:EXCHANGE [BROADCAST]
| | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=109B cardinality=25
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=16.00MB mem-reservation=32.00KB thread-reservation=1
| 01:SCAN HDFS [tpch_parquet.nation, RANDOM]
| HDFS partitions=1/1 files=1 size=3.04KB
| stored statistics:
| table: rows=25 size=3.04KB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=25
| mem-estimate=16.00MB mem-reservation=32.00KB thread-reservation=0
| tuple-ids=1 row-size=109B cardinality=25
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.customer, RANDOM]
HDFS partitions=1/1 files=1 size=12.34MB
runtime filters: RF000[bloom] -> c_nationkey
stored statistics:
table: rows=150.00K size=12.34MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=150.00K
mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=0
tuple-ids=0 row-size=218B cardinality=150.00K
in pipelines: 00(GETNEXT)
====
# Join with large build side - should use default-sized buffers.
select straight_join *
from tpch_parquet.lineitem
left join tpch_parquet.orders on l_orderkey = o_orderkey
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=102.00MB Threads=5
Per-Host Resource Estimates: Memory=534MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.lineitem LEFT
OUTER JOIN tpch_parquet.orders ON l_orderkey = o_orderkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=111.20MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment, tpch_parquet.orders.o_orderkey, tpch_parquet.orders.o_custkey, tpch_parquet.orders.o_orderstatus, tpch_parquet.orders.o_totalprice, tpch_parquet.orders.o_orderdate, tpch_parquet.orders.o_orderpriority, tpch_parquet.orders.o_clerk, tpch_parquet.orders.o_shippriority, tpch_parquet.orders.o_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=11.20MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1N row-size=402B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=382.84MB mem-reservation=74.00MB thread-reservation=2
02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash predicates: l_orderkey = o_orderkey
| fk/pk conjuncts: l_orderkey = o_orderkey
| mem-estimate=292.49MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1N row-size=402B cardinality=6.00M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--03:EXCHANGE [BROADCAST]
| | mem-estimate=10.34MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=171B cardinality=1.50M
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| Per-Host Resources: mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=2
| 01:SCAN HDFS [tpch_parquet.orders, RANDOM]
| HDFS partitions=1/1 files=2 size=54.21MB
| stored statistics:
| table: rows=1.50M size=54.21MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=1.18M
| mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=1
| tuple-ids=1 row-size=171B cardinality=1.50M
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.lineitem, RANDOM]
HDFS partitions=1/1 files=3 size=193.98MB
stored statistics:
table: rows=6.00M size=193.98MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=2.14M
mem-estimate=80.00MB mem-reservation=40.00MB thread-reservation=1
tuple-ids=0 row-size=231B cardinality=6.00M
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=136.00MB Threads=4
Per-Host Resource Estimates: Memory=534MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.lineitem LEFT
OUTER JOIN tpch_parquet.orders ON l_orderkey = o_orderkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=111.20MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment, tpch_parquet.orders.o_orderkey, tpch_parquet.orders.o_custkey, tpch_parquet.orders.o_orderstatus, tpch_parquet.orders.o_totalprice, tpch_parquet.orders.o_orderdate, tpch_parquet.orders.o_orderpriority, tpch_parquet.orders.o_clerk, tpch_parquet.orders.o_shippriority, tpch_parquet.orders.o_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=11.20MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1N row-size=402B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Instance Resources: mem-estimate=80.00MB mem-reservation=40.00MB thread-reservation=1
02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash-table-id=00
| hash predicates: l_orderkey = o_orderkey
| fk/pk conjuncts: l_orderkey = o_orderkey
| mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1N row-size=402B cardinality=6.00M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F03:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
| | Per-Instance Resources: mem-estimate=302.84MB mem-reservation=68.00MB thread-reservation=1
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: o_orderkey
| | mem-estimate=292.49MB mem-reservation=68.00MB spill-buffer=2.00MB thread-reservation=0
| |
| 03:EXCHANGE [BROADCAST]
| | mem-estimate=10.34MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=171B cardinality=1.50M
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| Per-Instance Resources: mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=1
| 01:SCAN HDFS [tpch_parquet.orders, RANDOM]
| HDFS partitions=1/1 files=2 size=54.21MB
| stored statistics:
| table: rows=1.50M size=54.21MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=1.18M
| mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=0
| tuple-ids=1 row-size=171B cardinality=1.50M
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.lineitem, RANDOM]
HDFS partitions=1/1 files=3 size=193.98MB
stored statistics:
table: rows=6.00M size=193.98MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=2.14M
mem-estimate=80.00MB mem-reservation=40.00MB thread-reservation=0
tuple-ids=0 row-size=231B cardinality=6.00M
in pipelines: 00(GETNEXT)
====
# Shuffle join with mid-sized input.
select straight_join *
from tpch_parquet.orders
join /*+shuffle*/ tpch_parquet.customer on o_custkey = c_custkey
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=80.00MB Threads=6
Per-Host Resource Estimates: Memory=231MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.orders INNER
JOIN /* +shuffle */ tpch_parquet.customer ON o_custkey = c_custkey
F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=110.77MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.orders.o_orderkey, tpch_parquet.orders.o_custkey, tpch_parquet.orders.o_orderstatus, tpch_parquet.orders.o_totalprice, tpch_parquet.orders.o_orderdate, tpch_parquet.orders.o_orderpriority, tpch_parquet.orders.o_clerk, tpch_parquet.orders.o_shippriority, tpch_parquet.orders.o_comment, tpch_parquet.customer.c_custkey, tpch_parquet.customer.c_name, tpch_parquet.customer.c_address, tpch_parquet.customer.c_nationkey, tpch_parquet.customer.c_phone, tpch_parquet.customer.c_acctbal, tpch_parquet.customer.c_mktsegment, tpch_parquet.customer.c_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
05:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.77MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(o_custkey)] hosts=2 instances=2
Per-Host Resources: mem-estimate=55.56MB mem-reservation=35.00MB thread-reservation=1 runtime-filters-memory=1.00MB
02:HASH JOIN [INNER JOIN, PARTITIONED]
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| runtime filters: RF000[bloom] <- c_custkey
| mem-estimate=34.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--04:EXCHANGE [HASH(c_custkey)]
| | mem-estimate=10.22MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=218B cardinality=150.00K
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Host Resources: mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=2
| 01:SCAN HDFS [tpch_parquet.customer, RANDOM]
| HDFS partitions=1/1 files=1 size=12.34MB
| stored statistics:
| table: rows=150.00K size=12.34MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=1
| tuple-ids=1 row-size=218B cardinality=150.00K
| in pipelines: 01(GETNEXT)
|
03:EXCHANGE [HASH(o_custkey)]
| mem-estimate=10.34MB mem-reservation=0B thread-reservation=0
| tuple-ids=0 row-size=171B cardinality=1.50M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=41.00MB mem-reservation=25.00MB thread-reservation=2 runtime-filters-memory=1.00MB
00:SCAN HDFS [tpch_parquet.orders, RANDOM]
HDFS partitions=1/1 files=2 size=54.21MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=54.21MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=1
tuple-ids=0 row-size=171B cardinality=1.50M
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=80.00MB Threads=5
Per-Host Resource Estimates: Memory=231MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.orders INNER
JOIN /* +shuffle */ tpch_parquet.customer ON o_custkey = c_custkey
F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=110.77MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.orders.o_orderkey, tpch_parquet.orders.o_custkey, tpch_parquet.orders.o_orderstatus, tpch_parquet.orders.o_totalprice, tpch_parquet.orders.o_orderdate, tpch_parquet.orders.o_orderpriority, tpch_parquet.orders.o_clerk, tpch_parquet.orders.o_shippriority, tpch_parquet.orders.o_comment, tpch_parquet.customer.c_custkey, tpch_parquet.customer.c_name, tpch_parquet.customer.c_address, tpch_parquet.customer.c_nationkey, tpch_parquet.customer.c_phone, tpch_parquet.customer.c_acctbal, tpch_parquet.customer.c_mktsegment, tpch_parquet.customer.c_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
05:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.77MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(o_custkey)] hosts=2 instances=2
Per-Instance Resources: mem-estimate=10.34MB mem-reservation=0B thread-reservation=1
02:HASH JOIN [INNER JOIN, PARTITIONED]
| hash-table-id=00
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F04:PLAN FRAGMENT [HASH(o_custkey)] hosts=2 instances=2
| | Per-Instance Resources: mem-estimate=45.22MB mem-reservation=35.00MB thread-reservation=1 runtime-filters-memory=1.00MB
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: c_custkey
| | runtime filters: RF000[bloom] <- c_custkey
| | mem-estimate=34.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| |
| 04:EXCHANGE [HASH(c_custkey)]
| | mem-estimate=10.22MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=218B cardinality=150.00K
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=1
| 01:SCAN HDFS [tpch_parquet.customer, RANDOM]
| HDFS partitions=1/1 files=1 size=12.34MB
| stored statistics:
| table: rows=150.00K size=12.34MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=0
| tuple-ids=1 row-size=218B cardinality=150.00K
| in pipelines: 01(GETNEXT)
|
03:EXCHANGE [HASH(o_custkey)]
| mem-estimate=10.34MB mem-reservation=0B thread-reservation=0
| tuple-ids=0 row-size=171B cardinality=1.50M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=1
00:SCAN HDFS [tpch_parquet.orders, RANDOM]
HDFS partitions=1/1 files=2 size=54.21MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=54.21MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=0
tuple-ids=0 row-size=171B cardinality=1.50M
in pipelines: 00(GETNEXT)
====
# Broadcast join with mid-sized input - should use larger buffers than shuffle join.
select straight_join *
from tpch_parquet.orders
join /*+broadcast*/ tpch_parquet.customer on o_custkey = c_custkey
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=79.00MB Threads=5
Per-Host Resource Estimates: Memory=220MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.orders INNER
JOIN /* +broadcast */ tpch_parquet.customer ON o_custkey = c_custkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=110.77MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.orders.o_orderkey, tpch_parquet.orders.o_custkey, tpch_parquet.orders.o_orderstatus, tpch_parquet.orders.o_totalprice, tpch_parquet.orders.o_orderdate, tpch_parquet.orders.o_orderpriority, tpch_parquet.orders.o_clerk, tpch_parquet.orders.o_shippriority, tpch_parquet.orders.o_comment, tpch_parquet.customer.c_custkey, tpch_parquet.customer.c_name, tpch_parquet.customer.c_address, tpch_parquet.customer.c_nationkey, tpch_parquet.customer.c_phone, tpch_parquet.customer.c_acctbal, tpch_parquet.customer.c_mktsegment, tpch_parquet.customer.c_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.77MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=85.34MB mem-reservation=59.00MB thread-reservation=2 runtime-filters-memory=1.00MB
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| runtime filters: RF000[bloom] <- c_custkey
| mem-estimate=34.12MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--03:EXCHANGE [BROADCAST]
| | mem-estimate=10.22MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=218B cardinality=150.00K
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Host Resources: mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=2
| 01:SCAN HDFS [tpch_parquet.customer, RANDOM]
| HDFS partitions=1/1 files=1 size=12.34MB
| stored statistics:
| table: rows=150.00K size=12.34MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=1
| tuple-ids=1 row-size=218B cardinality=150.00K
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.orders, RANDOM]
HDFS partitions=1/1 files=2 size=54.21MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=54.21MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=1
tuple-ids=0 row-size=171B cardinality=1.50M
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=114.00MB Threads=4
Per-Host Resource Estimates: Memory=255MB
Analyzed query: SELECT /* +straight_join */ * FROM tpch_parquet.orders INNER
JOIN /* +broadcast */ tpch_parquet.customer ON o_custkey = c_custkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=110.77MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.orders.o_orderkey, tpch_parquet.orders.o_custkey, tpch_parquet.orders.o_orderstatus, tpch_parquet.orders.o_totalprice, tpch_parquet.orders.o_orderdate, tpch_parquet.orders.o_orderpriority, tpch_parquet.orders.o_clerk, tpch_parquet.orders.o_shippriority, tpch_parquet.orders.o_comment, tpch_parquet.customer.c_custkey, tpch_parquet.customer.c_name, tpch_parquet.customer.c_address, tpch_parquet.customer.c_nationkey, tpch_parquet.customer.c_phone, tpch_parquet.customer.c_acctbal, tpch_parquet.customer.c_mktsegment, tpch_parquet.customer.c_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.77MB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=1
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash-table-id=00
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=388B cardinality=1.50M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F03:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| | Per-Instance Resources: mem-estimate=79.22MB mem-reservation=69.00MB thread-reservation=1 runtime-filters-memory=1.00MB
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: c_custkey
| | runtime filters: RF000[bloom] <- c_custkey
| | mem-estimate=68.00MB mem-reservation=68.00MB spill-buffer=2.00MB thread-reservation=0
| |
| 03:EXCHANGE [BROADCAST]
| | mem-estimate=10.22MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=218B cardinality=150.00K
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=1
| 01:SCAN HDFS [tpch_parquet.customer, RANDOM]
| HDFS partitions=1/1 files=1 size=12.34MB
| stored statistics:
| table: rows=150.00K size=12.34MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=24.00MB mem-reservation=16.00MB thread-reservation=0
| tuple-ids=1 row-size=218B cardinality=150.00K
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.orders, RANDOM]
HDFS partitions=1/1 files=2 size=54.21MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=54.21MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=40.00MB mem-reservation=24.00MB thread-reservation=0
tuple-ids=0 row-size=171B cardinality=1.50M
in pipelines: 00(GETNEXT)
====
# Join with no stats for right input - should use default buffers.
select straight_join *
from functional_parquet.alltypes
left join functional_parquet.alltypestiny on alltypes.id = alltypestiny.id
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=38.17MB Threads=5
Per-Host Resource Estimates: Memory=2.04GB
WARNING: The following tables are missing relevant table and/or column statistics.
functional_parquet.alltypes, functional_parquet.alltypestiny
Analyzed query: SELECT /* +straight_join */ * FROM functional_parquet.alltypes
LEFT OUTER JOIN functional_parquet.alltypestiny ON alltypes.id = alltypestiny.id
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=10.49MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: functional_parquet.alltypes.id, functional_parquet.alltypes.bool_col, functional_parquet.alltypes.tinyint_col, functional_parquet.alltypes.smallint_col, functional_parquet.alltypes.int_col, functional_parquet.alltypes.bigint_col, functional_parquet.alltypes.float_col, functional_parquet.alltypes.double_col, functional_parquet.alltypes.date_string_col, functional_parquet.alltypes.string_col, functional_parquet.alltypes.timestamp_col, functional_parquet.alltypes.year, functional_parquet.alltypes.month, functional_parquet.alltypestiny.id, functional_parquet.alltypestiny.bool_col, functional_parquet.alltypestiny.tinyint_col, functional_parquet.alltypestiny.smallint_col, functional_parquet.alltypestiny.int_col, functional_parquet.alltypestiny.bigint_col, functional_parquet.alltypestiny.float_col, functional_parquet.alltypestiny.double_col, functional_parquet.alltypestiny.date_string_col, functional_parquet.alltypestiny.string_col, functional_parquet.alltypestiny.timestamp_col, functional_parquet.alltypestiny.year, functional_parquet.alltypestiny.month
| mem-estimate=10.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=503.95KB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1N row-size=160B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=2.02GB mem-reservation=34.09MB thread-reservation=2
02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash predicates: alltypes.id = alltypestiny.id
| fk/pk conjuncts: assumed fk/pk
| mem-estimate=2.00GB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1N row-size=160B cardinality=unavailable
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--03:EXCHANGE [BROADCAST]
| | mem-estimate=251.92KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=80B cardinality=unavailable
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
| Per-Host Resources: mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=2
| 01:SCAN HDFS [functional_parquet.alltypestiny, RANDOM]
| HDFS partitions=4/4 files=4 size=11.92KB
| stored statistics:
| table: rows=unavailable size=unavailable
| partitions: 0/4 rows=unavailable
| columns missing stats: id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col
| extrapolated-rows=disabled max-scan-range-rows=unavailable
| mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=1
| tuple-ids=1 row-size=80B cardinality=unavailable
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [functional_parquet.alltypes, RANDOM]
HDFS partitions=24/24 files=24 size=202.07KB
stored statistics:
table: rows=unavailable size=unavailable
partitions: 0/24 rows=unavailable
columns missing stats: id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=1
tuple-ids=0 row-size=80B cardinality=unavailable
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=72.34MB Threads=6
Per-Host Resource Estimates: Memory=2.07GB
WARNING: The following tables are missing relevant table and/or column statistics.
functional_parquet.alltypestiny
Analyzed query: SELECT /* +straight_join */ * FROM functional_parquet.alltypes
LEFT OUTER JOIN functional_parquet.alltypestiny ON alltypes.id = alltypestiny.id
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=10.98MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: functional_parquet.alltypes.id, functional_parquet.alltypes.bool_col, functional_parquet.alltypes.tinyint_col, functional_parquet.alltypes.smallint_col, functional_parquet.alltypes.int_col, functional_parquet.alltypes.bigint_col, functional_parquet.alltypes.float_col, functional_parquet.alltypes.double_col, functional_parquet.alltypes.date_string_col, functional_parquet.alltypes.string_col, functional_parquet.alltypes.timestamp_col, functional_parquet.alltypes.year, functional_parquet.alltypes.month, functional_parquet.alltypestiny.id, functional_parquet.alltypestiny.bool_col, functional_parquet.alltypestiny.tinyint_col, functional_parquet.alltypestiny.smallint_col, functional_parquet.alltypestiny.int_col, functional_parquet.alltypestiny.bigint_col, functional_parquet.alltypestiny.float_col, functional_parquet.alltypestiny.double_col, functional_parquet.alltypestiny.date_string_col, functional_parquet.alltypestiny.string_col, functional_parquet.alltypestiny.timestamp_col, functional_parquet.alltypestiny.year, functional_parquet.alltypestiny.month
| mem-estimate=10.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=1007.95KB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1N row-size=160B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=6
Per-Instance Resources: mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=1
02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash-table-id=00
| hash predicates: alltypes.id = alltypestiny.id
| fk/pk conjuncts: assumed fk/pk
| mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1N row-size=160B cardinality=unavailable
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F03:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
| | Per-Instance Resources: mem-estimate=2.00GB mem-reservation=68.00MB thread-reservation=1
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: alltypestiny.id
| | mem-estimate=2.00GB mem-reservation=68.00MB spill-buffer=2.00MB thread-reservation=0
| |
| 03:EXCHANGE [BROADCAST]
| | mem-estimate=335.92KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=80B cardinality=unavailable
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=4
| Per-Instance Resources: mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=1
| 01:SCAN HDFS [functional_parquet.alltypestiny, RANDOM]
| HDFS partitions=4/4 files=4 size=11.92KB
| stored statistics:
| table: rows=unavailable size=unavailable
| partitions: 0/4 rows=unavailable
| columns missing stats: id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col
| extrapolated-rows=disabled max-scan-range-rows=unavailable
| mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=0
| tuple-ids=1 row-size=80B cardinality=unavailable
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [functional_parquet.alltypes, RANDOM]
HDFS partitions=24/24 files=24 size=202.07KB
stored statistics:
table: rows=unavailable size=unavailable
partitions: 0/24 rows=unavailable
columns missing stats: id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=16.00MB mem-reservation=88.00KB thread-reservation=0
tuple-ids=0 row-size=80B cardinality=unavailable
in pipelines: 00(GETNEXT)
====
# Low NDV aggregation - should scale down buffers to minimum.
select c_nationkey, avg(c_acctbal)
from tpch_parquet.customer
group by c_nationkey
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=9.94MB Threads=4
Per-Host Resource Estimates: Memory=48MB
Analyzed query: SELECT c_nationkey, avg(c_acctbal) FROM tpch_parquet.customer
GROUP BY c_nationkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: c_nationkey, avg(c_acctbal)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=10B cardinality=25
| in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(c_nationkey)] hosts=1 instances=1
Per-Host Resources: mem-estimate=10.02MB mem-reservation=1.94MB thread-reservation=1
03:AGGREGATE [FINALIZE]
| output: avg:merge(c_acctbal)
| group by: c_nationkey
| mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=2 row-size=10B cardinality=25
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(c_nationkey)]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=25
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=34.00MB mem-reservation=4.00MB thread-reservation=2
01:AGGREGATE [STREAMING]
| output: avg(c_acctbal)
| group by: c_nationkey
| mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=25
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.customer, RANDOM]
HDFS partitions=1/1 files=1 size=12.34MB
stored statistics:
table: rows=150.00K size=12.34MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=150.00K
mem-estimate=24.00MB mem-reservation=2.00MB thread-reservation=1
tuple-ids=0 row-size=10B cardinality=150.00K
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=9.94MB Threads=3
Per-Host Resource Estimates: Memory=48MB
Analyzed query: SELECT c_nationkey, avg(c_acctbal) FROM tpch_parquet.customer
GROUP BY c_nationkey
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: c_nationkey, avg(c_acctbal)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=10B cardinality=25
| in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(c_nationkey)] hosts=1 instances=1
Per-Instance Resources: mem-estimate=10.02MB mem-reservation=1.94MB thread-reservation=1
03:AGGREGATE [FINALIZE]
| output: avg:merge(c_acctbal)
| group by: c_nationkey
| mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=2 row-size=10B cardinality=25
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(c_nationkey)]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=25
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Instance Resources: mem-estimate=34.00MB mem-reservation=4.00MB thread-reservation=1
01:AGGREGATE [STREAMING]
| output: avg(c_acctbal)
| group by: c_nationkey
| mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=25
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.customer, RANDOM]
HDFS partitions=1/1 files=1 size=12.34MB
stored statistics:
table: rows=150.00K size=12.34MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=150.00K
mem-estimate=24.00MB mem-reservation=2.00MB thread-reservation=0
tuple-ids=0 row-size=10B cardinality=150.00K
in pipelines: 00(GETNEXT)
====
# Mid NDV aggregation - should scale down buffers to intermediate size.
select straight_join l_orderkey, o_orderstatus, count(*)
from tpch_parquet.lineitem
join tpch_parquet.orders on o_orderkey = l_orderkey
group by 1, 2
having count(*) = 1
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=120.00MB Threads=7
Per-Host Resource Estimates: Memory=414MB
Analyzed query: SELECT /* +straight_join */ l_orderkey, o_orderstatus, count(*)
FROM tpch_parquet.lineitem INNER JOIN tpch_parquet.orders ON o_orderkey =
l_orderkey GROUP BY l_orderkey, o_orderstatus HAVING count(*) = CAST(1 AS
BIGINT)
F04:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=110.10MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: l_orderkey, o_orderstatus, count(*)
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
08:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.10MB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 07(GETNEXT)
|
F03:PLAN FRAGMENT [HASH(l_orderkey,o_orderstatus)] hosts=3 instances=3
Per-Host Resources: mem-estimate=71.23MB mem-reservation=34.00MB thread-reservation=1
07:AGGREGATE [FINALIZE]
| output: count:merge(*)
| group by: l_orderkey, o_orderstatus
| having: count(*) = CAST(1 AS BIGINT)
| mem-estimate=61.13MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 07(GETNEXT), 00(OPEN)
|
06:EXCHANGE [HASH(l_orderkey,o_orderstatus)]
| mem-estimate=10.10MB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 00(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(l_orderkey)] hosts=3 instances=3
Per-Host Resources: mem-estimate=111.37MB mem-reservation=69.00MB thread-reservation=1 runtime-filters-memory=1.00MB
03:AGGREGATE [STREAMING]
| output: count(*)
| group by: l_orderkey, o_orderstatus
| mem-estimate=56.28MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 00(GETNEXT)
|
02:HASH JOIN [INNER JOIN, PARTITIONED]
| hash predicates: l_orderkey = o_orderkey
| fk/pk conjuncts: l_orderkey = o_orderkey
| runtime filters: RF000[bloom] <- o_orderkey
| mem-estimate=34.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=29B cardinality=5.76M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--05:EXCHANGE [HASH(o_orderkey)]
| | mem-estimate=10.05MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=21B cardinality=1.50M
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| Per-Host Resources: mem-estimate=40.00MB mem-reservation=8.00MB thread-reservation=2
| 01:SCAN HDFS [tpch_parquet.orders, RANDOM]
| HDFS partitions=1/1 files=2 size=54.21MB
| stored statistics:
| table: rows=1.50M size=54.21MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=1.18M
| mem-estimate=40.00MB mem-reservation=8.00MB thread-reservation=1
| tuple-ids=1 row-size=21B cardinality=1.50M
| in pipelines: 01(GETNEXT)
|
04:EXCHANGE [HASH(l_orderkey)]
| mem-estimate=10.04MB mem-reservation=0B thread-reservation=0
| tuple-ids=0 row-size=8B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=81.00MB mem-reservation=5.00MB thread-reservation=2 runtime-filters-memory=1.00MB
00:SCAN HDFS [tpch_parquet.lineitem, RANDOM]
HDFS partitions=1/1 files=3 size=193.98MB
runtime filters: RF000[bloom] -> l_orderkey
stored statistics:
table: rows=6.00M size=193.98MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=2.14M
mem-estimate=80.00MB mem-reservation=4.00MB thread-reservation=1
tuple-ids=0 row-size=8B cardinality=6.00M
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=120.00MB Threads=6
Per-Host Resource Estimates: Memory=414MB
Analyzed query: SELECT /* +straight_join */ l_orderkey, o_orderstatus, count(*)
FROM tpch_parquet.lineitem INNER JOIN tpch_parquet.orders ON o_orderkey =
l_orderkey GROUP BY l_orderkey, o_orderstatus HAVING count(*) = CAST(1 AS
BIGINT)
F04:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=110.10MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: l_orderkey, o_orderstatus, count(*)
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
08:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.10MB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 07(GETNEXT)
|
F03:PLAN FRAGMENT [HASH(l_orderkey,o_orderstatus)] hosts=3 instances=3
Per-Instance Resources: mem-estimate=71.23MB mem-reservation=34.00MB thread-reservation=1
07:AGGREGATE [FINALIZE]
| output: count:merge(*)
| group by: l_orderkey, o_orderstatus
| having: count(*) = CAST(1 AS BIGINT)
| mem-estimate=61.13MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 07(GETNEXT), 00(OPEN)
|
06:EXCHANGE [HASH(l_orderkey,o_orderstatus)]
| mem-estimate=10.10MB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 00(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(l_orderkey)] hosts=3 instances=3
Per-Instance Resources: mem-estimate=66.32MB mem-reservation=34.00MB thread-reservation=1
03:AGGREGATE [STREAMING]
| output: count(*)
| group by: l_orderkey, o_orderstatus
| mem-estimate=56.28MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=29B cardinality=4.69M
| in pipelines: 00(GETNEXT)
|
02:HASH JOIN [INNER JOIN, PARTITIONED]
| hash-table-id=00
| hash predicates: l_orderkey = o_orderkey
| fk/pk conjuncts: l_orderkey = o_orderkey
| mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=29B cardinality=5.76M
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F05:PLAN FRAGMENT [HASH(l_orderkey)] hosts=3 instances=3
| | Per-Instance Resources: mem-estimate=45.05MB mem-reservation=35.00MB thread-reservation=1 runtime-filters-memory=1.00MB
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: o_orderkey
| | runtime filters: RF000[bloom] <- o_orderkey
| | mem-estimate=34.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| |
| 05:EXCHANGE [HASH(o_orderkey)]
| | mem-estimate=10.05MB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=21B cardinality=1.50M
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| Per-Instance Resources: mem-estimate=40.00MB mem-reservation=8.00MB thread-reservation=1
| 01:SCAN HDFS [tpch_parquet.orders, RANDOM]
| HDFS partitions=1/1 files=2 size=54.21MB
| stored statistics:
| table: rows=1.50M size=54.21MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=1.18M
| mem-estimate=40.00MB mem-reservation=8.00MB thread-reservation=0
| tuple-ids=1 row-size=21B cardinality=1.50M
| in pipelines: 01(GETNEXT)
|
04:EXCHANGE [HASH(l_orderkey)]
| mem-estimate=10.04MB mem-reservation=0B thread-reservation=0
| tuple-ids=0 row-size=8B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=80.00MB mem-reservation=4.00MB thread-reservation=1
00:SCAN HDFS [tpch_parquet.lineitem, RANDOM]
HDFS partitions=1/1 files=3 size=193.98MB
runtime filters: RF000[bloom] -> l_orderkey
stored statistics:
table: rows=6.00M size=193.98MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=2.14M
mem-estimate=80.00MB mem-reservation=4.00MB thread-reservation=0
tuple-ids=0 row-size=8B cardinality=6.00M
in pipelines: 00(GETNEXT)
====
# High NDV aggregation - should use default buffer size.
select distinct *
from tpch_parquet.lineitem
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=112.00MB Threads=4
Per-Host Resource Estimates: Memory=1012MB
Analyzed query: SELECT DISTINCT * FROM tpch_parquet.lineitem
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=110.69MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.69MB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(tpch_parquet.lineitem.l_orderkey,tpch_parquet.lineitem.l_partkey,tpch_parquet.lineitem.l_suppkey,tpch_parquet.lineitem.l_linenumber,tpch_parquet.lineitem.l_quantity,tpch_parquet.lineitem.l_extendedprice,tpch_parquet.lineitem.l_discount,tpch_parquet.lineitem.l_tax,tpch_parquet.lineitem.l_returnflag,tpch_parquet.lineitem.l_linestatus,tpch_parquet.lineitem.l_shipdate,tpch_parquet.lineitem.l_commitdate,tpch_parquet.lineitem.l_receiptdate,tpch_parquet.lineitem.l_shipinstruct,tpch_parquet.lineitem.l_shipmode,tpch_parquet.lineitem.l_comment)] hosts=3 instances=3
Per-Host Resources: mem-estimate=473.84MB mem-reservation=34.00MB thread-reservation=1
03:AGGREGATE [FINALIZE]
| group by: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment
| mem-estimate=463.16MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(tpch_parquet.lineitem.l_orderkey,tpch_parquet.lineitem.l_partkey,tpch_parquet.lineitem.l_suppkey,tpch_parquet.lineitem.l_linenumber,tpch_parquet.lineitem.l_quantity,tpch_parquet.lineitem.l_extendedprice,tpch_parquet.lineitem.l_discount,tpch_parquet.lineitem.l_tax,tpch_parquet.lineitem.l_returnflag,tpch_parquet.lineitem.l_linestatus,tpch_parquet.lineitem.l_shipdate,tpch_parquet.lineitem.l_commitdate,tpch_parquet.lineitem.l_receiptdate,tpch_parquet.lineitem.l_shipinstruct,tpch_parquet.lineitem.l_shipmode,tpch_parquet.lineitem.l_comment)]
| mem-estimate=10.69MB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=427.37MB mem-reservation=74.00MB thread-reservation=2
01:AGGREGATE [STREAMING]
| group by: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment
| mem-estimate=347.37MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.lineitem, RANDOM]
HDFS partitions=1/1 files=3 size=193.98MB
stored statistics:
table: rows=6.00M size=193.98MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=2.14M
mem-estimate=80.00MB mem-reservation=40.00MB thread-reservation=1
tuple-ids=0 row-size=231B cardinality=6.00M
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=112.00MB Threads=3
Per-Host Resource Estimates: Memory=1012MB
Analyzed query: SELECT DISTINCT * FROM tpch_parquet.lineitem
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=110.69MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment
| mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=10.69MB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(tpch_parquet.lineitem.l_orderkey,tpch_parquet.lineitem.l_partkey,tpch_parquet.lineitem.l_suppkey,tpch_parquet.lineitem.l_linenumber,tpch_parquet.lineitem.l_quantity,tpch_parquet.lineitem.l_extendedprice,tpch_parquet.lineitem.l_discount,tpch_parquet.lineitem.l_tax,tpch_parquet.lineitem.l_returnflag,tpch_parquet.lineitem.l_linestatus,tpch_parquet.lineitem.l_shipdate,tpch_parquet.lineitem.l_commitdate,tpch_parquet.lineitem.l_receiptdate,tpch_parquet.lineitem.l_shipinstruct,tpch_parquet.lineitem.l_shipmode,tpch_parquet.lineitem.l_comment)] hosts=3 instances=3
Per-Instance Resources: mem-estimate=473.84MB mem-reservation=34.00MB thread-reservation=1
03:AGGREGATE [FINALIZE]
| group by: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment
| mem-estimate=463.16MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(tpch_parquet.lineitem.l_orderkey,tpch_parquet.lineitem.l_partkey,tpch_parquet.lineitem.l_suppkey,tpch_parquet.lineitem.l_linenumber,tpch_parquet.lineitem.l_quantity,tpch_parquet.lineitem.l_extendedprice,tpch_parquet.lineitem.l_discount,tpch_parquet.lineitem.l_tax,tpch_parquet.lineitem.l_returnflag,tpch_parquet.lineitem.l_linestatus,tpch_parquet.lineitem.l_shipdate,tpch_parquet.lineitem.l_commitdate,tpch_parquet.lineitem.l_receiptdate,tpch_parquet.lineitem.l_shipinstruct,tpch_parquet.lineitem.l_shipmode,tpch_parquet.lineitem.l_comment)]
| mem-estimate=10.69MB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Instance Resources: mem-estimate=427.37MB mem-reservation=74.00MB thread-reservation=1
01:AGGREGATE [STREAMING]
| group by: tpch_parquet.lineitem.l_orderkey, tpch_parquet.lineitem.l_partkey, tpch_parquet.lineitem.l_suppkey, tpch_parquet.lineitem.l_linenumber, tpch_parquet.lineitem.l_quantity, tpch_parquet.lineitem.l_extendedprice, tpch_parquet.lineitem.l_discount, tpch_parquet.lineitem.l_tax, tpch_parquet.lineitem.l_returnflag, tpch_parquet.lineitem.l_linestatus, tpch_parquet.lineitem.l_shipdate, tpch_parquet.lineitem.l_commitdate, tpch_parquet.lineitem.l_receiptdate, tpch_parquet.lineitem.l_shipinstruct, tpch_parquet.lineitem.l_shipmode, tpch_parquet.lineitem.l_comment
| mem-estimate=347.37MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=231B cardinality=6.00M
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [tpch_parquet.lineitem, RANDOM]
HDFS partitions=1/1 files=3 size=193.98MB
stored statistics:
table: rows=6.00M size=193.98MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=2.14M
mem-estimate=80.00MB mem-reservation=40.00MB thread-reservation=0
tuple-ids=0 row-size=231B cardinality=6.00M
in pipelines: 00(GETNEXT)
====
# Aggregation with unknown input - should use default buffer size.
select string_col, count(*)
from functional_parquet.alltypestiny
group by string_col
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=72.01MB Threads=4
Per-Host Resource Estimates: Memory=282MB
WARNING: The following tables are missing relevant table and/or column statistics.
functional_parquet.alltypestiny
Analyzed query: SELECT string_col, count(*) FROM functional_parquet.alltypestiny
GROUP BY string_col
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=10.07MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: string_col, count(*)
| mem-estimate=10.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=71.99KB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(string_col)] hosts=3 instances=3
Per-Host Resources: mem-estimate=128.07MB mem-reservation=34.00MB thread-reservation=1
03:AGGREGATE [FINALIZE]
| output: count:merge(*)
| group by: string_col
| mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(string_col)]
| mem-estimate=71.99KB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=144.00MB mem-reservation=34.01MB thread-reservation=2
01:AGGREGATE [STREAMING]
| output: count(*)
| group by: string_col
| mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [functional_parquet.alltypestiny, RANDOM]
HDFS partitions=4/4 files=4 size=11.92KB
stored statistics:
table: rows=unavailable size=unavailable
partitions: 0/4 rows=unavailable
columns: unavailable
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1
tuple-ids=0 row-size=12B cardinality=unavailable
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=140.02MB Threads=5
Per-Host Resource Estimates: Memory=554MB
WARNING: The following tables are missing relevant table and/or column statistics.
functional_parquet.alltypestiny
Analyzed query: SELECT string_col, count(*) FROM functional_parquet.alltypestiny
GROUP BY string_col
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=10.09MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: string_col, count(*)
| mem-estimate=10.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=95.99KB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(string_col)] hosts=3 instances=4
Per-Instance Resources: mem-estimate=128.09MB mem-reservation=34.00MB thread-reservation=1
03:AGGREGATE [FINALIZE]
| output: count:merge(*)
| group by: string_col
| mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(string_col)]
| mem-estimate=95.99KB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=4
Per-Instance Resources: mem-estimate=144.00MB mem-reservation=34.01MB thread-reservation=1
01:AGGREGATE [STREAMING]
| output: count(*)
| group by: string_col
| mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=1 row-size=20B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [functional_parquet.alltypestiny, RANDOM]
HDFS partitions=4/4 files=4 size=11.92KB
stored statistics:
table: rows=unavailable size=unavailable
partitions: 0/4 rows=unavailable
columns: unavailable
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=0
tuple-ids=0 row-size=12B cardinality=unavailable
in pipelines: 00(GETNEXT)
====