IMPALA-13902: Calcite planner: Implement is_spool_query_results

The is_spool_query_results query option is now supported in Calcite. The
returnAtMostOneRow method is now implemented to support this.
PlanRootSink is refactored to extract sanitizing query options (a new
method sanitizeSpoolingOptions()) out of
PlanRootSink.computeResourceProfile(). The bulk of memory bounding
calculation is also extracted out to a new class SpoolingMemoryBound.

Added "sleep" in ImpalaOperatorTable.java since some EE tests related to
result spooling calls sleep() function. Changed ImpalaPlanRel to extends
RelNode interface.

A sanity test has been added to calcite.test, but the bulk of the
testing will be done through the Impala test framework when it is
enabled.

Testing:
- Pass FE tests PlannerTest#testResultSpooling, TpcdsCpuCostPlannerTest,
  and all java tests under calcite-planner project.
- Pass query_test/test_result_spooling.py and
  custom_cluster/test_result_spooling.py.

Co-authored-by: Riza Suminto

Change-Id: I5b9bf49e2874ee12de212b892bd898c296774c6f
Reviewed-on: http://gerrit.cloudera.org:8080/23562
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Steve Carlin
2025-08-29 08:18:02 -07:00
committed by Impala Public Jenkins
parent 898e03e9d5
commit bc99705252
14 changed files with 327 additions and 168 deletions

View File

@@ -55,10 +55,10 @@ Max Per-Host Resource Reservation: Memory=72.52MB Threads=1
Per-Host Resource Estimates: Memory=269MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=268.97MB mem-reservation=72.52MB thread-reservation=1 runtime-filters-memory=6.00MB
| max-parallelism=1 segment-costs=[1440342244, 4]
| max-parallelism=1 segment-costs=[1440342244, 0]
PLAN-ROOT SINK
| output exprs: avg(tpcds_partitioned_parquet_snap.store_sales.ss_quantity), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_sales_price), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost), sum(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=4
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
11:AGGREGATE [FINALIZE]
| output: avg(CAST(tpcds_partitioned_parquet_snap.store_sales.ss_quantity AS BIGINT)), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_sales_price), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost), sum(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost)
@@ -184,14 +184,14 @@ PLAN-ROOT SINK
tuple-ids=0 row-size=40B cardinality=176.68M(filtered from 863.99M) cost=1216284982
in pipelines: 00(GETNEXT)
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=404.45MB Threads=24
Per-Host Resource Estimates: Memory=677MB
Max Per-Host Resource Reservation: Memory=400.45MB Threads=24
Per-Host Resource Estimates: Memory=673MB
F07:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
| max-parallelism=1 segment-costs=[68, 4] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
| Per-Instance Resources: mem-estimate=68.03KB mem-reservation=0B thread-reservation=1
| max-parallelism=1 segment-costs=[68, 0] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
PLAN-ROOT SINK
| output exprs: avg(tpcds_partitioned_parquet_snap.store_sales.ss_quantity), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_sales_price), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost), sum(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=4
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
19:AGGREGATE [FINALIZE]
| output: avg:merge(tpcds_partitioned_parquet_snap.store_sales.ss_quantity), avg:merge(tpcds_partitioned_parquet_snap.store_sales.ss_ext_sales_price), avg:merge(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost), sum:merge(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost)
@@ -425,14 +425,14 @@ max-parallelism=150 segment-costs=[1456086510]
tuple-ids=0 row-size=40B cardinality=176.68M(filtered from 863.99M) cost=1216284982
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=404.45MB Threads=24
Per-Host Resource Estimates: Memory=677MB
Max Per-Host Resource Reservation: Memory=400.45MB Threads=24
Per-Host Resource Estimates: Memory=673MB
F07:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
| max-parallelism=1 segment-costs=[68, 4] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
| Per-Instance Resources: mem-estimate=68.03KB mem-reservation=0B thread-reservation=1
| max-parallelism=1 segment-costs=[68, 0] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
PLAN-ROOT SINK
| output exprs: avg(tpcds_partitioned_parquet_snap.store_sales.ss_quantity), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_sales_price), avg(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost), sum(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=4
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
19:AGGREGATE [FINALIZE]
| output: avg:merge(tpcds_partitioned_parquet_snap.store_sales.ss_quantity), avg:merge(tpcds_partitioned_parquet_snap.store_sales.ss_ext_sales_price), avg:merge(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost), sum:merge(tpcds_partitioned_parquet_snap.store_sales.ss_ext_wholesale_cost)

View File

@@ -71,10 +71,10 @@ Max Per-Host Resource Reservation: Memory=69.52MB Threads=1
Per-Host Resource Estimates: Memory=250MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=250.04MB mem-reservation=69.52MB thread-reservation=1 runtime-filters-memory=5.00MB
| max-parallelism=1 segment-costs=[1126444413, 1]
| max-parallelism=1 segment-costs=[1126444413, 0]
PLAN-ROOT SINK
| output exprs: sum(tpcds_partitioned_parquet_snap.store_sales.ss_quantity)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=1
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
09:AGGREGATE [FINALIZE]
| output: sum(CAST(tpcds_partitioned_parquet_snap.store_sales.ss_quantity AS BIGINT))
@@ -179,14 +179,14 @@ PLAN-ROOT SINK
tuple-ids=0 row-size=28B cardinality=176.68M(filtered from 863.99M) cost=910976956
in pipelines: 00(GETNEXT)
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=379.14MB Threads=22
Per-Host Resource Estimates: Memory=622MB
Max Per-Host Resource Reservation: Memory=375.14MB Threads=22
Per-Host Resource Estimates: Memory=618MB
F06:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
| max-parallelism=1 segment-costs=[25, 1] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
| Per-Instance Resources: mem-estimate=32.00KB mem-reservation=0B thread-reservation=1
| max-parallelism=1 segment-costs=[25, 0] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
PLAN-ROOT SINK
| output exprs: sum(tpcds_partitioned_parquet_snap.store_sales.ss_quantity)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=1
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
16:AGGREGATE [FINALIZE]
| output: sum:merge(tpcds_partitioned_parquet_snap.store_sales.ss_quantity)
@@ -382,14 +382,14 @@ max-parallelism=130 segment-costs=[1221102531]
tuple-ids=0 row-size=28B cardinality=176.68M(filtered from 863.99M) cost=910976956
in pipelines: 00(GETNEXT)
---- PARALLELPLANS
Max Per-Host Resource Reservation: Memory=379.14MB Threads=22
Per-Host Resource Estimates: Memory=622MB
Max Per-Host Resource Reservation: Memory=375.14MB Threads=22
Per-Host Resource Estimates: Memory=618MB
F06:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
| max-parallelism=1 segment-costs=[25, 1] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
| Per-Instance Resources: mem-estimate=32.00KB mem-reservation=0B thread-reservation=1
| max-parallelism=1 segment-costs=[25, 0] cpu-comparison-result=130 [max(1 (self) vs 130 (sum children))]
PLAN-ROOT SINK
| output exprs: sum(tpcds_partitioned_parquet_snap.store_sales.ss_quantity)
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=1
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
16:AGGREGATE [FINALIZE]
| output: sum:merge(tpcds_partitioned_parquet_snap.store_sales.ss_quantity)

View File

@@ -27,10 +27,10 @@ Max Per-Host Resource Reservation: Memory=192.94MB Threads=1
Per-Host Resource Estimates: Memory=52.18GB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=52.18GB mem-reservation=192.94MB thread-reservation=1 runtime-filters-memory=51.00MB
| max-parallelism=1 segment-costs=[54918503750, 27478414018, 72455093642, 13789590829, 19351054919, 22241751, 1]
| max-parallelism=1 segment-costs=[54918503750, 27478414018, 72455093642, 13789590829, 19351054919, 22241751, 0]
PLAN-ROOT SINK
| output exprs: count()
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=1
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
22:AGGREGATE [FINALIZE]
| output: count()
@@ -238,11 +238,11 @@ PLAN-ROOT SINK
Max Per-Host Resource Reservation: Memory=4.62GB Threads=85
Per-Host Resource Estimates: Memory=59.85GB
F16:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
| max-parallelism=1 segment-costs=[25, 1] cpu-comparison-result=360 [max(1 (self) vs 360 (sum children))]
| Per-Instance Resources: mem-estimate=184.84KB mem-reservation=0B thread-reservation=1
| max-parallelism=1 segment-costs=[25, 0] cpu-comparison-result=360 [max(1 (self) vs 360 (sum children))]
PLAN-ROOT SINK
| output exprs: count()
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=1
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
40:AGGREGATE [FINALIZE]
| output: count:merge()
@@ -641,11 +641,11 @@ max-parallelism=1824 segment-costs=[54881581784, 24500370827] cpu-comparison-res
Max Per-Host Resource Reservation: Memory=4.62GB Threads=85
Per-Host Resource Estimates: Memory=59.85GB
F16:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
| max-parallelism=1 segment-costs=[25, 1] cpu-comparison-result=360 [max(1 (self) vs 360 (sum children))]
| Per-Instance Resources: mem-estimate=184.84KB mem-reservation=0B thread-reservation=1
| max-parallelism=1 segment-costs=[25, 0] cpu-comparison-result=360 [max(1 (self) vs 360 (sum children))]
PLAN-ROOT SINK
| output exprs: count()
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=1
| mem-estimate=0B mem-reservation=0B thread-reservation=0 cost=0
|
40:AGGREGATE [FINALIZE]
| output: count:merge()

View File

@@ -71,7 +71,7 @@ Per-Instance Resources: mem-estimate=16.00MB mem-reservation=32.00KB thread-rese
in pipelines: 00(GETNEXT)
====
# Validate that the maximum memory reservation for PLAN-ROOT SINK is bounded by
# MAX_PINNED_RESULT_SPOOLING_MEMORY.
# MAX_SPILLED_RESULT_SPOOLING_MEM.
select * from tpch.lineitem order by l_orderkey
---- DISTRIBUTEDPLAN
Max Per-Host Resource Reservation: Memory=24.00MB Threads=3

View File

@@ -1128,3 +1128,13 @@ NaN,Inf
---- TYPES
DOUBLE, FLOAT
====
---- QUERY
select count(*) from functional.alltypestiny;
---- RUNTIME_PROFILE
row_regex: .*SPOOL_QUERY_RESULTS=0.*
====
---- QUERY
select * from (values(0));
---- RUNTIME_PROFILE
row_regex: .*SPOOL_QUERY_RESULTS=0.*
====