IMPALA-14006: Bound max_instances in CreateInputCollocatedInstances

IMPALA-11604 (part 2) changes how many instances to create in
Scheduler::CreateInputCollocatedInstances. This works when the left
child fragment of a parent fragment is distributed across nodes.
However, if the left child fragment instance is limited to only 1
node (the case of UNPARTITIONED fragment), the scheduler might
over-parallelize the parent fragment by scheduling too many instances in
a single node.

This patch attempts to mitigate the issue in two ways. First, it adds
bounding logic in PlanFragment.traverseEffectiveParallelism() to lower
parallelism further if the left (probe) side of the child fragment is
not well distributed across nodes.

Second, it adds TQueryExecRequest.max_parallelism_per_node to relay
information from Analyzer.getMaxParallelismPerNode() to the scheduler.
With this information, the scheduler can do additional sanity checks to
prevent Scheduler::CreateInputCollocatedInstances from
over-parallelizing a fragment. Note that this sanity check can also cap
MAX_FS_WRITERS option under a similar scenario.

Added ScalingVerdict enum and TRACE log it to show the scaling decision
steps.

Testing:
- Add planner test and e2e test that exercise the corner case under
  COMPUTE_PROCESSING_COST=1 option.
- Manually comment the bounding logic in traverseEffectiveParallelism()
  and confirm that the scheduler's sanity check still enforces the
  bounding.

Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
Reviewed-on: http://gerrit.cloudera.org:8080/22840
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Riza Suminto
2025-04-28 12:24:30 -07:00
committed by Impala Public Jenkins
parent c0c6cc9df4
commit 3210ec58c5
8 changed files with 250 additions and 35 deletions

View File

@@ -1118,5 +1118,9 @@ struct TQueryExecRequest {
// The unbounded version of cores_required. Used by Frontend to do executor group-set
// assignment for the query. Should either be unset or set with positive value.
18: optional i32 cores_required_unbounded
// Propagated value from Analyzer.getMaxParallelismPerNode().
// Used by scheduler.cc as sanity check during scheduling.
19: optional i32 max_parallelism_per_node
}