Before this change, the per-host reservation size was computed
by the Planner. However, scheduling happens after planning,
so the Planner must assume that all fragments run on all
hosts, and the reservation size is likely much larger than
it needs to be.
This moves the computation of the per-host reservation size
to the BE where it can be computed more precisely. This also
includes a number of plan/profile changes.
Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Reviewed-on: http://gerrit.cloudera.org:8080/7630
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.
The basic assumptions for the new resource estimates are:
* resources are acquired during or after the first call to Open()
and released in Close().
* Blocking nodes call Open() on their child before acquiring
their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
from Open().
* The peak resource consumption of the query is the sum of the
independent fragments (except for the parallel join build plans
where we can assume there will be synchronisation). This is
conservative but we don't synchronise fragment Open() and Close()
across exchanges so can't make stronger assumptions in general.
Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.
Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
more complex plan trees inside a single fragment, which is useful
for testing the peak resource requirement logic.
Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Reviewed-on: http://gerrit.cloudera.org:8080/7223
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.
This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.
Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).
I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s. To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.
Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)
Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.
Added an e2e test to make sure that the optimisation is applied in
the backend.
Added planner tests for various cases where codegen should and shouldn't
be disabled.
Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.
Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Reviewed-on: http://gerrit.cloudera.org:8080/7153
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The main idea of this patch is to use table stats to
extrapolate the row counts for new/modified partitions.
Existing behavior:
- Partitions that lack the row count stat are ignored
when estimating the cardinality of HDFS scans. Such
partitions effectively have an estimated row count
of zero.
- We always use the row count stats for partitions that
have one. The row count may be innaccurate if data in
such partitions has changed significantly.
Summary of changes:
- Enhance COMPUTE STATS to also store the total number
of file bytes in the table.
- Use the table-level row count and file bytes stats
to estimate the number of rows in a scan.
- A new impalad startup flag is added to enable/disable
the extrapolation behavior. The feature is disabled by
default. Note that even with the feature disabled,
COMPUTE STATS stores the file bytes so you can enable
the feature without having to run COMPUTE STATS again.
Testing:
- Added new FE unit test
- Added new EE test
Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Reviewed-on: http://gerrit.cloudera.org:8080/6840
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins