4 Commits

Author SHA1 Message Date
Joe McDonnell
ca356a8df5 IMPALA-13437 (part 2): Implement cost-based tuple cache placement
This changes the default behavior of the tuple cache to consider
cost when placing the TupleCacheNodes. It tries to pick the best
locations within a budget. First, it eliminates unprofitable locations
via a threshold. Next, it ranks the remaining locations by their
profitability. Finally, it picks the best locations in rank order
until it reaches the budget.

The threshold is based on the ratio of processing cost for regular
execution versus the processing cost for reading from the cache.
If the ratio is below the threshold, the location is eliminated.
The threshold is specified by the tuple_cache_required_cost_reduction_factor
query option. This defaults to 3.0, which means that the cost of
reading from the cache must be less than 1/3 the cost of computing
the value normally. A higher value makes this more restrictive
about caching locations, which pushes in the direction of lower
overhead.

The ranking is based on the cost reduction per byte. This is given
by the formula:
 (regular processing cost - cost to read from cache) / estimated serialized size
This prefers locations with small results or high reduction in cost.

The budget is based on the estimated serialized size per node. This
limits the total caching that a query will do. A higher value allows more
caching, which can increase the overhead on the first run of a query. A lower
value is less aggressive and can limit the overhead at the expense of less
caching. This uses a per-node limit as the limit should scale based on the
size of the executor group as each executor brings extra capacity. The budget
is specified by the tuple_cache_budget_bytes_per_executor.

The old behavior to place the tuple cache at all eligible locations is
still available via the tuple_cache_placement_policy query option. The
default is the cost_based policy described above, but the old behavior
is available via the all_eligible policy. This is useful for correctness
testing (and the existing tuple cache test cases).

This changes the explain plan output:
 - The hash trace is only enabled at VERBOSE level. This means that the regular
   profile will not contain the hash trace, as the regular profile uses EXTENDED.
 - This adds additional information at VERBOSE to display the cost information
   for each plan node. This can help trace why a particular location was
   not picked.

Testing:
 - This adds a TPC-DS planner test with tuple caching enabled (based on the
   existing TpcdsCpuCostPlannerTest)
 - This modifies existing tests to adapt to changes in the explain plan output

Change-Id: Ifc6e7b95621a7937d892511dc879bf7c8da07cdc
Reviewed-on: http://gerrit.cloudera.org:8080/23219
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-18 21:02:51 +00:00
Joe McDonnell
14597c7e2f IMPALA-13964: Fix test_tuple_cache_tpc_queries.py flakiness
This makes two changes to deflake test_tuple_cache_tpc_queries.py.
First, it increases the runtime filter wait time from 60 seconds to
600 seconds. The correctness verification slows down the path
that produces the runtime filter. The slowdown is dependent on
the speed of storage, so this can get very slow on test machines.

Second, this skips correctness checking for locations that are just
after streaming aggregations. Streaming aggregations can produce
variable output that the correctness checking can't handle.
For example a grouping aggregation computing a sum might have
a preaggregation produce either (A: 3) or (A: 2), (A: 1) or
(A: 1), (A: 1), (A: 1). The finalization sees these as equivalent.
This marks the nodes as variable starting with the preaggregation
and clears the mark at the finalize stage.

When skipping correctness checking, the tuple cache node does not
hit the cache normally. This guarantees that its children will run
and go through correctness checking.

Testing:
 - Ran test_tuple_cache_tpc_queries.py locally
 - Added a frontend test for this specific case

Change-Id: If5e1be287bdb489a89aea3b2d7bec416220feb9a
Reviewed-on: http://gerrit.cloudera.org:8080/23010
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-06-12 15:48:32 +00:00
Yida Wu
0767ae065a IMPALA-12908: (Addendum) use RUNTIME_FILTER_WAIT_TIME_MS for tuple cache TPC testing
When runtime filters arrive after tuple caching has occurred, they
can't filter the cached results. This can lead to larger tuple caching
result sets than expected, causing correctness check failures in TPC
tests.

While other solutions may exist, extending RUNTIME_FILTER_WAIT_TIME_MS
is a simple fix by ensuring runtime filters are applied before tuple
caching.

Also set the query option enable_tuple_cache_verification to false
by default, as the filter arrival time may affect the correctness
check. To avoid flaky tests, change to use a more conservative
approach and only enable the correctness check when explicitly
specified by the testcase.

Tests:
Verified TPC tests pass correctness checks with increased runtime
filter wait time.

Change-Id: Ie70a87344c436ce8e2073575df5c5bf762ef562d
Reviewed-on: http://gerrit.cloudera.org:8080/21898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-10-15 23:38:03 +00:00
Yida Wu
f2a09b6dda IMPALA-12907: Add testcases for TPC-H/TPC-DS queries with tuple caching
Added testcases to run TPC-H and TPC-DS queries twice with tuple
caching to verify that Impala won't crash and ensure the
correctness of the results.

Testcases allows mt_dop to be 0 or 4.

Also, added the environment varibles of tuple cache to
run-all-tests.sh and added skipif to test_tuple_cache_tpc_queries.py
to skip if not tuple cache enabled.

Tests:
Ran the tests in the build with tuple cache enabled.

Change-Id: I967372744d8dda25cbe372aefec04faec5a76847
Reviewed-on: http://gerrit.cloudera.org:8080/21628
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-16 07:30:26 +00:00