impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	775f73f03e	IMPALA-14462: Fix tie-breaking for sorting scan ranges oldest to newest TestTupleCacheFullCluster.test_scan_range_distributed is flaky on s3 builds. The addition of a single file is changing scheduling significantly even with scan ranges sorted oldest to newest. This is because modification times on S3 have a granularity of one second. Multiple files have the same modification time, and the fix for IMPALA-13548 did not properly break ties for sorting. This adds logic to break ties for files with the same modification time. It compares the path (absolute path or relative path + partition) as well as the offset within the file. These should be enough to break all conceivable ties, as it is not possible to have two scan ranges with the same file at the same offset. In debug builds, this does additional validation to make sure that when a != b, comp(a, b) != comp(b, a). The test requires that adding a single file to the table changes exactly one cache key. If that final file has the same modification time as an existing file, scheduling may still mix up the files and change more than one cache key, even with tie-breaking. This adds a sleep just before generating the final file to guarantee that it gets a newer modification time. Testing: - Ran TestTupleCacheFullCluster.test_scan_range_distributed for 15 iterations on S3 Change-Id: I3f2e40d3f975ee370c659939da0374675a28cd38 Reviewed-on: http://gerrit.cloudera.org:8080/23458 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-25 23:30:28 +00:00
Joe McDonnell	e05d92cb3d	IMPALA-13548: Schedule scan ranges oldest to newest for tuple caching Scheduling does not sort scan ranges by modification time. When a new file is added to a table, its order in the list of scan ranges is not based on modification time. Instead, it is based on which partition it belongs to and what its filename is. A new file that is added early in the list of scan ranges can cause cascading differences in scheduling. For tuple caching, this means that multiple runtime cache keys could change due to adding a single file. To minimize that disruption, this adds the ability to sort the scan ranges by modification time and schedule scan ranges oldest to newest. This enables it for scan nodes that feed into tuple cache nodes (similar to deterministic scan range assignment). Testing: - Modified TestTupleCacheFullCluster::test_scan_range_distributed to have stricter checks about how many cache keys change after an insert (only one should change) - Modified TupleCacheTest#testDeterministicScheduling to verify that oldest to newest scheduling is also enabled. Change-Id: Ia4108c7a00c6acf8bbfc036b2b76e7c02ae44d47 Reviewed-on: http://gerrit.cloudera.org:8080/23228 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-22 20:51:46 +00:00
Joe McDonnell	ca356a8df5	IMPALA-13437 (part 2): Implement cost-based tuple cache placement This changes the default behavior of the tuple cache to consider cost when placing the TupleCacheNodes. It tries to pick the best locations within a budget. First, it eliminates unprofitable locations via a threshold. Next, it ranks the remaining locations by their profitability. Finally, it picks the best locations in rank order until it reaches the budget. The threshold is based on the ratio of processing cost for regular execution versus the processing cost for reading from the cache. If the ratio is below the threshold, the location is eliminated. The threshold is specified by the tuple_cache_required_cost_reduction_factor query option. This defaults to 3.0, which means that the cost of reading from the cache must be less than 1/3 the cost of computing the value normally. A higher value makes this more restrictive about caching locations, which pushes in the direction of lower overhead. The ranking is based on the cost reduction per byte. This is given by the formula: (regular processing cost - cost to read from cache) / estimated serialized size This prefers locations with small results or high reduction in cost. The budget is based on the estimated serialized size per node. This limits the total caching that a query will do. A higher value allows more caching, which can increase the overhead on the first run of a query. A lower value is less aggressive and can limit the overhead at the expense of less caching. This uses a per-node limit as the limit should scale based on the size of the executor group as each executor brings extra capacity. The budget is specified by the tuple_cache_budget_bytes_per_executor. The old behavior to place the tuple cache at all eligible locations is still available via the tuple_cache_placement_policy query option. The default is the cost_based policy described above, but the old behavior is available via the all_eligible policy. This is useful for correctness testing (and the existing tuple cache test cases). This changes the explain plan output: - The hash trace is only enabled at VERBOSE level. This means that the regular profile will not contain the hash trace, as the regular profile uses EXTENDED. - This adds additional information at VERBOSE to display the cost information for each plan node. This can help trace why a particular location was not picked. Testing: - This adds a TPC-DS planner test with tuple caching enabled (based on the existing TpcdsCpuCostPlannerTest) - This modifies existing tests to adapt to changes in the explain plan output Change-Id: Ifc6e7b95621a7937d892511dc879bf7c8da07cdc Reviewed-on: http://gerrit.cloudera.org:8080/23219 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-18 21:02:51 +00:00
Riza Suminto	1cead45114	IMPALA-13947: Test local catalog mode by default Local catalog mode has been the default and works well in downstream Impala for over 5 years. This patch turn on local catalog mode by default (--catalog_topic_mode=minimal and --use_local_catalog=true) as preferred mode going forward. Implemented LocalCatalog.setIsReady() to facilitate using local catalog mode for FE tests. Some FE tests fail due to behavior differences in local catalog mode like IMPALA-7539. This is probably OK since Impala now largely hand over FileSystem permission check to Apache Ranger. The following custom cluster tests are pinned to evaluate under legacy catalog mode because their behavior changed in local catalog mode: TestCalcitePlanner.test_calcite_frontend TestCoordinators.test_executor_only_lib_cache TestMetadataReplicas TestTupleCacheCluster TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set --use_hms_column_order_for_hbase_tables=true flag for both impalad and catalogd to get consistent column order in either local or legacy catalog mode. Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error assertions to be more fine grained by matching individual query id. Move most of test methods from TestRangerLegacyCatalog to TestRangerLocalCatalog, except for some that do need to run in legacy catalog mode. Also renamed TestRangerLocalCatalog to TestRangerDefaultCatalog. Table ownership issue in local catalog mode remains unresolved (see IMPALA-8937). Testing: Pass exhaustive tests. Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f Reviewed-on: http://gerrit.cloudera.org:8080/23080 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-06 21:42:24 +00:00
Joe McDonnell	07a5716773	IMPALA-13892: Add support for printing STRUCTs Tuple cache correctness verification is failing as the code in debug-util.cc used for printing the text version of tuples does not support printing structs. It hits a DCHECK and kills Impala. This adds supports for printing structs to debug-util.cc, fixing tuple cache correctness verification for complex types. To print structs correctly, each slot needs to know its field name. The ColumnType has this information, but it requires a field idx to lookup the name. This is the last index in the absolute path for this slot. However, the materialized path can be truncated to remove some indices at the end. Since we need that information to resolve the field name, this adds the struct field idx to the TSlotDescriptor to pass it to the backend. This also adds a counter to the profile to track when correctness verification is on. This is useful for testing. Testing: - Added a custom cluster test using nested types with correctness verification - Examined some of the text files Change-Id: Ib9479754c2766a9dd6483ba065e26a4d3a22e7e9 Reviewed-on: http://gerrit.cloudera.org:8080/23075 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-07-23 16:15:30 +00:00
Joe McDonnell	78a27c56fe	IMPALA-13898: Incorporate partition information into tuple cache keys Currently, the tuple cache keys do not include partition information in either the planner key or the fragment instance key. However, the partition actually is important to correctness. First, there are settings defined on the table and partition that can impact the results. For example, for processing text files, the separator, escape character, etc are specified at the table level. This impacts the rows produced from a given file. There are other such settings stored at the partition level (e.g. the JSON binary format). Second, it is possible to have two partitions pointed at the same filesystem location. For example, scale_db.num_partitions_1234_blocks_per_partition_1 is a table that has all partitions pointing to the same location. In that case, the cache can't tell the partitions apart based on the files alone. This is an exotic configuration. Incorporating an identifier of the partition (e.g. the partition keys/values) allows the cache to tell the difference. To fix this, we incorporate partition information into the key. At planning time, when incorporating the scan range information, we also incorporate information about the associated partitions. This moves the code to HdfsScanNode and changes it to iterate over the partitions, hashing both the partition information and the scan ranges. At runtime, the TupleCacheNode looks up the partition associated with a scan node and hashes the additional information on the HdfsPartitionDescriptor. This includes some test-only changes to make it possible to run the TestBinaryType::test_json_binary_format test case with tuple caching. ImpalaTestSuite::_get_table_location() (used by clone_table()) now detects a fully-qualified table name and extracts the database from it. It only uses the vector to calculate the database if the table is not fully qualified. This allows a test to clone a table without needing to manipulate its vector to match the right database. This also changes _get_table_location() so that it does not switch into the database. This required reworking test_scanners_fuzz.py to use absolute paths for queries. It turns out that some tests in test_scanners_fuzz.py were running in the wrong database and running against uncorrupted tables. After this is corrected, some tests can crash Impala. This xfails those tests until this can be fixed (tracked by IMPALA-14219). Testing: - Added a frontend test in TupleCacheTest for a table with multiple partitions pointed at the same place. - Added custom cluster tests testing both issues Change-Id: I3a7109fcf8a30bf915bb566f7d642f8037793a8c Reviewed-on: http://gerrit.cloudera.org:8080/23074 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-07-17 01:07:44 +00:00
Joe McDonnell	7b25a7b070	IMPALA-13887: Incorporate column/field information into cache key The correctness verification for the tuple cache found an issue with TestParquet::test_resolution_by_name(). The test creates a table, selects, alters the table to change a column name, and selects again. With parquet_fallback_schema_resolution=NAME, the column names determine behavior. The tuple cache key did not include the column names, so it was producing an incorrect result after changing the column name. This change adds information about the column / field name to the TSlotDescriptor so that it is incorporated into the tuple cache key. This is only needed when producing the tuple cache key, so it is omitted for other cases. Testing: - Ran TestParquet::test_resolution_by_name() with correctness verification - Added custom cluster test that runs the test_resolution_by_name() test case with tuple caching. This fails without this change. Change-Id: Iebfa777452daf66851b86383651d35e1b0a5f262 Reviewed-on: http://gerrit.cloudera.org:8080/23073 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-27 18:13:19 +00:00
Michael Smith	f222574f04	IMPALA-13660: Support caching broadcast hash joins This extends tuple caching to be able to cache above joins. As part of this, ExchangeNodes are now eligible for broadcast and directed exchanges. This does not yet support partitioned exchanges. Since an exchange passes data from all nodes, this incorporates all the scan range information when passing through an exchange. For joins with a separate build side, a cache hit above the join means that a probe-side thread will never arrive. If the builder is not notified, it will wait for that thread to arrive and extend the latency of the query significantly. This adds code to notify the builder when a thread will never participate in the probe phase. Testing: - Added test cases to TestTupleCace including with distributed plans. - Added test cases to test_tuple_cache.py to verify behavior when updating the build side table and the timing of a cache hit. - Performance tests with TPC-DS at scale Change-Id: Ic61462702b43175c593b34e8c3a14b9cfe85c29e Reviewed-on: http://gerrit.cloudera.org:8080/22371 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-10 05:53:44 +00:00
Joe McDonnell	7aa4d50484	IMPALA-13882: Fix Iceberg v2 deletes with tuple caching A variety of Iceberg statements (including v2 deletes) rely on getting information from the scan node child of the delete node. Since tuple caching can insert a TupleCacheNode above that scan, the logic is currently failing, because it doesn't know how to bypass the TupleCacheNode and get to the scan node below. This modifies the logic in multiple places to detect a TupleCacheNode and go past it to the get the scan node below it. Testing: - Added a basic Iceberg test with v2 deletes for the frontend test and custom cluster test Change-Id: I162e738c4e4449a536701a740272aaac56ce8fd8 Reviewed-on: http://gerrit.cloudera.org:8080/22666 Reviewed-by: Kurt Deschler <kdeschle@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-10 00:20:24 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Michael Smith	4a645105f9	IMPALA-13658: Enable tuple caching aggregates Enables tuple caching on aggregates directly above scan nodes. Caching aggregates requires that their children are also eligible for caching, so this excludes aggregates above an exchange, union, or hash join. Testing: - Adds Planner tests for different aggregate cases to confirm they have stable tuple cache keys and are valid for caching. - Adds custom cluster tests that cached aggregates are used, and can be re-used in slightly different statements. Change-Id: I9bd13c2813c90d23eb3a70f98068fdcdab97a885 Reviewed-on: http://gerrit.cloudera.org:8080/22322 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-22 23:26:21 +00:00
Michael Smith	2085edbe1c	IMPALA-13503: CustomClusterTestSuite for whole class Allow using CustomClusterTestSuite with a single cluster for the whole class. This speeds up tests by letting us group together multiple test cases on the same cluster configuration and only starting the cluster once. Updates tuple cache tests as an example of how this can be used. Reduces test_tuple_cache execution time from 100s to 60s. Change-Id: I7a08694edcf8cc340d89a0fb33beb8229163b356 Reviewed-on: http://gerrit.cloudera.org:8080/22006 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-18 23:57:39 +00:00
Yida Wu	493d563590	IMPALA-13510: Unset the environment variable for tuple cache tests The test_cache_disabled test case would fail in the tuple cache build because the build enables the tuple cache using the environment variables, while the test case requires the tuple cache to remain disabled. This patch unset the related environment variable TUPLE_CACHE_DIR before running the tuple cache related tests to ensure the tests are using its own starting flags. Tests: Passed the test_tuple_cache.py under the tuple cache build. Change-Id: I2b551e533c7c69d5b29ed6ad6af90be57f53c937 Reviewed-on: http://gerrit.cloudera.org:8080/22018 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-07 10:24:19 +00:00
Joe McDonnell	1267fde57b	IMPALA-13497: Add TupleCacheBytesWritten/Read to the profile This adds counters for the number of bytes written / read from the tuple cache. This gives visibility into whether certain locations have enormous result sizes. This will be used to tune the placement of tuple cache nodes. Tests: - Added checks of the TupleCacheBytesWritten/Read counters to existing tests in test_tuple_cache.py Change-Id: Ib5c9249049d8d46116a65929896832d02c2d9f1f Reviewed-on: http://gerrit.cloudera.org:8080/21991 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-30 21:50:32 +00:00
Yida Wu	47b638e667	IMPALA-13188: Add test that compute stats does not result in a different tuple cache key The patch introduces a new test, TestTupleCacheComputeStats, to verify that compute stats does not change the tuple cache key. The test creates a simple table with one row, runs an explain on a basic query, then inserts more rows, computes the stats, and reruns the same explain query. It compares the two results to ensure that the cache keys are identical in the planning phase. Tests: Passed the test. Change-Id: I918232f0af3a6ab8c32823da4dba8f8cd31369d0 Reviewed-on: http://gerrit.cloudera.org:8080/21917 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-22 02:20:34 +00:00
Yida Wu	f5a8771a98	IMPALA-13411: Fix DCHECK fires for scan nodes that produce zero-length tuples Removed the DCHECK assertion that tuple_data_len must be greater than zero in tuple-file-writer.cc and tuple-file-reader.cc, because in certain cases, such as count(*), tuple_data_len can be zero, as no column data is returned and only the row count matters. Tests: Adds TestTupleCacheCountStar for the regression test. Change-Id: I264b537f0eb678b65081e90c21726198f254513d Reviewed-on: http://gerrit.cloudera.org:8080/21953 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-17 23:25:40 +00:00
Michael Smith	b6b953b48e	IMPALA-13186: Tag query option scope for tuple cache Constructs a hash of non-default query options that are relevant to query results; by default query options are included in the hash. Passes this hash to the frontend for inclusion in the tuple cache key on plan leaf nodes (which will be included in parent key calculation). Modifies MurmurHash3 to be re-entrant so the backend can construct a hash incrementally. This is slightly slower but more memory efficient than accumulating all hash inputs in a contiguous array first. Adds TUPLE_CACHE_EXEMPT_QUERY_OPT_FN to mark query options that can be ignored when calculating a tuple cache hash. Adds startup flag 'tuple_cache_exempt_query_options' as a safety valve for query options that might be important to exempt that we missed. Removes duplicate printing logic for query options from child-query.cc in favor of re-using TQueryOptionsToMap, which does the same thing. Cleans up query-options.cc helpers so they're static and reduces duplicate printing logic. Adds test that different values for a relevant query option use different cache entries. Adds startup flag 'tuple_cache_ignore_query_options' to omit query options for testing certain tuple cache failure modes, where we need to use debug actions. Change-Id: I1f4802ad9548749cd43df8848b6f46dca3739ae7 Reviewed-on: http://gerrit.cloudera.org:8080/21698 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-09 17:21:10 +00:00
Michael Smith	d81c4db5e1	IMPALA-13185: Include runtime filter source in key Incorporates the build-side PlanNode of a runtime filter in the tuple cache key to avoid re-using intermediate results that were generated using a runtime filter on the same target but different selection criteria (build-side conjuncts). We currently don't support caching ExchangeNode, but a common scenario is a runtime filter produced by a HashJoin, with an Exchange on the build side. Looks through the first ExchangeNode when considering the cache key and eligibility for the build side source for a runtime filter. Testing shows all tests now passing for test_tuple_cache_tpc_queries except those that hit "TupleCacheNode does not enforce limits itself and cannot have a limit set." Adds planner tests covering some scenarios where runtime filters are expected to match or differ, and custom cluster tests for multi-node testing. Change-Id: I0077964be5acdb588d76251a6a39e57a0f42bb5a Reviewed-on: http://gerrit.cloudera.org:8080/21729 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-10-04 17:10:29 +00:00
Michael Smith	6642b75efc	IMPALA-13402: Clean up test_tuple_cache dimensions Uses exec_option_dimension to specify exec options, and avoids starting a cluster when the test would just be skipped. Uses other standard helpers to replace custom methods that were less flexible. Change-Id: Ib241f1f1cfaf918dffaddd5aeef3884c70e0a3fb Reviewed-on: http://gerrit.cloudera.org:8080/21859 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-01 21:32:40 +00:00
Joe McDonnell	831585c2f5	IMPALA-12906: Incorporate scan range information into the tuple cache key This change is accomplishing two things: 1. It incorporates scan range information into the tuple cache key. 2. It reintroduces deterministic scheduling as an option for mt_dop and uses it for HdfsScanNodes that feed into a TupleCacheNode. The combination of these two things solves several problems: 1. When a table is modified, the list of scan ranges will change, and this will naturally change the cache keys. 2. When accessing a partitioned table, two queries may have different predicates on the partition columns. Since the predicates can be satisfied via partition pruning, they are not included at runtime. This means that two queries may have identical compile-time keys with only the scan ranges being different due to different partition pruning. 3. Each fragment instance processes different scan ranges, so each will have a unique cache key. To incorporate scan range information, this introduces a new per-fragment-instance cache key. At compile time, the planner now keeps track of which HdfsScanNodes feed into a TupleCacheNode. This is passed over to the runtime as a list of plan node ids that contain scan ranges. At runtime, the fragment instance walks through the list of plan nodes ids and hashes any scan ranges associated with them. This hash is the per-fragment-instance cache key. The combination of the compile-time cache key produced by the planner and the per-fragment-instance cache key is a unique identifier of the result. Deterministic scheduling for mt_dop was removed via IMPALA-9655 with the introduction of the shared queue. This revives some of the pre-IMPALA-9655 scheduling logic as an option. Since the TupleCacheNode knows which HdfsScanNodes feed into it, the TupleCacheNode turns on deterministic scheduling for all of those HdfsScanNodes. Since this only applies to HdfsScanNodes that feed into a TupleCacheNode, it means that any HdfsScanNode that doesn't feed into a TupleCacheNode continues using the current algorithm. The pre-IMPALA-9655 code is modified to make it more deterministic about how it assigns scan ranges to instances. Testing: - Added custom cluster tests for the scan range information including modifying a table, selecting from a partitioned table, and verifying that fragment instances have unique keys - Added basic frontend test to verify that deterministic scheduling gets set on the HdfsScanNode that feed into the TupleCacheNode. - Restored the pre-IMPALA-9655 backend test to cover the LPT code Change-Id: Ibe298fff0f644ce931a2aa934ebb98f69aab9d34 Reviewed-on: http://gerrit.cloudera.org:8080/21541 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com>	2024-09-04 16:31:31 +00:00
Michael Smith	6121c4f7d6	IMPALA-12905: Disk-based tuple caching This implements on-disk caching for the tuple cache. The TupleCacheNode uses the TupleFileWriter and TupleFileReader to write and read back tuples from local files. The file format uses RowBatch's standard serialization used for KRPC data streams. The TupleCacheMgr is the daemon-level structure that coordinates the state machine for cache entries, including eviction. When a writer is adding an entry, it inserts an IN_PROGRESS entry before starting to write data. This does not count towards cache capacity, because the total size is not known yet. This IN_PROGRESS entry prevents other writers from concurrently writing the same entry. If the write is successful, the entry transitions to the COMPLETE state and updates the total size of the entry. If the write is unsuccessful and a new execution might succeed, then the entry is removed. If the write is unsuccessful and won't succeed later (e.g. if the total size of the entry exceeds the max size of an entry), then it transitions to the TOMBSTONE state. TOMBSTONE entries avoid the overhead of trying to write entries that are too large. Given these states, when a TupleCacheNode is doing its initial Lookup() call, one of three things can happen: 1. It can find a COMPLETE entry and read it. 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it cannot read or write the entry. 3. It finds no entry and inserts its own IN_PROGRESS entry to start a write. The tuple cache is configured using the tuple_cache parameter, which is a combination of the cache directory and the capacity similar to the data_cache parameter. For example, /data/0:100GB uses directory /data/0 for the cache with a total capacity of 100GB. This currently supports a single directory, but it can be expanded to multiple directories later if needed. The cache eviction policy can be specified via the tuple_cache_eviction_policy parameter, which currently supports LRU or LIRS. The tuple_cache parameter cannot be specified if allow_tuple_caching=false. This contains contributions from Michael Smith, Yida Wu, and Joe McDonnell. Testing: - This adds basic custom cluster tests for the tuple cache. Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Reviewed-on: http://gerrit.cloudera.org:8080/21171 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>	2024-04-10 03:11:49 +00:00

21 Commits