mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-12908: Add correctness check for tuple cache
The patch adds a feature to the automated correctness check for tuple cache. The purpose of this feature is to enable the verification of the correctness of the tuple cache by comparing caches with the same key across different queries. The feature consists of two main components: cache dumping and runtime correctness validation. During the cache dumping phase, if a tuple cache is detected, we retrieve the cache from the global cache and dump it to a subdirectory as a reference file within the specified debug dumping directory. The subdirectory is using the cache key as its name. Additionally, data from the child is also read and dumped to a separate file in the same directory. We expect these two files to be identical, assuming the results are deterministic. For non-deterministic cases like TOP-N or others, we may detect them and exclude them from dumping later. Furthermore, the cache data will be transformed into a human-readable text format on a row-by-row basis before dumping. This approach allows for easier investigation and later analysis. The verification process starts by comparing the entire file content sharing with the same key. If the content matches, the verification is considered successful. However, if the content doesn't match, we enter a slower mode where we compare all the rows individually. In the slow mode, we will create a hash map from the reference cache file, then iterate the current cache file row by row and search if every row exists in the hash map. Additionally, a counter is integrated into the hash map to handle scenarios involving duplicated rows. Once verification is complete, if no discrepancies are found, both files will be removed. If discrepancies are detected, the files will be kept and appended with a '.bad' postfix. New start flags: Added a starting flag tuple_cache_debug_dump_dir for specifying the directory for dumping the result caches. if tuple_cache_debug_dump_dir is empty, the feature is disabled. Added a query option enable_tuple_cache_verification to enable or disable the tuple cache verification. Default is true. Only valid when tuple_cache_debug_dump_dir is specified. Tests: Ran the testcase test_tuple_cache_tpc_queries and caught known inconsistencies. Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c Reviewed-on: http://gerrit.cloudera.org:8080/21754 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This commit is contained in:
@@ -489,7 +489,9 @@ error_codes = (
|
||||
"Subscriber '$0' has incompatible protocol version V$1 conflicting with statestored's "
|
||||
"version V$2"),
|
||||
|
||||
("JDBC_CONFIGURATION_ERROR", 159, "Error in JDBC table configuration: $0.")
|
||||
("JDBC_CONFIGURATION_ERROR", 159, "Error in JDBC table configuration: $0."),
|
||||
|
||||
("TUPLE_CACHE_INCONSISTENCY", 160, "Inconsistent tuple cache found: $0.")
|
||||
)
|
||||
|
||||
import sys
|
||||
|
||||
Reference in New Issue
Block a user