IMPALA-12908: Add correctness check for tuple cache

The patch adds a feature to the automated correctness check for
tuple cache. The purpose of this feature is to enable the
verification of the correctness of the tuple cache by comparing
caches with the same key across different queries.

The feature consists of two main components: cache dumping and
runtime correctness validation.

During the cache dumping phase, if a tuple cache is detected,
we retrieve the cache from the global cache and dump it to a
subdirectory as a reference file within the specified debug
dumping directory. The subdirectory is using the cache key as
its name. Additionally, data from the child is also read and
dumped to a separate file in the same directory. We expect
these two files to be identical, assuming the results are
deterministic. For non-deterministic cases like TOP-N or others,
we may detect them and exclude them from dumping later.
Furthermore, the cache data will be transformed into a
human-readable text format on a row-by-row basis before dumping.
This approach allows for easier investigation and later analysis.

The verification process starts by comparing the entire file
content sharing with the same key. If the content matches, the
verification is considered successful. However, if the content
doesn't match, we enter a slower mode where we compare all the
rows individually. In the slow mode, we will create a hash map
from the reference cache file, then iterate the current cache
file row by row and search if every row exists in the hash map.
Additionally, a counter is integrated into the hash map to
handle scenarios involving duplicated rows. Once verification is
complete, if no discrepancies are found, both files will be removed.
If discrepancies are detected, the files will be kept and appended
with a '.bad' postfix.

New start flags:
Added a starting flag tuple_cache_debug_dump_dir for specifying
the directory for dumping the result caches. if
tuple_cache_debug_dump_dir is empty, the feature is disabled.

Added a query option enable_tuple_cache_verification to enable
or disable the tuple cache verification. Default is true. Only
valid when tuple_cache_debug_dump_dir is specified.

Tests:
Ran the testcase test_tuple_cache_tpc_queries and caught known
inconsistencies.

Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c
Reviewed-on: http://gerrit.cloudera.org:8080/21754
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This commit is contained in:
Yida Wu
2023-05-11 13:34:46 -07:00
committed by Joe McDonnell
parent 4c582fc55b
commit f11172a4a2
21 changed files with 1033 additions and 12 deletions

View File

@@ -489,7 +489,9 @@ error_codes = (
"Subscriber '$0' has incompatible protocol version V$1 conflicting with statestored's "
"version V$2"),
("JDBC_CONFIGURATION_ERROR", 159, "Error in JDBC table configuration: $0.")
("JDBC_CONFIGURATION_ERROR", 159, "Error in JDBC table configuration: $0."),
("TUPLE_CACHE_INCONSISTENCY", 160, "Inconsistent tuple cache found: $0.")
)
import sys