IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up

HdfsTable tracks the ValidWriteIdList from HMS. When the table is
reloaded, the ValidWriteIdList is updated to the latest state. An
ABORT_TXN event that is lagging behind could match to aborted write ids
that have already been cleaned up by the HMS housekeeping thread. Such
write ids can't be found in the cached ValidWriteIdList as opened or
aborted write ids. This hits a Precondition check and fails the event
processing.

This patch fixes the check to allow this case. Also adds more logs for
dealing with write ids.

Tests
 - Add custom-cluster test to start Hive with the housekeeping thread
   turned on and verified that such ABORT_TXN event is processed
   correctly.

Change-Id: I93b6f684d6e4b94961d804a0c022029249873681
Reviewed-on: http://gerrit.cloudera.org:8080/21071
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
stiga-huang
2024-02-26 09:50:13 +08:00
committed by Impala Public Jenkins
parent 97cd30c607
commit 5250cc14b6
8 changed files with 84 additions and 11 deletions

View File

@@ -151,6 +151,13 @@ mkdir -p hive-site-events-cleanup
rm -f hive-site-events-cleanup/hive-site.xml
ln -s "${CONFIG_DIR}/hive-site_events_cleanup.xml" hive-site-events-cleanup/hive-site.xml
export HIVE_VARIANT=housekeeping_on
$IMPALA_HOME/bin/generate_xml_config.py hive-site.xml.py hive-site_housekeeping_on.xml
mkdir -p hive-site-housekeeping-on
rm -f hive-site-housekeeping-on/hive-site.xml
ln -s "${CONFIG_DIR}/hive-site_housekeeping_on.xml" \
hive-site-housekeeping-on/hive-site.xml
export HIVE_VARIANT=ranger_auth
HIVE_RANGER_CONF_DIR=hive-site-ranger-auth
$IMPALA_HOME/bin/generate_xml_config.py hive-site.xml.py hive-site_ranger_auth.xml