IMPALA-14283: Invalidate the cache when served by a new catalogd

Before this patch, coordinator just invalidates the catalog cache when
witness the catalog service id changes in DDL/DML responses or
statestore catalog updates. This is enough in the legacy catalog mode
since these are the only ways that coordinator gets metadata from
catalogd. However, in local catalog mode, coordinator sends
getPartialCatalogObject requests to fetch metadata from catalogd. If the
request is now served by a new catalogd (e.g. due to HA failover),
coordinator should invalidate its catalog cache in case catalog version
overlaps on the same table and unintentionally reuse stale metadata.

To ensure performance, catalogServiceIdLock_ in CatalogdMetaProvider is
refactored to be a ReentrantReadWriteLock. Most of the usages on it just
need the read lock.

This patch also adds the catalog service id in the profile.

Tests:
 - Ran test_warmed_up_metadata_failover_catchup 50 times.
 - Ran FE tests: CatalogdMetaProviderTest and LocalCatalogTest.
 - Ran CORE tests

Change-Id: I751e43f5d594497a521313579defc5b179dc06ce
Reviewed-on: http://gerrit.cloudera.org:8080/23236
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
This commit is contained in:
stiga-huang
2025-08-02 10:03:57 +08:00
committed by Quanlong Huang
parent 447c016ae1
commit aec7380b75
6 changed files with 90 additions and 35 deletions

View File

@@ -622,7 +622,8 @@ enum CatalogLookupStatus {
// TODO: Fix partition lookup logic to not do it with IDs.
PARTITION_NOT_FOUND,
DATA_SOURCE_NOT_FOUND,
VERSION_MISMATCH
VERSION_MISMATCH,
CATALOG_SERVICE_CHANGED
}
// RPC response for GetPartialCatalogObject.
@@ -646,6 +647,9 @@ struct TGetPartialCatalogObjectResponse {
// Loaded time in catalogd corresponding to 'object_version_number'.
9: optional i64 object_loaded_time_ms
// The CatalogService service ID this result came from.
10: optional Types.TUniqueId catalog_service_id
}