IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject request

getPartialCatalogObject is a catalogd RPC used by local catalog mode
coordinators to fetch metadata on-demand from catalogd.
For a table with a huge number (e.g. 6M) of files, catalogd might hit
OOM of exceeding the JVM array limit when serializing the response of
a getPartialCatalogObject request for all partitions (thus all files).

This patch adds a new flag, catalog_partial_fetch_max_files, to define
the max number of file descriptors allowed in a response of
getPartialCatalogObject. Catalogd will truncate the response in
partition level when it's too big, and only return a subset of the
requested partitions. Coordinator should send new requests to fetch the
remaining partitions. Note that it's possible that table metadata
changes between the requests. Coordinator will detect the catalog
version changes and throws an InconsistentMetadataFetchException for the
planner to replan the query. This is an existing mechanism for other
kinds of table metadata.

Here are some metrics of the number of files in a single response and
the corresponding byte array size and duration of a single response:
 * 1000000: 371.71MB, 1s487ms
 * 2000000: 744.51MB, 4s035ms
 * 3000000: 1.09GB, 6s643ms
 * 4000000: 1.46GB, duration not measured due to GC pauses
 * 5000000: 1.82GB, duration not measured due to GC pauses
 * 6000000: >2GB (hit OOM)
Choose 1000000 as the default value for now. We can tune it in the
future.

Tests:
 - Added custom-cluster test
 - Ran e2e tests in local-catalog mode with
   catalog_partial_fetch_max_files=1000 so the new codes are used.

Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Reviewed-on: http://gerrit.cloudera.org:8080/22559
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
stiga-huang
2025-02-25 15:27:55 +08:00
committed by Impala Public Jenkins
parent b62de19c12
commit 4ddacac14f
8 changed files with 155 additions and 24 deletions

View File

@@ -323,4 +323,6 @@ struct TBackendGflags {
145: required bool catalogd_deployed
146: required string catalog_config_dir
147: required i32 catalog_partial_fetch_max_files
}