IMPALA-13725: Add Iceberg table repair functionalities

In some cases users delete files directly from storage without
going through the Iceberg API, e.g. they remove old partitions.

This corrupts the table, and makes queries that try to read the
missing files fail.
This change introduces a repair statement that deletes the
dangling references of missing files from the metadata.
Note that the table cannot be repaired if there are missing
delete files because Iceberg's DeleteFiles API which is used
to execute the operation allows removing only data files.

Testing:
 - E2E
   - HDFS
   - S3, Ozone
 - analysis

Change-Id: I514403acaa3b8c0a7b2581d676b82474d846d38e
Reviewed-on: http://gerrit.cloudera.org:8080/23512
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Noemi Pap-Takacs
2025-10-07 16:45:31 +02:00
parent 2ac5a24dc0
commit fdad9d3204
8 changed files with 283 additions and 10 deletions

View File

@@ -449,6 +449,10 @@ struct TAlterTableExecuteRemoveOrphanFilesParams {
1: required i64 older_than_millis
}
// Parameters for ALTER TABLE EXECUTE REPAIR_METADATA operations.
struct TAlterTableExecuteRepairMetadataParams {
}
// Parameters for ALTER TABLE EXECUTE ... operations.
struct TAlterTableExecuteParams {
// Parameters for ALTER TABLE EXECUTE EXPIRE_SNAPSHOTS
@@ -459,6 +463,9 @@ struct TAlterTableExecuteParams {
// Parameters for ALTER TABLE EXECUTE REMOVE_ORPHAN_FILES
3: optional TAlterTableExecuteRemoveOrphanFilesParams remove_orphan_files_params
// True iff it is an ALTER TABLE EXECUTE REPAIR statement.
4: optional TAlterTableExecuteRepairMetadataParams repair_metadata_params
}
// Parameters for all ALTER TABLE commands.