IMPALA-14224: Cleanup subdirectories in TRUNCATE

If an external table contains data files in subdirectories, and
recursive listing is enabled, Impala considers the files in the
subdirectories as part of the table. However, currently INSERT OVERWRITE
and TRUNCATE do not always delete these files, leading to data
corruption.

This change takes care of TRUNCATE.

Currently TRUNCATE can be run in two different ways:
 - if the table is being replicated, the HMS api is used
 - otherwise catalogd deletes the files itself.
Two differences between these methods are:
 - calling HMS leads to an ALTER_TABLE event
 - calling HMS leads to recursive delete while catalogd only
   deletes files directly in the partition/table directory.

This commit introduces the '--truncate_external_tables_with_hms' startup
flag, with default value 'true'. If this flag is set to true, Impala
always uses the HMS api for TRUNCATE operations.

Note that HMS always deletes stats on TRUNCATE, so setting the
DELETE_STATS_IN_TRUNCATE query option to false is not supported if
'--truncate_external_tables_with_hms' is set to true: an exception is
thrown.

Testing:
 - extended the tests in test_recursive_listing.py::TestRecursiveListing
   to include TRUNCATE
 - Moved tests with DELETE_STATS_IN_TRUNCATE=0 from truncate-table.test
   to truncate-table-no-delete-stats.test, which is run in a new custom
   cluster test (custom_cluster/test_no_delete_stats_in_truncate.py).

Change-Id: Ic0fcc6cf1eca8a0bcf2f93dbb61240da05e35519
Reviewed-on: http://gerrit.cloudera.org:8080/23166
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Csaba Ringhofer
2025-07-01 18:23:49 +02:00
committed by Impala Public Jenkins
parent c446291ccf
commit 9f12714d1c
11 changed files with 214 additions and 120 deletions

View File

@@ -349,4 +349,6 @@ struct TBackendGflags {
158: required string warmup_tables_config_file
159: required bool keeps_warmup_tables_loaded
160: required bool truncate_external_tables_with_hms
}