mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations
We must invoke validateDataFilesExist for RowDelta operations (DELETE/ UPDATE/MERGE). Without this a concurrent RewriteFiles (compaction) and RowDelta can corrupt a table. IcebergBufferedDeleteSink now also collects the filenames of the data files that are referenced in the position delete files. It adds them to the DML exec state which is then collected by the Coordinator. The Coordinator passes the file paths to CatalogD which executes Iceberg's RowDelta operation and now invokes validateDataFilesExist() with the file paths. Additionally it also invokes validateDeletedFiles(). This patch set also resolves IMPALA-12640 which is about replacing IcebergDeleteSink with IcebergBufferedDeleteSink, as from now on we use the buffered version for all DML operations that write position delete files. Testing: * adds new stress test with DELETE + UPDATE + OPTIMIZE Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c Reviewed-on: http://gerrit.cloudera.org:8080/21099 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
This commit is contained in:
@@ -109,6 +109,10 @@ message DmlExecStatusPB {
|
||||
// root's key in an unpartitioned table being ROOT_PARTITION_KEY.
|
||||
// The target table name is recorded in the corresponding TQueryExecRequest
|
||||
map<string, DmlPartitionStatusPB> per_partition_status = 1;
|
||||
|
||||
// In case of Iceberg modify statements it contains the data files referenced
|
||||
// by position delete records.
|
||||
repeated string data_files_referenced_by_position_deletes = 2;
|
||||
}
|
||||
|
||||
// Error message exchange format
|
||||
|
||||
Reference in New Issue
Block a user