Files
impala/testdata/workloads/functional-query/queries/QueryTest
Gergely Fürnstáhl d0fe4c604f IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Testing:
  - Duplicated related planner tests to run both with new operator and
hash join
  - Added a dimension for e2e tests to run both with new operator and
hash join
  - Added new multiblock tests to verify assumptions used in new
operator to optimize probing
  - Added new test with BATCH_SIZE=2 to verify in/out batch handling
with new operator

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Reviewed-on: http://gerrit.cloudera.org:8080/19850
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-05 20:32:23 +00:00
..
2021-07-06 18:35:30 +00:00