IMPALA-13887: Incorporate column/field information into cache key

The correctness verification for the tuple cache found an issue with TestParquet::test_resolution_by_name(). The test creates a table, selects, alters the table to change a column name, and selects again. With parquet_fallback_schema_resolution=NAME, the column names determine behavior. The tuple cache key did not include the column names, so it was producing an incorrect result after changing the column name. This change adds information about the column / field name to the TSlotDescriptor so that it is incorporated into the tuple cache key. This is only needed when producing the tuple cache key, so it is omitted for other cases. Testing: - Ran TestParquet::test_resolution_by_name() with correctness verification - Added custom cluster test that runs the test_resolution_by_name() test case with tuple caching. This fails without this change. Change-Id: Iebfa777452daf66851b86383651d35e1b0a5f262 Reviewed-on: http://gerrit.cloudera.org:8080/23073 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-19 18:12:08 -05:00 · 2025-06-15 10:05:56 -07:00
parent 50f01352aa
commit 7b25a7b070
4 changed files with 37 additions and 0 deletions
--- a/common/thrift/Descriptors.thrift
+++ b/common/thrift/Descriptors.thrift
@@ -52,6 +52,12 @@ struct TSlotDescriptor {
  9: required i32 slotIdx
  10: required CatalogObjects.TVirtualColumnType virtual_col_type =
      CatalogObjects.TVirtualColumnType.NONE
+  // The path includes column / field names materialized by a scan. This is set for
+  // producing the tuple cache key, because the names of columns / fields determine
+  // behavior when resolving Parquet columns/fields by name. This information is
+  // provided by other structures for the executor, so it only needs to be set for
+  // the tuple cache.
+  11: optional string path
 }

 struct TColumnDescriptor {