Files
impala/testdata/datasets/functional
stiga-huang 77d80aeda6 IMPALA-11812: Deduplicate column schema in hmsPartitions
A list of HMS Partitions will be created in many workloads in catalogd,
e.g. table loading, bulk altering partitions by ComputeStats or
AlterTableRecoverPartitions, etc. Currently, each of hmsPartition hold a
unique list of column schema, i.e. a List<FieldSchema>. This results in
lots of FieldSchema instances if the table is wide and lots of
partitions need to be loaded/operated. Though the strings of column
names and comments are interned, the FieldSchema objects could still
occupy the majority of the heap. See the histogram in JIRA description.

In reality, the hmsPartition instances of a table can share the
table-level column schema since Impala doesn't respect the partition
level schema.

This patch replaces column list in StorageDescriptor of hmsPartitions
with the table level column list to remove the duplications. Also add
some progress logs in batch HMS operations, and avoid misleading logs
when event-processor is disabled.

Tests:
- Ran exhaustive tests
- Add tests on wide table operations that hit OOM errors without this
  fix.

Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d
Reviewed-on: http://gerrit.cloudera.org:8080/19391
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-01-01 04:38:36 +00:00
..