mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-14261: Take 'impala.computeStatsSnapshotId' into account when deciding between Puffin and HMS stats
Since IMPALA-13609, Impala writes snapshot information for each column on COMPUTE STATS for Iceberg tables (see there for why it is useful), but this information has so far been ignored. After this change, snapshot information is used when deciding which of HMS and Puffin NDV stats should be used (i.e. which is more recent). This test also modifies the IcebergUtil.ComputeStatsSnapshotPropertyConverter class: previously Iceberg fieldIds were stored as Long, but now they are stored as Integer, in accordance with the Iceberg spec. Documentation: - updated the docs about Puffin stats in docs/topics/impala_iceberg.xml Testing: - modified existing tests to fit the new decision mechanism Change-Id: I95a5b152dd504e94dea368a107d412e33f67930c Reviewed-on: http://gerrit.cloudera.org:8080/23251 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Daniel Becker <daniel.becker@cloudera.com>
This commit is contained in:
committed by
Daniel Becker
parent
a68f716458
commit
19c12e0e06
@@ -896,10 +896,11 @@ ORDER BY made_current_at;
|
||||
come from different snapshots.
|
||||
</p>
|
||||
<p>
|
||||
In case there are both HMS and Puffin stats for a column, the more recent one will
|
||||
be used - for HMS stats we use the 'impala.lastComputeStatsTime' table property, and
|
||||
for Puffin stats we use the snapshot timestamp to determine which one is more
|
||||
recent.
|
||||
In case there are both HMS and Puffin NDV stats for a column, the more recent one
|
||||
will be used. For HMS stats we use the 'impala.computeStatsSnapshotId' table
|
||||
property which stores, for each column, the snapshot for which HMS stats were
|
||||
calculated. We compare this with the snapshot of the Puffin stats to decide which
|
||||
is more recent.
|
||||
</p>
|
||||
<p>
|
||||
Reading Puffin stats is disabled by default; set the "--enable_reading_puffin_stats"
|
||||
|
||||
Reference in New Issue
Block a user