IMPALA-13588: Update Puffin reading doc after IMPALA-13370

IMPALA-13370 added support for reading Puffin NDV stats from the
metadata.json if the "NDV" property is available. This change updates
the docs accordingly.

Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a
Reviewed-on: http://gerrit.cloudera.org:8080/22140
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
This commit is contained in:
Daniel Becker
2024-11-28 12:20:15 +01:00
committed by Peter Rozsa
parent 907c1738a0
commit b49f45eacb

View File

@@ -879,6 +879,12 @@ ORDER BY made_current_at;
values in the HMS may be stale.
</p>
<p>
Some engines, e.g. Trino, also write the NDV as a property (with key "ndv") in the
"statistics" section of the metadata.json file for each blob, in addition to the
Puffin file. If such a property is present for a blob, Impala will read the value
from the metadata.json file instead of the Puffin file to reduce file I/O.
</p>
<p>
Note that it is currently not possible to drop Puffin stats from Impala.
For this reason, it is possible to disable reading Puffin stats in two ways:
<ul>