mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-13410: Document reading Puffin files
IMPALA-13247 introduced support for reading Puffin files belonging to the current snapshot. This change documents it. Change-Id: Ib2975a67aadd948d9451f44a1c884349161c19d2 Reviewed-on: http://gerrit.cloudera.org:8080/21870 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
This commit is contained in:
@@ -57,6 +57,10 @@ under the License.
|
||||
<topicmeta><linktext>the Apache Iceberg site</linktext></topicmeta>
|
||||
</keydef>
|
||||
|
||||
<keydef href="https://iceberg.apache.org/puffin-spec" scope="external" format="html" keys="upstream_iceberg_puffin_site">
|
||||
<topicmeta><linktext>the Apache Iceberg Puffin site</linktext></topicmeta>
|
||||
</keydef>
|
||||
|
||||
<keydef href="https://ozone.apache.org" scope="external" format="html" keys="upstream_ozone_site">
|
||||
<topicmeta><linktext>the Apache Ozone site</linktext></topicmeta>
|
||||
</keydef>
|
||||
|
||||
@@ -857,6 +857,45 @@ ORDER BY made_current_at;
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
<concept id="iceberg_puffin_stats">
|
||||
<title>Iceberg Puffin statistics</title>
|
||||
<conbody>
|
||||
<p>
|
||||
Impala supports reading NDV (Number of Distinct Values) statistics from Puffin files.
|
||||
For the Puffin specification, see <xref keyref="upstream_iceberg_puffin_site"/>.
|
||||
</p>
|
||||
<p>
|
||||
Impala only reads Puffin stats when they are available for the current snapshot.
|
||||
Puffin files or blobs that were written for other snapshots than the current one
|
||||
are ignored. This behaviour is different from how Impala treats HMS stats, where
|
||||
older stats can also be used - see <xref keyref="perf_stats"/> for more.
|
||||
As this may be unintuitive for users, reading Puffin stats is disabled by default;
|
||||
set the "--disable_reading_puffin_stats" startup flag to false to enable it.
|
||||
</p>
|
||||
<p>
|
||||
When Puffin stats reading is enabled, the NDV values read from Puffin files take
|
||||
precedence over NDV values stored in the HMS. This is because we only read Puffin
|
||||
stats for the current snapshot, so these values are always up-to-date, while the
|
||||
values in the HMS may be stale.
|
||||
</p>
|
||||
<p>
|
||||
Note that it is currently not possible to drop Puffin stats from Impala.
|
||||
For this reason, it is possible to disable reading Puffin stats in two ways:
|
||||
<ul>
|
||||
<li>Globally, with the aforementioned
|
||||
<codeph>disable_reading_puffin_stats</codeph> startup flag - when it is set
|
||||
to true, Impala will never read Puffin stats.</li>
|
||||
<li>For specific tables, by setting the
|
||||
<codeph>impala.iceberg_disable_reading_puffin_stats</codeph> table property
|
||||
to "true".</li>
|
||||
</ul>
|
||||
</p>
|
||||
<p>
|
||||
Note that Impala does not yet support writing Puffin statistics files.
|
||||
</p>
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
<concept id="iceberg_table_cloning">
|
||||
<title>Cloning Iceberg tables (LIKE clause)</title>
|
||||
<conbody>
|
||||
|
||||
Reference in New Issue
Block a user