IMPALA-13000: Document OPTIMIZE TABLE

Document OPTIMIZE TABLE syntax and behaviour.

Testing:
 - built docs locally

Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Reviewed-on: http://gerrit.cloudera.org:8080/21320
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
This commit is contained in:
Noemi Pap-Takacs
2024-04-17 15:04:54 +02:00
committed by Zoltan Borok-Nagy
parent 93278cccf0
commit 9b05a205fe

View File

@@ -546,6 +546,53 @@ UPDATE ice_t SET ice_t.k = o.k, ice_t.j = o.j, FROM ice_t, other_table o where i
</conbody>
</concept>
<concept id="iceberg_optimize_table">
<title>Optimizing (Compacting) Iceberg tables</title>
<conbody>
<p>
Frequent updates and row-level modifications on Iceberg tables can write many small
data files and delete files, which have to be merged-on-read.
This causes read performance to degrade over time.
The following statement can be used to compact the table and optimize it for reading.
<codeblock>
OPTIMIZE TABLE [<varname>db_name</varname>.]<varname>table_name</varname>;
</codeblock>
</p>
<p>
The current implementation of the <codeph>OPTIMIZE TABLE</codeph> statement rewrites
the entire table, executing the following tasks:
<ul>
<li>compact small files</li>
<li>merge delete and update deltas</li>
<li>rewrite all files, converting them to the latest table schema</li>
<li>rewrite all partitions according to the latest partition spec</li>
</ul>
</p>
<p>
To execute table optimization:
<ul>
<li>The user needs ALL privileges on the table.</li>
<li>The table can conatin any file formats that Impala can read, but <codeph>write.format.default</codeph>
has to be <codeph>parquet</codeph>.</li>
<li>The table cannot contain complex types.</li>
</ul>
</p>
<p>
When a table is optimized, a new snapshot is created. The old table state is still
accessible by time travel to previous snapshots, because the rewritten data and
delete files are not removed physically.
</p>
<p>
Note that the current implementation of <codeph>OPTIMIZE TABLE</codeph> rewrites
the entire table, therefore this operation can take a long time to complete
depending on the size of the table.
</p>
</conbody>
</concept>
<concept id="iceberg_time_travel">
<title>Time travel for Iceberg tables</title>
<conbody>