impala

mirror of https://github.com/apache/impala.git synced 2026-02-02 06:00:36 -05:00

Files

Noemi Pap-Takacs 2d3289027c IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.

Syntax: OPTIMIZE TABLE <table_name>;

This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Reviewed-on: http://gerrit.cloudera.org:8080/20405
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2023-09-28 15:31:20 +00:00

functional-planner

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 15:31:20 +00:00

functional-query

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 15:31:20 +00:00

perf-regression

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

targeted-perf

IMPALA-11937: Fix wrong GROUP BY ordinal in PERF_AGG-Q10

2023-03-20 12:48:03 +00:00

targeted-stress

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

tpcds

IMPALA-10798: Initial support for reading JSON files

2023-09-05 16:55:41 +00:00

tpcds-insert

IMPALA-10384: Make partition names consistent between BE and FE

2020-12-11 19:51:28 +00:00

tpcds-unmodified

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

tpch

IMPALA-10798: Initial support for reading JSON files

2023-09-05 16:55:41 +00:00

tpch_nested

IMPALA-9604: Add TPCH-nested tests for column masking

2020-06-17 06:54:50 +00:00

README

Move functional data loading to new framework + initial changes for workload directory structure

2014-01-08 10:44:18 -08:00

README

This directory contains Impala test workloads. The directory layout for the workloads should follow:

workloads/
   <data set name>/<data set name>_dimensions.csv  <- The test dimension file
   <data set name>/<data set name>_core.csv  <- A test vector file
   <data set name>/<data set name>_pairwise.csv
   <data set name>/<data set name>_exhaustive.csv
   <data set name>/queries/<query test>.test <- The queries for this workload