diff --git a/docs/impala.ditamap b/docs/impala.ditamap index c050ff04c..682ae6491 100644 --- a/docs/impala.ditamap +++ b/docs/impala.ditamap @@ -157,6 +157,7 @@ under the License. + @@ -304,7 +305,6 @@ under the License. - diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml index e82dc8a0d..37aea6a32 100644 --- a/docs/topics/impala_iceberg.xml +++ b/docs/topics/impala_iceberg.xml @@ -550,7 +550,7 @@ UPDATE ice_t SET ice_t.k = o.k, ice_t.j = o.j, FROM ice_t, other_table o where i
  • Updating partitioning column with non-constant expression via the UPDATE FROM statement is not allowed. - The upcoming MERGE statement will not have this limitation. + This limitation could be eliminated by using a MERGE statement.
  • @@ -560,6 +560,34 @@ UPDATE ice_t SET ice_t.k = o.k, ice_t.j = o.j, FROM ice_t, other_table o where i + + Merging data into Iceberg tables + +

    + Impala can execute MERGE statements against Iceberg tables, e.g: + +MERGE INTO ice_t USING source ON ice_t.a = source.id WHEN NOT MATCHED THEN INSERT VALUES(id, source.column1); +MERGE INTO ice_t USING source ON ice_t.a = source.id WHEN MATCHED THEN DELETE; +MERGE INTO ice_t USING source ON ice_t.a = source.id WHEN MATCHED THEN UPDATE SET b = source.b; +MERGE INTO ice_t USING source ON ice_t.a = source.id + WHEN MATCHED AND ice_t.a < 100 THEN UPDATE SET b = source.b + WHEN MATCHED THEN DELETE + WHEN NOT MATCHED THEN INSERT VALUES(id, source.column1); + +

    +

    + The limitations of the UPDATE statement also apply to the MERGE statement; in addition, + the limitations of the MERGE statement: +

      +
    • Subqueries in source statements must be simple queries as internal rewrite is not supported.
    • +
    +

    +

    + More information about the MERGE statement can be found at . +

    +
    +
    + Loading data into Iceberg tables diff --git a/docs/topics/impala_merge.xml b/docs/topics/impala_merge.xml new file mode 100644 index 000000000..1afd47147 --- /dev/null +++ b/docs/topics/impala_merge.xml @@ -0,0 +1,113 @@ + + + + + + MERGE Statement + MERGE + + + + + + + + + + + + + + + + +

    + MERGE statement + The MERGE statement enables conditional updates, deletes, and inserts, based on the result of a join + between a target and a source table. This operation is useful for applying data changes from transactional systems to + analytic data warehouses by merging data from two tables with similar structures. +

    + +

    + The MERGE statement supports multiple WHEN clauses, where each clause can specify + actions like UPDATE, DELETE, or INSERT. Actions are applied based + on the join conditions defined between the source and target tables. +

    +

    + +MERGE INTO target_table [AS target_alias] +USING source_expr [AS source_alias] +ON search_condition +[WHEN MATCHED [AND search_condition] THEN + UPDATE SET column1 = expression1, column2 = expression2, ... ] +[WHEN MATCHED [AND search_condition] THEN DELETE] +[WHEN NOT MATCHED [AND search_condition] THEN + INSERT (column1, column2, ...) VALUES (expression1, expression2, ...)] + +

    + The WHEN MATCHED clause is executed if a row from the source table matches a row in the target table, + based on the ON condition. Within this clause, you can either UPDATE specific + columns or DELETE the matched rows. Multiple WHEN MATCHED clauses can be provided, + each with a different condition. +

    + +

    + The WHEN NOT MATCHED clause is executed if a row from the source table has no matching row in the + target table. This clause typically inserts new rows into the target table. +

    + +
      +
    • UPDATE: Updates specified columns of the target table for matching rows. Both source and target + fields can be used in the update expressions.
    • +
    • DELETE: Deletes the matching rows from the target table.
    • +
    • INSERT: Inserts new rows into the target table when no match is found, using values from the source table.
    • +
    + +

    + The ON clause defines the join condition between the target table and source expression, typically based + on primary key or unique identifier columns. The MERGE operation evaluates the conditions in the order + of the WHEN clauses, executing the first matching action and discarding subsequent clauses. +

    + + +

    + + +MERGE INTO customers AS c +USING updates AS u +ON u.customer_id = c.customer_id +WHEN MATCHED AND c.status != 'inactive' THEN + UPDATE SET c.name = u.name, c.email = u.email +WHEN MATCHED THEN DELETE +WHEN NOT MATCHED THEN + INSERT (customer_id, name, email, status) VALUES (u.customer_id, u.name, u.email, 'active'); +

    + In this example, the MERGE operation updates customer information where IDs match and the customer + is not inactive, deletes inactive customers, and inserts new customers from the source table if no match is found. +

    +

    + The MERGE statement is only supported for Iceberg tables. +

    +

    + For Iceberg tables, this operation generally uses a full outer join with the STRAIGHT_JOIN hint + to combine the target and source datasets. +

    +
    +