Files
impala/docs/topics/impala_dml.xml
Jim Apple d484d2f684 Add Apache license header to files in doc directory
This now gives a clean RAT check with bin/check-rat-report.py, which
is one way for the Impala community to check compliance with ASF rules
on intellectual property.

Change-Id: I2ad06435f84a65ba126759e42a18fdaf52cd7036
Reviewed-on: http://gerrit.cloudera.org:8080/5232
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
Reviewed-by: John Russell <jrussell@cloudera.com>
2016-12-02 23:54:32 +00:00

110 lines
4.4 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="dml">
<title>DML Statements</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="SQL"/>
<data name="Category" value="DML"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Tables"/>
<data name="Category" value="ETL"/>
<data name="Category" value="Ingest"/>
</metadata>
</prolog>
<conbody>
<p>
DML refers to <q>Data Manipulation Language</q>, a subset of SQL statements that modify the data stored in
tables. Because Impala focuses on query performance and leverages the append-only nature of HDFS storage,
currently Impala only supports a small set of DML statements:
</p>
<ul>
<li>
<xref keyref="delete"/>. Works for Kudu tables only.
</li>
<li>
<xref keyref="insert"/>.
</li>
<li>
<xref keyref="load_data"/>. Does not apply for HBase or Kudu tables.
</li>
<li>
<xref keyref="update"/>. Works for Kudu tables only.
</li>
<li>
<xref keyref="upsert"/>. Works for Kudu tables only.
</li>
</ul>
<p>
<codeph>INSERT</codeph> in Impala is primarily optimized for inserting large volumes of data in a single
statement, to make effective use of the multi-megabyte HDFS blocks. This is the way in Impala to create new
data files. If you intend to insert one or a few rows at a time, such as using the <codeph>INSERT ...
VALUES</codeph> syntax, that technique is much more efficient for Impala tables stored in HBase. See
<xref href="impala_hbase.xml#impala_hbase"/> for details.
</p>
<p>
<codeph>LOAD DATA</codeph> moves existing data files into the directory for an Impala table, making them
immediately available for Impala queries. This is one way in Impala to work with data files produced by other
Hadoop components. (<codeph>CREATE EXTERNAL TABLE</codeph> is the other alternative; with external tables,
you can query existing data files, while the files remain in their original location.)
</p>
<p>
In <keyword keyref="impala28_full"/> and higher, Impala does support the <codeph>UPDATE</codeph>, <codeph>DELETE</codeph>,
and <codeph>UPSERT</codeph> statements for Kudu tables.
For HDFS or S3 tables, to simulate the effects of an <codeph>UPDATE</codeph> or <codeph>DELETE</codeph> statement
in other database systems, typically you use <codeph>INSERT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph> to copy data
from one table to another, filtering out or changing the appropriate rows during the copy operation.
</p>
<p>
You can also achieve a result similar to <codeph>UPDATE</codeph> by using Impala tables stored in HBase.
When you insert a row into an HBase table, and the table
already contains a row with the same value for the key column, the older row is hidden, effectively the same
as a single-row <codeph>UPDATE</codeph>.
</p>
<p rev="2.6.0">
Impala can perform DML operations for tables or partitions stored in the Amazon S3 filesystem
with <keyword keyref="impala26_full"/> and higher. See <xref href="impala_s3.xml#s3"/> for details.
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
The other major classifications of SQL statements are data definition language (see
<xref href="impala_ddl.xml#ddl"/>) and queries (see <xref href="impala_select.xml#select"/>).
</p>
</conbody>
</concept>