mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Currently we just support REFRESH on the whole table or a specific partition: REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])] If users want to refresh multiple partitions, they have to submit multiple statements each for a single partition. This has some drawbacks: - It requires holding the table write lock inside catalogd multiple times, which increase lock contention with other read/write operations on the same table, e.g. getPartialCatalogObject requests from coordinators. - Catalog version of the table will be increased multiple times. Coordinators in local catalog mode is more likely to see different versions between their getPartialCatalogObject requests so have to retry planning to resolve InconsistentMetadataFetchException. - Partitions are reloaded in sequence. They should be reloaded in parallel like we do in refreshing the whole table. This patch extends the syntax to refresh multiple partitions in one statement: REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...]) [PARTITION (key_col1=val3 [, key_col2=val4...])...]] Example: REFRESH foo PARTITION(p=0) PARTITION(p=1) PARTITION(p=2); TResetMetadataRequest is extended to have a list of partition specs for this. If the list has only one item, we still use the existing logic of reloading a specific partition. If the list has more than one item, partitions will be reloaded in parallel. This is implemented in CatalogServiceCatalog#reloadTable(). Previously it always invokes HdfsTable#load() with partitionsToUpdate=null. Now the parameter is set when TResetMetadataRequest has the partition list. HMS notification events in RELOAD type will be fired for each partition if enable_reload_events is turned on. Once HIVE-28967 is resolved, we can fire a single event for multiple partitions. Updated docs in impala_refresh.xml. Tests: - Added FE and e2e tests Change-Id: Ie5b0deeaf23129ed6e1ba2817f54291d7f63d04e Reviewed-on: http://gerrit.cloudera.org:8080/22938 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
235 lines
8.2 KiB
XML
235 lines
8.2 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="refresh">
|
|
|
|
<title>REFRESH Statement</title>
|
|
|
|
<titlealts audience="PDF">
|
|
|
|
<navtitle>REFRESH</navtitle>
|
|
|
|
</titlealts>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="DDL"/>
|
|
<data name="Category" value="Tables"/>
|
|
<data name="Category" value="Hive"/>
|
|
<data name="Category" value="Metastore"/>
|
|
<data name="Category" value="ETL"/>
|
|
<data name="Category" value="Ingest"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The <codeph>REFRESH</codeph> statement reloads the metadata for the table from the
|
|
metastore database and does an incremental reload of the file and block metadata from the
|
|
HDFS NameNode. <codeph>REFRESH</codeph> is used to avoid inconsistencies between Impala
|
|
and external metadata sources, namely Hive Metastore (HMS) and NameNodes.
|
|
</p>
|
|
|
|
<p> The <codeph>REFRESH</codeph> statement is only required if you load data
|
|
from outside of Impala. Updated metadata, as a result of running
|
|
<codeph>REFRESH</codeph>, is broadcast to all Impala coordinators. </p>
|
|
|
|
<p>
|
|
See <xref href="impala_hadoop.xml#intro_metastore"/> for the information about the way
|
|
Impala uses metadata and how it shares the same metastore database as Hive.
|
|
</p>
|
|
|
|
<p>
|
|
Once issued, the <codeph>REFRESH</codeph> statement cannot be cancelled.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<codeblock rev="IMPALA-1683">REFRESH [<varname>db_name</varname>.]<varname>table_name</varname>
|
|
[PARTITION (<varname>key_col1</varname>=<varname>val1</varname> [, <varname>key_col2</varname>=<varname>val2</varname>...])
|
|
[PARTITION (<varname>key_col1</varname>=<varname>val3</varname> [, <varname>key_col2</varname>=<varname>val4</varname>...])...]</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
The table name is a required parameter, and the table must already exist and be known to
|
|
Impala.
|
|
</p>
|
|
|
|
<p>
|
|
Only the metadata for the specified table is reloaded.
|
|
</p>
|
|
|
|
<p>
|
|
Use the <codeph>REFRESH</codeph> statement to load the latest metastore metadata for a
|
|
particular table after one of the following scenarios happens outside of Impala:
|
|
</p>
|
|
|
|
<ul>
|
|
<li> Deleting, adding, or modifying files. <p> For example, after loading
|
|
new data files into the HDFS data directory for the table, appending
|
|
to an existing HDFS file, inserting data from Hive via
|
|
<codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>. </p>
|
|
</li>
|
|
|
|
<li>
|
|
Deleting, adding, or modifying partitions.
|
|
<p>
|
|
For example, after issuing <codeph>ALTER TABLE</codeph> or other table-modifying SQL
|
|
statement in Hive
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<note rev="2.3.0">
|
|
<p rev="2.3.0">
|
|
In <keyword keyref="impala23_full"/> and higher, the <codeph>ALTER TABLE
|
|
<varname>table_name</varname> RECOVER PARTITIONS</codeph> statement is a faster
|
|
alternative to <codeph>REFRESH</codeph> when you are only adding new partition
|
|
directories through Hive or manual HDFS operations. See
|
|
<xref
|
|
href="impala_alter_table.xml#alter_table"/> for details.
|
|
</p>
|
|
</note>
|
|
|
|
<p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
|
|
|
|
<p rev="IMPALA-1683">
|
|
<b>Refreshing specific partitions:</b>
|
|
</p>
|
|
|
|
<p rev="IMPALA-1683">
|
|
In <keyword keyref="impala27_full"/> and higher, the <codeph>REFRESH</codeph> statement
|
|
can apply to a single partition at a time, rather than the whole table. Include the
|
|
optional <codeph>PARTITION (<varname>partition_spec</varname>)</codeph> clause and specify
|
|
values for each of the partition key columns.
|
|
</p>
|
|
|
|
<p>
|
|
In <keyword keyref="impala50_full"/> and higher, the <codeph>REFRESH</codeph> statement
|
|
can apply to multiple partitions at a time, rather than a single partition. Use the
|
|
optional <codeph>PARTITION (<varname>partition_spec</varname>)</codeph> clause for each
|
|
each of the partition.
|
|
</p>
|
|
|
|
<p>
|
|
The following rules apply:
|
|
<ul>
|
|
<li>
|
|
The <codeph>PARTITION</codeph> clause of the <codeph>REFRESH</codeph> statement must
|
|
include all the partition key columns.
|
|
</li>
|
|
|
|
<li>
|
|
The order of the partition key columns does not have to match the column order in the
|
|
table.
|
|
</li>
|
|
|
|
<li>
|
|
Specifying a nonexistent partition does not cause an error.
|
|
</li>
|
|
|
|
<li>
|
|
The partition can be one that Impala created and is already aware of, or a new
|
|
partition created through Hive.
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p rev="IMPALA-1683">
|
|
The following examples demonstrates the above rules.
|
|
</p>
|
|
|
|
<codeblock rev="IMPALA-1683"><![CDATA[
|
|
-- Partition doesn't exist.
|
|
refresh p2 partition (y=0, z=3);
|
|
refresh p2 partition (y=0, z=-1)
|
|
|
|
-- Key columns specified in a different order than the table definition.
|
|
refresh p2 partition (z=1, y=0)
|
|
|
|
-- Incomplete partition spec causes an error.
|
|
refresh p2 partition (y=0)
|
|
ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
|
|
|
|
-- Refresh multiple partitions.
|
|
refresh p2 partition (y=0, z=3) partition (y=1, z=0) partition (y=1, z=2);
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
For examples of using <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
|
|
with a combination of Impala and Hive operations, see
|
|
<xref href="impala_tutorial.xml#tutorial_impala_hive"/>.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Related impala-shell options:</b>
|
|
</p>
|
|
|
|
<p rev="1.1">
|
|
Due to the expense of reloading the metadata for all tables, the
|
|
<cmdname>impala-shell</cmdname> <codeph>-r</codeph> option is not recommended.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
|
|
|
|
<p rev="IMPALA-1683">
|
|
All HDFS and Ranger permissions and privilege requirements are the same whether you
|
|
refresh the entire table or a single partition.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
|
|
|
|
<p>
|
|
The <codeph>REFRESH</codeph> statement checks HDFS permissions of the underlying data
|
|
files and directories, caching this information so that a statement can be cancelled
|
|
immediately if for example the <codeph>impala</codeph> user does not have permission to
|
|
write to the data directory for the table. Impala reports any lack of write permissions as
|
|
an <codeph>INFO</codeph> message in the log file.
|
|
</p>
|
|
|
|
<p>
|
|
If you change HDFS permissions to make data readable or writeable by the Impala user,
|
|
issue another <codeph>REFRESH</codeph> to make Impala aware of the change.
|
|
</p>
|
|
|
|
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
<xref href="impala_hadoop.xml#intro_metastore"/>,
|
|
<xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
|
|
</p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|