mirror of
https://github.com/apache/impala.git
synced 2026-01-18 06:00:37 -05:00
- Take 1: Let's review these docs before we go clean up many more. Change-Id: I1c91f7975c09dae9908591eeeac0d55e5355b2d4 Reviewed-on: http://gerrit.cloudera.org:8080/12400 Reviewed-by: Alex Rodoni <arodoni@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
201 lines
8.8 KiB
XML
201 lines
8.8 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="intro_components">
|
|
|
|
<title>Components of the Impala Server</title>
|
|
<titlealts audience="PDF"><navtitle>Components</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Concepts"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p> The Impala server is a distributed, massively parallel processing (MPP)
|
|
database engine. It consists of different daemon processes that run on
|
|
specific hosts within your cluster. </p>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
|
|
<concept id="intro_impalad">
|
|
|
|
<title>The Impala Daemon</title>
|
|
|
|
<conbody>
|
|
|
|
<p> The core Impala component is the Impala daemon, physically represented
|
|
by the <codeph>impalad</codeph> process. A few of the key functions that
|
|
an Impala daemon performs are:<ul>
|
|
<li>Reads and writes to data files.</li>
|
|
<li>Accepts queries transmitted from the <codeph>impala-shell</codeph>
|
|
command, Hue, JDBC, or ODBC.</li>
|
|
<li>Parallelizes the queries and distributes work across the
|
|
cluster.</li>
|
|
<li>Transmits intermediate query results back to the central
|
|
coordinator. </li>
|
|
</ul></p>
|
|
<p>Impala daemons can be deployed in one of the following ways:<ul>
|
|
<li>HDFS and Impala are co-located, and each Impala daemon runs on the
|
|
same host as a DataNode.</li>
|
|
<li>Impala is deployed separately in a compute cluster and reads
|
|
remotely from HDFS, S3, ADLS, etc.</li>
|
|
</ul></p>
|
|
|
|
<p> The Impala daemons are in constant communication with StateStore, to
|
|
confirm which daemons are healthy and can accept new work. </p>
|
|
|
|
<p rev="1.2"> They also receive broadcast messages from the
|
|
<cmdname>catalogd</cmdname> daemon (introduced in Impala 1.2) whenever
|
|
any Impala daemon in the cluster creates, alters, or drops any type of
|
|
object, or when an <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
|
|
statement is processed through Impala. This background communication
|
|
minimizes the need for <codeph>REFRESH</codeph> or <codeph>INVALIDATE
|
|
METADATA</codeph> statements that were needed to coordinate metadata
|
|
across Impala daemons prior to Impala 1.2. </p>
|
|
|
|
<p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503">
|
|
In <keyword keyref="impala29_full"/> and higher, you can control which hosts act as query coordinators
|
|
and which act as query executors, to improve scalability for highly concurrent workloads on large clusters.
|
|
See <xref keyref="scalability_coordinator"/> for details.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Related information:</b> <xref href="impala_config_options.xml#config_options"/>,
|
|
<xref href="impala_processes.xml#processes"/>, <xref href="impala_timeouts.xml#impalad_timeout"/>,
|
|
<xref href="impala_ports.xml#ports"/>, <xref href="impala_proxy.xml#proxy"/>
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="intro_statestore">
|
|
|
|
<title>The Impala Statestore</title>
|
|
|
|
<conbody>
|
|
|
|
<p> The Impala component known as the StateStore checks on the health of
|
|
all Impala daemons in a cluster, and continuously relays its findings to
|
|
each of those daemons. It is physically represented by a daemon process
|
|
named <codeph>statestored</codeph>. You only need such a process on one
|
|
host in a cluster. If an Impala daemon goes offline due to hardware
|
|
failure, network error, software issue, or other reason, the StateStore
|
|
informs all the other Impala daemons so that future queries can avoid
|
|
making requests to the unreachable Impala daemon. </p>
|
|
|
|
<p> Because the StateStore's purpose is to help when things go wrong and
|
|
to broadcast metadata to coordinators, it is not always critical to the
|
|
normal operation of an Impala cluster. If the StateStore is not running
|
|
or becomes unreachable, the Impala daemons continue running and
|
|
distributing work among themselves as usual when working with the data
|
|
known to Impala. The cluster just becomes less robust if other Impala
|
|
daemons fail, and metadata becomes less consistent as it changes while
|
|
the StateStore is offline. When the StateStore comes back online, it
|
|
re-establishes communication with the Impala daemons and resumes its
|
|
monitoring and broadcasting functions. </p>
|
|
|
|
<p> If you issue a DDL statement while the StateStore is down, the queries
|
|
that access the new object the DDL created will fail. </p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
|
|
|
|
<p>
|
|
<b>Related information:</b>
|
|
</p>
|
|
|
|
<p>
|
|
<xref href="impala_scalability.xml#statestore_scalability"/>,
|
|
<xref href="impala_config_options.xml#config_options"/>, <xref href="impala_processes.xml#processes"/>,
|
|
<xref href="impala_timeouts.xml#statestore_timeout"/>, <xref href="impala_ports.xml#ports"/>
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept rev="1.2" id="intro_catalogd">
|
|
|
|
<title>The Impala Catalog Service</title>
|
|
|
|
<conbody>
|
|
|
|
<p> The Impala component known as the Catalog Service relays the metadata
|
|
changes from Impala SQL statements to all the Impala daemons in a
|
|
cluster. It is physically represented by a daemon process named
|
|
<codeph>catalogd</codeph>. You only need such a process on one host in
|
|
a cluster. Because the requests are passed through the StateStore
|
|
daemon, it makes sense to run the <cmdname>statestored</cmdname> and
|
|
<cmdname>catalogd</cmdname> services on the same host. </p>
|
|
|
|
<p> The catalog service avoids the need to issue <codeph>REFRESH</codeph>
|
|
and <codeph>INVALIDATE METADATA</codeph> statements when the metadata
|
|
changes are performed by statements issued through Impala. When you
|
|
create a table, load data, and so on through Hive, you do need to issue
|
|
<codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph> on an
|
|
Impala daemon before executing a query there. </p>
|
|
|
|
<p>
|
|
This feature touches a number of aspects of Impala:
|
|
</p>
|
|
|
|
<ul id="catalogd_xrefs">
|
|
<li>
|
|
<p>
|
|
See <xref href="impala_install.xml#install"/>, <xref href="impala_upgrading.xml#upgrading"/> and
|
|
<xref href="impala_processes.xml#processes"/>, for usage information for the
|
|
<cmdname>catalogd</cmdname> daemon.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p> The <codeph>REFRESH</codeph> and <codeph>INVALIDATE
|
|
METADATA</codeph> statements are not needed when the
|
|
<codeph>CREATE TABLE</codeph>, <codeph>INSERT</codeph>, or other
|
|
table-changing or data-changing operation is performed through
|
|
Impala. These statements are still needed if such operations are
|
|
done through Hive or by manipulating data files directly in HDFS,
|
|
but in those cases the statements only need to be issued on one
|
|
Impala daemon rather than on all daemons. See <xref
|
|
href="impala_refresh.xml#refresh"/> and <xref
|
|
href="impala_invalidate_metadata.xml#invalidate_metadata"/> for
|
|
the latest usage information for those statements. </p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/load_catalog_in_background"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
|
|
|
|
<note>
|
|
<p conref="../shared/impala_common.xml#common/catalog_server_124"/>
|
|
</note>
|
|
|
|
<p>
|
|
<b>Related information:</b> <xref href="impala_config_options.xml#config_options"/>,
|
|
<xref href="impala_processes.xml#processes"/>, <xref href="impala_ports.xml#ports"/>
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
</concept>
|