mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
125 lines
5.1 KiB
XML
125 lines
5.1 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="install">
|
|
|
|
<title><ph audience="standalone">Installing Impala</ph><ph audience="integrated">Impala Installation</ph></title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Installing"/>
|
|
<data name="Category" value="Administrators"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">installation</indexterm>
|
|
<indexterm audience="hidden">pseudo-distributed cluster</indexterm>
|
|
<indexterm audience="hidden">cluster</indexterm>
|
|
<indexterm audience="hidden">DataNodes</indexterm>
|
|
<indexterm audience="hidden">NameNode</indexterm>
|
|
<indexterm audience="hidden">Cloudera Manager</indexterm>
|
|
<indexterm audience="hidden">impalad</indexterm>
|
|
<indexterm audience="hidden">impala-shell</indexterm>
|
|
<indexterm audience="hidden">statestored</indexterm>
|
|
Impala is an open-source add-on to the Cloudera Enterprise Core that returns rapid responses to
|
|
queries.
|
|
</p>
|
|
|
|
<note>
|
|
<p>
|
|
Under CDH 5, Impala is included as part of the CDH installation and no separate steps are needed.
|
|
<ph audience="standalone">Therefore, the instruction steps in this section apply to CDH 4 only.</ph>
|
|
</p>
|
|
</note>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
</conbody>
|
|
|
|
<concept id="install_details">
|
|
|
|
<title>What is Included in an Impala Installation</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster.
|
|
The key installation step for performance is to install the <cmdname>impalad</cmdname> daemon (which does
|
|
most of the query processing work) on <i>all</i> DataNodes in the cluster.
|
|
</p>
|
|
|
|
<p>
|
|
The Impala package installs these binaries:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
<cmdname>impalad</cmdname> - The Impala daemon. Plans and executes queries against HDFS, HBase, <ph rev="2.2.0">and Amazon S3 data</ph>.
|
|
<xref href="impala_processes.xml#processes">Run one impalad process</xref> on each node in the cluster
|
|
that has a DataNode.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
<cmdname>statestored</cmdname> - Name service that tracks location and status of all
|
|
<codeph>impalad</codeph> instances in the cluster. <xref href="impala_processes.xml#processes">Run one
|
|
instance of this daemon</xref> on a node in your cluster. Most production deployments run this daemon
|
|
on the namenode.
|
|
</p>
|
|
</li>
|
|
|
|
<li rev="1.2">
|
|
<p>
|
|
<cmdname>catalogd</cmdname> - Metadata coordination service that broadcasts changes from Impala DDL and
|
|
DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are
|
|
immediately visible to queries submitted through any Impala node.
|
|
<!-- Consider removing this when 1.2 gets far in the past. -->
|
|
(Prior to Impala 1.2, you had to run the <codeph>REFRESH</codeph> or <codeph>INVALIDATE
|
|
METADATA</codeph> statement on each node to synchronize changed metadata. Now those statements are only
|
|
required if you perform the DDL or DML through an external mechanism such as Hive <ph rev="2.2.0">or by uploading
|
|
data to the Amazon S3 filesystem</ph>.)
|
|
<xref href="impala_processes.xml#processes">Run one instance of this daemon</xref> on a node in your cluster,
|
|
preferably on the same host as the <codeph>statestored</codeph> daemon.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
<cmdname>impala-shell</cmdname> - <xref href="impala_impala_shell.xml#impala_shell">Command-line
|
|
interface</xref> for issuing queries to the Impala daemon. You install this on one or more hosts
|
|
anywhere on your network, not necessarily DataNodes or even within the same cluster as Impala. It can
|
|
connect remotely to any instance of the Impala daemon.
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Before doing the installation, ensure that you have all necessary prerequisites. See
|
|
<xref href="impala_prereqs.xml#prereqs"/> for details.
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
</concept>
|