mirror of
https://github.com/apache/impala.git
synced 2026-01-07 09:02:19 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
198 lines
9.1 KiB
XML
198 lines
9.1 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept id="config_performance">
|
||
|
||
<title>Post-Installation Configuration for Impala</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Performance"/>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Configuring"/>
|
||
<data name="Category" value="Administrators"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p id="p_24">
|
||
This section describes the mandatory and recommended configuration settings for Impala. If Impala is
|
||
installed using Cloudera Manager, some of these configurations are completed automatically; you must still
|
||
configure short-circuit reads manually. If you installed Impala without Cloudera Manager, or if you want to
|
||
customize your environment, consider making the changes described in this topic.
|
||
</p>
|
||
|
||
<p>
|
||
<!-- Could conref this paragraph from ciiu_install.xml. -->
|
||
In some cases, depending on the level of Impala, CDH, and Cloudera Manager, you might need to add particular
|
||
component configuration details in one of the free-form fields on the Impala configuration pages within
|
||
Cloudera Manager. <ph conref="../shared/impala_common.xml#common/safety_valve"/>
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
You must enable short-circuit reads, whether or not Impala was installed through Cloudera Manager. This
|
||
setting goes in the Impala configuration settings, not the Hadoop-wide settings.
|
||
</li>
|
||
|
||
<li>
|
||
If you installed Impala in an environment that is not managed by Cloudera Manager, you must enable block
|
||
location tracking, and you can optionally enable native checksumming for optimal performance.
|
||
</li>
|
||
|
||
<li>
|
||
If you deployed Impala using Cloudera Manager see
|
||
<xref href="impala_perf_testing.xml#performance_testing"/> to confirm proper configuration.
|
||
</li>
|
||
</ul>
|
||
|
||
<section id="section_fhq_wyv_ls">
|
||
<title>Mandatory: Short-Circuit Reads</title>
|
||
<p> Enabling short-circuit reads allows Impala to read local data directly
|
||
from the file system. This removes the need to communicate through the
|
||
DataNodes, improving performance. This setting also minimizes the number
|
||
of additional copies of data. Short-circuit reads requires
|
||
<codeph>libhadoop.so</codeph>
|
||
<!-- This link went stale. Not obvious how to keep it in sync with whatever Hadoop CDH is using behind the scenes. So hide the link for now. -->
|
||
<!-- (the <xref href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html" scope="external" format="html">Hadoop Native Library</xref>) -->
|
||
(the Hadoop Native Library) to be accessible to both the server and the
|
||
client. <codeph>libhadoop.so</codeph> is not available if you have
|
||
installed from a tarball. You must install from an
|
||
<codeph>.rpm</codeph>, <codeph>.deb</codeph>, or parcel to use
|
||
short-circuit local reads. <note> If you use Cloudera Manager, you can
|
||
enable short-circuit reads through a checkbox in the user interface
|
||
and that setting takes effect for Impala as well. </note>
|
||
</p>
|
||
<p>
|
||
<b>To configure DataNodes for short-circuit reads:</b>
|
||
</p>
|
||
<ol id="ol_qlq_wyv_ls">
|
||
<li id="copy_config_files"> Copy the client
|
||
<codeph>core-site.xml</codeph> and <codeph>hdfs-site.xml</codeph>
|
||
configuration files from the Hadoop configuration directory to the
|
||
Impala configuration directory. The default Impala configuration
|
||
location is <codeph>/etc/impala/conf</codeph>. </li>
|
||
<li>
|
||
<indexterm audience="hidden"
|
||
>dfs.client.read.shortcircuit</indexterm>
|
||
<indexterm audience="hidden">dfs.domain.socket.path</indexterm>
|
||
<indexterm audience="hidden"
|
||
>dfs.client.file-block-storage-locations.timeout.millis</indexterm>
|
||
On all Impala nodes, configure the following properties in <!-- Exact timing is unclear, since we say farther down to copy /etc/hadoop/conf/hdfs-site.xml to /etc/impala/conf.
|
||
Which wouldn't work if we already modified the Impala version of the file here. Not to mention that this
|
||
doesn't take the CM interface into account, where these /etc files might not exist in those locations. -->
|
||
<!-- <codeph>/etc/impala/conf/hdfs-site.xml</codeph> as shown: -->
|
||
Impala's copy of <codeph>hdfs-site.xml</codeph> as shown: <codeblock><property>
|
||
<name>dfs.client.read.shortcircuit</name>
|
||
<value>true</value>
|
||
</property>
|
||
|
||
<property>
|
||
<name>dfs.domain.socket.path</name>
|
||
<value>/var/run/hdfs-sockets/dn</value>
|
||
</property>
|
||
|
||
<property>
|
||
<name>dfs.client.file-block-storage-locations.timeout.millis</name>
|
||
<value>10000</value>
|
||
</property></codeblock>
|
||
<!-- Former socket.path value: <value>/var/run/hadoop-hdfs/dn._PORT</value> -->
|
||
<!--
|
||
<note>
|
||
The text <codeph>_PORT</codeph> appears just as shown; you do not need to
|
||
substitute a number.
|
||
</note>
|
||
-->
|
||
</li>
|
||
<li>
|
||
<p> If <codeph>/var/run/hadoop-hdfs/</codeph> is group-writable, make
|
||
sure its group is <codeph>root</codeph>. </p>
|
||
<note> If you are also going to enable block location tracking, you
|
||
can skip copying configuration files and restarting DataNodes and go
|
||
straight to <xref href="#config_performance/block_location_tracking"
|
||
>Optional: Block Location Tracking</xref>.
|
||
Configuring short-circuit reads and block location tracking require
|
||
the same process of copying files and restarting services, so you
|
||
can complete that process once when you have completed all
|
||
configuration changes. Whether you copy files and restart services
|
||
now or during configuring block location tracking, short-circuit
|
||
reads are not enabled until you complete those final steps. </note>
|
||
</li>
|
||
<li id="restart_all_datanodes"> After applying these changes, restart
|
||
all DataNodes. </li>
|
||
</ol>
|
||
</section>
|
||
|
||
<section id="block_location_tracking">
|
||
|
||
<title>Mandatory: Block Location Tracking</title>
|
||
|
||
<p>
|
||
Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing
|
||
better utilization of the underlying disks. Impala will not start unless this setting is enabled.
|
||
</p>
|
||
|
||
<p>
|
||
<b>To enable block location tracking:</b>
|
||
</p>
|
||
|
||
<ol>
|
||
<li>
|
||
For each DataNode, adding the following to the <codeph>hdfs-site.xml</codeph> file:
|
||
<codeblock><property>
|
||
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
|
||
<value>true</value>
|
||
</property> </codeblock>
|
||
</li>
|
||
|
||
<li conref="#config_performance/copy_config_files"/>
|
||
|
||
<li conref="#config_performance/restart_all_datanodes"/>
|
||
</ol>
|
||
</section>
|
||
|
||
<section id="native_checksumming">
|
||
|
||
<title>Optional: Native Checksumming</title>
|
||
|
||
<p>
|
||
Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if
|
||
that library is available.
|
||
</p>
|
||
|
||
<p id="p_29">
|
||
<b>To enable native checksumming:</b>
|
||
</p>
|
||
|
||
<p>
|
||
If you installed CDH from packages, the native checksumming library is installed and setup correctly. In
|
||
such a case, no additional steps are required. Conversely, if you installed by other means, such as with
|
||
tarballs, native checksumming may not be available due to missing shared objects. Finding the message
|
||
"<codeph>Unable to load native-hadoop library for your platform... using builtin-java classes where
|
||
applicable</codeph>" in the Impala logs indicates native checksumming may be unavailable. To enable native
|
||
checksumming, you must build and install <codeph>libhadoop.so</codeph> (the
|
||
<!-- Another instance of stale link. -->
|
||
<!-- <xref href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html" scope="external" format="html">Hadoop Native Library</xref>). -->
|
||
Hadoop Native Library).
|
||
</p>
|
||
</section>
|
||
</conbody>
|
||
</concept>
|