Files
impala/docs/topics/impala_config_performance.xml
John Russell 8377b9949c Global search/replace: audience="Cloudera" -> audience="hidden".
For this change to land in master, the audience="hidden" code review
needs to be completed first. Otherwise, the doc build would still work
but the audience="hidden" content would be visible rather than hidden as
desired.

Some work happening in parallel might introduce additional instances of
audience="Cloudera". I suggest addressing those in a followup CR so this
global change can land quickly.

Since the changes apply across so many different files, but are so
narrow in scope, I suggest that the way to validate (check that no
extraneous changes were introduced accidentally) is to diff just the
changed lines:

git diff -U0 HEAD^ HEAD

In patch set 2, I updated other topics marked audience="Cloudera"
by CRs that were pushed in the meantime.

Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8
Reviewed-on: http://gerrit.cloudera.org:8080/5613
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-18 19:31:57 +00:00

198 lines
9.1 KiB
XML
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="config_performance">
<title>Post-Installation Configuration for Impala</title>
<prolog>
<metadata>
<data name="Category" value="Performance"/>
<data name="Category" value="Impala"/>
<data name="Category" value="Configuring"/>
<data name="Category" value="Administrators"/>
</metadata>
</prolog>
<conbody>
<p id="p_24">
This section describes the mandatory and recommended configuration settings for Impala. If Impala is
installed using Cloudera Manager, some of these configurations are completed automatically; you must still
configure short-circuit reads manually. If you installed Impala without Cloudera Manager, or if you want to
customize your environment, consider making the changes described in this topic.
</p>
<p>
<!-- Could conref this paragraph from ciiu_install.xml. -->
In some cases, depending on the level of Impala, CDH, and Cloudera Manager, you might need to add particular
component configuration details in one of the free-form fields on the Impala configuration pages within
Cloudera Manager. <ph conref="../shared/impala_common.xml#common/safety_valve"/>
</p>
<ul>
<li>
You must enable short-circuit reads, whether or not Impala was installed through Cloudera Manager. This
setting goes in the Impala configuration settings, not the Hadoop-wide settings.
</li>
<li>
If you installed Impala in an environment that is not managed by Cloudera Manager, you must enable block
location tracking, and you can optionally enable native checksumming for optimal performance.
</li>
<li>
If you deployed Impala using Cloudera Manager see
<xref href="impala_perf_testing.xml#performance_testing"/> to confirm proper configuration.
</li>
</ul>
<section id="section_fhq_wyv_ls">
<title>Mandatory: Short-Circuit Reads</title>
<p> Enabling short-circuit reads allows Impala to read local data directly
from the file system. This removes the need to communicate through the
DataNodes, improving performance. This setting also minimizes the number
of additional copies of data. Short-circuit reads requires
<codeph>libhadoop.so</codeph>
<!-- This link went stale. Not obvious how to keep it in sync with whatever Hadoop CDH is using behind the scenes. So hide the link for now. -->
<!-- (the <xref href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html" scope="external" format="html">Hadoop Native Library</xref>) -->
(the Hadoop Native Library) to be accessible to both the server and the
client. <codeph>libhadoop.so</codeph> is not available if you have
installed from a tarball. You must install from an
<codeph>.rpm</codeph>, <codeph>.deb</codeph>, or parcel to use
short-circuit local reads. <note> If you use Cloudera Manager, you can
enable short-circuit reads through a checkbox in the user interface
and that setting takes effect for Impala as well. </note>
</p>
<p>
<b>To configure DataNodes for short-circuit reads:</b>
</p>
<ol id="ol_qlq_wyv_ls">
<li id="copy_config_files"> Copy the client
<codeph>core-site.xml</codeph> and <codeph>hdfs-site.xml</codeph>
configuration files from the Hadoop configuration directory to the
Impala configuration directory. The default Impala configuration
location is <codeph>/etc/impala/conf</codeph>. </li>
<li>
<indexterm audience="hidden"
>dfs.client.read.shortcircuit</indexterm>
<indexterm audience="hidden">dfs.domain.socket.path</indexterm>
<indexterm audience="hidden"
>dfs.client.file-block-storage-locations.timeout.millis</indexterm>
On all Impala nodes, configure the following properties in <!-- Exact timing is unclear, since we say farther down to copy /etc/hadoop/conf/hdfs-site.xml to /etc/impala/conf.
Which wouldn't work if we already modified the Impala version of the file here. Not to mention that this
doesn't take the CM interface into account, where these /etc files might not exist in those locations. -->
<!-- <codeph>/etc/impala/conf/hdfs-site.xml</codeph> as shown: -->
Impala's copy of <codeph>hdfs-site.xml</codeph> as shown: <codeblock>&lt;property&gt;
&lt;name&gt;dfs.client.read.shortcircuit&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.domain.socket.path&lt;/name&gt;
&lt;value&gt;/var/run/hdfs-sockets/dn&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.client.file-block-storage-locations.timeout.millis&lt;/name&gt;
&lt;value&gt;10000&lt;/value&gt;
&lt;/property&gt;</codeblock>
<!-- Former socket.path value: &lt;value&gt;/var/run/hadoop-hdfs/dn._PORT&lt;/value&gt; -->
<!--
<note>
The text <codeph>_PORT</codeph> appears just as shown; you do not need to
substitute a number.
</note>
-->
</li>
<li>
<p> If <codeph>/var/run/hadoop-hdfs/</codeph> is group-writable, make
sure its group is <codeph>root</codeph>. </p>
<note> If you are also going to enable block location tracking, you
can skip copying configuration files and restarting DataNodes and go
straight to <xref href="#config_performance/block_location_tracking"
>Optional: Block Location Tracking</xref>.
Configuring short-circuit reads and block location tracking require
the same process of copying files and restarting services, so you
can complete that process once when you have completed all
configuration changes. Whether you copy files and restart services
now or during configuring block location tracking, short-circuit
reads are not enabled until you complete those final steps. </note>
</li>
<li id="restart_all_datanodes"> After applying these changes, restart
all DataNodes. </li>
</ol>
</section>
<section id="block_location_tracking">
<title>Mandatory: Block Location Tracking</title>
<p>
Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing
better utilization of the underlying disks. Impala will not start unless this setting is enabled.
</p>
<p>
<b>To enable block location tracking:</b>
</p>
<ol>
<li>
For each DataNode, adding the following to the <codeph>hdfs-site.xml</codeph> file:
<codeblock>&lt;property&gt;
&lt;name&gt;dfs.datanode.hdfs-blocks-metadata.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt; </codeblock>
</li>
<li conref="#config_performance/copy_config_files"/>
<li conref="#config_performance/restart_all_datanodes"/>
</ol>
</section>
<section id="native_checksumming">
<title>Optional: Native Checksumming</title>
<p>
Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if
that library is available.
</p>
<p id="p_29">
<b>To enable native checksumming:</b>
</p>
<p>
If you installed CDH from packages, the native checksumming library is installed and setup correctly. In
such a case, no additional steps are required. Conversely, if you installed by other means, such as with
tarballs, native checksumming may not be available due to missing shared objects. Finding the message
"<codeph>Unable to load native-hadoop library for your platform... using builtin-java classes where
applicable</codeph>" in the Impala logs indicates native checksumming may be unavailable. To enable native
checksumming, you must build and install <codeph>libhadoop.so</codeph> (the
<!-- Another instance of stale link. -->
<!-- <xref href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html" scope="external" format="html">Hadoop Native Library</xref>). -->
Hadoop Native Library).
</p>
</section>
</conbody>
</concept>