mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
Change-Id: I4b6f1c704c1e328af9f0beec73f8b6b61fba992e Reviewed-on: http://gerrit.cloudera.org:8080/12457 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
473 lines
18 KiB
XML
473 lines
18 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="troubleshooting">
|
|
|
|
<title>Troubleshooting Impala</title>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Troubleshooting"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">troubleshooting</indexterm>
|
|
Troubleshooting for Impala requires being able to diagnose and debug problems
|
|
with performance, network connectivity, out-of-memory conditions, disk space usage,
|
|
and crash or hang conditions in any of the Impala-related daemons.
|
|
</p>
|
|
|
|
<p outputclass="toc inpage" audience="PDF">
|
|
The following sections describe the general troubleshooting procedures to diagnose
|
|
different kinds of problems:
|
|
</p>
|
|
|
|
</conbody>
|
|
|
|
<concept id="trouble_sql">
|
|
|
|
<title>Troubleshooting Impala SQL Syntax Issues</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
In general, if queries issued against Impala fail, you can try running these same queries against Hive.
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
If a query fails against both Impala and Hive, it is likely that there is a problem with your query or
|
|
other elements of your <keyword keyref="distro"/> environment:
|
|
<ul>
|
|
<li>
|
|
Review the <xref href="impala_langref.xml#langref">Language Reference</xref> to ensure your query is
|
|
valid.
|
|
</li>
|
|
|
|
<li>
|
|
Check <xref href="impala_reserved_words.xml#reserved_words"/> to see if any database, table,
|
|
column, or other object names in your query conflict with Impala reserved words.
|
|
Quote those names with backticks (<codeph>``</codeph>) if so.
|
|
</li>
|
|
|
|
<li>
|
|
Check <xref href="impala_functions.xml#builtins"/> to confirm whether Impala supports all the
|
|
built-in functions being used by your query, and whether argument and return types are the
|
|
same as you expect.
|
|
</li>
|
|
|
|
<li>
|
|
Review the <xref href="impala_logging.xml#logs_debug">contents of the Impala logs</xref> for any information that may be useful in identifying the
|
|
source of the problem.
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
If a query fails against Impala but not Hive, it is likely that there is a problem with your Impala
|
|
installation.
|
|
</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
<concept id="IMPALA-5605">
|
|
<title>Troubleshooting Crashes Caused by Memory Resource Limit</title>
|
|
<conbody>
|
|
<p>Under very high concurrency, Impala could encounter a serious error due
|
|
to usage of various operating system resources. Errors similar to the
|
|
following may be caused by operating system resource exhaustion:</p>
|
|
<codeblock>F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
|
|
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'</codeblock>
|
|
<p>The KRPC implementation in Impala 2.12 / 3.0 greatly reduces thread
|
|
counts and the chances of hitting a resource limit.</p>
|
|
<p>If you still get an error similar to the above in Impala 3.0 and
|
|
higher, try increasing the <codeph>max_map_count</codeph> OS virtual
|
|
memory parameter. <codeph>max_map_count</codeph> defines the maximum
|
|
number of memory map areas that a process can use. Configure each host
|
|
running an <codeph>impalad</codeph> daemon with the command to increase
|
|
<codeph>max_map_count</codeph> to 8 GB.</p>
|
|
<codeblock outputclass="cdoc-input">echo 8000000 > /proc/sys/vm/max_map_count</codeblock>
|
|
<p>To make the above settings durable, refer to your OS documentation. For
|
|
example, on RHEL 6.x:<ol>
|
|
<li>Add the following line to
|
|
<codeph>/etc/sysctl.conf</codeph>:<codeblock>vm.max_map_count=8000000</codeblock></li>
|
|
<li>Run the following
|
|
command:<codeblock outputclass="cdoc-input">sysctl -p</codeblock></li>
|
|
</ol></p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="trouble_io" rev="">
|
|
<title>Troubleshooting I/O Capacity Problems</title>
|
|
<conbody>
|
|
<p> Impala queries are typically I/O-intensive. If there is an I/O problem
|
|
with storage devices, or with HDFS itself, Impala queries could show
|
|
slow response times with no obvious cause on the Impala side. Slow I/O
|
|
on even a single Impala daemon could result in an overall slowdown,
|
|
because queries involving clauses such as <codeph>ORDER BY</codeph>,
|
|
<codeph>GROUP BY</codeph>, or <codeph>JOIN</codeph> do not start
|
|
returning results until all executor Impala daemons have finished their
|
|
work. </p>
|
|
<p> To test whether the Linux I/O system itself is performing as expected,
|
|
run Linux commands like the following on each host Impala daemon is
|
|
running: </p>
|
|
<codeblock>
|
|
$ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
|
|
vm.drop_caches = 3
|
|
vm.drop_caches = 0
|
|
$ sudo dd if=/dev/sda bs=1M of=/dev/null count=1k
|
|
1024+0 records in
|
|
1024+0 records out
|
|
1073741824 bytes (1.1 GB) copied, 5.60373 s, 192 MB/s
|
|
$ sudo dd if=/dev/sdb bs=1M of=/dev/null count=1k
|
|
1024+0 records in
|
|
1024+0 records out
|
|
1073741824 bytes (1.1 GB) copied, 5.51145 s, 195 MB/s
|
|
$ sudo dd if=/dev/sdc bs=1M of=/dev/null count=1k
|
|
1024+0 records in
|
|
1024+0 records out
|
|
1073741824 bytes (1.1 GB) copied, 5.58096 s, 192 MB/s
|
|
$ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k
|
|
1024+0 records in
|
|
1024+0 records out
|
|
1073741824 bytes (1.1 GB) copied, 5.43924 s, 197 MB/s
|
|
</codeblock>
|
|
<p>
|
|
On modern hardware, a throughput rate of less than 100 MB/s typically indicates
|
|
a performance issue with the storage device. Correct the hardware problem before
|
|
continuing with Impala tuning or benchmarking.
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
|
|
<concept id="trouble_cookbook">
|
|
|
|
<title>Impala Troubleshooting Quick Reference</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The following table lists common problems and potential solutions.
|
|
</p>
|
|
|
|
<table>
|
|
<tgroup cols="3">
|
|
<colspec colname="1" colwidth="10*"/>
|
|
<colspec colname="2" colwidth="30*"/>
|
|
<colspec colname="3" colwidth="30*"/>
|
|
<thead>
|
|
<row>
|
|
<entry>
|
|
Symptom
|
|
</entry>
|
|
<entry>
|
|
Explanation
|
|
</entry>
|
|
<entry>
|
|
Recommendation
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
Impala takes a long time to start.
|
|
</entry>
|
|
<entry>
|
|
Impala instances with large numbers of tables, partitions, or data files take longer to start
|
|
because the metadata for these objects is broadcast to all <cmdname>impalad</cmdname> nodes and
|
|
cached.
|
|
</entry>
|
|
<entry>
|
|
Adjust timeout and synchronicity settings.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<p>
|
|
Joins fail to complete.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
There may be insufficient memory. During a join, data from the second, third, and so on sets to
|
|
be joined is loaded into memory. If Impala chooses an inefficient join order or join mechanism,
|
|
the query could exceed the total memory available.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Start by gathering statistics with the <codeph>COMPUTE STATS</codeph> statement for each table
|
|
involved in the join. Consider specifying the <codeph>[SHUFFLE]</codeph> hint so that data from
|
|
the joined tables is split up between nodes rather than broadcast to each node. If tuning at the
|
|
SQL level is not sufficient, add more memory to your system or join smaller data sets.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<p>
|
|
Queries return incorrect results.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Impala metadata may be outdated after changes are performed in Hive.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Where possible, use the appropriate Impala statement (<codeph>INSERT</codeph>, <codeph>LOAD
|
|
DATA</codeph>, <codeph>CREATE TABLE</codeph>, <codeph>ALTER TABLE</codeph>, <codeph>COMPUTE
|
|
STATS</codeph>, and so on) rather than switching back and forth between Impala and Hive. Impala
|
|
automatically broadcasts the results of DDL and DML operations to all Impala nodes in the
|
|
cluster, but does not automatically recognize when such changes are made through Hive. After
|
|
inserting data, adding a partition, or other operation in Hive, refresh the metadata for the
|
|
table as described in <xref href="impala_refresh.xml#refresh"/>.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<p>
|
|
Queries are slow to return results.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Some <codeph>impalad</codeph> instances may not have started. Using a browser, connect to the
|
|
host running the Impala state store. Connect using an address of the form
|
|
<codeph>http://<varname>hostname</varname>:<varname>port</varname>/metrics</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<note> Replace <varname>hostname</varname> and
|
|
<varname>port</varname> with the hostname and port of your
|
|
Impala state store host machine and web server port. The
|
|
default port is 25010. </note> The number of
|
|
<codeph>impalad</codeph> instances listed should match the
|
|
expected number of <codeph>impalad</codeph> instances
|
|
installed in the cluster. There should also be one
|
|
<codeph>impalad</codeph> instance installed on each
|
|
DataNode.</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Ensure Impala is installed on all DataNodes. Start any <codeph>impalad</codeph> instances that
|
|
are not running.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<p>
|
|
Queries are slow to return results.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Impala may not be configured to use native checksumming. Native checksumming uses
|
|
machine-specific instructions to compute checksums over HDFS data very quickly. Review Impala
|
|
logs. If you find instances of "<codeph>INFO util.NativeCodeLoader: Loaded the
|
|
native-hadoop</codeph>" messages, native checksumming is not enabled.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Ensure Impala is configured to use native checksumming as described in
|
|
<xref href="impala_config_performance.xml#config_performance"/>.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<p>
|
|
Queries are slow to return results.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Impala may not be configured to use data locality tracking.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Test Impala for data locality tracking and make configuration changes as necessary. Information
|
|
on this process can be found in <xref href="impala_config_performance.xml#config_performance"/>.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<p>
|
|
Attempts to complete Impala tasks such as executing INSERT-SELECT actions fail. The Impala logs
|
|
include notes that files could not be opened due to permission denied.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
This can be the result of permissions issues. For example, you could use the Hive shell as the
|
|
hive user to create a table. After creating this table, you could attempt to complete some
|
|
action, such as an INSERT-SELECT on the table. Because the table was created using one user and
|
|
the INSERT-SELECT is attempted by another, this action may fail due to permissions issues.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
In general, ensure the Impala user has sufficient permissions. In the preceding example, ensure
|
|
the Impala user has sufficient permissions to the table that the Hive user created.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
<row rev="IMP-1210">
|
|
<entry>
|
|
<p>
|
|
Impala fails to start up, with the <cmdname>impalad</cmdname> logs referring to errors connecting
|
|
to the statestore service and attempts to re-register.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
A large number of databases, tables, partitions, and so on can require metadata synchronization,
|
|
particularly on startup, that takes longer than the default timeout for the statestore service.
|
|
</p>
|
|
</entry>
|
|
<entry>
|
|
<p>
|
|
Configure the statestore timeout value and possibly other settings related to the frequency of
|
|
statestore updates and metadata loading. See
|
|
<xref href="impala_timeouts.xml#statestore_timeout"/> and
|
|
<xref href="impala_scalability.xml#statestore_scalability"/>.
|
|
</p>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<p audience="hidden">
|
|
Some or all of these settings might also be useful.
|
|
<codeblock>NUM_SCANNER_THREADS: 0
|
|
ABORT_ON_DEFAULT_LIMIT_EXCEEDED: 0
|
|
MAX_IO_BUFFERS: 0
|
|
DEFAULT_ORDER_BY_LIMIT: -1
|
|
BATCH_SIZE: 0
|
|
NUM_NODES: 0
|
|
DISABLE_CODEGEN: 0
|
|
MAX_ERRORS: 0
|
|
ABORT_ON_ERROR: 0
|
|
MAX_SCAN_RANGE_LENGTH: 0
|
|
ALLOW_UNSUPPORTED_FORMATS: 0
|
|
SUPPORT_START_OVER: false
|
|
DEBUG_ACTION:
|
|
MEM_LIMIT: 0
|
|
</codeblock>
|
|
</p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept audience="hidden" id="core_dumps">
|
|
|
|
<title>Enabling Core Dumps for Impala</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
Fill in details, then unhide.
|
|
</p>
|
|
|
|
<p>
|
|
From <xref href="impala_config_options.xml#config_options"/>:
|
|
</p>
|
|
|
|
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
|
|
|
|
<note conref="../shared/impala_common.xml#common/core_dump_considerations"/>
|
|
|
|
<p></p>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept audience="hidden" id="io_throughput">
|
|
<title>Verifying I/O Throughput</title>
|
|
<conbody>
|
|
<p>
|
|
Optimal Impala query performance depends on being able to perform I/O across multiple storage devices
|
|
in parallel, with the data transferred at or close to the maximum throughput for each device.
|
|
If a hardware or configuration issue causes a reduction in I/O throughput, even if the problem only
|
|
affects a subset of storage devices, you might experience
|
|
slow query performance that cannot be improved by using regular SQL tuning techniques.
|
|
</p>
|
|
<p>
|
|
As a general guideline, expect each commodity storage device (for example, a standard rotational
|
|
hard drive) to be able to transfer approximately 100 MB per second. If you see persistent slow query
|
|
perormance, examine the Impala logs to check
|
|
</p>
|
|
|
|
<codeblock>
|
|
<![CDATA[
|
|
Useful test for I/O throughput of hardware.
|
|
|
|
Symptoms:
|
|
* Queries running slow
|
|
* Scan rate of IO in Impala logs show noticeably less than expected IO rate for each disk (typical commodity disk should provide ~100 MB/s
|
|
|
|
Actions:
|
|
* Validate disk read from OS to confirm no issue at hardware or OS level
|
|
* Validate disk read at HDFS to see if issue at HDFS config
|
|
|
|
Specifics:
|
|
Testing Linux and hardware IO:
|
|
# First running:
|
|
sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
|
|
|
|
# Then Running:
|
|
sudo dd if=/dev/sda bs=1M of=/dev/null count=1k
|
|
& sudo dd if=/dev/sdb bs=1M of=/dev/null count=1k
|
|
& sudo dd if=/dev/sdc bs=1M of=/dev/null count=1k
|
|
& sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k & wait
|
|
|
|
Testing HDFS IO:
|
|
# You can use TestDFSIO. Its documented here ; http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/
|
|
# You can also use sar, dd and iostat for monitoring the disk.
|
|
|
|
# writes 10 files each of 1000 MB
|
|
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
|
|
|
|
# run the read benchmark
|
|
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
|
|
|
|
# clean up the data
|
|
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -clean
|
|
]]>
|
|
</codeblock>
|
|
|
|
</conbody>
|
|
</concept>
|
|
|
|
</concept>
|