mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
Updated all the associated topics. Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d Reviewed-on: http://gerrit.cloudera.org:8080/17469 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
374 lines
17 KiB
XML
374 lines
17 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="logging">
|
|
|
|
<title>Using Impala Logging</title>
|
|
|
|
<titlealts audience="PDF">
|
|
|
|
<navtitle>Logging</navtitle>
|
|
|
|
</titlealts>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Logs"/>
|
|
<data name="Category" value="Troubleshooting"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p> The Impala logs record information about: </p>
|
|
|
|
<ul>
|
|
<li> Any errors Impala encountered. If Impala experienced a serious error during startup, you
|
|
must diagnose and troubleshoot that problem before you can do anything further with Impala. </li>
|
|
|
|
<li> How Impala is configured. </li>
|
|
|
|
<li> Jobs Impala has completed. </li>
|
|
</ul>
|
|
|
|
<note>
|
|
<p> Formerly, the logs contained the query profile for each query, showing low-level details
|
|
of how the work is distributed among nodes and how intermediate and final results are
|
|
transmitted across the network. To save space, those query profiles are now stored in
|
|
zlib-compressed files in <filepath>/var/log/impala/profiles</filepath>. You can access them
|
|
through the Impala web user interface. For example, at
|
|
<codeph>http://<varname>impalad-node-hostname</varname>:25000/queries</codeph>, each
|
|
query is followed by a <codeph>Profile</codeph> link leading to a page showing extensive
|
|
analytical data for the query execution. </p>
|
|
<p rev="1.1.1"> The auditing feature introduced in Impala 1.1.1 produces a separate set of
|
|
audit log files when enabled. See <xref href="impala_auditing.xml#auditing"/> for details. </p>
|
|
<p rev="2.9.0 IMPALA-4431"> In <keyword keyref="impala29_full"/> and higher, you can control
|
|
how many audit event log files are kept on each host through the
|
|
<codeph>‑‑max_audit_event_log_files</codeph> startup option for the
|
|
<cmdname>impalad</cmdname> daemon, similar to the
|
|
<codeph>‑‑max_log_files</codeph> option for regular log files. </p>
|
|
<p rev="2.2.0"> The lineage feature introduced in Impala 2.2.0 produces a separate lineage log
|
|
file when enabled. See <xref href="impala_lineage.xml#lineage"/> for details. </p>
|
|
</note>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
|
|
</conbody>
|
|
|
|
<concept id="logs_details">
|
|
|
|
<title>Locations and Names of Impala Log Files</title>
|
|
|
|
<conbody>
|
|
|
|
<ul>
|
|
<li> By default, the log files are under the directory <filepath>/var/log/impala</filepath>.
|
|
To change log file locations, modify the defaults file described in <xref
|
|
href="impala_processes.xml#processes"/>. </li>
|
|
|
|
<li> The significant files for the <codeph>impalad</codeph> process are
|
|
<filepath>impalad.INFO</filepath>, <filepath>impalad.WARNING</filepath>, and
|
|
<filepath>impalad.ERROR</filepath>. You might also see a file
|
|
<filepath>impalad.FATAL</filepath>, although this is only present in rare conditions. </li>
|
|
|
|
<li> The significant files for the <codeph>statestored</codeph> process are
|
|
<filepath>statestored.INFO</filepath>, <filepath>statestored.WARNING</filepath>, and
|
|
<filepath>statestored.ERROR</filepath>. You might also see a file
|
|
<filepath>statestored.FATAL</filepath>, although this is only present in rare
|
|
conditions. </li>
|
|
|
|
<li rev="1.2"> The significant files for the <codeph>catalogd</codeph> process are
|
|
<filepath>catalogd.INFO</filepath>, <filepath>catalogd.WARNING</filepath>, and
|
|
<filepath>catalogd.ERROR</filepath>. You might also see a file
|
|
<filepath>catalogd.FATAL</filepath>, although this is only present in rare conditions. </li>
|
|
|
|
<li> Examine the <codeph>.INFO</codeph> files to see configuration settings for the
|
|
processes. </li>
|
|
|
|
<li> Examine the <codeph>.WARNING</codeph> files to see all kinds of problem information,
|
|
including such things as suboptimal settings and also serious runtime errors. </li>
|
|
|
|
<li> Examine the <codeph>.ERROR</codeph> and/or <codeph>.FATAL</codeph> files to see only
|
|
the most serious errors, if the processes crash, or queries fail to complete. These
|
|
messages are also in the <codeph>.WARNING</codeph> file. </li>
|
|
|
|
<li> A new set of log files is produced each time the associated daemon is restarted. These
|
|
log files have long names including a timestamp. The <codeph>.INFO</codeph>,
|
|
<codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files are physically represented
|
|
as symbolic links to the latest applicable log files. </li>
|
|
</ul>
|
|
|
|
<p> Impala stores information using the <codeph>glog_v</codeph> logging system. You will see
|
|
some messages referring to C++ file names. Logging is affected by: </p>
|
|
|
|
<ul>
|
|
<li> The <codeph>GLOG_v</codeph> environment variable specifies which types of messages are
|
|
logged. See <xref href="#log_levels"/> for details. </li>
|
|
|
|
<li> The <codeph>‑‑logbuflevel</codeph> startup flag for the
|
|
<cmdname>impalad</cmdname> daemon specifies how often the log information is written to
|
|
disk. The default is 0, meaning that the log is immediately flushed to disk when Impala
|
|
outputs an important messages such as a warning or an error, but less important messages
|
|
such as informational ones are buffered in memory rather than being flushed to disk
|
|
immediately. </li>
|
|
</ul>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="logs_rotate">
|
|
|
|
<title>Rotating Impala Logs</title>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Disk Storage"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p> Impala periodically switches the physical files representing the current log files, after
|
|
which it is safe to remove the old files if they are no longer needed. </p>
|
|
|
|
<p> Impala can automatically remove older unneeded log files, a feature known as <term>log
|
|
rotation</term>. </p>
|
|
|
|
<p> In Impala 2.2 and higher, the <codeph>‑‑max_log_files</codeph> configuration
|
|
option specifies how many log files to keep at each severity level (<codeph>INFO</codeph>,
|
|
<codeph>WARNING</codeph>, <codeph>ERROR</codeph>, and <codeph>FATAL</codeph>). You can
|
|
specify an appropriate setting for each Impala-related daemon (<cmdname>impalad</cmdname>,
|
|
<cmdname>statestored</cmdname>, and <cmdname>catalogd</cmdname>). <ul>
|
|
<li> A value of 0 preserves all log files, in which case you would set up set up manual
|
|
log rotation using your Linux tool or technique of choice. </li>
|
|
|
|
<li> A value of 1 preserves only the very latest log file. </li>
|
|
|
|
<li> The default value is 10. </li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p> Impala checks to see if any old logs need to be removed based on the interval specified in
|
|
the <codeph>‑‑logbufsecs</codeph> setting, every 5 seconds by default. </p>
|
|
|
|
<p> For some log levels, Impala logs are first temporarily buffered in memory and only written
|
|
to disk periodically. The <codeph>‑‑logbufsecs</codeph> setting controls the
|
|
maximum time that log messages are buffered for. For example, with the default value of 5
|
|
seconds, there may be up to a 5 second delay before a logged message shows up in the log
|
|
file. </p>
|
|
|
|
<p> It is not recommended that you set <codeph>‑‑logbufsecs</codeph> to 0 as the
|
|
setting makes the Impala daemon to spin in the thread that tries to delete old log files. </p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="dynamic_log_levels">
|
|
<title>Changing Log Levels Dynamically</title>
|
|
<conbody>
|
|
<p>For debugging purposes you may be adjusting the logging configuration for Catalog and
|
|
impalad servers. This required restarting the services. Impala supports adjusting the log
|
|
levels dynamically without the need to restart the server. There is a
|
|
<codeph>/log_level</codeph> tab in the debug page of all Impala servers. You can query the
|
|
<codeph>log4j</codeph> log level of <codeph>root</codeph> or
|
|
<codeph>org.apache.impala</codeph> by using the <codeph>Get Java Log Level</codeph>
|
|
button. Also you can change the <codeph>vlog/log4j</codeph> levels to any supported levels
|
|
of logging. You can select the log level using the <codeph>LOG LEVEL</codeph> drop down box.
|
|
You also have an option to restore the log levels to their original configuration by using
|
|
the <codeph>RESET</codeph> button.</p>
|
|
<p>Here is the format of a Glog:</p>
|
|
<codeblock>${level}${month}${day} HH:MM:SS.${us} ${thread-id} ${source-file}:${line}] ${query-id}] ${message}</codeblock>
|
|
<p>where</p>
|
|
<ul id="ul_yqb_ynv_plb">
|
|
<li>${level} — Log Levels; displays the levels as <codeph>I</codeph> for
|
|
<codeph>INFO</codeph>, <codeph>W</codeph> for <codeph>WARNING</codeph>,
|
|
<codeph>E</codeph> for <codeph>ERROR</codeph>, <codeph>F</codeph> for
|
|
<codeph>FATAL.</codeph>
|
|
</li>
|
|
<li>${month}${day} — Month and Date. </li>
|
|
<li>HH:MM:SS — Hours, Minutes, Seconds. </li>
|
|
<li>${us} — Microseconds. </li>
|
|
<li>${thread-id} — TID of the thread. </li>
|
|
<li>${source-file}:${line}] — File name and line number. </li>
|
|
<li>${query-id}] — An unique id for each and every query that is run in Impala. </li>
|
|
<li>${message} — Actual log message.</li>
|
|
</ul>
|
|
</conbody>
|
|
</concept>
|
|
|
|
<concept id="logs_debug">
|
|
|
|
<title>Reviewing Impala Logs</title>
|
|
|
|
<conbody>
|
|
|
|
<p> By default, the Impala log is stored at <codeph>/var/log/impalad/</codeph>. The most
|
|
comprehensive log, showing informational, warning, and error messages, is in the file name
|
|
<filepath>impalad.INFO</filepath>. View log file contents by using the web interface or by
|
|
examining the contents of the log file. (When you examine the logs through the file system,
|
|
you can troubleshoot problems by reading the <filepath>impalad.WARNING</filepath> and/or
|
|
<filepath>impalad.ERROR</filepath> files, which contain the subsets of messages indicating
|
|
potential problems.) </p>
|
|
|
|
<note>
|
|
<p> The web interface limits the amount of logging information displayed. To view every log
|
|
entry, access the log files directly through the file system. </p>
|
|
</note>
|
|
|
|
<p> You can view the contents of the <codeph>impalad.INFO</codeph> log file in the file
|
|
system. With the default configuration settings, the start of the log file appears as
|
|
follows: </p>
|
|
|
|
<codeblock>[user@example impalad]$ pwd
|
|
/var/log/impalad
|
|
[user@example impalad]$ more impalad.INFO
|
|
Log file created at: 2013/01/07 08:42:12
|
|
Running on machine: impala.example.com
|
|
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
|
|
I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
|
|
Built on Fri, 21 Dec 2012 12:55:19 PST
|
|
I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
|
|
I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
|
|
--dump_ir=false
|
|
--module_output=
|
|
--be_port=22000
|
|
--classpath=
|
|
--hostname=impala.example.com</codeblock>
|
|
|
|
<note> The preceding example shows only a small part of the log file. Impala log files are
|
|
often several megabytes in size. </note>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="log_format">
|
|
|
|
<title>Understanding Impala Log Contents</title>
|
|
|
|
<conbody>
|
|
|
|
<p> The logs store information about Impala startup options. This information appears once for
|
|
each time Impala is started and may include: </p>
|
|
|
|
<ul>
|
|
<li> Machine name. </li>
|
|
|
|
<li> Impala version number. </li>
|
|
|
|
<li> Flags used to start Impala. </li>
|
|
|
|
<li> CPU information. </li>
|
|
|
|
<li> The number of available disks. </li>
|
|
</ul>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="log_levels">
|
|
|
|
<title>Setting Logging Levels</title>
|
|
|
|
<conbody>
|
|
|
|
<p> Impala uses the GLOG system, which supports three logging levels. You can adjust logging
|
|
levels by exporting variable settings. To change logging settings manually, use a command
|
|
similar to the following on each node before starting <codeph>impalad</codeph>: </p>
|
|
|
|
<codeblock>export GLOG_v=1</codeblock>
|
|
|
|
<note> For performance reasons, do not enable the most verbose logging level of 3 unless there
|
|
is no other alternative for troubleshooting. </note>
|
|
|
|
<p> For more information on how to configure GLOG, including how to set variable logging
|
|
levels for different system components, see <xref keyref="glog.html">documentation for the
|
|
glog project on github</xref>. </p>
|
|
|
|
<section id="loglevels_details">
|
|
<title>Understanding What is Logged at Different Logging Levels</title>
|
|
<p> As logging levels increase, the categories of information logged are cumulative. For
|
|
example, GLOG_v=2 records everything GLOG_v=1 records, as well as additional information. </p>
|
|
<p> Increasing logging levels imposes performance overhead and increases log size. Where
|
|
practical, use GLOG_v=1 for most cases: this level has minimal performance impact but
|
|
still captures useful troubleshooting information. </p>
|
|
<p> Additional information logged at each level is as follows: </p>
|
|
<ul>
|
|
<li> GLOG_v=1 - The default level. Logs information about each connection and query that
|
|
is initiated to an <codeph>impalad</codeph> instance, including runtime profiles. </li>
|
|
|
|
<li> GLOG_v=2 - Everything from the previous level plus information for each RPC
|
|
initiated. This level also records query execution progress information, including
|
|
details on each file that is read. </li>
|
|
|
|
<li> GLOG_v=3 - Everything from the previous level plus logging of every row that is read.
|
|
This level is only applicable for the most serious troubleshooting and tuning scenarios,
|
|
because it can produce exceptionally large and detailed log files, potentially leading
|
|
to its own set of performance and capacity problems. </li>
|
|
</ul>
|
|
</section>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="redaction" rev="2.2.0">
|
|
|
|
<title>Redacting Sensitive Information from Impala Log Files</title>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Redaction"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<term>Log redaction</term> is a security feature that prevents sensitive information from
|
|
being displayed in locations used by administrators for monitoring and troubleshooting, such
|
|
as log files and the Impala debug web user interface. You configure regular expressions that
|
|
match sensitive types of information processed by your system, such as credit card numbers
|
|
or tax IDs, and literals matching these patterns are obfuscated wherever they would normally
|
|
be recorded in log files or displayed in administration or debugging user interfaces. </p>
|
|
|
|
<p> In a security context, the log redaction feature is complementary to the Ranger
|
|
authorization framework. Ranger prevents unauthorized users from being able to directly
|
|
access table data. Redaction prevents administrators or support personnel from seeing the
|
|
smaller amounts of sensitive or personally identifying information (PII) that might appear
|
|
in queries issued by those authorized users. </p>
|
|
|
|
<p> See <xref keyref="sg_redaction"/> for details about how to enable this feature and set up
|
|
the regular expressions to detect and redact sensitive information within SQL statement
|
|
text. </p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
</concept>
|