mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Change-Id: I809e716e66558db02e6401bd218b3dd2de49864c Reviewed-on: http://gerrit.cloudera.org:8080/14575 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
290 lines
10 KiB
XML
290 lines
10 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="auditing">
|
|
|
|
<title>Auditing Impala Operations</title>
|
|
|
|
<titlealts audience="PDF">
|
|
|
|
<navtitle>Auditing</navtitle>
|
|
|
|
</titlealts>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Auditing"/>
|
|
<data name="Category" value="Governance"/>
|
|
<data name="Category" value="Navigator"/>
|
|
<data name="Category" value="Security"/>
|
|
<data name="Category" value="Administrators"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
To monitor how Impala data is being used within your organization, ensure that your Impala
|
|
authorization and authentication policies are effective. To detect attempts at intrusion
|
|
or unauthorized access to Impala data, you can use the auditing feature in Impala 1.2.1
|
|
and higher:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Enable auditing by including the option
|
|
<codeph>‑‑audit_event_log_dir=<varname>directory_path</varname></codeph> in
|
|
your <cmdname>impalad</cmdname> startup options. The log directory must be a local
|
|
directory on the server, not an HDFS directory.
|
|
</li>
|
|
|
|
<li>
|
|
Decide how many queries will be represented in each audit event log file. By default,
|
|
Impala starts a new audit event log file every 5000 queries. To specify a different
|
|
number, <ph audience="standalone"
|
|
>include the option
|
|
<codeph>‑‑max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
|
|
in the <cmdname>impalad</cmdname> startup options</ph>.
|
|
</li>
|
|
|
|
<li rev="2.9.0 IMPALA-4431">
|
|
In <keyword keyref="impala29_full"/> and higher, you can control how many audit event
|
|
log files are kept on each host. Specify the option
|
|
<codeph>‑‑max_audit_event_log_files=<varname>number_of_log_files</varname></codeph>
|
|
in the <cmdname>impalad</cmdname> startup options. Once the limit is reached, older
|
|
files are rotated out using the same mechanism as for other Impala log files. The
|
|
default value for this setting is 0, representing an unlimited number of audit event log
|
|
files.
|
|
</li>
|
|
|
|
<li>
|
|
Use a cluster manager with governance capabilities to filter, visualize, and produce
|
|
reports based on the audit logs collected from all the hosts in the cluster.
|
|
</li>
|
|
</ul>
|
|
|
|
<p outputclass="toc inpage"/>
|
|
|
|
</conbody>
|
|
|
|
<concept id="auditing_performance">
|
|
|
|
<title>Durability and Performance Considerations for Impala Auditing</title>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Performance"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The auditing feature only imposes performance overhead while auditing is enabled.
|
|
</p>
|
|
|
|
<p>
|
|
Because any Impala host can process a query, enable auditing on all hosts where the
|
|
<ph audience="standalone"><cmdname>impalad</cmdname> daemon</ph>
|
|
<ph audience="integrated">Impala Daemon role</ph> runs. Each host stores its own log
|
|
files, in a directory in the local filesystem. The log data is periodically flushed to
|
|
disk (through an <codeph>fsync()</codeph> system call) to avoid loss of audit data in
|
|
case of a crash.
|
|
</p>
|
|
|
|
<p>
|
|
The runtime overhead of auditing applies to whichever host serves as the coordinator for
|
|
the query, that is, the host you connect to when you issue the query. This might be the
|
|
same host for all queries, or different applications or users might connect to and issue
|
|
queries through different hosts.
|
|
</p>
|
|
|
|
<p>
|
|
To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log
|
|
data (using the <codeph>fsync()</codeph> system call) periodically rather than after
|
|
every query. Currently, the <codeph>fsync()</codeph> calls are issued at a fixed
|
|
interval, every 5 seconds.
|
|
</p>
|
|
|
|
<p>
|
|
By default, Impala avoids losing any audit log data in the case of an error during a
|
|
logging operation (such as a disk full error), by immediately shutting down
|
|
<cmdname audience="standalone"
|
|
>impalad</cmdname><ph audience="integrated">the
|
|
Impala Daemon role</ph> on the host where the auditing problem occurred.
|
|
<ph
|
|
audience="standalone">You can override this setting by specifying the
|
|
option <codeph>‑‑abort_on_failed_audit_event=false</codeph> in the
|
|
<cmdname>impalad</cmdname> startup options.</ph>
|
|
</p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="auditing_format">
|
|
|
|
<title>Format of the Audit Log Files</title>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Logs"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
The audit log files represent the query information in JSON format, one query per line.
|
|
Typically, rather than looking at the log files themselves, you should use
|
|
cluster-management software to consolidate the log data from all Impala hosts and filter
|
|
and visualize the results in useful ways. (If you do examine the raw log data, you might
|
|
run the files through a JSON pretty-printer first.)
|
|
</p>
|
|
|
|
<p>
|
|
All the information about schema objects accessed by the query is encoded in a single
|
|
nested record on the same line. For example, the audit log for an <codeph>INSERT ...
|
|
SELECT</codeph> statement records that a select operation occurs on the source table and
|
|
an insert operation occurs on the destination table. The audit log for a query against a
|
|
view records the base table accessed by the view, or multiple base tables in the case of
|
|
a view that includes a join query. Every Impala operation that corresponds to a SQL
|
|
statement is recorded in the audit logs, whether the operation succeeds or fails. Impala
|
|
records more information for a successful operation than for a failed one, because an
|
|
unauthorized query is stopped immediately, before all the query planning is completed.
|
|
</p>
|
|
|
|
<!-- Opportunity to conref at the phrase level here... the content of this paragraph is the same as part
|
|
of a list bullet earlier on. -->
|
|
|
|
<p>
|
|
The information logged for each query includes:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Client session state:
|
|
<ul>
|
|
<li>
|
|
Session ID
|
|
</li>
|
|
|
|
<li>
|
|
User name
|
|
</li>
|
|
|
|
<li>
|
|
Network address of the client connection
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
SQL statement details:
|
|
<ul>
|
|
<li>
|
|
Query ID
|
|
</li>
|
|
|
|
<li>
|
|
Statement Type - DML, DDL, and so on
|
|
</li>
|
|
|
|
<li>
|
|
SQL statement text
|
|
</li>
|
|
|
|
<li>
|
|
Execution start time, in local time
|
|
</li>
|
|
|
|
<li>
|
|
Execution Status - Details on any errors that were encountered
|
|
</li>
|
|
|
|
<li>
|
|
Target Catalog Objects:
|
|
<ul>
|
|
<li>
|
|
Object Type - Table, View, or Database
|
|
</li>
|
|
|
|
<li>
|
|
Fully qualified object name
|
|
</li>
|
|
|
|
<li>
|
|
Privilege - How the object is being used (<codeph>SELECT</codeph>,
|
|
<codeph>INSERT</codeph>, <codeph>CREATE</codeph>, and so on)
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<!-- Delegating actual examples to doc for visualization tool for the moment.
|
|
<p>
|
|
Here is an excerpt from a sample audit log file:
|
|
</p>
|
|
<codeblock></codeblock>
|
|
-->
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
<concept id="auditing_exceptions">
|
|
|
|
<title>Which Operations Are Audited</title>
|
|
|
|
<conbody>
|
|
|
|
<p> The following types of SQL operations are recorded in the audit
|
|
log:</p>
|
|
|
|
<ul>
|
|
<li> Queries that are prevented due to lack of authorization. </li>
|
|
<li> Queries that Impala can analyze and parse to determine that they
|
|
are authorized. The audit data is recorded immediately after Impala
|
|
finishes its analysis, before the query is actually executed. </li>
|
|
<li> Queries whose results are available to be fetched by clients.</li>
|
|
<li>Finished DDL operations.</li>
|
|
</ul>
|
|
|
|
<p> The audit log does not contain entries for queries that could not be
|
|
parsed and analyzed. For example, a query that fails due to a syntax
|
|
error is not recorded in the audit log. </p>
|
|
<p>The audit log does not contain queries that fail due to a reference to
|
|
a table that does not exist. </p>
|
|
|
|
<p> Certain statements in the <cmdname>impala-shell</cmdname> interpreter,
|
|
such as <codeph>CONNECT</codeph>, <codeph rev="1.4.0">SUMMARY</codeph>,
|
|
<codeph>PROFILE</codeph>, <codeph>SET</codeph>, and
|
|
<codeph>QUIT</codeph>, do not correspond to actual SQL queries, and
|
|
these statements are not recorded in the audit log. </p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|
|
|
|
</concept>
|