mirror of
https://github.com/apache/impala.git
synced 2025-12-23 21:08:39 -05:00
Change-Id: I0ad5794d8a9b62cc7d01d023f56e700dc018f24b Reviewed-on: http://gerrit.cloudera.org:8080/12732 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Paul Rogers <progers@cloudera.com>
263 lines
13 KiB
XML
263 lines
13 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="explain">
|
|
|
|
<title>EXPLAIN Statement</title>
|
|
<titlealts audience="PDF"><navtitle>EXPLAIN</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Reports"/>
|
|
<data name="Category" value="Planning"/>
|
|
<data name="Category" value="Performance"/>
|
|
<data name="Category" value="Troubleshooting"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p> Returns the execution plan for a statement, showing the low-level
|
|
mechanisms that Impala will use to read the data, divide the work among
|
|
nodes in the cluster, and transmit intermediate and final results across
|
|
the network. Use <codeph>explain</codeph> followed by a complete
|
|
<codeph>SELECT</codeph> query. For example: </p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<codeblock>EXPLAIN { <varname>select_query</varname> | <varname>ctas_stmt</varname> | <varname>insert_stmt</varname> }
|
|
</codeblock>
|
|
|
|
<p>
|
|
The <varname>select_query</varname> is a <codeph>SELECT</codeph> statement, optionally prefixed by a
|
|
<codeph>WITH</codeph> clause. See <xref href="impala_select.xml#select"/> for details.
|
|
</p>
|
|
|
|
<p>
|
|
The <varname>insert_stmt</varname> is an <codeph>INSERT</codeph> statement that inserts into or overwrites an
|
|
existing table. It can use either the <codeph>INSERT ... SELECT</codeph> or <codeph>INSERT ...
|
|
VALUES</codeph> syntax. See <xref href="impala_insert.xml#insert"/> for details.
|
|
</p>
|
|
|
|
<p>
|
|
The <varname>ctas_stmt</varname> is a <codeph>CREATE TABLE</codeph> statement using the <codeph>AS
|
|
SELECT</codeph> clause, typically abbreviated as a <q>CTAS</q> operation. See
|
|
<xref href="impala_create_table.xml#create_table"/> for details.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
You can interpret the output to judge whether the query is performing efficiently, and adjust the query
|
|
and/or the schema if not. For example, you might change the tests in the <codeph>WHERE</codeph> clause, add
|
|
hints to make join operations more efficient, introduce subqueries, change the order of tables in a join, add
|
|
or change partitioning for a table, collect column statistics and/or table statistics in Hive, or any other
|
|
performance tuning steps.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>EXPLAIN</codeph> output reminds you if table or column statistics are missing from any table
|
|
involved in the query. These statistics are important for optimizing queries involving large tables or
|
|
multi-table joins. See <xref href="impala_compute_stats.xml#compute_stats"/> for how to gather statistics,
|
|
and <xref href="impala_perf_stats.xml#perf_stats"/> for how to use this information for query tuning.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/explain_interpret"/>
|
|
|
|
<p>
|
|
If you come from a traditional database background and are not familiar with data warehousing, keep in mind
|
|
that Impala is optimized for full table scans across very large tables. The structure and distribution of
|
|
this data is typically not suitable for the kind of indexing and single-row lookups that are common in OLTP
|
|
environments. Seeing a query scan entirely through a large table is common, not necessarily an indication of
|
|
an inefficient query. Of course, if you can reduce the volume of scanned data by orders of magnitude, for
|
|
example by using a query that affects only certain partitions within a partitioned table, then you might be
|
|
able to optimize a query so that it executes in seconds rather than minutes.
|
|
</p>
|
|
<p> The <codeph>EXPLAIN</codeph> output becomes more accurate and informative
|
|
as statistics are gathered by the <codeph>COMPUTE STATS</codeph>
|
|
statement. Initially, the information about data size and distribution,
|
|
such as the number of rows or number of distinct values for each column,
|
|
is marked "<codeph>unavailable</codeph>". The <codeph>COMPUTE
|
|
STATS</codeph> statement performs the analysis, so a subsequent
|
|
<codeph>EXPLAIN</codeph> statement has additional information to use in
|
|
deciding how to optimize the distributed query. </p>
|
|
|
|
<p>
|
|
For more information and examples to help you interpret <codeph>EXPLAIN</codeph> output, see
|
|
<xref href="impala_explain_plan.xml#perf_explain"/>.
|
|
</p>
|
|
|
|
<p rev="1.2">
|
|
<b>Extended EXPLAIN output:</b>
|
|
</p>
|
|
|
|
<p rev="1.2"> For performance tuning of complex queries, and capacity
|
|
planning (such as using the admission control and resource management
|
|
features), you can enable more detailed and informative output for the
|
|
<codeph>EXPLAIN</codeph> statement. In the
|
|
<cmdname>impala-shell</cmdname> interpreter, issue the command
|
|
<codeph>SET EXPLAIN_LEVEL=<varname>level</varname></codeph>, where
|
|
<varname>level</varname> is <codeph>MINIMAL</codeph>,
|
|
<codeph>STANDARD</codeph>, <codeph>EXTENDED</codeph>, or
|
|
<codeph>VERBOSE</codeph>. </p>
|
|
|
|
<p rev="1.2"> When extended <codeph>EXPLAIN</codeph> output is enabled,
|
|
<codeph>EXPLAIN</codeph> statements print information about estimated
|
|
memory requirements, minimum number of virtual cores, and so on. </p>
|
|
<p>Starting in <keyword keyref="impala32_full"/>, if the
|
|
<codeph>EXPLAIN_LEVEL</codeph> option is set to
|
|
<codeph>EXTENDED</codeph> level or <codeph>VERBOSE</codeph>, the output
|
|
contains the following additional information.</p>
|
|
<ul>
|
|
<li>The analyzed query, in the output header. <p>The analyzed query may
|
|
have been rewritten to include various optimizations and implicit
|
|
casts. See the example below.</p></li>
|
|
<li>The predicates in the plan output includes the same implicit casts and
|
|
literals printed with a cast to show the type.</li>
|
|
</ul>
|
|
|
|
<p>
|
|
See <xref href="impala_explain_level.xml#explain_level"/> for details and examples.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p> This example shows how the standard <codeph>EXPLAIN</codeph> output
|
|
moves from the lowest (physical) level to the higher (logical) levels. </p>
|
|
<ol>
|
|
<li>The query begins by scanning a certain amount of data; each node
|
|
performs an aggregation operation (evaluating <codeph>COUNT(*)</codeph>)
|
|
on some subset of data that is local to that node.</li>
|
|
<li>The intermediate results are transmitted back to the coordinator node
|
|
(labelled here as the <codeph>EXCHANGE</codeph> node).</li>
|
|
<li>Lastly, the intermediate results are summed to display the final
|
|
result. </li>
|
|
</ol>
|
|
|
|
<codeblock id="explain_plan_simple">[impalad-host:21000] > EXPLAIN SELECT COUNT(*) FROM customer_address;
|
|
+----------------------------------------------------------+
|
|
| Explain String |
|
|
+----------------------------------------------------------+
|
|
| ... |
|
|
| |
|
|
| 03:AGGREGATE [FINALIZE] |
|
|
| | output: sum(count(*)) |
|
|
| | |
|
|
| 02:EXCHANGE [UNPARTITIONED] |
|
|
| | |
|
|
| 01:AGGREGATE |
|
|
| | output: count(*) |
|
|
| | |
|
|
| 00:SCAN HDFS [default.customer_address] |
|
|
| partitions=1/1 size=5.25MB |
|
|
+----------------------------------------------------------+
|
|
</codeblock>
|
|
<p>The following example shows an extended <codeph>EXPLAIN</codeph> output.
|
|
Note that the analyzed query was rewritten to include:</p>
|
|
<ul>
|
|
<li>The 'constant folding' optimization, which simplified the expression
|
|
in the original query, '<codeph>1000 / 100</codeph>' to
|
|
'<codeph>10</codeph>'.</li>
|
|
<li>The implicit casts in the <codeph>WHERE</codeph> clause.</li>
|
|
</ul>
|
|
<codeblock>EXPLAIN SELECT * FROM functional_kudu.alltypestiny WHERE bigint_col < 1000 / 100;
|
|
+----------------------------------------------------------+
|
|
| Explain String |
|
|
+----------------------------------------------------------+
|
|
| ...
|
|
| Analyzed query: SELECT * FROM mytable WHERE CAST(bigint_col AS DOUBLE) < CAST(10 AS DOUBLE)
|
|
| ...
|
|
| 00:SCAN KUDU [functional_kudu.alltypestiny]
|
|
| predicates: CAST(bigint_col AS DOUBLE) < CAST(10 AS DOUBLE)
|
|
...</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/security_blurb"/>
|
|
<p conref="../shared/impala_common.xml#common/redaction_yes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
|
|
<p rev="">
|
|
<!-- Doublecheck these details. Does EXPLAIN really need any permissions? -->
|
|
The user ID that the <cmdname>impalad</cmdname> daemon runs under,
|
|
typically the <codeph>impala</codeph> user, must have read
|
|
and execute permissions for all applicable directories in all source tables
|
|
for the query that is being explained.
|
|
(A <codeph>SELECT</codeph> operation could read files from multiple different HDFS directories
|
|
if the source table is partitioned.)
|
|
</p>
|
|
|
|
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
|
<p>
|
|
The <codeph>EXPLAIN</codeph> statement displays equivalent plan
|
|
information for queries against Kudu tables as for queries
|
|
against HDFS-based tables.
|
|
</p>
|
|
|
|
<p>
|
|
To see which predicates Impala can <q>push down</q> to Kudu for
|
|
efficient evaluation, without transmitting unnecessary rows back
|
|
to Impala, look for the <codeph>kudu predicates</codeph> item in
|
|
the scan phase of the query. The label <codeph>kudu predicates</codeph>
|
|
indicates a condition that can be evaluated efficiently on the Kudu
|
|
side. The label <codeph>predicates</codeph> in a <codeph>SCAN KUDU</codeph>
|
|
node indicates a condition that is evaluated by Impala.
|
|
For example, in a table with primary key column <codeph>X</codeph>
|
|
and non-primary key column <codeph>Y</codeph>, you can see that
|
|
some operators in the <codeph>WHERE</codeph> clause are evaluated
|
|
immediately by Kudu and others are evaluated later by Impala:
|
|
</p>
|
|
|
|
<codeblock rev="2.9.0 IMPALA-4859">
|
|
EXPLAIN SELECT x,y from kudu_table WHERE
|
|
x = 1 AND y NOT IN (2,3) AND z = 1
|
|
AND a IS NOT NULL AND b > 0 AND length(s) > 5;
|
|
+----------------
|
|
| Explain String
|
|
+----------------
|
|
...
|
|
| 00:SCAN KUDU [kudu_table]
|
|
| predicates: y NOT IN (2, 3), length(s) > 5
|
|
| kudu predicates: a IS NOT NULL, b > 0, x = 1, z = 1
|
|
</codeblock>
|
|
|
|
<p rev="2.9.0 IMPALA-4859">
|
|
Only binary predicates, <codeph>IS NULL</codeph> and <codeph>IS NOT NULL</codeph>
|
|
(in <keyword keyref="impala29"/> and higher), and <codeph>IN</codeph> predicates
|
|
containing literal values that exactly match the types in the Kudu table, and do not
|
|
require any casting, can be pushed to Kudu.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
<p>
|
|
<xref href="impala_select.xml#select"/>,
|
|
<xref href="impala_insert.xml#insert"/>,
|
|
<xref href="impala_create_table.xml#create_table"/>,
|
|
<xref href="impala_explain_plan.xml#explain_plan"/>
|
|
</p>
|
|
|
|
</conbody>
|
|
</concept>
|