mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
369 lines
21 KiB
XML
369 lines
21 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept rev="1.2" id="explain_level">
|
|
|
|
<title>EXPLAIN_LEVEL Query Option</title>
|
|
<titlealts audience="PDF"><navtitle>EXPLAIN_LEVEL</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Query Options"/>
|
|
<data name="Category" value="Troubleshooting"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Performance"/>
|
|
<data name="Category" value="Reports"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">EXPLAIN_LEVEL query option</indexterm>
|
|
Controls the amount of detail provided in the output of the <codeph>EXPLAIN</codeph> statement. The basic
|
|
output can help you identify high-level performance issues such as scanning a higher volume of data or more
|
|
partitions than you expect. The higher levels of detail show how intermediate results flow between nodes and
|
|
how different SQL operations such as <codeph>ORDER BY</codeph>, <codeph>GROUP BY</codeph>, joins, and
|
|
<codeph>WHERE</codeph> clauses are implemented within a distributed query.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Type:</b> <codeph>STRING</codeph> or <codeph>INT</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Default:</b> <codeph>1</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Arguments:</b>
|
|
</p>
|
|
|
|
<p>
|
|
The allowed range of numeric values for this option is 0 to 3:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<codeph>0</codeph> or <codeph>MINIMAL</codeph>: A barebones list, one line per operation. Primarily useful
|
|
for checking the join order in very long queries where the regular <codeph>EXPLAIN</codeph> output is too
|
|
long to read easily.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>1</codeph> or <codeph>STANDARD</codeph>: The default level of detail, showing the logical way that
|
|
work is split up for the distributed query.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>2</codeph> or <codeph>EXTENDED</codeph>: Includes additional detail about how the query planner
|
|
uses statistics in its decision-making process, to understand how a query could be tuned by gathering
|
|
statistics, using query hints, adding or removing predicates, and so on.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>3</codeph> or <codeph>VERBOSE</codeph>: The maximum level of detail, showing how work is split up
|
|
within each node into <q>query fragments</q> that are connected in a pipeline. This extra detail is
|
|
primarily useful for low-level performance testing and tuning within Impala itself, rather than for
|
|
rewriting the SQL code at the user level.
|
|
</li>
|
|
</ul>
|
|
|
|
<note>
|
|
Prior to Impala 1.3, the allowed argument range for <codeph>EXPLAIN_LEVEL</codeph> was 0 to 1: level 0 had
|
|
the mnemonic <codeph>NORMAL</codeph>, and level 1 was <codeph>VERBOSE</codeph>. In Impala 1.3 and higher,
|
|
<codeph>NORMAL</codeph> is not a valid mnemonic value, and <codeph>VERBOSE</codeph> still applies to the
|
|
highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
|
|
older <codeph>impala-shell</codeph> script files that set the <codeph>EXPLAIN_LEVEL</codeph> query option.
|
|
</note>
|
|
|
|
<p>
|
|
Changing the value of this option controls the amount of detail in the output of the <codeph>EXPLAIN</codeph>
|
|
statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
|
|
you need to confirm whether the work for the query is distributed the way you expect, particularly for the
|
|
most resource-intensive operations such as join queries against large tables, queries against tables with
|
|
large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
|
|
check estimated resource usage when you use the admission control or resource management features explained
|
|
in <xref href="impala_resource_management.xml#resource_management"/>. See
|
|
<xref href="impala_explain.xml#explain"/> for the syntax of the <codeph>EXPLAIN</codeph> statement, and
|
|
<xref href="impala_explain_plan.xml#perf_explain"/> for details about how to use the extended information.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
As always, read the <codeph>EXPLAIN</codeph> output from bottom to top. The lowest lines represent the
|
|
initial work of the query (scanning data files), the lines in the middle represent calculations done on each
|
|
node and how intermediate results are transmitted from one node to another, and the topmost lines represent
|
|
the final results being sent back to the coordinator node.
|
|
</p>
|
|
|
|
<p>
|
|
The numbers in the left column are generated internally during the initial planning phase and do not
|
|
represent the actual order of operations, so it is not significant if they appear out of order in the
|
|
<codeph>EXPLAIN</codeph> output.
|
|
</p>
|
|
|
|
<p>
|
|
At all <codeph>EXPLAIN</codeph> levels, the plan contains a warning if any tables in the query are missing
|
|
statistics. Use the <codeph>COMPUTE STATS</codeph> statement to gather statistics for each table and suppress
|
|
this warning. See <xref href="impala_perf_stats.xml#perf_stats"/> for details about how the statistics help
|
|
query performance.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>PROFILE</codeph> command in <cmdname>impala-shell</cmdname> always starts with an explain plan
|
|
showing full detail, the same as with <codeph>EXPLAIN_LEVEL=3</codeph>. <ph rev="1.4.0">After the explain
|
|
plan comes the executive summary, the same output as produced by the <codeph>SUMMARY</codeph> command in
|
|
<cmdname>impala-shell</cmdname>.</ph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
|
|
in <codeph>EXPLAIN</codeph> output:
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > create table t1 (x int, s string);
|
|
[localhost:21000] > set explain_level=1;
|
|
[localhost:21000] > explain select count(*) from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=10.00MB VCores=1 |
|
|
| WARNING: The following tables are missing relevant table and/or column |
|
|
| statistics. |
|
|
| explain_plan.t1 |
|
|
| |
|
|
| 03:AGGREGATE [MERGE FINALIZE] |
|
|
| | output: sum(count(*)) |
|
|
| | |
|
|
| 02:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
| 01:AGGREGATE |
|
|
| | output: count(*) |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1] |
|
|
| partitions=1/1 size=0B |
|
|
+------------------------------------------------------------------------+
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
| WARNING: The following tables are missing relevant table and/or column |
|
|
| statistics. |
|
|
| explain_plan.t1 |
|
|
| |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1] |
|
|
| partitions=1/1 size=0B |
|
|
+------------------------------------------------------------------------+
|
|
[localhost:21000] > set explain_level=2;
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
| WARNING: The following tables are missing relevant table and/or column |
|
|
| statistics. |
|
|
| explain_plan.t1 |
|
|
| |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | hosts=0 per-host-mem=unavailable |
|
|
| | tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
|
|
| partitions=1/1 size=0B |
|
|
| table stats: unavailable |
|
|
| column stats: unavailable |
|
|
| hosts=0 per-host-mem=0B |
|
|
| tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
+------------------------------------------------------------------------+
|
|
[localhost:21000] > set explain_level=3;
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
<b>| WARNING: The following tables are missing relevant table and/or column |</b>
|
|
<b>| statistics. |</b>
|
|
<b>| explain_plan.t1 |</b>
|
|
| |
|
|
| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| hosts=0 per-host-mem=unavailable |
|
|
| tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
| |
|
|
| F00:PLAN FRAGMENT [PARTITION=RANDOM] |
|
|
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
|
|
| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
|
|
| partitions=1/1 size=0B |
|
|
<b>| table stats: unavailable |</b>
|
|
<b>| column stats: unavailable |</b>
|
|
| hosts=0 per-host-mem=0B |
|
|
| tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
+------------------------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
As the warning message demonstrates, most of the information needed for Impala to do efficient query
|
|
planning, and for you to understand the performance characteristics of the query, requires running the
|
|
<codeph>COMPUTE STATS</codeph> statement for the table:
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > compute stats t1;
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
| |
|
|
| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| hosts=0 per-host-mem=unavailable |
|
|
| tuple-ids=0 row-size=20B cardinality=0 |
|
|
| |
|
|
| F00:PLAN FRAGMENT [PARTITION=RANDOM] |
|
|
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
|
|
| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
|
|
| partitions=1/1 size=0B |
|
|
<b>| table stats: 0 rows total |</b>
|
|
<b>| column stats: all |</b>
|
|
| hosts=0 per-host-mem=0B |
|
|
| tuple-ids=0 row-size=20B cardinality=0 |
|
|
+------------------------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
|
|
<codeph>EXPLAIN</codeph> output and customize the amount of detail in the output. This example shows the
|
|
default <codeph>EXPLAIN</codeph> output for a three-way join query, then the equivalent output with a
|
|
<codeph>[SHUFFLE]</codeph> hint to change the join mechanism between the first two tables from a broadcast
|
|
join to a shuffle join.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > set explain_level=1;
|
|
[localhost:21000] > explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
|
|
+---------------------------------------------------------+
|
|
| Explain String |
|
|
+---------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
|
|
| |
|
|
| 07:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
<b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
| | hash predicates: two.x = three.x |
|
|
| | |
|
|
<b>| |--06:EXCHANGE [BROADCAST] |</b>
|
|
| | | |
|
|
| | 02:SCAN HDFS [explain_plan.t1 three] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
<b>| 03:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
| | hash predicates: one.x = two.x |
|
|
| | |
|
|
<b>| |--05:EXCHANGE [BROADCAST] |</b>
|
|
| | | |
|
|
| | 01:SCAN HDFS [explain_plan.t1 two] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1 one] |
|
|
| partitions=1/1 size=0B |
|
|
+---------------------------------------------------------+
|
|
[localhost:21000] > explain select one.*, two.*, three.*
|
|
> from t1 one join [shuffle] t1 two join t1 three
|
|
> where one.x = two.x and two.x = three.x;
|
|
+---------------------------------------------------------+
|
|
| Explain String |
|
|
+---------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
|
|
| |
|
|
| 08:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
<b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
| | hash predicates: two.x = three.x |
|
|
| | |
|
|
<b>| |--07:EXCHANGE [BROADCAST] |</b>
|
|
| | | |
|
|
| | 02:SCAN HDFS [explain_plan.t1 three] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
<b>| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</b>
|
|
| | hash predicates: one.x = two.x |
|
|
| | |
|
|
<b>| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</b>
|
|
| | | |
|
|
| | 01:SCAN HDFS [explain_plan.t1 two] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
<b>| 05:EXCHANGE [PARTITION=HASH(one.x)] |</b>
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1 one] |
|
|
| partitions=1/1 size=0B |
|
|
+---------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
For a join involving many different tables, the default <codeph>EXPLAIN</codeph> output might stretch over
|
|
several pages, and the only details you care about might be the join order and the mechanism (broadcast or
|
|
shuffle) for joining each pair of tables. In that case, you might set <codeph>EXPLAIN_LEVEL</codeph> to its
|
|
lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
|
|
shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
|
|
cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
|
|
final stage of join processing.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > set explain_level=0;
|
|
[localhost:21000] > explain select one.*, two.*, three.*
|
|
> from t1 one join [shuffle] t1 two join t1 three
|
|
> where one.x = two.x and two.x = three.x;
|
|
+---------------------------------------------------------+
|
|
| Explain String |
|
|
+---------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
|
|
| |
|
|
| 08:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
<b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
<b>| |--07:EXCHANGE [BROADCAST] |</b>
|
|
| | 02:SCAN HDFS [explain_plan.t1 three] |
|
|
<b>| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</b>
|
|
<b>| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</b>
|
|
| | 01:SCAN HDFS [explain_plan.t1 two] |
|
|
<b>| 05:EXCHANGE [PARTITION=HASH(one.x)] |</b>
|
|
| 00:SCAN HDFS [explain_plan.t1 one] |
|
|
+---------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<!-- Consider adding a related info section to collect the xrefs earlier on this page. -->
|
|
|
|
</conbody>
|
|
</concept>
|