mirror of
https://github.com/apache/impala.git
synced 2025-12-23 21:08:39 -05:00
The recent documentation formatting changes introduced the navigation panel on the left. However, due to the length of the query options navigation title these could overlap with the documentation paragraphs. This commit removes the underscores from the navigation titles of the query options, so browsers can break them into multiple lines. Additionally, the "SET" and "Query Options for the SET Statement" pages are merged to save some more space for the query option navigation titles. Testing: - Built the documentation and tested manually Change-Id: Icec787d7a2af848aaaff65be2ecf311a5ce8fe7f Reviewed-on: http://gerrit.cloudera.org:8080/20556 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>
372 lines
21 KiB
XML
372 lines
21 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept rev="1.2" id="explain_level">
|
|
|
|
<title>EXPLAIN_LEVEL Query Option</title>
|
|
<titlealts audience="PDF"><navtitle>EXPLAIN LEVEL</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Query Options"/>
|
|
<data name="Category" value="Troubleshooting"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Performance"/>
|
|
<data name="Category" value="Reports"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p> Controls the amount of detail provided in the output of the
|
|
<codeph>EXPLAIN</codeph> statement. The basic output can help you
|
|
identify high-level performance issues such as scanning a higher volume of
|
|
data or more partitions than you expect. The higher levels of detail show
|
|
how intermediate results flow between nodes and how different SQL
|
|
operations such as <codeph>ORDER BY</codeph>, <codeph>GROUP BY</codeph>,
|
|
joins, and <codeph>WHERE</codeph> clauses are implemented within a
|
|
distributed query. </p>
|
|
|
|
<p>
|
|
<b>Type:</b> <codeph>STRING</codeph> or <codeph>INT</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Default:</b> <codeph>1</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Arguments:</b>
|
|
</p>
|
|
|
|
<p>
|
|
The allowed range of numeric values for this option is 0 to 3:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<codeph>0</codeph> or <codeph>MINIMAL</codeph>: A barebones list, one line per operation. Primarily useful
|
|
for checking the join order in very long queries where the regular <codeph>EXPLAIN</codeph> output is too
|
|
long to read easily.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>1</codeph> or <codeph>STANDARD</codeph>: The default level of detail, showing the logical way that
|
|
work is split up for the distributed query.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>2</codeph> or <codeph>EXTENDED</codeph>: Includes additional
|
|
detail about how the query planner uses statistics in its
|
|
decision-making process, to understand how a query could be tuned by
|
|
gathering statistics, using query hints, adding or removing predicates,
|
|
and so on. In <keyword keyref="impala32_full"/> and higher, the output
|
|
also includes the analyzed query with the cast information in the output
|
|
header, and the implicit cast info in the Predicate section.</li>
|
|
|
|
<li>
|
|
<codeph>3</codeph> or <codeph>VERBOSE</codeph>: The maximum level of detail, showing how work is split up
|
|
within each node into <q>query fragments</q> that are connected in a pipeline. This extra detail is
|
|
primarily useful for low-level performance testing and tuning within Impala itself, rather than for
|
|
rewriting the SQL code at the user level.
|
|
</li>
|
|
</ul>
|
|
|
|
<note>
|
|
Prior to Impala 1.3, the allowed argument range for <codeph>EXPLAIN_LEVEL</codeph> was 0 to 1: level 0 had
|
|
the mnemonic <codeph>NORMAL</codeph>, and level 1 was <codeph>VERBOSE</codeph>. In Impala 1.3 and higher,
|
|
<codeph>NORMAL</codeph> is not a valid mnemonic value, and <codeph>VERBOSE</codeph> still applies to the
|
|
highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
|
|
older <codeph>impala-shell</codeph> script files that set the <codeph>EXPLAIN_LEVEL</codeph> query option.
|
|
</note>
|
|
|
|
<p>
|
|
Changing the value of this option controls the amount of detail in the output of the <codeph>EXPLAIN</codeph>
|
|
statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
|
|
you need to confirm whether the work for the query is distributed the way you expect, particularly for the
|
|
most resource-intensive operations such as join queries against large tables, queries against tables with
|
|
large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
|
|
check estimated resource usage when you use the admission control or resource management features explained
|
|
in <xref href="impala_resource_management.xml#resource_management"/>. See
|
|
<xref href="impala_explain.xml#explain"/> for the syntax of the <codeph>EXPLAIN</codeph> statement, and
|
|
<xref href="impala_explain_plan.xml#perf_explain"/> for details about how to use the extended information.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
As always, read the <codeph>EXPLAIN</codeph> output from bottom to top. The lowest lines represent the
|
|
initial work of the query (scanning data files), the lines in the middle represent calculations done on each
|
|
node and how intermediate results are transmitted from one node to another, and the topmost lines represent
|
|
the final results being sent back to the coordinator node.
|
|
</p>
|
|
|
|
<p>
|
|
The numbers in the left column are generated internally during the initial planning phase and do not
|
|
represent the actual order of operations, so it is not significant if they appear out of order in the
|
|
<codeph>EXPLAIN</codeph> output.
|
|
</p>
|
|
|
|
<p>
|
|
At all <codeph>EXPLAIN</codeph> levels, the plan contains a warning if any tables in the query are missing
|
|
statistics. Use the <codeph>COMPUTE STATS</codeph> statement to gather statistics for each table and suppress
|
|
this warning. See <xref href="impala_perf_stats.xml#perf_stats"/> for details about how the statistics help
|
|
query performance.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>PROFILE</codeph> command in <cmdname>impala-shell</cmdname> always starts with an explain plan
|
|
showing full detail, the same as with <codeph>EXPLAIN_LEVEL=3</codeph>. <ph rev="1.4.0">After the explain
|
|
plan comes the executive summary, the same output as produced by the <codeph>SUMMARY</codeph> command in
|
|
<cmdname>impala-shell</cmdname>.</ph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
|
|
in <codeph>EXPLAIN</codeph> output:
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > create table t1 (x int, s string);
|
|
[localhost:21000] > set explain_level=1;
|
|
[localhost:21000] > explain select count(*) from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=10.00MB VCores=1 |
|
|
| WARNING: The following tables are missing relevant table and/or column |
|
|
| statistics. |
|
|
| explain_plan.t1 |
|
|
| |
|
|
| 03:AGGREGATE [MERGE FINALIZE] |
|
|
| | output: sum(count(*)) |
|
|
| | |
|
|
| 02:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
| 01:AGGREGATE |
|
|
| | output: count(*) |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1] |
|
|
| partitions=1/1 size=0B |
|
|
+------------------------------------------------------------------------+
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
| WARNING: The following tables are missing relevant table and/or column |
|
|
| statistics. |
|
|
| explain_plan.t1 |
|
|
| |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1] |
|
|
| partitions=1/1 size=0B |
|
|
+------------------------------------------------------------------------+
|
|
[localhost:21000] > set explain_level=2;
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
| WARNING: The following tables are missing relevant table and/or column |
|
|
| statistics. |
|
|
| explain_plan.t1 |
|
|
| |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | hosts=0 per-host-mem=unavailable |
|
|
| | tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
|
|
| partitions=1/1 size=0B |
|
|
| table stats: unavailable |
|
|
| column stats: unavailable |
|
|
| hosts=0 per-host-mem=0B |
|
|
| tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
+------------------------------------------------------------------------+
|
|
[localhost:21000] > set explain_level=3;
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
<b>| WARNING: The following tables are missing relevant table and/or column |</b>
|
|
<b>| statistics. |</b>
|
|
<b>| explain_plan.t1 |</b>
|
|
| |
|
|
| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| hosts=0 per-host-mem=unavailable |
|
|
| tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
| |
|
|
| F00:PLAN FRAGMENT [PARTITION=RANDOM] |
|
|
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
|
|
| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
|
|
| partitions=1/1 size=0B |
|
|
<b>| table stats: unavailable |</b>
|
|
<b>| column stats: unavailable |</b>
|
|
| hosts=0 per-host-mem=0B |
|
|
| tuple-ids=0 row-size=19B cardinality=unavailable |
|
|
+------------------------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
As the warning message demonstrates, most of the information needed for Impala to do efficient query
|
|
planning, and for you to understand the performance characteristics of the query, requires running the
|
|
<codeph>COMPUTE STATS</codeph> statement for the table:
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > compute stats t1;
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
[localhost:21000] > explain select * from t1;
|
|
+------------------------------------------------------------------------+
|
|
| Explain String |
|
|
+------------------------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
|
|
| |
|
|
| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] |
|
|
| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| hosts=0 per-host-mem=unavailable |
|
|
| tuple-ids=0 row-size=20B cardinality=0 |
|
|
| |
|
|
| F00:PLAN FRAGMENT [PARTITION=RANDOM] |
|
|
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
|
|
| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
|
|
| partitions=1/1 size=0B |
|
|
<b>| table stats: 0 rows total |</b>
|
|
<b>| column stats: all |</b>
|
|
| hosts=0 per-host-mem=0B |
|
|
| tuple-ids=0 row-size=20B cardinality=0 |
|
|
+------------------------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
|
|
<codeph>EXPLAIN</codeph> output and customize the amount of detail in the output. This example shows the
|
|
default <codeph>EXPLAIN</codeph> output for a three-way join query, then the equivalent output with a
|
|
<codeph>[SHUFFLE]</codeph> hint to change the join mechanism between the first two tables from a broadcast
|
|
join to a shuffle join.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > set explain_level=1;
|
|
[localhost:21000] > explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
|
|
+---------------------------------------------------------+
|
|
| Explain String |
|
|
+---------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
|
|
| |
|
|
| 07:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
<b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
| | hash predicates: two.x = three.x |
|
|
| | |
|
|
<b>| |--06:EXCHANGE [BROADCAST] |</b>
|
|
| | | |
|
|
| | 02:SCAN HDFS [explain_plan.t1 three] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
<b>| 03:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
| | hash predicates: one.x = two.x |
|
|
| | |
|
|
<b>| |--05:EXCHANGE [BROADCAST] |</b>
|
|
| | | |
|
|
| | 01:SCAN HDFS [explain_plan.t1 two] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1 one] |
|
|
| partitions=1/1 size=0B |
|
|
+---------------------------------------------------------+
|
|
[localhost:21000] > explain select one.*, two.*, three.*
|
|
> from t1 one join [shuffle] t1 two join t1 three
|
|
> where one.x = two.x and two.x = three.x;
|
|
+---------------------------------------------------------+
|
|
| Explain String |
|
|
+---------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
|
|
| |
|
|
| 08:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
| | |
|
|
<b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
| | hash predicates: two.x = three.x |
|
|
| | |
|
|
<b>| |--07:EXCHANGE [BROADCAST] |</b>
|
|
| | | |
|
|
| | 02:SCAN HDFS [explain_plan.t1 three] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
<b>| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</b>
|
|
| | hash predicates: one.x = two.x |
|
|
| | |
|
|
<b>| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</b>
|
|
| | | |
|
|
| | 01:SCAN HDFS [explain_plan.t1 two] |
|
|
| | partitions=1/1 size=0B |
|
|
| | |
|
|
<b>| 05:EXCHANGE [PARTITION=HASH(one.x)] |</b>
|
|
| | |
|
|
| 00:SCAN HDFS [explain_plan.t1 one] |
|
|
| partitions=1/1 size=0B |
|
|
+---------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
For a join involving many different tables, the default <codeph>EXPLAIN</codeph> output might stretch over
|
|
several pages, and the only details you care about might be the join order and the mechanism (broadcast or
|
|
shuffle) for joining each pair of tables. In that case, you might set <codeph>EXPLAIN_LEVEL</codeph> to its
|
|
lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
|
|
shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
|
|
cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
|
|
final stage of join processing.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > set explain_level=0;
|
|
[localhost:21000] > explain select one.*, two.*, three.*
|
|
> from t1 one join [shuffle] t1 two join t1 three
|
|
> where one.x = two.x and two.x = three.x;
|
|
+---------------------------------------------------------+
|
|
| Explain String |
|
|
+---------------------------------------------------------+
|
|
| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
|
|
| |
|
|
| 08:EXCHANGE [PARTITION=UNPARTITIONED] |
|
|
<b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b>
|
|
<b>| |--07:EXCHANGE [BROADCAST] |</b>
|
|
| | 02:SCAN HDFS [explain_plan.t1 three] |
|
|
<b>| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</b>
|
|
<b>| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</b>
|
|
| | 01:SCAN HDFS [explain_plan.t1 two] |
|
|
<b>| 05:EXCHANGE [PARTITION=HASH(one.x)] |</b>
|
|
| 00:SCAN HDFS [explain_plan.t1 one] |
|
|
+---------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<!-- Consider adding a related info section to collect the xrefs earlier on this page. -->
|
|
|
|
</conbody>
|
|
</concept>
|