mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
The MT_DOP documentation was outdated stating that MT_DOP values greater than zero are not supported for DML statements. However, IMPALA-10351 introduced this feature and now DML statements do not produce an error if MT_DOP is set to a non-zero value. Change-Id: Id34ccdaa8e1738756f4f12f7074e9f076b9209b4 Reviewed-on: http://gerrit.cloudera.org:8080/21846 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
217 lines
7.1 KiB
XML
217 lines
7.1 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="mt_dop">
|
|
|
|
<title>MT_DOP Query Option</title>
|
|
<titlealts audience="PDF"><navtitle>MT DOP</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Query Options"/>
|
|
<data name="Category" value="Querying"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">MT_DOP query option</indexterm>
|
|
Sets the degree of intra-node parallelism used for certain operations that
|
|
can benefit from multithreaded execution. You can specify values
|
|
higher than zero to find the ideal balance of response time,
|
|
memory usage, and CPU usage during statement processing.
|
|
</p>
|
|
|
|
<note>
|
|
<p>
|
|
The Impala execution engine is being revamped incrementally to add
|
|
additional parallelism within a single host for certain statements and
|
|
kinds of operations. The setting <codeph>MT_DOP=0</codeph> uses the
|
|
<q>old</q> code path with limited intra-node parallelism.
|
|
</p>
|
|
|
|
<p>
|
|
Currently, <codeph>MT_DOP</codeph> support varies by statement type:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
<codeph>COMPUTE [INCREMENTAL] STATS</codeph>. Impala automatically sets
|
|
<codeph>MT_DOP=4</codeph> for <codeph>COMPUTE STATS</codeph> and
|
|
<codeph>COMPUTE INCREMENTAL STATS</codeph> statements on Parquet tables.
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
<codeph>SELECT</codeph> statements. <codeph>MT_DOP</codeph> is 0 by default
|
|
for <codeph>SELECT</codeph> statements but can be set to a value greater
|
|
than 0 to control intra-node parallelism. This may be useful to tune
|
|
query performance and in particular to reduce execution time of
|
|
long-running, CPU-intensive queries.
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
In <keyword keyref="impala34"/> and earlier, not all <codeph>SELECT</codeph>
|
|
statements support setting <codeph>MT_DOP</codeph>. Specifically, only
|
|
scan and aggregation operators, and
|
|
local joins that do not need data exchanges (such as for nested types) are
|
|
supported. Other <codeph>SELECT</codeph> statements produce an error if
|
|
<codeph>MT_DOP</codeph> is set to a non-zero value.
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
|
|
</note>
|
|
|
|
<p conref="../shared/impala_common.xml#common/type_integer"/>
|
|
<p conref="../shared/impala_common.xml#common/default_0"/>
|
|
<p>
|
|
Because <codeph>COMPUTE STATS</codeph> and <codeph>COMPUTE INCREMENTAL STATS</codeph>
|
|
statements for Parquet tables benefit substantially from extra intra-node
|
|
parallelism, Impala automatically sets <codeph>MT_DOP=4</codeph> when computing stats
|
|
for Parquet tables.
|
|
</p>
|
|
<p>
|
|
<b>Range:</b> 0 to 64
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<note>
|
|
<p>
|
|
Any timing figures in the following examples are on a small, lightly loaded development cluster.
|
|
Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
|
|
partitions within each table.
|
|
</p>
|
|
</note>
|
|
|
|
<p>
|
|
The following example shows how to run a <codeph>COMPUTE STATS</codeph>
|
|
statement against a Parquet table with or without an explicit <codeph>MT_DOP</codeph>
|
|
setting:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[
|
|
-- Explicitly setting MT_DOP to 0 selects the old code path.
|
|
set mt_dop = 0;
|
|
MT_DOP set to 0
|
|
|
|
-- The analysis for the billion rows is distributed among hosts,
|
|
-- but uses only a single core on each host.
|
|
compute stats billion_rows_parquet;
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
|
|
drop stats billion_rows_parquet;
|
|
|
|
-- Using 4 logical processors per host is faster.
|
|
set mt_dop = 4;
|
|
MT_DOP set to 4
|
|
|
|
compute stats billion_rows_parquet;
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
|
|
drop stats billion_rows_parquet;
|
|
|
|
-- Unsetting the option reverts back to its default.
|
|
-- Which for COMPUTE STATS and a Parquet table is 4,
|
|
-- so again it uses the fast path.
|
|
unset MT_DOP;
|
|
Unsetting option MT_DOP
|
|
|
|
compute stats billion_rows_parquet;
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
The following example shows the effects of setting <codeph>MT_DOP</codeph>
|
|
for a query on a Parquet table:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[
|
|
set mt_dop = 0;
|
|
MT_DOP set to 0
|
|
|
|
-- COUNT(DISTINCT) for a unique column is CPU-intensive.
|
|
select count(distinct id) from billion_rows_parquet;
|
|
+--------------------+
|
|
| count(distinct id) |
|
|
+--------------------+
|
|
| 1000000000 |
|
|
+--------------------+
|
|
Fetched 1 row(s) in 67.20s
|
|
|
|
set mt_dop = 16;
|
|
MT_DOP set to 16
|
|
|
|
-- Introducing more intra-node parallelism for the aggregation
|
|
-- speeds things up, and potentially reduces memory overhead by
|
|
-- reducing the number of scanner threads.
|
|
select count(distinct id) from billion_rows_parquet;
|
|
+--------------------+
|
|
| count(distinct id) |
|
|
+--------------------+
|
|
| 1000000000 |
|
|
+--------------------+
|
|
Fetched 1 row(s) in 17.19s
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
The following example shows how queries that are not compatible with non-zero
|
|
<codeph>MT_DOP</codeph> settings produce an error when <codeph>MT_DOP</codeph>
|
|
is set:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[
|
|
set mt_dop=1;
|
|
MT_DOP set to 1
|
|
|
|
insert into a1
|
|
select * from a2;
|
|
ERROR: NotImplementedException: MT_DOP not supported for DML statements.
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
<p>
|
|
<xref keyref="compute_stats"/>,
|
|
<xref keyref="aggregate_functions"/>
|
|
</p>
|
|
|
|
</conbody>
|
|
</concept>
|