mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message by saying the specific configuration that must be adjusted such that the query can pass the Admission Control. New fields 'per_backend_mem_to_admit_source' and 'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added into QuerySchedulePB. These fields explain what limiting factor drives final numbers at 'per_backend_mem_to_admit' and 'coord_backend_mem_to_admit' respectively. In turn, Admission Control will use this information to compose a more informative error message that the user can act upon. The new error message pattern also explicitly mentions "Per Host Min Memory Reservation" as a place to look at to investigate memory reservations scheduled for each backend node. Updated documentation with examples of query rejection by Admission Control and how to read the error message. Testing: - Add BE tests at admission-controller-test.cc - Adjust and pass affected EE tests Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66 Reviewed-on: http://gerrit.cloudera.org:8080/21436 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
252 lines
12 KiB
XML
252 lines
12 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="mem_limit">
|
|
<title>MEM_LIMIT Query Option</title>
|
|
<titlealts audience="PDF">
|
|
<navtitle>MEM LIMIT</navtitle>
|
|
</titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Query Options"/>
|
|
<data name="Category" value="Scalability"/>
|
|
<data name="Category" value="Memory"/>
|
|
<data name="Category" value="Troubleshooting"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
<conbody>
|
|
<p>
|
|
The MEM_LIMIT query option defines the maximum amount of memory a query can allocate on
|
|
each node. The total memory that can be used by a query is the <codeph>MEM_LIMIT</codeph>
|
|
times the number of nodes.
|
|
</p>
|
|
<p rev="">
|
|
There are two levels of memory limit for Impala. The
|
|
<codeph>‑‑mem_limit</codeph> startup option sets an overall limit for the
|
|
<cmdname>impalad</cmdname> process (which handles multiple queries concurrently). That
|
|
process memory limit can be expressed either as a percentage of RAM available to the
|
|
process such as <codeph>‑‑mem_limit=70%</codeph> or as a fixed amount of
|
|
memory, such as <codeph>100gb</codeph>. The memory available to the process is based on
|
|
the host's physical memory and, since Impala 3.2, memory limits from Linux Control Groups.
|
|
E.g. if an <cmdname>impalad</cmdname> process is running in a Docker container on a host
|
|
with 100GB of memory, the memory available is 100GB or the Docker container's memory
|
|
limit, whichever is less.
|
|
</p>
|
|
<p rev="">
|
|
The <codeph>MEM_LIMIT</codeph> query option, which you set through
|
|
<cmdname>impala-shell</cmdname> or the <codeph>SET</codeph> statement in a JDBC or ODBC
|
|
application, applies to each individual query. The <codeph>MEM_LIMIT</codeph> query option
|
|
is usually expressed as a fixed size such as <codeph>10gb</codeph>, and must always be
|
|
less than the <cmdname>impalad</cmdname> memory limit.
|
|
</p>
|
|
<p rev="">
|
|
If query processing approaches the specified memory limit on any node, either the
|
|
per-query limit or the impalad limit, then the SQL operations will start to reduce
|
|
their memory consumption, for example by writing the temporary data to disk (known as spilling to disk).
|
|
The result is a query that completes successfully, rather than failing with an out-of-memory error.
|
|
The tradeoff is decreased performance due to the extra disk I/O to write the temporary data and
|
|
read it back in. The slowdown could potentially be significant. Thus, while this feature improves
|
|
reliability, you should optimize your queries, system parameters, and hardware configuration to
|
|
make this spilling a rare occurrence.
|
|
</p>
|
|
<p>
|
|
<b>Type:</b> numeric
|
|
</p>
|
|
<p rev="">
|
|
<b>Units:</b> A numeric argument represents memory size in bytes; you can also use a
|
|
suffix of <codeph>m</codeph> or <codeph>mb</codeph> for megabytes, or more commonly
|
|
<codeph>g</codeph> or <codeph>gb</codeph> for gigabytes. If you specify a value with
|
|
unrecognized formats, subsequent queries fail with an error.
|
|
</p>
|
|
<p rev="">
|
|
<b>Default:</b> 0 (unlimited)
|
|
</p>
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
<p rev="">
|
|
The <codeph>MEM_LIMIT</codeph> setting is primarily useful for production workloads.
|
|
Impala's Admission Controller can be configured to automatically assign memory limits to
|
|
queries and limit memory consumption of resource pools. See <xref href="impala_admission.xml#admission_concurrency"/>
|
|
and <xref href="impala_admission.xml#admission_memory"/> for more information on configuring
|
|
the resource usage through admission control.
|
|
</p>
|
|
<p rev="">
|
|
Use the output of the <codeph>SUMMARY</codeph> command in <cmdname>impala-shell</cmdname>
|
|
to get a report of memory used for each phase of your most heavyweight queries on each
|
|
node, and then set a <codeph>MEM_LIMIT</codeph> somewhat higher than that. See
|
|
<xref href="impala_explain_plan.xml#perf_summary"/> for usage information about the
|
|
<codeph>SUMMARY</codeph> command.
|
|
</p>
|
|
<p conref="../shared/impala_common.xml#common/example_blurb" rev=""/>
|
|
<p rev="">
|
|
The following examples show how to set the <codeph>MEM_LIMIT</codeph> query option using a
|
|
fixed number of bytes, or suffixes representing gigabytes or megabytes.
|
|
</p>
|
|
<codeblock rev="">
|
|
[localhost:21000] > set mem_limit=3000000000;
|
|
MEM_LIMIT set to 3000000000
|
|
[localhost:21000] > select 5;
|
|
Query: select 5
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
|
|
[localhost:21000] > set mem_limit=3g;
|
|
MEM_LIMIT set to 3g
|
|
[localhost:21000] > select 5;
|
|
Query: select 5
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
|
|
[localhost:21000] > set mem_limit=3gb;
|
|
MEM_LIMIT set to 3gb
|
|
[localhost:21000] > select 5;
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
|
|
[localhost:21000] > set mem_limit=3m;
|
|
MEM_LIMIT set to 3m
|
|
[localhost:21000] > select 5;
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
[localhost:21000] > set mem_limit=3mb;
|
|
MEM_LIMIT set to 3mb
|
|
[localhost:21000] > select 5;
|
|
+---+
|
|
| 5 |
|
|
+---+
|
|
</codeblock>
|
|
<p rev="">
|
|
The following examples show how unrecognized <codeph>MEM_LIMIT</codeph> values lead to
|
|
errors for subsequent queries.
|
|
</p>
|
|
<codeblock rev="">
|
|
[localhost:21000] > set mem_limit=3pb;
|
|
MEM_LIMIT set to 3pb
|
|
[localhost:21000] > select 5;
|
|
ERROR: Failed to parse query memory limit from '3pb'.
|
|
|
|
[localhost:21000] > set mem_limit=xyz;
|
|
MEM_LIMIT set to xyz
|
|
[localhost:21000] > select 5;
|
|
Query: select 5
|
|
ERROR: Failed to parse query memory limit from 'xyz'.
|
|
</codeblock>
|
|
<p rev="">
|
|
The following examples shows the automatic query cancellation when the
|
|
<codeph>MEM_LIMIT</codeph> value is exceeded on any host involved in the Impala query.
|
|
First it runs a successful query and checks the largest amount of memory used on any node
|
|
for any stage of the query. Then it sets an artificially low <codeph>MEM_LIMIT</codeph>
|
|
setting so that the same query cannot run.
|
|
</p>
|
|
<codeblock rev="">
|
|
[localhost:21000] > select count(*) from customer;
|
|
Query: select count(*) from customer
|
|
+----------+
|
|
| count(*) |
|
|
+----------+
|
|
| 150000 |
|
|
+----------+
|
|
|
|
[localhost:21000] > select count(distinct c_name) from customer;
|
|
Query: select count(distinct c_name) from customer
|
|
+------------------------+
|
|
| count(distinct c_name) |
|
|
+------------------------+
|
|
| 150000 |
|
|
+------------------------+
|
|
|
|
[localhost:21000] > summary;
|
|
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
|
|
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
|
|
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
|
|
| 06:AGGREGATE | 1 | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE |
|
|
| 05:EXCHANGE | 1 | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
|
|
| 02:AGGREGATE | 1 | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | |
|
|
| 04:AGGREGATE | 1 | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | |
|
|
| 03:EXCHANGE | 1 | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) |
|
|
<b>| 01:AGGREGATE | 1 | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | |</b>
|
|
| 00:SCAN HDFS | 1 | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer |
|
|
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
|
|
|
|
[localhost:21000] > set mem_limit=15mb;
|
|
MEM_LIMIT set to 15mb
|
|
[localhost:21000] > select count(distinct c_name) from customer;
|
|
Query: select count(distinct c_name) from customer
|
|
ERROR:
|
|
Rejected query from pool default-pool: minimum memory reservation is greater than memory available to the query
|
|
for buffer reservations. Memory reservation needed given the current plan: 38.00 MB. Adjust MEM_LIMIT option
|
|
for the query to allow the query memory limit to be at least 70.00 MB. Note that changing the memory limit may
|
|
also change the plan. See 'Per Host Min Memory Reservation' in the query profile for more information about the
|
|
per-node memory requirements.</codeblock>
|
|
</conbody>
|
|
<concept id="mem_limit_executors">
|
|
<title>MEM_LIMIT_EXECUTORS Query Option</title>
|
|
<conbody>
|
|
<note>This is an advanced query option. Setting this query option is not recommended
|
|
unless specifically advised.</note>
|
|
<p>The existing <codeph>MEM_LIMIT</codeph> query option applies to all impala coordinators and
|
|
executors. This means that the same amount of memory gets reserved but coordinators
|
|
typically just do the job of coordinating the query and thus do not necessarily need all the
|
|
estimated memory. Blocking the estimated memory on coordinators blocks the memory to be used
|
|
for other queries.</p>
|
|
<p>The new <codeph>MEM_LIMIT_EXECUTORS</codeph> query option functions similarly to the
|
|
<codeph>MEM_LIMIT</codeph> option but sets the query memory limit only on executors. This
|
|
new option addresses the issue related to <codeph>MEM_LIMIT</codeph> and is recommended in
|
|
scenarios where the query needs much higher memory on executors compared with
|
|
coordinators.</p>
|
|
<p>Note that the <codeph>MEM_LIMIT_EXECUTORS</codeph> option does not work with
|
|
<codeph>MEM_LIMIT</codeph>. If you set both, only <codeph>MEM_LIMIT</codeph> applies.</p>
|
|
</conbody>
|
|
</concept>
|
|
<concept id="mem_limit_coordinators">
|
|
<title>MEM_LIMIT_COORDINATORS Query Option</title>
|
|
<conbody>
|
|
<note>This is an advanced query option. Setting this query option is not recommended
|
|
unless specifically advised.</note>
|
|
<p>The existing <codeph>MEM_LIMIT</codeph> query option applies to all impala coordinators and
|
|
executors. This means that the same amount of memory gets reserved but coordinators
|
|
typically just do the job of coordinating the query and thus do not necessarily need all the
|
|
estimated memory. Blocking the estimated memory on coordinators blocks the memory to be used
|
|
for other queries.</p>
|
|
<p>The new <codeph>MEM_LIMIT_COORDINATORS</codeph> query option functions similarly to the
|
|
<codeph>MEM_LIMIT</codeph> option but sets the query memory limit only on coordinators. This
|
|
new option addresses the issue related to <codeph>MEM_LIMIT</codeph> and is recommended in
|
|
scenarios where the query needs higher or lower memory on coordinators compared to the planner
|
|
estimates.</p>
|
|
<p>Note that the <codeph>MEM_LIMIT_COORDINATORS</codeph> option does not work with
|
|
<codeph>MEM_LIMIT</codeph>. If you set both, only <codeph>MEM_LIMIT</codeph> applies.</p>
|
|
</conbody>
|
|
</concept>
|
|
</concept>
|