mirror of
https://github.com/apache/impala.git
synced 2025-12-23 21:08:39 -05:00
- Max Memory Multiple - Max Running Queries Multiple - Max Queued Queries Multiple Change-Id: Ibca9cf9586359ee0f1ce0dd8744b4709752a26f1 Reviewed-on: http://gerrit.cloudera.org:8080/13906 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com>
590 lines
32 KiB
XML
590 lines
32 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept rev="1.3.0" id="admission_control">
|
||
|
||
<title>Admission Control and Query Queuing</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Querying"/>
|
||
<data name="Category" value="Admission Control"/>
|
||
<data name="Category" value="Resource Management"/>
|
||
</metadata>
|
||
</prolog>
|
||
<conbody>
|
||
<p id="admission_control_intro"> Admission control is an Impala feature that
|
||
imposes limits on concurrent SQL queries, to avoid resource usage spikes
|
||
and out-of-memory conditions on busy clusters. The admission control
|
||
feature lets you set an upper limit on the number of concurrent Impala
|
||
queries and on the memory used by those queries. Any additional queries
|
||
are queued until the earlier ones finish, rather than being cancelled or
|
||
running slowly and causing contention. As other queries finish, the queued
|
||
queries are allowed to proceed. </p>
|
||
<p rev="2.5.0"> In <keyword keyref="impala25_full"/> and higher, you can
|
||
specify these limits and thresholds for each pool rather than globally.
|
||
That way, you can balance the resource usage and throughput between steady
|
||
well-defined workloads, rare resource-intensive queries, and ad-hoc
|
||
exploratory queries. </p>
|
||
<p> In addition to the threshold values for currently executing queries, you
|
||
can place limits on the maximum number of queries that are queued
|
||
(waiting) and a limit on the amount of time they might wait before
|
||
returning with an error. These queue settings let you ensure that queries
|
||
do not wait indefinitely so that you can detect and correct
|
||
<q>starvation</q> scenarios. </p>
|
||
<p> Queries, DML statements, and some DDL statements, including
|
||
<codeph>CREATE TABLE AS SELECT</codeph> and <codeph>COMPUTE
|
||
STATS</codeph> are affected by admission control. </p>
|
||
<p> On a busy cluster, you might find there is an optimal number of Impala
|
||
queries that run concurrently. For example, when the I/O capacity is fully
|
||
utilized by I/O-intensive queries, you might not find any throughput
|
||
benefit in running more concurrent queries. By allowing some queries to
|
||
run at full speed while others wait, rather than having all queries
|
||
contend for resources and run slowly, admission control can result in
|
||
higher overall throughput. </p>
|
||
<p> For another example, consider a memory-bound workload such as many large
|
||
joins or aggregation queries. Each such query could briefly use many
|
||
gigabytes of memory to process intermediate results. Because Impala by
|
||
default cancels queries that exceed the specified memory limit, running
|
||
multiple large-scale queries at once might require re-running some queries
|
||
that are cancelled. In this case, admission control improves the
|
||
reliability and stability of the overall workload by only allowing as many
|
||
concurrent queries as the overall memory of the cluster can accommodate. </p>
|
||
<p outputclass="toc inpage"/>
|
||
</conbody>
|
||
|
||
<concept id="admission_concurrency">
|
||
<title>Concurrent Queries and Admission Control</title>
|
||
<conbody>
|
||
<p> One way to limit resource usage through admission control is to set an
|
||
upper limit on the number of concurrent queries. This is the initial
|
||
technique you might use when you do not have extensive information about
|
||
memory usage for your workload. The settings can be specified separately
|
||
for each dynamic resource pool. </p>
|
||
<dl>
|
||
<dlentry>
|
||
<dt> Max Running Queries </dt>
|
||
<dd><p>Maximum number of concurrently running queries in this pool.
|
||
The default value is unlimited for Impala 2.5 or higher.
|
||
(optional)</p> The maximum number of queries that can run
|
||
concurrently in this pool. The default value is unlimited. Any
|
||
queries for this pool that exceed <uicontrol>Max Running
|
||
Queries</uicontrol> are added to the admission control queue until
|
||
other queries finish. You can use <uicontrol>Max Running
|
||
Queries</uicontrol> in the early stages of resource management,
|
||
when you do not have extensive data about query memory usage, to
|
||
determine if the cluster performs better overall if throttling is
|
||
applied to Impala queries. <p> For a workload with many small
|
||
queries, you typically specify a high value for this setting, or
|
||
leave the default setting of <q>unlimited</q>. For a workload with
|
||
expensive queries, where some number of concurrent queries
|
||
saturate the memory, I/O, CPU, or network capacity of the cluster,
|
||
set the value low enough that the cluster resources are not
|
||
overcommitted for Impala. </p><p>Once you have enabled
|
||
memory-based admission control using other pool settings, you can
|
||
still use <uicontrol>Max Running Queries</uicontrol> as a
|
||
safeguard. If queries exceed either the total estimated memory or
|
||
the maximum number of concurrent queries, they are added to the
|
||
queue. </p><p>If <uicontrol>Max Running Queries
|
||
Multiple</uicontrol> is set, the <uicontrol>Max Running
|
||
Queries</uicontrol> setting is ignored.</p>
|
||
</dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>Max Running Queries Multiple</dt>
|
||
<dd>This floating point number is multiplied by the current total
|
||
number of executors at runtime to give the maximum number of
|
||
concurrently running queries allowed in the pool. The effect of this
|
||
setting scales with the number of executors in the resource
|
||
pool.<p>This calculation is rounded up to the nearest integer, so
|
||
the result will always be at least one. </p><p>If set to zero or a
|
||
negative number, the setting is ignored.</p></dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt> Max Queued Queries </dt>
|
||
<dd> Maximum number of queries that can be queued in this pool. The
|
||
default value is 200 for Impala 2.1 or higher and 50 for previous
|
||
versions of Impala. (optional)<p>If <uicontrol>Max Queued Queries
|
||
Multiple</uicontrol> is set, the <uicontrol>Max Queued
|
||
Queries</uicontrol> setting is ignored.</p></dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>Max Queued Queries Multiple</dt>
|
||
<dd>This floating point number is multiplied by the current total
|
||
number of executors at runtime to give the maximum number of queries
|
||
that can be queued in the pool. The effect of this setting scales
|
||
with the number of executors in the resource pool.<p>This
|
||
calculation is rounded up to the nearest integer, so the result
|
||
will always be at least one. </p><p>If set to zero or a negative
|
||
number, the setting is ignored.</p></dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt> Queue Timeout </dt>
|
||
<dd> The amount of time, in milliseconds, that a query waits in the
|
||
admission control queue for this pool before being canceled. The
|
||
default value is 60,000 milliseconds. <p>In the following cases,
|
||
<uicontrol>Queue Timeout</uicontrol> is not significant, and you
|
||
can specify a high value to avoid canceling queries
|
||
unexpectedly:<ul id="ul_kzr_rbg_gw">
|
||
<li>In a low-concurrency workload where few or no queries are
|
||
queued</li>
|
||
<li>In an environment without a strict SLA, where it does not
|
||
matter if queries occasionally take longer than usual because
|
||
they are held in admission control</li>
|
||
</ul>You might also need to increase the value to use Impala with
|
||
some business intelligence tools that have their own timeout
|
||
intervals for queries. </p><p>In a high-concurrency workload,
|
||
especially for queries with a tight SLA, long wait times in
|
||
admission control can cause a serious problem. For example, if a
|
||
query needs to run in 10 seconds, and you have tuned it so that it
|
||
runs in 8 seconds, it violates its SLA if it waits in the
|
||
admission control queue longer than 2 seconds. In a case like
|
||
this, set a low timeout value and monitor how many queries are
|
||
cancelled because of timeouts. This technique helps you to
|
||
discover capacity, tuning, and scaling problems early, and helps
|
||
avoid wasting resources by running expensive queries that have
|
||
already missed their SLA. </p><p> If you identify some queries
|
||
that can have a high timeout value, and others that benefit from a
|
||
low timeout value, you can create separate pools with different
|
||
values for this setting. </p>
|
||
</dd>
|
||
</dlentry>
|
||
</dl>
|
||
<p> You can combine these settings with the memory-based approach
|
||
described in <xref href="impala_admission.xml#admission_memory"/>. If
|
||
either the maximum number of or the expected memory usage of the
|
||
concurrent queries is exceeded, subsequent queries are queued until the
|
||
concurrent workload falls below the threshold again. </p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="admission_memory">
|
||
<title>Memory Limits and Admission Control</title>
|
||
<conbody>
|
||
<p>
|
||
Each dynamic resource pool can have an upper limit on the cluster-wide memory used by queries executing in that pool.
|
||
This is the technique to use once you have a stable workload with well-understood memory requirements.
|
||
</p>
|
||
<p>Use the following settings to manage memory-based admission
|
||
control.</p>
|
||
<dl>
|
||
<dlentry>
|
||
<dt>Max Memory</dt>
|
||
<dd>
|
||
<p>
|
||
The maximum amount of aggregate memory available across the
|
||
cluster to all queries executing in this pool. This should be a
|
||
portion of the aggregate configured memory for Impala daemons,
|
||
which will be shown in the settings dialog next to this option for
|
||
convenience. Setting this to a non-zero value enables memory based
|
||
admission control.
|
||
</p>
|
||
<p>
|
||
Impala determines the expected maximum memory used by all
|
||
queries in the pool and holds back any further queries that would
|
||
result in Max Memory being exceeded.
|
||
</p>
|
||
<p>
|
||
You set Max Memory in <codeph>fair-scheduler.xml</codeph> file
|
||
with the <codeph>maxResources</codeph> tag. For example:
|
||
<codeph><maxResources>2500 mb</maxResources></codeph>
|
||
</p>
|
||
<p>
|
||
If you specify Max Memory, you should specify the amount of
|
||
memory to allocate to each query in this pool. You can do this in
|
||
two ways:
|
||
</p>
|
||
<ul>
|
||
<li>By setting Maximum Query Memory Limit and Minimum Query Memory
|
||
Limit. This is preferred in <keyword keyref="impala31_full"/>
|
||
and greater and gives Impala flexibility to set aside more
|
||
memory to queries that are expected to be memory-hungry.</li>
|
||
<li>By setting Default Query Memory Limit to the exact amount of
|
||
memory that Impala should set aside for queries in that
|
||
pool.</li>
|
||
</ul>
|
||
<p>
|
||
Note that in the following cases, Impala will rely entirely on
|
||
memory estimates to determine how much memory to set aside for
|
||
each query. This is not recommended because it can result in
|
||
queries not running or being starved for memory if the estimates
|
||
are inaccurate. And it can affect other queries running on the
|
||
same node.
|
||
<ul>
|
||
<li>Max Memory, Maximum Query Memory Limit, and Minimum Query
|
||
Memory Limit are not set, and the <codeph>MEM_LIMIT</codeph>
|
||
query option is not set for the query.</li>
|
||
<li>Default Query Memory Limit is set to 0, and the
|
||
<codeph>MEM_LIMIT</codeph> query option is not set for the
|
||
query.</li>
|
||
</ul>
|
||
</p>
|
||
<p>If <uicontrol>Max Memory Multiple</uicontrol> is set, the
|
||
<uicontrol>Max Memory</uicontrol> setting is ignored.</p>
|
||
</dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>Max Memory Multiple</dt>
|
||
<dd> This number of bytes is multiplied by the current total number of
|
||
executors at runtime to give the maximum memory available across the
|
||
cluster for the pool. The effect of this setting scales with the
|
||
number of executors in the resource pool.<p>If set to zero or a
|
||
negative number, the setting is ignored.</p></dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>Minimum Query Memory Limit and Maximum Query Memory Limit</dt>
|
||
<dd>
|
||
<p>These two options determine the minimum and maximum per-host
|
||
memory limit that will be chosen by Impala Admission control for
|
||
queries in this resource pool. If set, Impala Admission Control
|
||
will choose a memory limit between the minimum and maximum values
|
||
based on the per-host memory estimate for the query. The memory
|
||
limit chosen determines the amount of memory that Impala Admission
|
||
control will set aside for this query on each host that the query
|
||
is running on. The aggregate memory across all of the hosts that
|
||
the query is running on is counted against the pool’s Max
|
||
Memory.</p>
|
||
<p>Minimum Query Memory Limit must be less than or equal to Maximum
|
||
Query Memory Limit and Max Memory.</p>
|
||
<p>A user can override Impala’s choice of memory limit by setting
|
||
the <codeph>MEM_LIMIT</codeph> query option. If the Clamp
|
||
MEM_LIMIT Query Option setting is set to <codeph>TRUE</codeph> and
|
||
the user sets <codeph>MEM_LIMIT</codeph> to a value that is
|
||
outside of the range specified by these two options, then the
|
||
effective memory limit will be either the minimum or maximum,
|
||
depending on whether <codeph>MEM_LIMIT</codeph> is lower than or
|
||
higher the range.</p>
|
||
<p>For example, assume a resource pool with the following parameters
|
||
set: <ul>
|
||
<li>Minimum Query Memory Limit = 2GB</li>
|
||
<li>Maximum Query Memory Limit = 10GB</li>
|
||
</ul>If a user tries to submit a query with the
|
||
<codeph>MEM_LIMIT</codeph> query option set to 14 GB, the
|
||
following would happen:<ul>
|
||
<li>If Clamp MEM_LIMIT Query Option = true, admission controller
|
||
would override <codeph>MEM_LIMIT</codeph> with 10 GB and
|
||
attempt admission using that value.</li>
|
||
<li>If Clamp MEM_LIMIT Query Option = false, the admission
|
||
controller will retain the <codeph>MEM_LIMIT</codeph> of 14 GB
|
||
set by the user and will attempt admission using the
|
||
value.</li>
|
||
</ul></p>
|
||
</dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>Default Query Memory Limit</dt>
|
||
<dd>The default memory limit applied to queries executing in this pool
|
||
when no explicit <codeph>MEM_LIMIT</codeph> query option is set. The
|
||
memory limit chosen determines the amount of memory that Impala
|
||
Admission control will set aside for this query on each host that
|
||
the query is running on. The aggregate memory across all of the
|
||
hosts that the query is running on is counted against the pool’s Max
|
||
Memory. This option is deprecated in <keyword keyref="impala31_full"
|
||
/> and higher and is replaced by Maximum Query Memory Limit and
|
||
Minimum Query Memory Limit. Do not set this if either Maximum Query
|
||
Memory Limit or Minimum Query Memory Limit is set.</dd>
|
||
</dlentry>
|
||
</dl>
|
||
<dl>
|
||
<dlentry>
|
||
<dt> Clamp MEM_LIMIT Query Option</dt>
|
||
<dd>If this field is not selected, the <codeph>MEM_LIMIT</codeph>
|
||
query option will not be bounded by the <b>Maximum Query Memory
|
||
Limit</b> and the <b>Minimum Query Memory Limit</b> values
|
||
specified for this resource pool. By default, this field is selected
|
||
in Impala 3.1 and higher. The field is disabled if both <b>Minimum
|
||
Query Memory Limit</b> and <b>Maximum Query Memory Limit</b> are
|
||
not set.</dd>
|
||
</dlentry>
|
||
</dl>
|
||
<p
|
||
conref="../shared/impala_common.xml#common/admission_control_mem_limit_interaction"/>
|
||
<p>
|
||
You can combine the memory-based settings with the upper limit on concurrent queries described in
|
||
<xref href="impala_admission.xml#admission_concurrency"/>. If either the maximum number of
|
||
or the expected memory usage of the concurrent queries is exceeded, subsequent queries
|
||
are queued until the concurrent workload falls below the threshold again.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="set_per_query_memory_limits">
|
||
<title>Setting Per-query Memory Limits</title>
|
||
<conbody>
|
||
<p>Use per-query memory limits to prevent queries from consuming excessive
|
||
memory resources that impact other queries. We recommends that you set
|
||
the query memory limits whenever possible.</p>
|
||
<p>If you set the <b>Max Memory</b> for a resource pool, Impala attempts
|
||
to throttle queries if there is not enough memory to run them within the
|
||
specified resources.</p>
|
||
<p>Only use admission control with maximum memory resources if you can
|
||
ensure there are query memory limits. Set the pool <b>Maximum Query
|
||
Memory Limit</b> to be certain. You can override this setting with the
|
||
<codeph>MEM_LIMIT</codeph> query option, if necessary.</p>
|
||
<p>Typically, you set query memory limits using the <codeph>set
|
||
MEM_LIMIT=Xg;</codeph> query option. When you find the right value for
|
||
your business case, memory-based admission control works well. The
|
||
potential downside is that queries that attempt to use more memory might
|
||
perform poorly or even be cancelled.</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="admission_yarn">
|
||
|
||
<title>How Impala Admission Control Relates to Other Resource Management Tools</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Concepts"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The admission control feature is similar in some ways to the YARN resource management framework. These features
|
||
can be used separately or together. This section describes some similarities and differences, to help you
|
||
decide which combination of resource management features to use for Impala.
|
||
</p>
|
||
|
||
<p>
|
||
Admission control is a lightweight, decentralized system that is suitable for workloads consisting
|
||
primarily of Impala queries and other SQL statements. It sets <q>soft</q> limits that smooth out Impala
|
||
memory usage during times of heavy load, rather than taking an all-or-nothing approach that cancels jobs
|
||
that are too resource-intensive.
|
||
</p>
|
||
|
||
<p>
|
||
Because the admission control system does not interact with other Hadoop workloads such as MapReduce jobs, you
|
||
might use YARN with static service pools on clusters where resources are shared between
|
||
Impala and other Hadoop components. This configuration is recommended when using Impala in a
|
||
<term>multitenant</term> cluster. Devote a percentage of cluster resources to Impala, and allocate another
|
||
percentage for MapReduce and other batch-style workloads. Let admission control handle the concurrency and
|
||
memory usage for the Impala work within the cluster, and let YARN manage the work for other components within the
|
||
cluster. In this scenario, Impala's resources are not managed by YARN.
|
||
</p>
|
||
|
||
<p>
|
||
The Impala admission control feature uses the same configuration mechanism as the YARN resource manager to map users to
|
||
pools and authenticate them.
|
||
</p>
|
||
|
||
<p>
|
||
Although the Impala admission control feature uses a <codeph>fair-scheduler.xml</codeph> configuration file
|
||
behind the scenes, this file does not depend on which scheduler is used for YARN. You still use this file
|
||
even when YARN is using the capacity scheduler.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="admission_architecture">
|
||
|
||
<title>How Impala Schedules and Enforces Limits on Concurrent Queries</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Concepts"/>
|
||
<data name="Category" value="Scheduling"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The admission control system is decentralized, embedded in each Impala daemon and communicating through the
|
||
statestore mechanism. Although the limits you set for memory usage and number of concurrent queries apply
|
||
cluster-wide, each Impala daemon makes its own decisions about whether to allow each query to run
|
||
immediately or to queue it for a less-busy time. These decisions are fast, meaning the admission control
|
||
mechanism is low-overhead, but might be imprecise during times of heavy load across many coordinators. There could be times when the
|
||
more queries were queued (in aggregate across the cluster) than the specified limit, or when number of admitted queries
|
||
exceeds the expected number. Thus, you typically err on the
|
||
high side for the size of the queue, because there is not a big penalty for having a large number of queued
|
||
queries; and you typically err on the low side for configuring memory resources, to leave some headroom in case more
|
||
queries are admitted than expected, without running out of memory and being cancelled as a result.
|
||
</p>
|
||
|
||
<p>
|
||
To avoid a large backlog of queued requests, you can set an upper limit on the size of the queue for
|
||
queries that are queued. When the number of queued queries exceeds this limit, further queries are
|
||
cancelled rather than being queued. You can also configure a timeout period per pool, after which queued queries are
|
||
cancelled, to avoid indefinite waits. If a cluster reaches this state where queries are cancelled due to
|
||
too many concurrent requests or long waits for query execution to begin, that is a signal for an
|
||
administrator to take action, either by provisioning more resources, scheduling work on the cluster to
|
||
smooth out the load, or by doing <xref href="impala_performance.xml#performance">Impala performance
|
||
tuning</xref> to enable higher throughput.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="admission_jdbc_odbc">
|
||
|
||
<title>How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2)</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="JDBC"/>
|
||
<data name="Category" value="ODBC"/>
|
||
<data name="Category" value="HiveServer2"/>
|
||
<data name="Category" value="Concepts"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Most aspects of admission control work transparently with client interfaces such as JDBC and ODBC:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
If a SQL statement is put into a queue rather than running immediately, the API call blocks until the
|
||
statement is dequeued and begins execution. At that point, the client program can request to fetch
|
||
results, which might also block until results become available.
|
||
</li>
|
||
|
||
<li>
|
||
If a SQL statement is cancelled because it has been queued for too long or because it exceeded the memory
|
||
limit during execution, the error is returned to the client program with a descriptive error message.
|
||
</li>
|
||
|
||
</ul>
|
||
|
||
<p rev=""> In Impala 2.0 and higher, you can submit a SQL
|
||
<codeph>SET</codeph> statement from the client application to change
|
||
the <codeph>REQUEST_POOL</codeph> query option. This option lets you
|
||
submit queries to different resource pools, as described in <xref
|
||
href="impala_request_pool.xml#request_pool"/>. </p>
|
||
|
||
<p>
|
||
At any time, the set of queued queries could include queries submitted through multiple different Impala
|
||
daemon hosts. All the queries submitted through a particular host will be executed in order, so a
|
||
<codeph>CREATE TABLE</codeph> followed by an <codeph>INSERT</codeph> on the same table would succeed.
|
||
Queries submitted through different hosts are not guaranteed to be executed in the order they were
|
||
received. Therefore, if you are using load-balancing or other round-robin scheduling where different
|
||
statements are submitted through different hosts, set up all table structures ahead of time so that the
|
||
statements controlled by the queuing system are primarily queries, where order is not significant. Or, if a
|
||
sequence of statements needs to happen in strict order (such as an <codeph>INSERT</codeph> followed by a
|
||
<codeph>SELECT</codeph>), submit all those statements through a single session, while connected to the same
|
||
Impala daemon host.
|
||
</p>
|
||
|
||
<p>
|
||
Admission control has the following limitations or special behavior when used with JDBC or ODBC
|
||
applications:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
The other resource-related query options,
|
||
<codeph>RESERVATION_REQUEST_TIMEOUT</codeph> and <codeph>V_CPU_CORES</codeph>, are no longer used. Those query options only
|
||
applied to using Impala with Llama, which is no longer supported.
|
||
</li>
|
||
</ul>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="admission_schema_config">
|
||
<title>SQL and Schema Considerations for Admission Control</title>
|
||
<conbody>
|
||
<p>
|
||
When queries complete quickly and are tuned for optimal memory usage, there is less chance of
|
||
performance or capacity problems during times of heavy load. Before setting up admission control,
|
||
tune your Impala queries to ensure that the query plans are efficient and the memory estimates
|
||
are accurate. Understanding the nature of your workload, and which queries are the most
|
||
resource-intensive, helps you to plan how to divide the queries into different pools and
|
||
decide what limits to define for each pool.
|
||
</p>
|
||
<p>
|
||
For large tables, especially those involved in join queries, keep their statistics up to date
|
||
after loading substantial amounts of new data or adding new partitions.
|
||
Use the <codeph>COMPUTE STATS</codeph> statement for unpartitioned tables, and
|
||
<codeph>COMPUTE INCREMENTAL STATS</codeph> for partitioned tables.
|
||
</p>
|
||
<p>
|
||
When you use dynamic resource pools with a <uicontrol>Max Memory</uicontrol> setting enabled,
|
||
you typically override the memory estimates that Impala makes based on the statistics from the
|
||
<codeph>COMPUTE STATS</codeph> statement.
|
||
You either set the <codeph>MEM_LIMIT</codeph> query option within a particular session to
|
||
set an upper memory limit for queries within that session, or a default <codeph>MEM_LIMIT</codeph>
|
||
setting for all queries processed by the <cmdname>impalad</cmdname> instance, or
|
||
a default <codeph>MEM_LIMIT</codeph> setting for all queries assigned to a particular
|
||
dynamic resource pool. By designating a consistent memory limit for a set of similar queries
|
||
that use the same resource pool, you avoid unnecessary query queuing or out-of-memory conditions
|
||
that can arise during high-concurrency workloads when memory estimates for some queries are inaccurate.
|
||
</p>
|
||
<p>
|
||
Follow other steps from <xref href="impala_performance.xml#performance"/> to tune your queries.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
<concept id="admission_guidelines">
|
||
<title>Guidelines for Using Admission Control</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Planning"/>
|
||
<data name="Category" value="Guidelines"/>
|
||
<data name="Category" value="Best Practices"/>
|
||
</metadata>
|
||
</prolog>
|
||
<conbody>
|
||
<p> The limits imposed by admission control are de-centrally managed
|
||
<q>soft</q> limits. Each Impala coordinator node makes its own
|
||
decisions about whether to allow queries to run immediately or to queue
|
||
them. These decisions rely on information passed back and forth between
|
||
nodes by the StateStore service. If a sudden surge in requests causes
|
||
more queries than anticipated to run concurrently, then the throughput
|
||
could decrease due to queries spilling to disk or contending for
|
||
resources. Or queries could be cancelled if they exceed the
|
||
<codeph>MEM_LIMIT</codeph> setting while running. </p>
|
||
<p> In <cmdname>impala-shell</cmdname>, you can also specify which
|
||
resource pool to direct queries to by setting the
|
||
<codeph>REQUEST_POOL</codeph> query option. </p>
|
||
<p> To see how admission control works for particular queries, examine the
|
||
profile output or the summary output for the query. <ul>
|
||
<li>Profile<p>The information is available through the
|
||
<codeph>PROFILE</codeph> statement in
|
||
<cmdname>impala-shell</cmdname> immediately after running a
|
||
query in the shell, on the <uicontrol>queries</uicontrol> page of
|
||
the Impala debug web UI, or in the Impala log file (basic
|
||
information at log level 1, more detailed information at log level
|
||
2). </p><p>The profile output contains details about the admission
|
||
decision, such as whether the query was queued or not and which
|
||
resource pool it was assigned to. It also includes the estimated
|
||
and actual memory usage for the query, so you can fine-tune the
|
||
configuration for the memory limits of the resource pools.
|
||
</p></li>
|
||
<li>Summary<p>Starting in <keyword keyref="impala31"/>, the
|
||
information is available in <cmdname>impala-shell</cmdname> when
|
||
the <codeph>LIVE_PROGRESS</codeph> or
|
||
<codeph>LIVE_SUMMARY</codeph> query option is set to
|
||
<codeph>TRUE</codeph>.</p><p>You can also start an
|
||
<codeph>impala-shell</codeph> session with the
|
||
<codeph>--live_progress</codeph> or
|
||
<codeph>--live_summary</codeph> flags to monitor all queries in
|
||
that <codeph>impala-shell</codeph> session.</p><p>The summary
|
||
output includes the queuing status consisting of whether the query
|
||
was queued and what was the latest queuing reason.</p></li>
|
||
</ul></p>
|
||
<p> For details about all the Fair Scheduler configuration settings, see
|
||
<xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in
|
||
particular the tags such as <codeph><queue></codeph> and
|
||
<codeph><aclSubmitApps></codeph> to map users and groups to
|
||
particular resource pools (queues). </p>
|
||
</conbody>
|
||
</concept>
|
||
</concept>
|