mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
[DOCS] Clarification on admission control and DDL statements
Removed the confusing example and paragraphs. Change-Id: I2e3e82bd34e88e7a13de1864aeb97f01023bc715 Reviewed-on: http://gerrit.cloudera.org:8080/10829 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
committed by
Impala Public Jenkins
parent
83448f1c41
commit
6f52ce10e3
@@ -50,6 +50,11 @@ under the License.
|
||||
before returning with an error. These queue settings let you ensure that queries do
|
||||
not wait indefinitely, so that you can detect and correct <q>starvation</q> scenarios.
|
||||
</p>
|
||||
<p>
|
||||
Queries, DML statements, and some DDL statements, including
|
||||
<codeph>CREATE TABLE AS SELECT</codeph> and <codeph>COMPUTE
|
||||
STATS</codeph> are affected by admission control.
|
||||
</p>
|
||||
<p>
|
||||
Enable this feature if your cluster is
|
||||
underutilized at some times and overutilized at others. Overutilization is indicated by performance
|
||||
@@ -765,38 +770,42 @@ impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph
|
||||
<!-- End Config -->
|
||||
|
||||
<concept id="admission_guidelines">
|
||||
|
||||
<title>Guidelines for Using Admission Control</title>
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Planning"/>
|
||||
<data name="Category" value="Guidelines"/>
|
||||
<data name="Category" value="Best Practices"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
To see how admission control works for particular queries, examine the profile output for the query. This
|
||||
information is available through the <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname>
|
||||
immediately after running a query in the shell, on the <uicontrol>queries</uicontrol> page of the Impala
|
||||
debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
|
||||
level 2). The profile output contains details about the admission decision, such as whether the query was
|
||||
queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
|
||||
usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Remember that the limits imposed by admission control are <q>soft</q> limits.
|
||||
The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
|
||||
to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
|
||||
between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
|
||||
concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
|
||||
or queries could be cancelled if they exceed the <codeph>MEM_LIMIT</codeph> setting while running.
|
||||
</p>
|
||||
|
||||
<!--
|
||||
<title>Guidelines for Using Admission Control</title>
|
||||
<prolog>
|
||||
<metadata>
|
||||
<data name="Category" value="Planning"/>
|
||||
<data name="Category" value="Guidelines"/>
|
||||
<data name="Category" value="Best Practices"/>
|
||||
</metadata>
|
||||
</prolog>
|
||||
<conbody>
|
||||
<p>
|
||||
To see how admission control works for particular queries, examine
|
||||
the profile output for the query. This information is available
|
||||
through the <codeph>PROFILE</codeph> statement in
|
||||
<cmdname>impala-shell</cmdname> immediately after running a query in
|
||||
the shell, on the <uicontrol>queries</uicontrol> page of the Impala
|
||||
debug web UI, or in the Impala log file (basic information at log
|
||||
level 1, more detailed information at log level 2). The profile output
|
||||
contains details about the admission decision, such as whether the
|
||||
query was queued or not and which resource pool it was assigned to. It
|
||||
also includes the estimated and actual memory usage for the query, so
|
||||
you can fine-tune the configuration for the memory limits of the
|
||||
resource pools.
|
||||
</p>
|
||||
<p>
|
||||
Remember that the limits imposed by admission control are
|
||||
<q>soft</q> limits. The decentralized nature of this mechanism means
|
||||
that each Impala node makes its own decisions about whether to allow
|
||||
queries to run immediately or to queue them. These decisions rely on
|
||||
information passed back and forth between nodes by the statestore
|
||||
service. If a sudden surge in requests causes more queries than
|
||||
anticipated to run concurrently, then throughput could decrease due to
|
||||
queries spilling to disk or contending for resources; or queries could
|
||||
be cancelled if they exceed the <codeph>MEM_LIMIT</codeph> setting
|
||||
while running.
|
||||
</p>
|
||||
<!--
|
||||
<p>
|
||||
If you have trouble getting a query to run because its estimated memory usage is too high, you can override
|
||||
the estimate by setting the <codeph>MEM_LIMIT</codeph> query option in <cmdname>impala-shell</cmdname>,
|
||||
@@ -806,58 +815,25 @@ impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph
|
||||
pre-allocated by the query.
|
||||
</p>
|
||||
-->
|
||||
|
||||
<p>
|
||||
In <cmdname>impala-shell</cmdname>, you can also specify which resource pool to direct queries to by
|
||||
setting the <codeph>REQUEST_POOL</codeph> query option.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The statements affected by the admission control feature are primarily queries, but also include statements
|
||||
that write data such as <codeph>INSERT</codeph> and <codeph>CREATE TABLE AS SELECT</codeph>. Most write
|
||||
operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
|
||||
memory due to buffering intermediate data before writing out each Parquet data block. See
|
||||
<xref href="impala_parquet.xml#parquet_etl"/> for instructions about inserting data efficiently into
|
||||
Parquet tables.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
|
||||
is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
|
||||
are also queued so that they are processed in the correct order:
|
||||
</p>
|
||||
|
||||
<codeblock>-- This query could be queued to avoid out-of-memory at times of heavy load.
|
||||
select * from huge_table join enormous_table using (id);
|
||||
-- If so, this subsequent statement in the same session is also queued
|
||||
-- until the previous statement completes.
|
||||
drop table huge_table;
|
||||
</codeblock>
|
||||
|
||||
<p>
|
||||
If you set up different resource pools for different users and groups, consider reusing any classifications
|
||||
you developed for use with Sentry security. See <xref href="impala_authorization.xml#authorization"/> for details.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
For details about all the Fair Scheduler configuration settings, see
|
||||
<xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in particular the tags such as <codeph><queue></codeph> and
|
||||
<codeph><aclSubmitApps></codeph> to map users and groups to particular resource pools (queues).
|
||||
</p>
|
||||
|
||||
<!-- Wait a sec. We say admission control doesn't use RESERVATION_REQUEST_TIMEOUT at all.
|
||||
What's the real story here? Matt did refer to some timeout option that was
|
||||
available through the shell but not the DB-centric APIs.
|
||||
<p>
|
||||
Because you cannot override query options such as
|
||||
<codeph>RESERVATION_REQUEST_TIMEOUT</codeph>
|
||||
in a JDBC or ODBC application, consider configuring timeout periods
|
||||
on the application side to cancel queries that take
|
||||
too long due to being queued during times of high load.
|
||||
</p>
|
||||
-->
|
||||
</conbody>
|
||||
</concept>
|
||||
<p>
|
||||
In <cmdname>impala-shell</cmdname>, you can also specify which
|
||||
resource pool to direct queries to by setting the
|
||||
<codeph>REQUEST_POOL</codeph> query option.
|
||||
</p>
|
||||
<p>
|
||||
If you set up different resource pools for different users and
|
||||
groups, consider reusing any classifications you developed for use
|
||||
with Sentry security. See <xref
|
||||
href="impala_authorization.xml#authorization"/> for details.
|
||||
</p>
|
||||
<p>
|
||||
For details about all the Fair Scheduler configuration settings, see
|
||||
<xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in
|
||||
particular the tags such as <codeph><queue></codeph> and
|
||||
<codeph><aclSubmitApps></codeph> to map users and groups to
|
||||
particular resource pools (queues).
|
||||
</p>
|
||||
</conbody>
|
||||
</concept>
|
||||
</concept>
|
||||
</concept>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user