[DOCS] Clarification on admission control and DDL statements

Removed the confusing example and paragraphs.

Change-Id: I2e3e82bd34e88e7a13de1864aeb97f01023bc715
Reviewed-on: http://gerrit.cloudera.org:8080/10829
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Alex Rodoni
2018-06-26 14:30:38 -07:00
committed by Impala Public Jenkins
parent 83448f1c41
commit 6f52ce10e3

View File

@@ -50,6 +50,11 @@ under the License.
before returning with an error. These queue settings let you ensure that queries do
not wait indefinitely, so that you can detect and correct <q>starvation</q> scenarios.
</p>
<p>
Queries, DML statements, and some DDL statements, including
<codeph>CREATE TABLE AS SELECT</codeph> and <codeph>COMPUTE
STATS</codeph> are affected by admission control.
</p>
<p>
Enable this feature if your cluster is
underutilized at some times and overutilized at others. Overutilization is indicated by performance
@@ -765,38 +770,42 @@ impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph
<!-- End Config -->
<concept id="admission_guidelines">
<title>Guidelines for Using Admission Control</title>
<prolog>
<metadata>
<data name="Category" value="Planning"/>
<data name="Category" value="Guidelines"/>
<data name="Category" value="Best Practices"/>
</metadata>
</prolog>
<conbody>
<p>
To see how admission control works for particular queries, examine the profile output for the query. This
information is available through the <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname>
immediately after running a query in the shell, on the <uicontrol>queries</uicontrol> page of the Impala
debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
level 2). The profile output contains details about the admission decision, such as whether the query was
queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
</p>
<p>
Remember that the limits imposed by admission control are <q>soft</q> limits.
The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
or queries could be cancelled if they exceed the <codeph>MEM_LIMIT</codeph> setting while running.
</p>
<!--
<title>Guidelines for Using Admission Control</title>
<prolog>
<metadata>
<data name="Category" value="Planning"/>
<data name="Category" value="Guidelines"/>
<data name="Category" value="Best Practices"/>
</metadata>
</prolog>
<conbody>
<p>
To see how admission control works for particular queries, examine
the profile output for the query. This information is available
through the <codeph>PROFILE</codeph> statement in
<cmdname>impala-shell</cmdname> immediately after running a query in
the shell, on the <uicontrol>queries</uicontrol> page of the Impala
debug web UI, or in the Impala log file (basic information at log
level 1, more detailed information at log level 2). The profile output
contains details about the admission decision, such as whether the
query was queued or not and which resource pool it was assigned to. It
also includes the estimated and actual memory usage for the query, so
you can fine-tune the configuration for the memory limits of the
resource pools.
</p>
<p>
Remember that the limits imposed by admission control are
<q>soft</q> limits. The decentralized nature of this mechanism means
that each Impala node makes its own decisions about whether to allow
queries to run immediately or to queue them. These decisions rely on
information passed back and forth between nodes by the statestore
service. If a sudden surge in requests causes more queries than
anticipated to run concurrently, then throughput could decrease due to
queries spilling to disk or contending for resources; or queries could
be cancelled if they exceed the <codeph>MEM_LIMIT</codeph> setting
while running.
</p>
<!--
<p>
If you have trouble getting a query to run because its estimated memory usage is too high, you can override
the estimate by setting the <codeph>MEM_LIMIT</codeph> query option in <cmdname>impala-shell</cmdname>,
@@ -806,58 +815,25 @@ impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph
pre-allocated by the query.
</p>
-->
<p>
In <cmdname>impala-shell</cmdname>, you can also specify which resource pool to direct queries to by
setting the <codeph>REQUEST_POOL</codeph> query option.
</p>
<p>
The statements affected by the admission control feature are primarily queries, but also include statements
that write data such as <codeph>INSERT</codeph> and <codeph>CREATE TABLE AS SELECT</codeph>. Most write
operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
memory due to buffering intermediate data before writing out each Parquet data block. See
<xref href="impala_parquet.xml#parquet_etl"/> for instructions about inserting data efficiently into
Parquet tables.
</p>
<p>
Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
are also queued so that they are processed in the correct order:
</p>
<codeblock>-- This query could be queued to avoid out-of-memory at times of heavy load.
select * from huge_table join enormous_table using (id);
-- If so, this subsequent statement in the same session is also queued
-- until the previous statement completes.
drop table huge_table;
</codeblock>
<p>
If you set up different resource pools for different users and groups, consider reusing any classifications
you developed for use with Sentry security. See <xref href="impala_authorization.xml#authorization"/> for details.
</p>
<p>
For details about all the Fair Scheduler configuration settings, see
<xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in particular the tags such as <codeph>&lt;queue&gt;</codeph> and
<codeph>&lt;aclSubmitApps&gt;</codeph> to map users and groups to particular resource pools (queues).
</p>
<!-- Wait a sec. We say admission control doesn't use RESERVATION_REQUEST_TIMEOUT at all.
What's the real story here? Matt did refer to some timeout option that was
available through the shell but not the DB-centric APIs.
<p>
Because you cannot override query options such as
<codeph>RESERVATION_REQUEST_TIMEOUT</codeph>
in a JDBC or ODBC application, consider configuring timeout periods
on the application side to cancel queries that take
too long due to being queued during times of high load.
</p>
-->
</conbody>
</concept>
<p>
In <cmdname>impala-shell</cmdname>, you can also specify which
resource pool to direct queries to by setting the
<codeph>REQUEST_POOL</codeph> query option.
</p>
<p>
If you set up different resource pools for different users and
groups, consider reusing any classifications you developed for use
with Sentry security. See <xref
href="impala_authorization.xml#authorization"/> for details.
</p>
<p>
For details about all the Fair Scheduler configuration settings, see
<xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in
particular the tags such as <codeph>&lt;queue&gt;</codeph> and
<codeph>&lt;aclSubmitApps&gt;</codeph> to map users and groups to
particular resource pools (queues).
</p>
</conbody>
</concept>
</concept>
</concept>