mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Change-Id: I78bfceb225d25078c54c1ed8f88ca250ef42dafe Reviewed-on: http://gerrit.cloudera.org:8080/14314 Reviewed-by: Sahil Takiar <stakiar@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
132 lines
6.5 KiB
XML
132 lines
6.5 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="data_sink">
|
|
<title>Spooling Impala Query Results</title>
|
|
<conbody>
|
|
<p>In Impala, you can control how query results are materialized and
|
|
returned to clients, e.g. impala-shell, Hue, JDBC apps.</p>
|
|
<ul>
|
|
<li>When query result spooling is disabled, Impala relies on clients to
|
|
fetch results to trigger the generation of more result row batches until
|
|
all the result rows have been produced. If a client issues a query
|
|
without fetching all the results, the query fragments continue to
|
|
consume the resources until the query is cancelled and unregistered,
|
|
potentially tying up resources and causing other queries to wait for an
|
|
extended period of time in admission control.<p>Impala would materialize
|
|
rows on-demand where rows are created only when the client requests
|
|
them.</p></li>
|
|
<li>When query result spooling is enabled, result sets of queries are
|
|
eagerly fetched and spooled in the spooling location, either in memory
|
|
or on disk. <p>Once all result rows have been fetched and stored in the
|
|
spooling location, the resources are freed up. Incoming client fetches
|
|
can get the data from the spooled results.</p></li>
|
|
</ul>
|
|
<p>Result spooling is turned off by default, but can be enabled via the
|
|
<codeph>SPOOL_QUERY_RESULTS</codeph> query option.</p>
|
|
<section id="section_av4_hsy_2jb">
|
|
<title>Admission Control and Result Spooling</title>
|
|
<p>Query results spooling collects and stores query results in memory that
|
|
is controlled by admission control. Use the following query options to
|
|
calibrate how much memory to use and when to spill to disk.<dl>
|
|
<dlentry>
|
|
<dt>MAX_RESULT_SPOOLING_MEM</dt>
|
|
<dd>
|
|
<p>The maximum amount of memory used when spooling query results.
|
|
If this value is exceeded when spooling results, all memory will
|
|
most likely be spilled to disk. Set to 100 MB by default. </p>
|
|
</dd>
|
|
</dlentry>
|
|
<dlentry>
|
|
<dt>MAX_SPILLED_RESULT_SPOOLING_MEM</dt>
|
|
<dd>
|
|
<p>The maximum amount of memory that can be spilled to disk when
|
|
spooling query results. Must be greater than or equal to
|
|
<codeph>MAX_RESULT_SPOOLING_MEM</codeph>. If this value is
|
|
exceeded, the coordinator fragment will block until the client
|
|
has consumed enough rows to free up more memory. Set to 1 GB by
|
|
default.</p>
|
|
</dd>
|
|
</dlentry>
|
|
</dl></p>
|
|
</section>
|
|
<section id="section_oh2_fsy_2jb">
|
|
<title>Fetch Timeout</title>
|
|
<p>Resources for a query are released when the query completes its
|
|
execution. To prevent clients from indefinitely waiting for query
|
|
results, use the <codeph>FETCH_ROWS_TIMEOUT_MS</codeph> query option to
|
|
set the timeout when clients fetch rows. Timeout applies both when query
|
|
result spooling is enabled and disabled:<ul>
|
|
<li>When result spooling is disabled (<codeph>SPOOL_QUERY_RESULTS =
|
|
FALSE</codeph>), the timeout controls how long a client waits for
|
|
a single row batch to be produced by the coordinator. </li>
|
|
<li>When result spooling is enabled ( (<codeph>SPOOL_QUERY_RESULTS =
|
|
TRUE</codeph>), a client can fetch multiple row batches at a time,
|
|
so this timeout controls the total time a client waits for row
|
|
batches to be produced.</li>
|
|
</ul></p>
|
|
</section>
|
|
<section id="section_ahm_bsy_2jb">
|
|
<title>Explain Plans</title>
|
|
<p>Below is the part of the <codeph>EXPLAIN</codeph> plan output for
|
|
result spooling.<codeblock>F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
|
| Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1
|
|
PLAN-ROOT SINK
|
|
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0</codeblock><ul>
|
|
<li>The <codeph>mem-estimate</codeph> for the <codeph>PLAN-ROOT
|
|
SINK</codeph> is an estimate of the amount of memory needed to
|
|
spool all the rows returned by the query.</li>
|
|
<li>The <codeph>mem-reservation</codeph> is the number and size of the
|
|
buffers necessary to spool the query results. By default, the read
|
|
and write buffers are 2 MB in size each, which is why the default is
|
|
4 MB.</li>
|
|
</ul></p>
|
|
</section>
|
|
<section id="section_ovl_ksy_2jb">
|
|
<title>PlanRootSink</title>
|
|
<p dir="ltr">In Impala, the <codeph>PlanRootSink</codeph> class controls
|
|
the passing of batches of rows to the clients and acts as a queue of
|
|
rows to be sent to clients.</p>
|
|
<p>
|
|
<ul>
|
|
<li>
|
|
<p>When result spooling is disabled, a single batch or rows is sent
|
|
to the <codeph>PlanRootSink</codeph>, and then the client must
|
|
consume that batch before another one can be sent.</p>
|
|
</li>
|
|
<li>
|
|
<p>When result spooling is enabled, multiple batches of rows can be
|
|
sent to the <codeph>PlanRootSink</codeph>, and multiple batches
|
|
can be consumed by the client.</p>
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
</section>
|
|
<section>
|
|
<p><b>Related information:</b>
|
|
<xref href="impala_max_result_spooling_mem.xml#MAX_RESULT_SPOOLING_MEM"
|
|
/>, <xref
|
|
href="impala_max_spilled_result_spooling_mem.xml#MAX_SPILLED_RESULT_SPOOLING_MEM"
|
|
/>, <xref href="impala_spool_query_results.xml#SPOOL_QUERY_RESULTS"
|
|
/></p>
|
|
</section>
|
|
</conbody>
|
|
</concept>
|