mirror of
https://github.com/apache/impala.git
synced 2026-02-03 09:00:39 -05:00
This adds basic documentation about enabling the intermediate results caching feature. Tests: - Built PDF, asf-site-html, and plain-html Change-Id: I2e08c91a694f1d333bb903b105623fb73efc3a2e Reviewed-on: http://gerrit.cloudera.org:8080/23846 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
88 lines
4.0 KiB
XML
88 lines
4.0 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="intermediate_results_cache">
|
|
|
|
<title>Intermediate Results Cache</title>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
In Impala, query execution always starts from scratch, computing
|
|
intermediate results in several stages to produce the final results.
|
|
These intermediate results are discarded at the end of query execution,
|
|
so the computation must be repeated for a new run of the query even
|
|
if none of the underlying data has changed. Caching intermediate results
|
|
can improve the latency for repetitive work while also freeing up
|
|
resources for other queries.
|
|
</p>
|
|
|
|
<p>
|
|
The intermediate results cache is enabled via the following configurations:
|
|
<ul>
|
|
<li>
|
|
<codeph>--allow_tuple_caching</codeph> is a startup flag that gates
|
|
the intermediate results caching feature. It must be set to true on coordinators
|
|
and executors to allow the use of the intermediate results cache, but it does
|
|
not enable the cache by itself.
|
|
</li>
|
|
<li>
|
|
The <codeph>--tuple_cache</codeph> startup flag specifies the storage
|
|
directory and quota for the intermediate results cache on coordinators and
|
|
executors. The flag is set to a directory name followed by a <codeph>:</codeph>
|
|
and a capacity for that directory. For example:
|
|
<codeblock>--tuple_cache=/data/cache:20GB</codeblock>
|
|
This setting uses the <codeph>/data/cache</codeph> directory and allows the
|
|
cache to consume up to 20GB in that directory. The directory must exist in the
|
|
local filesystem of each Impala Daemon, or Impala will fail to start.
|
|
</li>
|
|
<li>
|
|
The <codeph>enable_tuple_caching</codeph> query option determines whether a
|
|
query uses the intermediate results cache. To use the feature, this must be
|
|
set to true via the session or <codeph>default_query_options</codeph>.
|
|
</li>
|
|
</ul>
|
|
All three of these settings must be specified to use the intermediate results cache.
|
|
The default value for all three configurations is for the feature to be disabled.
|
|
</p>
|
|
|
|
<p>
|
|
The cache key incorporates information about all the settings that can impact the
|
|
query results, including information about the base tables and any query options.
|
|
When any of those settings change, it results in a new cache entry.
|
|
For example, if new data is ingested into a base table, the key will change. This
|
|
means that there is no need for an administrator to manually refresh or invalidate
|
|
the cache entries.
|
|
</p>
|
|
|
|
<p>
|
|
When the cache reaches the quota, cache entries are evicted to make space for new
|
|
entries. The cache eviction policy can be specified by the
|
|
<codeph>--tuple_cache_eviction_policy</codeph> startup flag. Currently, the cache
|
|
supports the following cache eviction policies:
|
|
<ul>
|
|
<li>LRU (Least Recently Used--the default)</li>
|
|
<li>LIRS (Least Inter-reference Recency Set)</li>
|
|
</ul>
|
|
LIRS is a scan-resistant, low performance-overhead policy.
|
|
</p>
|
|
</conbody>
|
|
</concept>
|