impala/docs/topics/impala_intermediate_results_cache.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="intermediate_results_cache">

  <title>Intermediate Results Cache</title>

  <conbody>

    <p>
      In Impala, query execution always starts from scratch, computing
      intermediate results in several stages to produce the final results.
      These intermediate results are discarded at the end of query execution,
      so the computation must be repeated for a new run of the query even
      if none of the underlying data has changed. Caching intermediate results
      can improve the latency for repetitive work while also freeing up
      resources for other queries.
    </p>

    <p>
      The intermediate results cache is enabled via the following configurations:
      <ul>
        <li>
          <codeph>--allow_tuple_caching</codeph> is a startup flag that gates
          the intermediate results caching feature. It must be set to true on coordinators
          and executors to allow the use of the intermediate results cache, but it does
          not enable the cache by itself.
        </li>
        <li>
          The <codeph>--tuple_cache</codeph> startup flag specifies the storage
          directory and quota for the intermediate results cache on coordinators and
          executors. The flag is set to a directory name followed by a <codeph>:</codeph>
          and a capacity for that directory. For example:
          <codeblock>--tuple_cache=/data/cache:20GB</codeblock>
          This setting uses the <codeph>/data/cache</codeph> directory and allows the
          cache to consume up to 20GB in that directory. The directory must exist in the
          local filesystem of each Impala Daemon, or Impala will fail to start.
        </li>
        <li>
          The <codeph>enable_tuple_caching</codeph> query option determines whether a
          query uses the intermediate results cache. To use the feature, this must be
          set to true via the session or <codeph>default_query_options</codeph>.
        </li>
      </ul>
      All three of these settings must be specified to use the intermediate results cache.
      The default value for all three configurations is for the feature to be disabled.
    </p>

    <p>
      The cache key incorporates information about all the settings that can impact the
      query results, including information about the base tables and any query options.
      When any of those settings change, it results in a new cache entry.
      For example, if new data is ingested into a base table, the key will change. This
      means that there is no need for an administrator to manually refresh or invalidate
      the cache entries.
    </p>

    <p>
      When the cache reaches the quota, cache entries are evicted to make space for new
      entries. The cache eviction policy can be specified by the
      <codeph>--tuple_cache_eviction_policy</codeph> startup flag. Currently, the cache
      supports the following cache eviction policies:
      <ul>
        <li>LRU (Least Recently Used--the default)</li>
        <li>LIRS (Least Inter-reference Recency Set)</li>
      </ul>
      LIRS is a scan-resistant, low performance-overhead policy.
    </p>
  </conbody>
</concept>