mirror of
https://github.com/apache/impala.git
synced 2026-01-04 09:00:56 -05:00
In particular, the new query options: BUFFER_POOL_LIMIT MAX_ROW_SIZE MIN_SPILLABLE_BUFFER_SIZE DEFAULT_SPILLABLE_BUFFER_SIZE Change-Id: I49323f8ffbff3e195058e88762eedbb1fcb1bc0e Reviewed-on: http://gerrit.cloudera.org:8080/8003 Tested-by: Impala Public Jenkins Reviewed-by: John Russell <jrussell@cloudera.com>
1087 lines
51 KiB
XML
1087 lines
51 KiB
XML
<?xml version="1.0" encoding="UTF-8"?><!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept id="scalability">
|
||
|
||
<title>Scalability Considerations for Impala</title>
|
||
<titlealts audience="PDF"><navtitle>Scalability Considerations</navtitle></titlealts>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Performance"/>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Planning"/>
|
||
<data name="Category" value="Querying"/>
|
||
<data name="Category" value="Developers"/>
|
||
<data name="Category" value="Memory"/>
|
||
<data name="Category" value="Scalability"/>
|
||
<!-- Using domain knowledge about Impala, sizing, etc. to decide what to mark as 'Proof of Concept'. -->
|
||
<data name="Category" value="Proof of Concept"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This section explains how the size of your cluster and the volume of data influences SQL performance and
|
||
schema design for Impala tables. Typically, adding more cluster capacity reduces problems due to memory
|
||
limits or disk throughput. On the other hand, larger clusters are more likely to have other kinds of
|
||
scalability issues, such as a single slow node that causes performance problems for queries.
|
||
</p>
|
||
|
||
<p outputclass="toc inpage"/>
|
||
|
||
<p conref="../shared/impala_common.xml#common/cookbook_blurb"/>
|
||
|
||
</conbody>
|
||
|
||
<concept audience="hidden" id="scalability_memory">
|
||
|
||
<title>Overview and Guidelines for Impala Memory Usage</title>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Memory"/>
|
||
<data name="Category" value="Concepts"/>
|
||
<data name="Category" value="Best Practices"/>
|
||
<data name="Category" value="Guidelines"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<!--
|
||
Outline adapted from Alan Choi's "best practices" and/or "performance cookbook" papers.
|
||
-->
|
||
|
||
<codeblock>Memory Usage – the Basics
|
||
* Memory is used by:
|
||
* Hash join – RHS tables after decompression, filtering and projection
|
||
* Group by – proportional to the #groups
|
||
* Parquet writer buffer – 1GB per partition
|
||
* IO buffer (shared across queries)
|
||
* Metadata cache (no more than 1GB typically)
|
||
* Memory held and reused by later query
|
||
* Impala releases memory from time to time starting in 1.4.
|
||
|
||
Memory Usage – Estimating Memory Usage
|
||
* Use Explain Plan
|
||
* Requires statistics! Mem estimate without stats is meaningless.
|
||
* Reports per-host memory requirement for this cluster size.
|
||
* Re-run if you’ve re-sized the cluster!
|
||
[image of explain plan]
|
||
|
||
Memory Usage – Estimating Memory Usage
|
||
* EXPLAIN’s memory estimate issues
|
||
* Can be way off – much higher or much lower.
|
||
* group by’s estimate can be particularly off – when there’s a large number of group by columns.
|
||
* Mem estimate = NDV of group by column 1 * NDV of group by column 2 * ... NDV of group by column n
|
||
* Ignore EXPLAIN’s estimate if it’s too high! • Do your own estimate for group by
|
||
* GROUP BY mem usage = (total number of groups * size of each row) + (total number of groups * size of each row) / num node
|
||
|
||
Memory Usage – Finding Actual Memory Usage
|
||
* Search for “Per Node Peak Memory Usage” in the profile.
|
||
This is accurate. Use it for production capacity planning.
|
||
|
||
Memory Usage – Actual Memory Usage
|
||
* For complex queries, how do I know which part of my query is using too much memory?
|
||
* Use the ExecSummary from the query profile!
|
||
- But is that "Peak Mem" number aggregate or per-node?
|
||
[image of executive summary]
|
||
|
||
Memory Usage – Hitting Mem-limit
|
||
* Top causes (in order) of hitting mem-limit even when running a single query:
|
||
1. Lack of statistics
|
||
2. Lots of joins within a single query
|
||
3. Big-table joining big-table
|
||
4. Gigantic group by
|
||
|
||
Memory Usage – Hitting Mem-limit
|
||
Lack of stats
|
||
* Wrong join order, wrong join strategy, wrong insert strategy
|
||
* Explain Plan tells you that!
|
||
[image of explain plan]
|
||
* Fix: Compute Stats table
|
||
|
||
Memory Usage – Hitting Mem-limit
|
||
Lots of joins within a single query
|
||
* select...from fact, dim1, dim2,dim3,...dimN where ...
|
||
* Each dim tbl can fit in memory, but not all of them together
|
||
* As of Impala 1.4, Impala might choose the wrong plan – BROADCAST
|
||
FIX 1: use shuffle hint
|
||
select ... from fact join [shuffle] dim1 on ... join dim2 [shuffle] ...
|
||
FIX 2: pre-join the dim tables (if possible)
|
||
- How about an example to illustrate that technique?
|
||
* few join=>better perf!
|
||
|
||
Memory Usage: Hitting Mem-limit
|
||
Big-table joining big-table
|
||
* Big-table (after decompression, filtering, and projection) is a table that is bigger than total cluster memory size.
|
||
* Impala 2.0 will do this (via disk-based join). Consider using Hive for now.
|
||
* (Advanced) For a simple query, you can try this advanced workaround – per-partition join
|
||
* Requires the partition key be part of the join key
|
||
select ... from BigTbl_A a join BigTbl_B b where a.part_key = b.part_key and a.part_key in (1,2,3)
|
||
union all
|
||
select ... from BigTbl_A a join BigTbl_B b where a.part_key = b.part_key and a.part_key in (4,5,6)
|
||
|
||
Memory Usage: Hitting Mem-limit
|
||
Gigantic group by
|
||
* The total number of distinct groups is huge, such as group by userid.
|
||
* Impala 2.0 will do this (via disk-based agg). Consider using Hive for now.
|
||
- Is this one of the cases where people were unhappy we recommended Hive?
|
||
* (Advanced) For a simple query, you can try this advanced workaround – per-partition agg
|
||
* Requires the partition key be part of the group by
|
||
select part_key, col1, col2, ...agg(..) from tbl where
|
||
part_key in (1,2,3)
|
||
Union all
|
||
Select part_key, col1, col2, ...agg(..) from tbl where
|
||
part_key in (4,5,6)
|
||
- But where's the GROUP BY in the preceding query? Need a real example.
|
||
|
||
Memory Usage: Additional Notes
|
||
* Use explain plan for estimate; use profile for accurate measure
|
||
* Data skew can use uneven memory usage
|
||
* Review previous common issues on out-of-memory
|
||
* Note: Even with disk-based joins, you'll want to review these steps to speed up queries and use memory more efficiently
|
||
</codeblock>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="scalability_catalog">
|
||
|
||
<title>Impact of Many Tables or Partitions on Impala Catalog Performance and Memory Usage</title>
|
||
|
||
<conbody>
|
||
|
||
<p audience="hidden">
|
||
Details to fill in in future: Impact of <q>load catalog in background</q> option.
|
||
Changing timeouts.
|
||
</p>
|
||
|
||
<p>
|
||
Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized for tables
|
||
containing relatively few, large data files. Schemas containing thousands of tables, or tables containing
|
||
thousands of partitions, can encounter performance issues during startup or during DDL operations such as
|
||
<codeph>ALTER TABLE</codeph> statements.
|
||
</p>
|
||
|
||
<note type="important" rev="TSB-168">
|
||
<p>
|
||
Because of a change in the default heap size for the <cmdname>catalogd</cmdname> daemon in
|
||
<keyword keyref="impala25_full"/> and higher, the following procedure to increase the <cmdname>catalogd</cmdname>
|
||
memory limit might be required following an upgrade to <keyword keyref="impala25_full"/> even if not
|
||
needed previously.
|
||
</p>
|
||
</note>
|
||
|
||
<p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept rev="2.1.0" id="statestore_scalability">
|
||
|
||
<title>Scalability Considerations for the Impala Statestore</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Before <keyword keyref="impala21_full"/>, the statestore sent only one kind of message to its subscribers. This message contained all
|
||
updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the
|
||
statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a
|
||
subscriber to decide whether or not the subscriber had failed.
|
||
</p>
|
||
|
||
<p>
|
||
Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large
|
||
numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata
|
||
updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time
|
||
out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of
|
||
statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or
|
||
restarted.
|
||
</p>
|
||
|
||
<p>
|
||
As of <keyword keyref="impala21_full"/>, the statestore now sends topic updates and heartbeats in separate messages. This allows the
|
||
statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to
|
||
send topic updates according to a fixed schedule, reducing statestore network overhead.
|
||
</p>
|
||
|
||
<p>
|
||
The statestore now has the following relevant configuration flags for the <cmdname>statestored</cmdname>
|
||
daemon:
|
||
</p>
|
||
|
||
<dl>
|
||
<dlentry id="statestore_num_update_threads">
|
||
|
||
<dt>
|
||
<codeph>-statestore_num_update_threads</codeph>
|
||
</dt>
|
||
|
||
<dd>
|
||
The number of threads inside the statestore dedicated to sending topic updates. You should not
|
||
typically need to change this value.
|
||
<p>
|
||
<b>Default:</b> 10
|
||
</p>
|
||
</dd>
|
||
|
||
</dlentry>
|
||
|
||
<dlentry id="statestore_update_frequency_ms">
|
||
|
||
<dt>
|
||
<codeph>-statestore_update_frequency_ms</codeph>
|
||
</dt>
|
||
|
||
<dd>
|
||
The frequency, in milliseconds, with which the statestore tries to send topic updates to each
|
||
subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends
|
||
topic updates as fast as it can. You should not typically need to change this value.
|
||
<p>
|
||
<b>Default:</b> 2000
|
||
</p>
|
||
</dd>
|
||
|
||
</dlentry>
|
||
|
||
<dlentry id="statestore_num_heartbeat_threads">
|
||
|
||
<dt>
|
||
<codeph>-statestore_num_heartbeat_threads</codeph>
|
||
</dt>
|
||
|
||
<dd>
|
||
The number of threads inside the statestore dedicated to sending heartbeats. You should not typically
|
||
need to change this value.
|
||
<p>
|
||
<b>Default:</b> 10
|
||
</p>
|
||
</dd>
|
||
|
||
</dlentry>
|
||
|
||
<dlentry id="statestore_heartbeat_frequency_ms">
|
||
|
||
<dt>
|
||
<codeph>-statestore_heartbeat_frequency_ms</codeph>
|
||
</dt>
|
||
|
||
<dd>
|
||
The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber.
|
||
This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that,
|
||
you might need to increase this value to make the interval longer between heartbeat messages.
|
||
<p>
|
||
<b>Default:</b> 1000 (one heartbeat message every second)
|
||
</p>
|
||
</dd>
|
||
|
||
</dlentry>
|
||
</dl>
|
||
|
||
<p>
|
||
If it takes a very long time for a cluster to start up, and <cmdname>impala-shell</cmdname> consistently
|
||
displays <codeph>This Impala daemon is not ready to accept user requests</codeph>, the statestore might be
|
||
taking too long to send the entire catalog topic to the cluster. In this case, consider adding
|
||
<codeph>--load_catalog_in_background=false</codeph> to your catalog service configuration. This setting
|
||
stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for
|
||
each table is loaded when the table is accessed for the first time.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="scalability_coordinator" rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503">
|
||
|
||
<title>Controlling which Hosts are Coordinators and Executors</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
By default, each host in the cluster that runs the <cmdname>impalad</cmdname>
|
||
daemon can act as the coordinator for an Impala query, execute the fragments
|
||
of the execution plan for the query, or both. During highly concurrent
|
||
workloads for large-scale queries, especially on large clusters, the dual
|
||
roles can cause scalability issues:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
The extra work required for a host to act as the coordinator could interfere
|
||
with its capacity to perform other work for the earlier phases of the query.
|
||
For example, the coordinator can experience significant network and CPU overhead
|
||
during queries containing a large number of query fragments. Each coordinator
|
||
caches metadata for all table partitions and data files, which can be substantial
|
||
and contend with memory needed to process joins, aggregations, and other operations
|
||
performed by query executors.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
Having a large number of hosts act as coordinators can cause unnecessary network
|
||
overhead, or even timeout errors, as each of those hosts communicates with the
|
||
<cmdname>statestored</cmdname> daemon for metadata updates.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
The <q>soft limits</q> imposed by the admission control feature are more likely
|
||
to be exceeded when there are a large number of heavily loaded hosts acting as
|
||
coordinators.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
If such scalability bottlenecks occur, you can explicitly specify that certain
|
||
hosts act as query coordinators, but not executors for query fragments.
|
||
These hosts do not participate in I/O-intensive operations such as scans,
|
||
and CPU-intensive operations such as aggregations.
|
||
</p>
|
||
|
||
<p>
|
||
Then, you specify that the
|
||
other hosts act as executors but not coordinators. These hosts do not communicate
|
||
with the <cmdname>statestored</cmdname> daemon or process the final result sets
|
||
from queries. You cannot connect to these hosts through clients such as
|
||
<cmdname>impala-shell</cmdname> or business intelligence tools.
|
||
</p>
|
||
|
||
<p>
|
||
This feature is available in <keyword keyref="impala29_full"/> and higher.
|
||
</p>
|
||
|
||
<p>
|
||
To use this feature, you specify one of the following startup flags for the
|
||
<cmdname>impalad</cmdname> daemon on each host:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
<codeph>is_executor=false</codeph> for each host that
|
||
does not act as an executor for Impala queries.
|
||
These hosts act exclusively as query coordinators.
|
||
This setting typically applies to a relatively small number of
|
||
hosts, because the most common topology is to have nearly all
|
||
DataNodes doing work for query execution.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
<codeph>is_coordinator=false</codeph> for each host that
|
||
does not act as a coordinator for Impala queries.
|
||
These hosts act exclusively as executors.
|
||
The number of hosts with this setting typically increases
|
||
as the cluster grows larger and handles more table partitions,
|
||
data files, and concurrent queries. As the overhead for query
|
||
coordination increases, it becomes more important to centralize
|
||
that work on dedicated hosts.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
By default, both of these settings are enabled for each <codeph>impalad</codeph>
|
||
instance, allowing all such hosts to act as both executors and coordinators.
|
||
</p>
|
||
|
||
<p>
|
||
For example, on a 100-node cluster, you might specify <codeph>is_executor=false</codeph>
|
||
for 10 hosts, to dedicate those hosts as query coordinators. Then specify
|
||
<codeph>is_coordinator=false</codeph> for the remaining 90 hosts. All explicit or
|
||
load-balanced connections must go to the 10 hosts acting as coordinators. These hosts
|
||
perform the network communication to keep metadata up-to-date and route query results
|
||
to the appropriate clients. The remaining 90 hosts perform the intensive I/O, CPU, and
|
||
memory operations that make up the bulk of the work for each query. If a bottleneck or
|
||
other performance issue arises on a specific host, you can narrow down the cause more
|
||
easily because each host is dedicated to specific operations within the overall
|
||
Impala workload.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="scalability_buffer_pool" rev="2.10.0 IMPALA-3200">
|
||
<title>Effect of Buffer Pool on Memory Usage (<keyword keyref="impala210"/> and higher)</title>
|
||
<conbody>
|
||
<p>
|
||
The buffer pool feature, available in <keyword keyref="impala210"/> and higher, changes the
|
||
way Impala allocates memory during a query. Most of the memory needed is reserved at the
|
||
beginning of the query, avoiding cases where a query might run for a long time before failing
|
||
with an out-of-memory error. The actual memory estimates and memory buffers are typically
|
||
smaller than before, so that more queries can run concurrently or process larger volumes
|
||
of data than previously.
|
||
</p>
|
||
<p>
|
||
The buffer pool feature includes some query options that you can fine-tune:
|
||
<xref keyref="buffer_pool_limit"/>,
|
||
<xref keyref="default_spillable_buffer_size"/>,
|
||
<xref keyref="max_row_size"/>, and
|
||
<xref keyref="min_spillable_buffer_size"/>.
|
||
</p>
|
||
<p>
|
||
Most of the effects of the buffer pool are transparent to you as an Impala user.
|
||
Memory use during spilling is now steadier and more predictable, instead of
|
||
increasing rapidly as more data is spilled to disk. The main change from a user
|
||
perspective is the need to increase the <codeph>MAX_ROW_SIZE</codeph> query option
|
||
setting when querying tables with columns containing long strings, many columns,
|
||
or other combinations of factors that produce very large rows. If Impala encounters
|
||
rows that are too large to process with the default query option settings, the query
|
||
fails with an error message suggesting to increase the <codeph>MAX_ROW_SIZE</codeph>
|
||
setting.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept audience="hidden" id="scalability_cluster_size">
|
||
|
||
<title>Scalability Considerations for Impala Cluster Size and Topology</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept audience="hidden" id="concurrent_connections">
|
||
|
||
<title>Scaling the Number of Concurrent Connections</title>
|
||
|
||
<conbody>
|
||
|
||
<p></p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept rev="2.0.0" id="spill_to_disk">
|
||
|
||
<title>SQL Operations that Spill to Disk</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Certain memory-intensive operations write temporary data to disk (known as <term>spilling</term> to disk)
|
||
when Impala is close to exceeding its memory limit on a particular host.
|
||
</p>
|
||
|
||
<p>
|
||
The result is a query that completes successfully, rather than failing with an out-of-memory error. The
|
||
tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back
|
||
in. The slowdown could be potentially be significant. Thus, while this feature improves reliability,
|
||
you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
|
||
</p>
|
||
|
||
<note rev="2.10.0 IMPALA-3200">
|
||
<p>
|
||
In <keyword keyref="impala210"/> and higher, also see <xref keyref="scalability_buffer_pool"/> for
|
||
changes to Impala memory allocation that might change the details of which queries spill to disk,
|
||
and how much memory and disk space is involved in the spilling operation.
|
||
</p>
|
||
</note>
|
||
|
||
<p>
|
||
<b>What kinds of queries might spill to disk:</b>
|
||
</p>
|
||
|
||
<p>
|
||
Several SQL clauses and constructs require memory allocations that could activat the spilling mechanism:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
when a query uses a <codeph>GROUP BY</codeph> clause for columns
|
||
with millions or billions of distinct values, Impala keeps a
|
||
similar number of temporary results in memory, to accumulate the
|
||
aggregate results for each value in the group.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
When large tables are joined together, Impala keeps the values of
|
||
the join columns from one table in memory, to compare them to
|
||
incoming values from the other table.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
When a large result set is sorted by the <codeph>ORDER BY</codeph>
|
||
clause, each node sorts its portion of the result set in memory.
|
||
</p>
|
||
</li>
|
||
<li>
|
||
<p>
|
||
The <codeph>DISTINCT</codeph> and <codeph>UNION</codeph> operators
|
||
build in-memory data structures to represent all values found so
|
||
far, to eliminate duplicates as the query progresses.
|
||
</p>
|
||
</li>
|
||
<!-- JIRA still in open state as of 5.8 / 2.6, commenting out.
|
||
<li>
|
||
<p rev="IMPALA-3471">
|
||
In <keyword keyref="impala26_full"/> and higher, <term>top-N</term> queries (those with
|
||
<codeph>ORDER BY</codeph> and <codeph>LIMIT</codeph> clauses) can also spill.
|
||
Impala allocates enough memory to hold as many rows as specified by the <codeph>LIMIT</codeph>
|
||
clause, plus enough memory to hold as many rows as specified by any <codeph>OFFSET</codeph> clause.
|
||
</p>
|
||
</li>
|
||
-->
|
||
</ul>
|
||
|
||
<p conref="../shared/impala_common.xml#common/spill_to_disk_vs_dynamic_partition_pruning"/>
|
||
|
||
<p>
|
||
<b>How Impala handles scratch disk space for spilling:</b>
|
||
</p>
|
||
|
||
<p rev="obwl" conref="../shared/impala_common.xml#common/order_by_scratch_dir"/>
|
||
|
||
<p>
|
||
<b>Memory usage for SQL operators:</b>
|
||
</p>
|
||
|
||
<p rev="2.10.0 IMPALA-3200">
|
||
In <keyword keyref="impala210_full"/> and higher, the way SQL operators such as <codeph>GROUP BY</codeph>,
|
||
<codeph>DISTINCT</codeph>, and joins, transition between using additional memory or activating the
|
||
spill-to-disk feature is changed. The memory required to spill to disk is reserved up front, and you can
|
||
examine it in the <codeph>EXPLAIN</codeph> plan when the <codeph>EXPLAIN_LEVEL</codeph> query option is
|
||
set to 2 or higher.
|
||
</p>
|
||
|
||
<p>
|
||
The infrastructure of the spilling feature affects the way the affected SQL operators, such as
|
||
<codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, use memory.
|
||
On each host that participates in the query, each such operator in a query requires memory
|
||
to store rows of data and other data structures. Impala reserves a certain amount of memory
|
||
up front for each operator that supports spill-to-disk that is sufficient to execute the
|
||
operator. If an operator accumulates more data than can fit in the reserved memory, it
|
||
can either reserve more memory to continue processing data in memory or start spilling
|
||
data to temporary scratch files on disk. Thus, operators with spill-to-disk support
|
||
can adapt to different memory constraints by using however much memory is available
|
||
to speed up execution, yet tolerate low memory conditions by spilling data to disk.
|
||
</p>
|
||
|
||
<p>
|
||
The amount data depends on the portion of the data being handled by that host, and thus
|
||
the operator may end up consuming different amounts of memory on different hosts.
|
||
</p>
|
||
|
||
<!--
|
||
<p>
|
||
The infrastructure of the spilling feature affects the way the affected SQL operators, such as
|
||
<codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, use memory.
|
||
On each host that participates in the query, each such operator in a query accumulates memory
|
||
while building the data structure to process the aggregation or join operation. The amount
|
||
of memory used depends on the portion of the data being handled by that host, and thus might
|
||
be different from one host to another. When the amount of memory being used for the operator
|
||
on a particular host reaches a threshold amount, Impala reserves an additional memory buffer
|
||
to use as a work area in case that operator causes the query to exceed the memory limit for
|
||
that host. After allocating the memory buffer, the memory used by that operator remains
|
||
essentially stable or grows only slowly, until the point where the memory limit is reached
|
||
and the query begins writing temporary data to disk.
|
||
</p>
|
||
|
||
<p rev="2.2.0">
|
||
Prior to Impala 2.2, the extra memory buffer for an operator that might spill to disk
|
||
was allocated when the data structure used by the applicable SQL operator reaches 16 MB in size,
|
||
and the memory buffer itself was 512 MB. In Impala 2.2, these values are halved: the threshold value
|
||
is 8 MB and the memory buffer is 256 MB. <ph rev="2.3.0">In <keyword keyref="impala23_full"/> and higher, the memory for the buffer
|
||
is allocated in pieces, only as needed, to avoid sudden large jumps in memory usage.</ph> A query that uses
|
||
multiple such operators might allocate multiple such memory buffers, as the size of the data structure
|
||
for each operator crosses the threshold on a particular host.
|
||
</p>
|
||
|
||
<p>
|
||
Therefore, a query that processes a relatively small amount of data on each host would likely
|
||
never reach the threshold for any operator, and would never allocate any extra memory buffers. A query
|
||
that did process millions of groups, distinct values, join keys, and so on might cross the threshold,
|
||
causing its memory requirement to rise suddenly and then flatten out. The larger the cluster, less data is processed
|
||
on any particular host, thus reducing the chance of requiring the extra memory allocation.
|
||
</p>
|
||
-->
|
||
|
||
<p>
|
||
<b>Added in:</b> This feature was added to the <codeph>ORDER BY</codeph> clause in Impala 1.4.
|
||
This feature was extended to cover join queries, aggregation functions, and analytic
|
||
functions in Impala 2.0. The size of the memory work area required by
|
||
each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
|
||
<ph rev="2.10.0 IMPALA-3200">The spilling mechanism was reworked to take advantage of the
|
||
Impala buffer pool feature and be more predictable and stable in <keyword keyref="impala210_full"/>.</ph>
|
||
</p>
|
||
|
||
<p>
|
||
<b>Avoiding queries that spill to disk:</b>
|
||
</p>
|
||
|
||
<p>
|
||
Because the extra I/O can impose significant performance overhead on these types of queries, try to avoid
|
||
this situation by using the following steps:
|
||
</p>
|
||
|
||
<ol>
|
||
<li>
|
||
Detect how often queries spill to disk, and how much temporary data is written. Refer to the following
|
||
sources:
|
||
<ul>
|
||
<li>
|
||
The output of the <codeph>PROFILE</codeph> command in the <cmdname>impala-shell</cmdname>
|
||
interpreter. This data shows the memory usage for each host and in total across the cluster. The
|
||
<codeph>WriteIoBytes</codeph> counter reports how much data was written to disk for each operator
|
||
during the query. (In <keyword keyref="impala29_full"/>, the counter was named
|
||
<codeph>ScratchBytesWritten</codeph>; in <keyword keyref="impala28_full"/> and earlier, it was named
|
||
<codeph>BytesWritten</codeph>.)
|
||
</li>
|
||
|
||
<li>
|
||
The <uicontrol>Queries</uicontrol> tab in the Impala debug web user interface. Select the query to
|
||
examine and click the corresponding <uicontrol>Profile</uicontrol> link. This data breaks down the
|
||
memory usage for a single host within the cluster, the host whose web interface you are connected to.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
Use one or more techniques to reduce the possibility of the queries spilling to disk:
|
||
<ul>
|
||
<li>
|
||
Increase the Impala memory limit if practical, for example, if you can increase the available memory
|
||
by more than the amount of temporary data written to disk on a particular node. Remember that in
|
||
Impala 2.0 and later, you can issue <codeph>SET MEM_LIMIT</codeph> as a SQL statement, which lets you
|
||
fine-tune the memory usage for queries from JDBC and ODBC applications.
|
||
</li>
|
||
|
||
<li>
|
||
Increase the number of nodes in the cluster, to increase the aggregate memory available to Impala and
|
||
reduce the amount of memory required on each node.
|
||
</li>
|
||
|
||
<li>
|
||
Increase the overall memory capacity of each DataNode at the hardware level.
|
||
</li>
|
||
|
||
<li>
|
||
On a cluster with resources shared between Impala and other Hadoop components, use resource
|
||
management features to allocate more memory for Impala. See
|
||
<xref href="impala_resource_management.xml#resource_management"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
If the memory pressure is due to running many concurrent queries rather than a few memory-intensive
|
||
ones, consider using the Impala admission control feature to lower the limit on the number of
|
||
concurrent queries. By spacing out the most resource-intensive queries, you can avoid spikes in
|
||
memory usage and improve overall response times. See
|
||
<xref href="impala_admission.xml#admission_control"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
Tune the queries with the highest memory requirements, using one or more of the following techniques:
|
||
<ul>
|
||
<li>
|
||
Run the <codeph>COMPUTE STATS</codeph> statement for all tables involved in large-scale joins and
|
||
aggregation queries.
|
||
</li>
|
||
|
||
<li>
|
||
Minimize your use of <codeph>STRING</codeph> columns in join columns. Prefer numeric values
|
||
instead.
|
||
</li>
|
||
|
||
<li>
|
||
Examine the <codeph>EXPLAIN</codeph> plan to understand the execution strategy being used for the
|
||
most resource-intensive queries. See <xref href="impala_explain_plan.xml#perf_explain"/> for
|
||
details.
|
||
</li>
|
||
|
||
<li>
|
||
If Impala still chooses a suboptimal execution strategy even with statistics available, or if it
|
||
is impractical to keep the statistics up to date for huge or rapidly changing tables, add hints
|
||
to the most resource-intensive queries to select the right execution strategy. See
|
||
<xref href="impala_hints.xml#hints"/> for details.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
If your queries experience substantial performance overhead due to spilling, enable the
|
||
<codeph>DISABLE_UNSAFE_SPILLS</codeph> query option. This option prevents queries whose memory usage
|
||
is likely to be exorbitant from spilling to disk. See
|
||
<xref href="impala_disable_unsafe_spills.xml#disable_unsafe_spills"/> for details. As you tune
|
||
problematic queries using the preceding steps, fewer and fewer will be cancelled by this option
|
||
setting.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
</ol>
|
||
|
||
<p>
|
||
<b>Testing performance implications of spilling to disk:</b>
|
||
</p>
|
||
|
||
<p>
|
||
To artificially provoke spilling, to test this feature and understand the performance implications, use a
|
||
test environment with a memory limit of at least 2 GB. Issue the <codeph>SET</codeph> command with no
|
||
arguments to check the current setting for the <codeph>MEM_LIMIT</codeph> query option. Set the query
|
||
option <codeph>DISABLE_UNSAFE_SPILLS=true</codeph>. This option limits the spill-to-disk feature to prevent
|
||
runaway disk usage from queries that are known in advance to be suboptimal. Within
|
||
<cmdname>impala-shell</cmdname>, run a query that you expect to be memory-intensive, based on the criteria
|
||
explained earlier. A self-join of a large table is a good candidate:
|
||
</p>
|
||
|
||
<codeblock>select count(*) from big_table a join big_table b using (column_with_many_values);
|
||
</codeblock>
|
||
|
||
<p>
|
||
Issue the <codeph>PROFILE</codeph> command to get a detailed breakdown of the memory usage on each node
|
||
during the query.
|
||
<!--
|
||
The crucial part of the profile output concerning memory is the <codeph>BlockMgr</codeph>
|
||
portion. For example, this profile shows that the query did not quite exceed the memory limit.
|
||
-->
|
||
</p>
|
||
|
||
<!-- Commenting out because now stale due to changes from the buffer pool (IMPALA-3200).
|
||
To do: Revisit these details later if indicated by user feedback.
|
||
|
||
<codeblock>BlockMgr:
|
||
- BlockWritesIssued: 1
|
||
- BlockWritesOutstanding: 0
|
||
- BlocksCreated: 24
|
||
- BlocksRecycled: 1
|
||
- BufferedPins: 0
|
||
- MaxBlockSize: 8.00 MB (8388608)
|
||
<b>- MemoryLimit: 200.00 MB (209715200)</b>
|
||
<b>- PeakMemoryUsage: 192.22 MB (201555968)</b>
|
||
- TotalBufferWaitTime: 0ns
|
||
- TotalEncryptionTime: 0ns
|
||
- TotalIntegrityCheckTime: 0ns
|
||
- TotalReadBlockTime: 0ns
|
||
</codeblock>
|
||
|
||
<p>
|
||
In this case, because the memory limit was already below any recommended value, I increased the volume of
|
||
data for the query rather than reducing the memory limit any further.
|
||
</p>
|
||
-->
|
||
|
||
<p>
|
||
Set the <codeph>MEM_LIMIT</codeph> query option to a value that is smaller than the peak memory usage
|
||
reported in the profile output. Now try the memory-intensive query again.
|
||
</p>
|
||
|
||
<p>
|
||
Check if the query fails with a message like the following:
|
||
</p>
|
||
|
||
<codeblock>WARNINGS: Spilling has been disabled for plans that do not have stats and are not hinted
|
||
to prevent potentially bad plans from using too many cluster resources. Compute stats on
|
||
these tables, hint the plan or disable this behavior via query options to enable spilling.
|
||
</codeblock>
|
||
|
||
<p>
|
||
If so, the query could have consumed substantial temporary disk space, slowing down so much that it would
|
||
not complete in any reasonable time. Rather than rely on the spill-to-disk feature in this case, issue the
|
||
<codeph>COMPUTE STATS</codeph> statement for the table or tables in your sample query. Then run the query
|
||
again, check the peak memory usage again in the <codeph>PROFILE</codeph> output, and adjust the memory
|
||
limit again if necessary to be lower than the peak memory usage.
|
||
</p>
|
||
|
||
<p>
|
||
At this point, you have a query that is memory-intensive, but Impala can optimize it efficiently so that
|
||
the memory usage is not exorbitant. You have set an artificial constraint through the
|
||
<codeph>MEM_LIMIT</codeph> option so that the query would normally fail with an out-of-memory error. But
|
||
the automatic spill-to-disk feature means that the query should actually succeed, at the expense of some
|
||
extra disk I/O to read and write temporary work data.
|
||
</p>
|
||
|
||
<p>
|
||
Try the query again, and confirm that it succeeds. Examine the <codeph>PROFILE</codeph> output again. This
|
||
time, look for lines of this form:
|
||
</p>
|
||
|
||
<codeblock>- SpilledPartitions: <varname>N</varname>
|
||
</codeblock>
|
||
|
||
<p>
|
||
If you see any such lines with <varname>N</varname> greater than 0, that indicates the query would have
|
||
failed in Impala releases prior to 2.0, but now it succeeded because of the spill-to-disk feature. Examine
|
||
the total time taken by the <codeph>AGGREGATION_NODE</codeph> or other query fragments containing non-zero
|
||
<codeph>SpilledPartitions</codeph> values. Compare the times to similar fragments that did not spill, for
|
||
example in the <codeph>PROFILE</codeph> output when the same query is run with a higher memory limit. This
|
||
gives you an idea of the performance penalty of the spill operation for a particular query with a
|
||
particular memory limit. If you make the memory limit just a little lower than the peak memory usage, the
|
||
query only needs to write a small amount of temporary data to disk. The lower you set the memory limit, the
|
||
more temporary data is written and the slower the query becomes.
|
||
</p>
|
||
|
||
<p>
|
||
Now repeat this procedure for actual queries used in your environment. Use the
|
||
<codeph>DISABLE_UNSAFE_SPILLS</codeph> setting to identify cases where queries used more memory than
|
||
necessary due to lack of statistics on the relevant tables and columns, and issue <codeph>COMPUTE
|
||
STATS</codeph> where necessary.
|
||
</p>
|
||
|
||
<p>
|
||
<b>When to use DISABLE_UNSAFE_SPILLS:</b>
|
||
</p>
|
||
|
||
<p>
|
||
You might wonder, why not leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned on all the time. Whether and
|
||
how frequently to use this option depends on your system environment and workload.
|
||
</p>
|
||
|
||
<p>
|
||
<codeph>DISABLE_UNSAFE_SPILLS</codeph> is suitable for an environment with ad hoc queries whose performance
|
||
characteristics and memory usage are not known in advance. It prevents <q>worst-case scenario</q> queries
|
||
that use large amounts of memory unnecessarily. Thus, you might turn this option on within a session while
|
||
developing new SQL code, even though it is turned off for existing applications.
|
||
</p>
|
||
|
||
<p>
|
||
Organizations where table and column statistics are generally up-to-date might leave this option turned on
|
||
all the time, again to avoid worst-case scenarios for untested queries or if a problem in the ETL pipeline
|
||
results in a table with no statistics. Turning on <codeph>DISABLE_UNSAFE_SPILLS</codeph> lets you <q>fail
|
||
fast</q> in this case and immediately gather statistics or tune the problematic queries.
|
||
</p>
|
||
|
||
<p>
|
||
Some organizations might leave this option turned off. For example, you might have tables large enough that
|
||
the <codeph>COMPUTE STATS</codeph> takes substantial time to run, making it impractical to re-run after
|
||
loading new data. If you have examined the <codeph>EXPLAIN</codeph> plans of your queries and know that
|
||
they are operating efficiently, you might leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned off. In that
|
||
case, you know that any queries that spill will not go overboard with their memory consumption.
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="complex_query">
|
||
<title>Limits on Query Size and Complexity</title>
|
||
<conbody>
|
||
<p>
|
||
There are hardcoded limits on the maximum size and complexity of queries.
|
||
Currently, the maximum number of expressions in a query is 2000.
|
||
You might exceed the limits with large or deeply nested queries
|
||
produced by business intelligence tools or other query generators.
|
||
</p>
|
||
<p>
|
||
If you have the ability to customize such queries or the query generation
|
||
logic that produces them, replace sequences of repetitive expressions
|
||
with single operators such as <codeph>IN</codeph> or <codeph>BETWEEN</codeph>
|
||
that can represent multiple values or ranges.
|
||
For example, instead of a large number of <codeph>OR</codeph> clauses:
|
||
</p>
|
||
<codeblock>WHERE val = 1 OR val = 2 OR val = 6 OR val = 100 ...
|
||
</codeblock>
|
||
<p>
|
||
use a single <codeph>IN</codeph> clause:
|
||
</p>
|
||
<codeblock>WHERE val IN (1,2,6,100,...)</codeblock>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="scalability_io">
|
||
<title>Scalability Considerations for Impala I/O</title>
|
||
<conbody>
|
||
<p>
|
||
Impala parallelizes its I/O operations aggressively,
|
||
therefore the more disks you can attach to each host, the better.
|
||
Impala retrieves data from disk so quickly using
|
||
bulk read operations on large blocks, that most queries
|
||
are CPU-bound rather than I/O-bound.
|
||
</p>
|
||
<p>
|
||
Because the kind of sequential scanning typically done by
|
||
Impala queries does not benefit much from the random-access
|
||
capabilities of SSDs, spinning disks typically provide
|
||
the most cost-effective kind of storage for Impala data,
|
||
with little or no performance penalty as compared to SSDs.
|
||
</p>
|
||
<p>
|
||
Resource management features such as YARN, Llama, and admission control
|
||
typically constrain the amount of memory, CPU, or overall number of
|
||
queries in a high-concurrency environment.
|
||
Currently, there is no throttling mechanism for Impala I/O.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="big_tables">
|
||
<title>Scalability Considerations for Table Layout</title>
|
||
<conbody>
|
||
<p>
|
||
Due to the overhead of retrieving and updating table metadata
|
||
in the metastore database, try to limit the number of columns
|
||
in a table to a maximum of approximately 2000.
|
||
Although Impala can handle wider tables than this, the metastore overhead
|
||
can become significant, leading to query performance that is slower
|
||
than expected based on the actual data volume.
|
||
</p>
|
||
<p>
|
||
To minimize overhead related to the metastore database and Impala query planning,
|
||
try to limit the number of partitions for any partitioned table to a few tens of thousands.
|
||
</p>
|
||
<p rev="IMPALA-5309">
|
||
If the volume of data within a table makes it impractical to run exploratory
|
||
queries, consider using the <codeph>TABLESAMPLE</codeph> clause to limit query processing
|
||
to only a percentage of data within the table. This technique reduces the overhead
|
||
for query startup, I/O to read the data, and the amount of network, CPU, and memory
|
||
needed to process intermediate results during the query. See <xref keyref="tablesample"/>
|
||
for details.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept rev="" id="kerberos_overhead_cluster_size">
|
||
<title>Kerberos-Related Network Overhead for Large Clusters</title>
|
||
<conbody>
|
||
<p>
|
||
When Impala starts up, or after each <codeph>kinit</codeph> refresh, Impala sends a number of
|
||
simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might be able to process
|
||
all the requests within roughly 5 seconds. For a cluster with 1000 hosts, the time to process
|
||
the requests would be roughly 500 seconds. Impala also makes a number of DNS requests at the same
|
||
time as these Kerberos-related requests.
|
||
</p>
|
||
<p>
|
||
While these authentication requests are being processed, any submitted Impala queries will fail.
|
||
During this period, the KDC and DNS may be slow to respond to requests from components other than Impala,
|
||
so other secure services might be affected temporarily.
|
||
</p>
|
||
|
||
<p>
|
||
To reduce the frequency of the <codeph>kinit</codeph> renewal that initiates
|
||
a new set of authentication requests, increase the <codeph>kerberos_reinit_interval</codeph>
|
||
configuration setting for the <cmdname>impalad</cmdname> daemons. Currently, the default is 60 minutes.
|
||
Consider using a higher value such as 360 (6 hours).
|
||
</p>
|
||
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept rev="IMPALA-2294" id="kerberos_overhead_memory_usage">
|
||
<title>Kerberos-Related Memory Overhead for Large Clusters</title>
|
||
<conbody>
|
||
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
|
||
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="vm_overcommit_memory_end"/>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="scalability_hotspots" rev="2.5.0 IMPALA-2696">
|
||
<title>Avoiding CPU Hotspots for HDFS Cached Data</title>
|
||
<conbody>
|
||
<p>
|
||
You can use the HDFS caching feature, described in <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>,
|
||
with Impala to reduce I/O and memory-to-memory copying for frequently accessed tables or partitions.
|
||
</p>
|
||
<p>
|
||
In the early days of this feature, you might have found that enabling HDFS caching
|
||
resulted in little or no performance improvement, because it could result in
|
||
<q>hotspots</q>: instead of the I/O to read the table data being parallelized across
|
||
the cluster, the I/O was reduced but the CPU load to process the data blocks
|
||
might be concentrated on a single host.
|
||
</p>
|
||
<p>
|
||
To avoid hotspots, include the <codeph>WITH REPLICATION</codeph> clause with the
|
||
<codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements for tables that use HDFS caching.
|
||
This clause allows more than one host to cache the relevant data blocks, so the CPU load
|
||
can be shared, reducing the load on any one host.
|
||
See <xref href="impala_create_table.xml#create_table"/> and <xref href="impala_alter_table.xml#alter_table"/>
|
||
for details.
|
||
</p>
|
||
<p>
|
||
Hotspots with high CPU load for HDFS cached data could still arise in some cases, due to
|
||
the way that Impala schedules the work of processing data blocks on different hosts.
|
||
In <keyword keyref="impala25_full"/> and higher, scheduling improvements mean that the work for
|
||
HDFS cached data is divided better among all the hosts that have cached replicas
|
||
for a particular data block. When more than one host has a cached replica for a data block,
|
||
Impala assigns the work of processing that block to whichever host has done the least work
|
||
(in terms of number of bytes read) for the current query. If hotspots persist even with this
|
||
load-based scheduling algorithm, you can enable the query option <codeph>SCHEDULE_RANDOM_REPLICA=TRUE</codeph>
|
||
to further distribute the CPU load. This setting causes Impala to randomly pick a host to process a cached
|
||
data block if the scheduling algorithm encounters a tie when deciding which host has done the
|
||
least work.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
<concept id="scalability_file_handle_cache" rev="2.10.0 IMPALA-4623">
|
||
<title>Scalability Considerations for NameNode Traffic with File Handle Caching</title>
|
||
<conbody>
|
||
<p>
|
||
One scalability aspect that affects heavily loaded clusters is the load on the HDFS
|
||
NameNode, from looking up the details as each HDFS file is opened. Impala queries
|
||
often access many different HDFS files, for example if a query does a full table scan
|
||
on a table with thousands of partitions, each partition containing multiple data files.
|
||
Accessing each column of a Parquet file also involves a separate <q>open</q> call,
|
||
further increasing the load on the NameNode. High NameNode overhead can add startup time
|
||
(that is, increase latency) to Impala queries, and reduce overall throughput for non-Impala
|
||
workloads that also require accessing HDFS files.
|
||
</p>
|
||
<p>
|
||
In <keyword keyref="impala210_full"/> and higher, you can reduce NameNode overhead by enabling
|
||
a caching feature for HDFS file handles. Data files that are accessed by different queries,
|
||
or even multiple times within the same query, can be accessed without a new <q>open</q>
|
||
call and without fetching the file details again from the NameNode.
|
||
</p>
|
||
<p>
|
||
Because this feature only involves HDFS data files, it does not apply to non-HDFS tables,
|
||
such as Kudu or HBase tables, or tables that store their data on cloud services such as
|
||
S3 or ADLS. Any read operations that perform remote reads also skip the cached file handles.
|
||
</p>
|
||
<p>
|
||
This feature is turned off by default. To enable it, set the configuration option
|
||
<codeph>max_cached_file_handles</codeph> to a non-zero value for each <cmdname>impalad</cmdname>
|
||
daemon. Consider an initial starting value of 20 thousand, and adjust upward if NameNode
|
||
overhead is still significant, or downward if it is more important to reduce the extra memory usage
|
||
on each host. Each cache entry consumes 6 KB, meaning that caching 20,000 file handles requires
|
||
up to 120 MB on each DataNode. The exact memory usage varies depending on how many file handles
|
||
have actually been cached; memory is freed as file handles are evicted from the cache.
|
||
</p>
|
||
<p>
|
||
If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is cached,
|
||
Impala still accesses the contents of that file. This is a change from prior behavior. Previously,
|
||
accessing a file that was in the trashcan would cause an error. This behavior only applies to
|
||
non-Impala methods of removing HDFS files, not the Impala mechanisms such as <codeph>TRUNCATE TABLE</codeph>
|
||
or <codeph>DROP TABLE</codeph>.
|
||
</p>
|
||
<p>
|
||
If files are removed, replaced, or appended by HDFS operations outside of Impala, the way to bring the
|
||
file information up to date is to run the <codeph>REFRESH</codeph> statement on the table.
|
||
</p>
|
||
<p>
|
||
File handle cache entries are evicted as the cache fills up, or based on a timeout period
|
||
when they have not been accessed for some time.
|
||
</p>
|
||
<p>
|
||
To evaluate the effectiveness of file handle caching for a particular workload, issue the
|
||
<codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine query
|
||
profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph>
|
||
(ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low).
|
||
Before starting any evaluation, run some representative queries to <q>warm up</q> the cache,
|
||
because the first time each data file is accessed is always recorded as a cache miss.
|
||
To see metrics about file handle caching for each <cmdname>impalad</cmdname> instance,
|
||
examine the <uicontrol>/metrics</uicontrol> page in the Impala web UI, in particular the fields
|
||
<uicontrol>impala-server.io.mgr.cached-file-handles-miss-count</uicontrol>,
|
||
<uicontrol>impala-server.io.mgr.cached-file-handles-hit-count</uicontrol>, and
|
||
<uicontrol>impala-server.io.mgr.num-cached-file-handles</uicontrol>.
|
||
</p>
|
||
</conbody>
|
||
</concept>
|
||
|
||
</concept>
|