mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Change-Id: Ia665e3f230f218d8bbf998dfd1ae21338c21b36e Reviewed-on: http://gerrit.cloudera.org:8080/12908 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
109 lines
4.8 KiB
XML
109 lines
4.8 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept rev="5.4.3" id="impala_isilon">
|
|
|
|
<title>Using Impala with Isilon Storage</title>
|
|
|
|
<titlealts audience="PDF">
|
|
|
|
<navtitle>Isilon Storage</navtitle>
|
|
|
|
</titlealts>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Isilon"/>
|
|
<data name="Category" value="Disk Storage"/>
|
|
<data name="Category" value="Administrators"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">Isilon</indexterm>
|
|
You can use Impala to query data files that reside on EMC Isilon storage devices, rather
|
|
than in HDFS. This capability allows convenient query access to a storage system where you
|
|
might already be managing large volumes of data. The combination of the Impala query
|
|
engine and Isilon storage is certified on <keyword keyref="impala224"/> or higher.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/isilon_block_size_caveat"/>
|
|
|
|
<p>
|
|
The typical use case for Impala and Isilon together is to use Isilon for the default
|
|
filesystem, replacing HDFS entirely. In this configuration, when you create a database,
|
|
table, or partition, the data always resides on Isilon storage and you do not need to
|
|
specify any special <codeph>LOCATION</codeph> attribute. If you do specify a
|
|
<codeph>LOCATION</codeph> attribute, its value refers to a path within the Isilon
|
|
filesystem. For example:
|
|
</p>
|
|
|
|
<codeblock>-- If the default filesystem is Isilon, all Impala data resides there
|
|
-- and all Impala databases and tables are located there.
|
|
CREATE TABLE t1 (x INT, s STRING);
|
|
|
|
-- You can specify LOCATION for database, table, or partition,
|
|
-- using values from the Isilon filesystem.
|
|
CREATE DATABASE d1 LOCATION '/some/path/on/isilon/server/d1.db';
|
|
CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN);
|
|
</codeblock>
|
|
|
|
<p>
|
|
Impala can write to, delete, and rename data files and database, table, and partition
|
|
directories on Isilon storage. Therefore, Impala statements such as <codeph>CREATE
|
|
TABLE</codeph>, <codeph>DROP TABLE</codeph>, <codeph>CREATE DATABASE</codeph>,
|
|
<codeph>DROP DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, and <codeph>INSERT</codeph>
|
|
work the same with Isilon storage as with HDFS.
|
|
</p>
|
|
|
|
<p>
|
|
When the Impala spill-to-disk feature is activated by a query that approaches the memory
|
|
limit, Impala writes all the temporary data to a local (not Isilon) storage device.
|
|
Because the I/O bandwidth for the temporary data depends on the number of local disks, and
|
|
clusters using Isilon storage might not have as many local disks attached, pay special
|
|
attention on Isilon-enabled clusters to any queries that use the spill-to-disk feature.
|
|
Where practical, tune the queries or allocate extra memory for Impala to avoid spilling.
|
|
Although you can specify an Isilon storage device as the destination for the temporary
|
|
data for the spill-to-disk feature, that configuration is not recommended due to the need
|
|
to transfer the data both ways using remote I/O.
|
|
</p>
|
|
|
|
<p>
|
|
When tuning Impala queries on HDFS, you typically try to avoid any remote reads. When the
|
|
data resides on Isilon storage, all the I/O consists of remote reads. Do not be alarmed
|
|
when you see non-zero numbers for remote read measurements in query profile output. The
|
|
benefit of the Impala and Isilon integration is primarily convenience of not having to
|
|
move or copy large volumes of data to HDFS, rather than raw query performance. You can
|
|
increase the performance of Impala I/O for Isilon systems by increasing the value for the
|
|
<codeph>‑‑num_remote_hdfs_io_threads</codeph> startup option for the
|
|
<cmdname>impalad</cmdname> daemon.
|
|
</p>
|
|
|
|
<!-- <p outputclass="toc inpage"/> -->
|
|
|
|
</conbody>
|
|
|
|
</concept>
|