mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Updates documentation to include examples with service identifier. Also fixes inconsistent use of ASCII quotes for example text, highlighting code and variable names, and normalizes descriptions between S3/HDFS/Ozone. Removes "priority" from remote descriptions as it is optional and does nothing. Change-Id: I624a607bda33ab47100e1540ff1d66c8d19a7329 Reviewed-on: http://gerrit.cloudera.org:8080/19504 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>
494 lines
25 KiB
XML
494 lines
25 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept id="disk_space">
|
||
|
||
<title>Managing Disk Space for Impala Data</title>
|
||
|
||
<titlealts audience="PDF">
|
||
|
||
<navtitle>Managing Disk Space</navtitle>
|
||
|
||
</titlealts>
|
||
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Disk Storage"/>
|
||
<data name="Category" value="Administrators"/>
|
||
<data name="Category" value="Developers"/>
|
||
<data name="Category" value="Data Analysts"/>
|
||
<data name="Category" value="Tables"/>
|
||
<data name="Category" value="Compression"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
Although Impala typically works with many large files in an HDFS storage system with
|
||
plenty of capacity, there are times when you might perform some file cleanup to reclaim
|
||
space, or advise developers on techniques to minimize space consumption and file
|
||
duplication.
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>
|
||
Use compact binary file formats where practical. Numeric and time-based data in
|
||
particular can be stored in more compact form in binary data files. Depending on the
|
||
file format, various compression and encoding features can reduce file size even
|
||
further. You can specify the <codeph>STORED AS</codeph> clause as part of the
|
||
<codeph>CREATE TABLE</codeph> statement, or <codeph>ALTER TABLE</codeph> with the
|
||
<codeph>SET FILEFORMAT</codeph> clause for an existing table or partition within a
|
||
partitioned table. See <xref
|
||
href="impala_file_formats.xml#file_formats"/>
|
||
for details about file formats, especially <xref href="impala_parquet.xml#parquet"/>.
|
||
See <xref href="impala_create_table.xml#create_table"/> and
|
||
<xref
|
||
href="impala_alter_table.xml#alter_table"/> for syntax details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
You manage underlying data files differently depending on whether the corresponding
|
||
Impala table is defined as an
|
||
<xref
|
||
href="impala_tables.xml#internal_tables">internal</xref> or
|
||
<xref
|
||
href="impala_tables.xml#external_tables">external</xref> table:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
Use the <codeph>DESCRIBE FORMATTED</codeph> statement to check if a particular table
|
||
is internal (managed by Impala) or external, and to see the physical location of the
|
||
data files in HDFS. See <xref
|
||
href="impala_describe.xml#describe"/>
|
||
for details.
|
||
</li>
|
||
|
||
<li>
|
||
For Impala-managed (<q>internal</q>) tables, use <codeph>DROP TABLE</codeph>
|
||
statements to remove data files. See
|
||
<xref
|
||
href="impala_drop_table.xml#drop_table"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
For tables not managed by Impala (<q>external</q> tables), use appropriate
|
||
HDFS-related commands such as <codeph>hadoop fs</codeph>, <codeph>hdfs dfs</codeph>,
|
||
or <codeph>distcp</codeph>, to create, move, copy, or delete files within HDFS
|
||
directories that are accessible by the <codeph>impala</codeph> user. Issue a
|
||
<codeph>REFRESH <varname>table_name</varname></codeph> statement after adding or
|
||
removing any files from the data directory of an external table. See
|
||
<xref href="impala_refresh.xml#refresh"/> for details.
|
||
</li>
|
||
|
||
<li>
|
||
Use external tables to reference HDFS data files in their original location. With
|
||
this technique, you avoid copying the files, and you can map more than one Impala
|
||
table to the same set of data files. When you drop the Impala table, the data files
|
||
are left undisturbed. See <xref href="impala_tables.xml#external_tables"/> for
|
||
details.
|
||
</li>
|
||
|
||
<li>
|
||
Use the <codeph>LOAD DATA</codeph> statement to move HDFS files into the data
|
||
directory for an Impala table from inside Impala, without the need to specify the
|
||
HDFS path of the destination directory. This technique works for both internal and
|
||
external tables. See <xref href="impala_load_data.xml#load_data"/> for details.
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Make sure that the HDFS trashcan is configured correctly. When you remove files from
|
||
HDFS, the space might not be reclaimed for use by other files until sometime later,
|
||
when the trashcan is emptied. See <xref href="impala_drop_table.xml#drop_table"/> for
|
||
details. See <xref href="impala_prereqs.xml#prereqs_account"/> for permissions needed
|
||
for the HDFS trashcan to operate correctly.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Drop all tables in a database before dropping the database itself. See
|
||
<xref href="impala_drop_database.xml#drop_database"/> for details.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<p>
|
||
Clean up temporary files after failed <codeph>INSERT</codeph> statements. If an
|
||
<codeph>INSERT</codeph> statement encounters an error, and you see a directory named
|
||
<filepath>.impala_insert_staging</filepath> or
|
||
<filepath>_impala_insert_staging</filepath> left behind in the data directory for the
|
||
table, it might contain temporary data files taking up space in HDFS. You might be
|
||
able to salvage these data files, for example if they are complete but could not be
|
||
moved into place due to a permission error. Or, you might delete those files through
|
||
commands such as <codeph>hadoop fs</codeph> or <codeph>hdfs dfs</codeph>, to reclaim
|
||
space before re-trying the <codeph>INSERT</codeph>. Issue <codeph>DESCRIBE FORMATTED
|
||
<varname>table_name</varname></codeph> to see the HDFS path where you can check for
|
||
temporary files.
|
||
</p>
|
||
</li>
|
||
|
||
<li rev="2.2.0">
|
||
<p>
|
||
If you use the Amazon Simple Storage Service (S3) as a place to offload data to reduce
|
||
the volume of local storage, Impala 2.2.0 and higher can query the data directly from
|
||
S3. See <xref
|
||
href="impala_s3.xml#s3"/> for details.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<section id="section_vrg_fjb_3jb">
|
||
|
||
<title>Configuring Scratch Space for Spilling to Disk</title>Impala uses intermediate files during large
|
||
sort, join, aggregation, or analytic function operations The files are
|
||
removed when the operation finishes. You can specify locations of the
|
||
intermediate files by starting the <cmdname>impalad</cmdname> daemon with
|
||
the <codeph>--scratch_dirs="<varname>path_to_directory</varname>"</codeph>
|
||
configuration option. By default, intermediate files are stored in the
|
||
directory <filepath>/tmp/impala-scratch</filepath>.<p
|
||
id="order_by_scratch_dir">
|
||
<ul>
|
||
<li>
|
||
You can specify a single directory or a comma-separated list of directories.
|
||
</li>
|
||
|
||
<li>
|
||
You can specify an optional a capacity quota per scratch directory using the colon
|
||
(:) as the delimiter.
|
||
<p>
|
||
The capacity quota of <codeph>-1</codeph> or <codeph>0</codeph> is the same as no
|
||
quota for the directory.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
The scratch directories must be on the local filesystem, not in HDFS.
|
||
</li>
|
||
|
||
<li>
|
||
You might specify different directory paths for different hosts, depending on the
|
||
capacity and speed of the available storage devices.
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
|
||
<p>
|
||
If there is less than 1 GB free on the filesystem where that directory resides, Impala
|
||
still runs, but writes a warning message to its log.
|
||
</p>
|
||
|
||
<p>
|
||
Impala successfully starts (with a warning written to the log) if it cannot create or
|
||
read and write files in one of the scratch directories.
|
||
</p>
|
||
|
||
<p>
|
||
The following are examples for specifying scratch directories.
|
||
<table frame="all" rowsep="1" colsep="1"
|
||
id="table_a4d_myg_3jb">
|
||
<tgroup cols="2" align="left">
|
||
<colspec colname="c1" colnum="1"/>
|
||
<colspec colname="c2" colnum="2"/>
|
||
<thead>
|
||
<row>
|
||
<entry>
|
||
Config option
|
||
</entry>
|
||
<entry>
|
||
Description
|
||
</entry>
|
||
</row>
|
||
</thead>
|
||
<tbody>
|
||
<row>
|
||
<entry>
|
||
<codeph>--scratch_dirs=/dir1,/dir2</codeph>
|
||
</entry>
|
||
<entry>
|
||
Use /dir1 and /dir2 as scratch directories with no capacity quota.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
<codeph>--scratch_dirs=/dir1,/dir2:25G</codeph>
|
||
</entry>
|
||
<entry>
|
||
Use /dir1 and /dir2 as scratch directories with no capacity quota on /dir1 and
|
||
the 25GB quota on /dir2.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
<codeph>--scratch_dirs=/dir1:5MB,/dir2</codeph>
|
||
</entry>
|
||
<entry>
|
||
Use /dir1 and /dir2 as scratch directories with the capacity quota of 5MB on
|
||
/dir1 and no quota on /dir2.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
<codeph>--scratch_dirs=/dir1:-1,/dir2:0</codeph>
|
||
</entry>
|
||
<entry>
|
||
Use /dir1 and /dir2 as scratch directories with no capacity quota.
|
||
</entry>
|
||
</row>
|
||
</tbody>
|
||
</tgroup>
|
||
</table>
|
||
</p>
|
||
|
||
<p>
|
||
Allocation from a scratch directory will fail if the specified limit for the directory
|
||
is exceeded.
|
||
</p>
|
||
|
||
<p>
|
||
If Impala encounters an error reading or writing files in a scratch directory during a
|
||
query, Impala logs the error, and the query fails.
|
||
</p>
|
||
|
||
</section>
|
||
<section>
|
||
<title>Priority Based Scratch Directory Selection</title>
|
||
<p>The location of the intermediate files are configured by starting the impalad daemon with
|
||
the flag <codeph>--scratch_dirs="path_to_directory"</codeph>. Currently this startup flag uses the configured
|
||
scratch directories in a round robin fashion. Automatic selection of scratch directories in
|
||
a round robin fashion may not always be ideal in every situation since these directories
|
||
could come from different classes of storage system volumes having different performance
|
||
characteristics (SSD vs HDD, local storage vs network attached storage, etc.). To optimize
|
||
your workload, you have an option to configure the priority of the scratch directories based
|
||
on your storage system configuration.</p>
|
||
<p>The scratch directories will be selected for spilling based on how you configure the
|
||
priorities of the directories and if you provide the same priority for multiple directories
|
||
then the directories will be selected in a round robin fashion.</p>
|
||
<p>The valid formats for specifying the priority directories are as shown here:
|
||
<codeblock><varname>dir-path</varname>:<varname>limit</varname>:<varname>priority</varname>
|
||
<varname>dir-path</varname>::<varname>priority</varname>
|
||
</codeblock></p>
|
||
<p>Example:</p>
|
||
<p>
|
||
<codeblock>/dir1:200GB:0
|
||
/dir1::0
|
||
</codeblock>
|
||
</p>
|
||
<p>The following formats use the default priority:
|
||
<codeblock>/dir1
|
||
/dir1:200GB
|
||
/dir1:200GB:
|
||
</codeblock>
|
||
</p>
|
||
<p>In the example below, dir1 will be used as a spill victim until it is full and then dir2, dir3,
|
||
and dir4 will be used in a round robin fashion.</p>
|
||
<p>
|
||
<codeblock>--scratch_dirs="/dir1:200GB:0, /dir2:1024GB:1, /dir3:1024GB:1, /dir4:1024GB:1"
|
||
</codeblock>
|
||
</p>
|
||
</section>
|
||
<section>
|
||
<title>Increasing Scratch Capacity</title>
|
||
<p> You can compress the data spilled to disk to increase the effective scratch capacity. You
|
||
typically more than double capacity using compression and reduce spilling to disk. Use the
|
||
--disk_spill_compression_codec and –-disk_spill_punch_holes startup options. The
|
||
--disk_spill_compression_codec takes any value supported by the COMPRESSION_CODEC query
|
||
option. The value is not case-sensitive. A value of <codeph>ZSTD</codeph> or
|
||
<codeph>LZ4</codeph> is recommended (default is NONE).</p>
|
||
<p>For example:</p>
|
||
<codeblock>--disk_spill_compression_codec=LZ4
|
||
--disk_spill_punch_holes=true
|
||
</codeblock>
|
||
<p>
|
||
If you set <codeph>--disk_spill_compression_codec</codeph> to a value other than <codeph>NONE</codeph>, you must set <codeph>--disk_spill_punch_holes</codeph> to true.
|
||
</p>
|
||
<p>
|
||
The hole punching feature supported by many filesystems is used to reclaim space in scratch files during execution
|
||
of a query that spills to disk. This results in lower scratch space requirements in many cases, especially when
|
||
combined with disk spill compression. When this option is not enabled, scratch space is still recycled by a query,
|
||
but less effectively in many cases.
|
||
</p>
|
||
<p> You can specify a compression level for <codeph>ZSTD</codeph> only. For example: </p>
|
||
<codeblock>--disk_spill_compression_codec=ZSTD:10
|
||
--disk_spill_punch_holes=true
|
||
</codeblock>
|
||
<p> Compression levels from 1 up to 22 (default 3) are supported for <codeph>ZSTD</codeph>.
|
||
The lower the compression level, the faster the speed at the cost of compression ratio.</p>
|
||
</section>
|
||
<section>
|
||
<title>Configure Impala Daemon to spill to S3</title>
|
||
<p>Impala occasionally needs to use persistent storage for writing intermediate files during
|
||
large sorts, joins, aggregations, or analytic function operations. If your workload results
|
||
in large volumes of intermediate data being written, it is recommended to configure the
|
||
heavy spilling queries to use a remote storage location rather than the local one. The
|
||
advantage of using remote storage for scratch space is that it is elastic and can handle
|
||
any amount of spilling.</p>
|
||
<p><b>Before you begin</b></p>
|
||
<p>Identify the URL for an S3 bucket to which you want your new Impala to write the temporary
|
||
data. If you use the S3 bucket that is associated with the environment, navigate to the S3
|
||
bucket and copy the URL. If you want to use an external S3 bucket, you must first configure
|
||
your environment to use the external S3 bucket with the correct read/write permissions.</p>
|
||
<p><b>Configuring the Start-up Option in Impala daemon</b></p>
|
||
<p>You can use the Impalad start option scratch_dirs to specify the locations of the
|
||
intermediate files. The format of the option is:</p>
|
||
<codeblock>--scratch_dirs="<varname>remote_dir</varname>, <varname>local_buffer_dir</varname> (,<varname>local_dir</varname>…)"</codeblock>
|
||
<p>where <varname>local_buffer_dir</varname> and <varname>local_dir</varname> conform to the
|
||
earlier descriptions for scratch directories.</p>
|
||
<p>With the option specified above:</p>
|
||
<ul>
|
||
<li>You can specify only one remote directory. When you configure a remote directory, you
|
||
must specify a local buffer directory as the buffer. However you can use multiple local
|
||
directories with the remote directory. If you specify multiple local directories, the
|
||
first local directory would be used as the local buffer directory.</li>
|
||
<li>If you configure both remote and local directories, the remote directory is only used
|
||
when the local directories are fully utilized.</li>
|
||
<li>The size of a remote intermediate file could affect the query performance, and the
|
||
value can be set by <codeph>--remote_tmp_file_size=<varname>size</varname></codeph> in
|
||
the start-up option. The default size of a remote intermediate file is 16MB while the
|
||
maximum is 512MB.</li>
|
||
</ul>
|
||
<p><b>Examples</b></p>
|
||
<ul>
|
||
<li>A remote scratch dir with a local buffer dir, file size 64MB.
|
||
<codeblock>--scratch_dirs=s3a://remote_dir,/local_buffer_dir --remote_tmp_file_size=64M</codeblock></li>
|
||
<li>A remote scratch dir with a local buffer dir limited to 256MB, and one local dir
|
||
limited to 10GB.
|
||
<codeblock>--scratch_dirs=s3a://remote_dir,/local_buffer_dir:256M,/local_dir:10G</codeblock></li>
|
||
<li>A remote scratch dir with a local buffer dir, and multiple prioritized local dirs.
|
||
<codeblock>--scratch_dirs=s3a://remote_dir,/local_buffer_dir,/local_dir_1:5G:1,/local_dir_2:5G:2</codeblock></li>
|
||
</ul>
|
||
</section>
|
||
<section>
|
||
<title>Configure Impala Daemon to spill to HDFS</title>
|
||
<p>Impala occasionally needs to use persistent storage for writing intermediate files during
|
||
large sorts, joins, aggregations, or analytic function operations. If your workload results
|
||
in large volumes of intermediate data being written, it is recommended to configure the
|
||
heavy spilling queries to use a remote storage location rather than the local one. The
|
||
advantage of using remote storage for scratch space is that it is elastic and can handle
|
||
any amount of spilling.</p>
|
||
<p><b>Before you begin</b></p>
|
||
<ul>
|
||
<li>Identify the HDFS scratch directory where you want your new Impala to write the
|
||
temporary data.</li>
|
||
<li>Identify the IP address, host name, or service identifier of HDFS.</li>
|
||
<li>Identify the port number of the HDFS NameNode (if not-default).</li>
|
||
<li>Configure Impala to write temporary data to disk during query processing.</li>
|
||
</ul>
|
||
<p><b>Configuring the Start-up Option in Impala daemon</b></p>
|
||
<p>You can use the Impalad start option <codeph>scratch_dirs</codeph> to specify the
|
||
locations of the intermediate files.</p>
|
||
<p>Use the following format for this start up option:</p>
|
||
<codeblock>--scratch_dirs="hdfs://<varname>authority</varname>/<varname>path</varname>(:<varname>max_bytes</varname>), <varname>local_buffer_dir</varname> (,<varname>local_dir</varname>…)"</codeblock>
|
||
<ul>
|
||
<li>Where <codeph>hdfs://<varname>authority</varname>/<varname>path</varname></codeph> is
|
||
the remote directory.</li>
|
||
<li><varname>authority</varname> may include <codeph>ip_address</codeph> or
|
||
<codeph>hostname</codeph> and <codeph>port</codeph>, or <codeph>service_id</codeph>.</li>
|
||
<li><varname>max_bytes</varname> is optional.</li>
|
||
</ul>
|
||
<p>Using the above format:</p>
|
||
<ul>
|
||
<li>You can specify only one remote directory. When you configure a remote directory, you
|
||
must specify a local buffer directory as the buffer. However you can use multiple local
|
||
directories with the remote directory. If you specify multiple local directories, the
|
||
first local directory would be used as the local buffer directory.</li>
|
||
<li>If you configure both remote and local directories, the remote directory is only used
|
||
when the local directories are fully utilized.</li>
|
||
<li>The size of a remote intermediate file could affect the query performance, and the
|
||
value can be set by <codeph>--remote_tmp_file_size=<varname>size</varname></codeph> in
|
||
the start-up option. The default size of a remote intermediate file is 16MB while the
|
||
maximum is 512MB.</li>
|
||
</ul>
|
||
<p><b>Examples</b></p>
|
||
<ul>
|
||
<li>A HDFS scratch dir with one local buffer dir, file size 64MB. The space of HDFS scratch
|
||
dir is limited to 300G.
|
||
<codeblock>--scratch_dirs=hdfs://10.0.0.49:20500/tmp:300G,/local_buffer_dir --remote_tmp_file_size=64M</codeblock></li>
|
||
<li>A HDFS scratch dir with one local buffer dir limited to 512MB, and one local dir
|
||
limited to 10GB. The space of HDFS scratch dir is limited to 300G. The HDFS NameNode uses
|
||
its default port (8020).
|
||
<codeblock>--scratch_dirs=hdfs://hdfsnn/tmp:300G,/local_buffer_dir:512M,/local_dir:10G</codeblock></li>
|
||
<li>A HDFS scratch dir with one local buffer dir, and multiple prioritized local dirs. The
|
||
space of HDFS scratch dir is unlimited. The HDFS service identifier is <codeph>hdfs1</codeph>.
|
||
<codeblock>--scratch_dirs=hdfs://hdfs1/tmp,/local_buffer_dir,/local_dir_1:5G:1,/local_dir_2:5G:2</codeblock></li>
|
||
</ul>
|
||
<p>Even though max_bytes is optional, it is highly recommended to configure for spilling to
|
||
HDFS because the HDFS cluster space is limited.</p>
|
||
</section>
|
||
<section>
|
||
<title>Configure Impala Daemon to spill to Ozone</title>
|
||
<p><b>Before you begin</b></p>
|
||
<ul>
|
||
<li>Identify the Ozone scratch directory where you want your new Impala to write the
|
||
temporary data.</li>
|
||
<li>Identify the IP address, host name, or service identifier of Ozone.</li>
|
||
<li>Identify the port number of the Ozone Manager (if not-default).</li>
|
||
</ul>
|
||
<p><b>Configuring the Start-up Option in Impala daemon</b></p>
|
||
<p>You can use the Impalad start option <codeph>scratch_dirs</codeph> to specify the locations of the
|
||
intermediate files.</p>
|
||
<codeblock>--scratch_dirs="ofs://<varname>authority</varname>/<varname>path</varname>(:<varname>max_bytes</varname>), <varname>local_buffer_dir</varname> (,<varname>local_dir</varname>…)"</codeblock>
|
||
<ul>
|
||
<li>Where <codeph>ofs://<varname>authority</varname>/<varname>path</varname></codeph> is
|
||
the remote directory.</li>
|
||
<li><codeph>authority</codeph> may include <codeph>ip_address</codeph> or
|
||
<codeph>hostname</codeph> and <codeph>port</codeph>, or <codeph>service_id</codeph>.</li>
|
||
<li><codeph>max_bytes</codeph> is optional.</li>
|
||
</ul>
|
||
<p>Using the above format:</p>
|
||
<ul>
|
||
<li>You can specify only one remote directory. When you configure a remote directory, you
|
||
must specify a local buffer directory as the buffer. However you can use multiple local
|
||
directories with the remote directory. If you specify multiple local directories, the
|
||
first local directory would be used as the local buffer directory.</li>
|
||
<li>If you configure both remote and local directories, the remote directory is only used
|
||
when the local directories are fully utilized.</li>
|
||
<li>The size of a remote intermediate file could affect the query performance, and the
|
||
value can be set by <codeph>--remote_tmp_file_size=<varname>size</varname></codeph> in
|
||
the start-up option. The default size of a remote intermediate file is 16MB while the
|
||
maximum is 512MB.</li>
|
||
</ul>
|
||
<p><b>Examples</b></p>
|
||
<ul>
|
||
<li>An Ozone scratch dir with one local buffer dir, file size 64MB. The space of Ozone
|
||
scratch dir is limited to 300G.
|
||
<codeblock>--scratch_dirs=ofs://10.0.0.49:29000/tmp:300G,/local_buffer_dir --remote_tmp_file_size=64M</codeblock></li>
|
||
<li>An Ozone scratch dir with one local buffer dir limited to 512MB, and one local dir
|
||
limited to 10GB. The space of Ozone scratch dir is limited to 300G. The Ozone Manager
|
||
uses its default port (9862).
|
||
<codeblock>--scratch_dirs=ofs://ozonemgr/tmp:300G,/local_buffer_dir:512M,/local_dir:10G</codeblock></li>
|
||
<li>An Ozone scratch dir with one local buffer dir, and multiple prioritized local dirs. The
|
||
space of Ozone scratch dir is unlimited. The Ozone service identifier is <codeph>ozone1</codeph>.
|
||
<codeblock>--scratch_dirs=ofs://ozone1/tmp,/local_buffer_dir,/local_dir_1:5G:1,/local_dir_2:5G:2</codeblock></li>
|
||
</ul>
|
||
<p>Even though max_bytes is optional, it is highly recommended to configure for spilling to
|
||
Ozone because the Ozone cluster space is limited.</p>
|
||
</section>
|
||
</conbody>
|
||
|
||
</concept>
|