mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Updated "compute incremental stats" syntax to support a list of columns. Change-Id: Id5ad3bdf26572a1d0510df9b41ee1f12ae2cf747 Reviewed-on: http://gerrit.cloudera.org:8080/19602 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
500 lines
24 KiB
XML
500 lines
24 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept rev="1.2.2" id="compute_stats">
|
|
|
|
<title>COMPUTE STATS Statement</title>
|
|
<titlealts audience="PDF"><navtitle>COMPUTE STATS</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Performance"/>
|
|
<data name="Category" value="Scalability"/>
|
|
<data name="Category" value="ETL"/>
|
|
<data name="Category" value="Ingest"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Tables"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
<indexterm audience="hidden">COMPUTE STATS statement</indexterm> The
|
|
COMPUTE STATS statement gathers information about volume and distribution
|
|
of data in a table and all associated columns and partitions. The
|
|
information is stored in the metastore database, and used by Impala to
|
|
help optimize queries. For example, if Impala can determine that a table
|
|
is large or small, or has many or few distinct values it can organize and
|
|
parallelize the work appropriately for a join query or insert operation.
|
|
For details about the kinds of information gathered by this statement, see
|
|
<xref href="impala_perf_stats.xml#perf_stats"/>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<codeblock rev="2.1.0"><ph rev="2.12.0 IMPALA-5310">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ] [TABLESAMPLE SYSTEM(<varname>percentage</varname>) [REPEATABLE(<varname>seed</varname>)]]</ph>
|
|
|
|
<varname>column_list</varname> ::= <varname>column_name</varname> [ , <varname>column_name</varname>, ... ]
|
|
|
|
COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>partition_spec</varname>)] [ ( <varname>column_list</varname> ) ]
|
|
|
|
<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
|
|
|
|
<varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
|
|
|
|
<ph rev="IMPALA-1654"><varname>complex_partition_spec</varname> ::= <varname>comparison_expression_on_partition_col</varname></ph>
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/incremental_partition_spec"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
Originally, Impala relied on users to run the Hive <codeph>ANALYZE
|
|
TABLE</codeph> statement, but that method of gathering statistics proved
|
|
unreliable and difficult to use. The Impala <codeph>COMPUTE STATS</codeph>
|
|
statement was built to improve the reliability and user-friendliness of
|
|
this operation. <codeph>COMPUTE STATS</codeph> does not require any setup
|
|
steps or special configuration. You only run a single Impala
|
|
<codeph>COMPUTE STATS</codeph> statement to gather both table and column
|
|
statistics, rather than separate Hive <codeph>ANALYZE TABLE</codeph>
|
|
statements for each kind of statistics.
|
|
</p>
|
|
|
|
<p rev="impala-3562">
|
|
For non-incremental <codeph>COMPUTE STATS</codeph>
|
|
statement, the columns for which statistics are computed can be specified
|
|
with an optional comma-separate list of columns.
|
|
</p>
|
|
|
|
<p rev="impala-3562">
|
|
If no column list is given, the <codeph>COMPUTE STATS</codeph> statement
|
|
computes column-level statistics for all columns of the table. This adds
|
|
potentially unneeded work for columns whose stats are not needed by
|
|
queries. It can be especially costly for very wide tables and unneeded
|
|
large string fields.
|
|
</p>
|
|
<p rev="impala-3562">
|
|
<codeph>COMPUTE STATS</codeph> returns an error when a specified column
|
|
cannot be analyzed, such as when the column does not exist, the column is
|
|
of an unsupported type for COMPUTE STATS, e.g. colums of complex types,
|
|
or the column is a partitioning column.
|
|
|
|
</p>
|
|
<p rev="impala-3562">
|
|
If an empty column list is given, no column is analyzed by <codeph>COMPUTE
|
|
STATS</codeph>.
|
|
</p>
|
|
|
|
<p rev="2.12.0 IMPALA-5310">
|
|
In <keyword keyref="impala212_full"/> and
|
|
higher, an optional <codeph>TABLESAMPLE</codeph> clause immediately after
|
|
a table reference specifies that the <codeph>COMPUTE STATS</codeph>
|
|
operation only processes a specified percentage of the table data. For
|
|
tables that are so large that a full <codeph>COMPUTE STATS</codeph>
|
|
operation is impractical, you can use <codeph>COMPUTE STATS</codeph> with
|
|
a <codeph>TABLESAMPLE</codeph> clause to extrapolate statistics from a
|
|
sample of the table data. See <keyword keyref="perf_stats"/>about the
|
|
experimental stats extrapolation and sampling features.
|
|
</p>
|
|
|
|
<p rev="2.1.0">
|
|
The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation is a shortcut for partitioned tables that works on a
|
|
subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables
|
|
with many partitions, where a full <codeph>COMPUTE STATS</codeph> operation takes too long to be practical
|
|
each time a partition is added or dropped. See <xref href="impala_perf_stats.xml#perf_stats_incremental"/>
|
|
for full usage details.
|
|
</p>
|
|
|
|
<note type="important">
|
|
<p conref="../shared/impala_common.xml#common/cs_or_cis"/>
|
|
<p conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
|
|
<p conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
|
|
</note>
|
|
|
|
<p>
|
|
<codeph>COMPUTE INCREMENTAL STATS</codeph> only applies to partitioned tables. If you use the
|
|
<codeph>INCREMENTAL</codeph> clause for an unpartitioned table, Impala automatically uses the original
|
|
<codeph>COMPUTE STATS</codeph> statement. Such tables display <codeph>false</codeph> under the
|
|
<codeph>Incremental stats</codeph> column of the <codeph>SHOW TABLE STATS</codeph> output.
|
|
</p>
|
|
<note>
|
|
<p>
|
|
Because many of the most performance-critical and resource-intensive
|
|
operations rely on table and column statistics to construct accurate and
|
|
efficient plans, <codeph>COMPUTE STATS</codeph> is an important step at
|
|
the end of your ETL process. Run <codeph>COMPUTE STATS</codeph> on all
|
|
tables as your first step during performance tuning for slow queries, or
|
|
troubleshooting for out-of-memory conditions:
|
|
<ul>
|
|
<li>
|
|
Accurate statistics help Impala construct an efficient query plan
|
|
for join queries, improving performance and reducing memory usage.
|
|
</li>
|
|
<li>
|
|
Accurate statistics help Impala distribute the work effectively
|
|
for insert operations into Parquet tables, improving performance and
|
|
reducing memory usage.
|
|
</li>
|
|
<li rev="1.3.0">
|
|
Accurate statistics help Impala estimate the memory
|
|
required for each query, which is important when you use resource
|
|
management features, such as admission control and the YARN resource
|
|
management framework. The statistics help Impala to achieve high
|
|
concurrency, full utilization of available memory, and avoid
|
|
contention with workloads from other Hadoop components.
|
|
</li>
|
|
<li rev="IMPALA-4572">
|
|
In <keyword keyref="impala28_full"/> and
|
|
higher, when you run the <codeph>COMPUTE STATS</codeph> or
|
|
<codeph>COMPUTE INCREMENTAL STATS</codeph> statement against a
|
|
Parquet table, Impala automatically applies the query option setting
|
|
<codeph>MT_DOP=4</codeph> to increase the amount of intra-node
|
|
parallelism during this CPU-intensive operation. See <xref
|
|
keyref="mt_dop"/> for details about what this query option does
|
|
and how to use it with CPU-intensive <codeph>SELECT</codeph>
|
|
statements.
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
</note>
|
|
|
|
<p rev="IMPALA-1654">
|
|
<b>Computing stats for groups of partitions:</b>
|
|
</p>
|
|
|
|
<p rev="IMPALA-1654">
|
|
In <keyword keyref="impala28_full"/> and higher, you can run <codeph>COMPUTE INCREMENTAL STATS</codeph>
|
|
on multiple partitions, instead of the entire table or one partition at a time. You include
|
|
comparison operators other than <codeph>=</codeph> in the <codeph>PARTITION</codeph> clause,
|
|
and the <codeph>COMPUTE INCREMENTAL STATS</codeph> statement applies to all partitions that
|
|
match the comparison expression.
|
|
</p>
|
|
|
|
<p rev="IMPALA-1654">
|
|
For example, the <codeph>INT_PARTITIONS</codeph> table contains 4 partitions.
|
|
The following <codeph>COMPUTE INCREMENTAL STATS</codeph> statements affect some but not all
|
|
partitions, as indicated by the <codeph>Updated <varname>n</varname> partition(s)</codeph>
|
|
messages. The partitions that are affected depend on values in the partition key column <codeph>X</codeph>
|
|
that match the comparison expression in the <codeph>PARTITION</codeph> clause.
|
|
</p>
|
|
|
|
<codeblock rev="IMPALA-1654"><![CDATA[
|
|
show partitions int_partitions;
|
|
+-------+-------+--------+------+--------------+-------------------+---------+...
|
|
| x | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |...
|
|
+-------+-------+--------+------+--------------+-------------------+---------+...
|
|
| 99 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | PARQUET |...
|
|
| 120 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |...
|
|
| 150 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |...
|
|
| 200 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |...
|
|
| Total | -1 | 0 | 0B | 0B | | |...
|
|
+-------+-------+--------+------+--------------+-------------------+---------+...
|
|
|
|
compute incremental stats int_partitions partition (x < 100);
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 1 column(s). |
|
|
+-----------------------------------------+
|
|
|
|
compute incremental stats int_partitions partition (x in (100, 150, 200));
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 2 partition(s) and 1 column(s). |
|
|
+-----------------------------------------+
|
|
|
|
compute incremental stats int_partitions partition (x between 100 and 175);
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 2 partition(s) and 1 column(s). |
|
|
+-----------------------------------------+
|
|
|
|
compute incremental stats int_partitions partition (x in (100, 150, 200) or x < 100);
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 3 partition(s) and 1 column(s). |
|
|
+-----------------------------------------+
|
|
|
|
compute incremental stats int_partitions partition (x != 150);
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 3 partition(s) and 1 column(s). |
|
|
+-----------------------------------------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
|
|
|
|
<p rev="2.3.0">
|
|
Currently, the statistics created by the <codeph>COMPUTE STATS</codeph> statement do not include
|
|
information about complex type columns. The column stats metrics for complex columns are always shown
|
|
as -1. For queries involving complex type columns, Impala uses
|
|
heuristics to estimate the data distribution within such columns.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/hbase_blurb"/>
|
|
|
|
<p>
|
|
<codeph>COMPUTE STATS</codeph> works for HBase tables also. The statistics gathered for HBase tables are
|
|
somewhat different than for HDFS-backed tables, but that metadata is still used for optimization when HBase
|
|
tables are involved in join queries.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/s3_blurb"/>
|
|
|
|
<p rev="2.2.0">
|
|
<codeph>COMPUTE STATS</codeph> also works for tables where data resides in the Amazon Simple Storage Service (S3).
|
|
See <xref href="impala_s3.xml#s3"/> for details.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/performance_blurb"/>
|
|
|
|
<p>
|
|
The statistics collected by <codeph>COMPUTE STATS</codeph> are used to optimize join queries
|
|
<codeph>INSERT</codeph> operations into Parquet tables, and other resource-intensive kinds of SQL statements.
|
|
See <xref href="impala_perf_stats.xml#perf_stats"/> for details.
|
|
</p>
|
|
|
|
<p>
|
|
For large tables, the <codeph>COMPUTE STATS</codeph> statement itself might take a long time and you
|
|
might need to tune its performance. The <codeph>COMPUTE STATS</codeph> statement does not work with the
|
|
<codeph>EXPLAIN</codeph> statement, or the <codeph>SUMMARY</codeph> command in <cmdname>impala-shell</cmdname>.
|
|
You can use the <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> to examine timing information
|
|
for the statement as a whole. If a basic <codeph>COMPUTE STATS</codeph> statement takes a long time for a
|
|
partitioned table, consider switching to the <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax so that only
|
|
newly added partitions are analyzed each time.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
This example shows two tables, <codeph>T1</codeph> and <codeph>T2</codeph>, with a small number distinct
|
|
values linked by a parent-child relationship between <codeph>T1.ID</codeph> and <codeph>T2.PARENT</codeph>.
|
|
<codeph>T1</codeph> is tiny, while <codeph>T2</codeph> has approximately 100K rows. Initially, the statistics
|
|
includes physical measurements such as the number of files, the total size, and size measurements for
|
|
fixed-length columns such as with the <codeph>INT</codeph> type. Unknown values are represented by -1. After
|
|
running <codeph>COMPUTE STATS</codeph> for each table, much more information is available through the
|
|
<codeph>SHOW STATS</codeph> statements. If you were running a join query involving both of these tables, you
|
|
would need statistics for both tables to get the most effective optimization for the query.
|
|
</p>
|
|
|
|
<!-- Note: chopped off any excess characters at position 87 and after,
|
|
to avoid weird wrapping in PDF.
|
|
Applies to any subsequent examples with output from SHOW ... STATS too. -->
|
|
|
|
<codeblock>[localhost:21000] > show table stats t1;
|
|
Query: show table stats t1
|
|
+-------+--------+------+--------+
|
|
| #Rows | #Files | Size | Format |
|
|
+-------+--------+------+--------+
|
|
| -1 | 1 | 33B | TEXT |
|
|
+-------+--------+------+--------+
|
|
Returned 1 row(s) in 0.02s
|
|
[localhost:21000] > show table stats t2;
|
|
Query: show table stats t2
|
|
+-------+--------+----------+--------+
|
|
| #Rows | #Files | Size | Format |
|
|
+-------+--------+----------+--------+
|
|
| -1 | 28 | 960.00KB | TEXT |
|
|
+-------+--------+----------+--------+
|
|
Returned 1 row(s) in 0.01s
|
|
[localhost:21000] > show column stats t1;
|
|
Query: show column stats t1
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| id | INT | -1 | -1 | 4 | 4 |
|
|
| s | STRING | -1 | -1 | -1 | -1 |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
Returned 2 row(s) in 1.71s
|
|
[localhost:21000] > show column stats t2;
|
|
Query: show column stats t2
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| parent | INT | -1 | -1 | 4 | 4 |
|
|
| s | STRING | -1 | -1 | -1 | -1 |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
Returned 2 row(s) in 0.01s
|
|
[localhost:21000] > compute stats t1;
|
|
Query: compute stats t1
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
Returned 1 row(s) in 5.30s
|
|
[localhost:21000] > show table stats t1;
|
|
Query: show table stats t1
|
|
+-------+--------+------+--------+
|
|
| #Rows | #Files | Size | Format |
|
|
+-------+--------+------+--------+
|
|
| 3 | 1 | 33B | TEXT |
|
|
+-------+--------+------+--------+
|
|
Returned 1 row(s) in 0.01s
|
|
[localhost:21000] > show column stats t1;
|
|
Query: show column stats t1
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| id | INT | 3 | -1 | 4 | 4 |
|
|
| s | STRING | 3 | -1 | -1 | -1 |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
Returned 2 row(s) in 0.02s
|
|
[localhost:21000] > compute stats t2;
|
|
Query: compute stats t2
|
|
+-----------------------------------------+
|
|
| summary |
|
|
+-----------------------------------------+
|
|
| Updated 1 partition(s) and 2 column(s). |
|
|
+-----------------------------------------+
|
|
Returned 1 row(s) in 5.70s
|
|
[localhost:21000] > show table stats t2;
|
|
Query: show table stats t2
|
|
+-------+--------+----------+--------+
|
|
| #Rows | #Files | Size | Format |
|
|
+-------+--------+----------+--------+
|
|
| 98304 | 1 | 960.00KB | TEXT |
|
|
+-------+--------+----------+--------+
|
|
Returned 1 row(s) in 0.03s
|
|
[localhost:21000] > show column stats t2;
|
|
Query: show column stats t2
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
| parent | INT | 3 | -1 | 4 | 4 |
|
|
| s | STRING | 6 | -1 | 14 | 9.3 |
|
|
+--------+--------+------------------+--------+----------+----------+
|
|
Returned 2 row(s) in 0.01s</codeblock>
|
|
|
|
<p rev="2.1.0">
|
|
The following example shows how to use the <codeph>INCREMENTAL</codeph> clause, available in Impala 2.1.0 and
|
|
higher. The <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax lets you collect statistics for newly added or
|
|
changed partitions, without rescanning the entire table.
|
|
</p>
|
|
|
|
<codeblock conref="../shared/impala_common.xml#common/compute_stats_walkthrough"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/file_format_blurb"/>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with tables created with any of the file formats supported
|
|
by Impala. See <xref href="impala_file_formats.xml#file_formats"/> for details about working with the
|
|
different file formats. The following considerations apply to <codeph>COMPUTE STATS</codeph> depending on the
|
|
file format of the table.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with text tables with no restrictions. These tables can be
|
|
created through either Impala or Hive.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with Parquet tables. These tables can be created through
|
|
either Impala or Hive.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with Avro tables without restriction in <keyword keyref="impala22_full"/>
|
|
and higher. In earlier releases, <codeph>COMPUTE STATS</codeph> worked only for Avro tables created through Hive,
|
|
and required the <codeph>CREATE TABLE</codeph> statement to use SQL-style column names and types rather than an
|
|
Avro-style schema specification.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with RCFile tables with no restrictions. These tables can
|
|
be created through either Impala or Hive.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with SequenceFile tables with no restrictions. These
|
|
tables can be created through either Impala or Hive.
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>COMPUTE STATS</codeph> statement works with partitioned tables, whether all the partitions use
|
|
the same file format, or some partitions are defined through <codeph>ALTER TABLE</codeph> to use different
|
|
file formats.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/ddl_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/cancel_blurb_maybe"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
|
|
|
<note conref="../shared/impala_common.xml#common/compute_stats_nulls"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/internals_blurb"/>
|
|
<p>
|
|
Behind the scenes, the <codeph>COMPUTE STATS</codeph> statement
|
|
executes two statements: one to count the rows of each partition
|
|
in the table (or the entire table if unpartitioned) through the
|
|
<codeph>COUNT(*)</codeph> function,
|
|
and another to count the approximate number of distinct values
|
|
in each column through the <codeph>NDV()</codeph> function.
|
|
You might see these queries in your monitoring and diagnostic displays.
|
|
The same factors that affect the performance, scalability, and
|
|
execution of other queries (such as parallel execution, memory usage,
|
|
admission control, and timeouts) also apply to the queries run by the
|
|
<codeph>COMPUTE STATS</codeph> statement.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
|
|
<p rev="">
|
|
The user ID that the <cmdname>impalad</cmdname> daemon runs under,
|
|
typically the <codeph>impala</codeph> user, must have read
|
|
permission for all affected files in the source directory:
|
|
all files in the case of an unpartitioned table or
|
|
a partitioned table in the case of <codeph>COMPUTE STATS</codeph>;
|
|
or all the files in partitions without incremental stats in
|
|
the case of <codeph>COMPUTE INCREMENTAL STATS</codeph>.
|
|
It must also have read and execute permissions for all
|
|
relevant directories holding the data files.
|
|
(Essentially, <codeph>COMPUTE STATS</codeph> requires the
|
|
same permissions as the underlying <codeph>SELECT</codeph> queries it runs
|
|
against the table.)
|
|
</p>
|
|
|
|
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
|
|
|
<p rev="IMPALA-2830">
|
|
The <codeph>COMPUTE STATS</codeph> statement applies to Kudu tables.
|
|
Impala only computes the number of rows for the whole Kudu table,
|
|
partition level row counts are not available.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
<xref href="impala_drop_stats.xml#drop_stats"/>, <xref href="impala_show.xml#show_table_stats"/>,
|
|
<xref href="impala_show.xml#show_column_stats"/>, <xref href="impala_perf_stats.xml#perf_stats"/>
|
|
</p>
|
|
</conbody>
|
|
</concept>
|