mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
836 lines
37 KiB
XML
836 lines
37 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept rev="1.4.0" id="decimal">
|
|
|
|
<title>DECIMAL Data Type (<keyword keyref="impala14"/> or higher only)</title>
|
|
<titlealts audience="PDF"><navtitle>DECIMAL</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Data Types"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Schemas"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
A numeric data type with fixed scale and precision, used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER
|
|
TABLE</codeph> statements. Suitable for financial and other arithmetic calculations where the imprecise
|
|
representation and rounding behavior of <codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> make those types
|
|
impractical.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<p>
|
|
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
|
|
</p>
|
|
|
|
<codeblock><varname>column_name</varname> DECIMAL[(<varname>precision</varname>[,<varname>scale</varname>])]</codeblock>
|
|
|
|
<p>
|
|
<codeph>DECIMAL</codeph> with no precision or scale values is equivalent to <codeph>DECIMAL(9,0)</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Precision and Scale:</b>
|
|
</p>
|
|
|
|
<p>
|
|
<varname>precision</varname> represents the total number of digits that can be represented by the column,
|
|
regardless of the location of the decimal point. This value must be between 1 and 38. For example,
|
|
representing integer values up to 9999, and floating-point values up to 99.99, both require a precision of 4.
|
|
You can also represent corresponding negative values, without any change in the precision. For example, the
|
|
range -9999 to 9999 still only requires a precision of 4.
|
|
</p>
|
|
|
|
<p>
|
|
<varname>scale</varname> represents the number of fractional digits. This value must be less than or equal to
|
|
<varname>precision</varname>. A scale of 0 produces integral values, with no fractional part. If precision
|
|
and scale are equal, all the digits come after the decimal point, making all the values between 0 and
|
|
0.999... or 0 and -0.999...
|
|
</p>
|
|
|
|
<p>
|
|
When <varname>precision</varname> and <varname>scale</varname> are omitted, a <codeph>DECIMAL</codeph> value
|
|
is treated as <codeph>DECIMAL(9,0)</codeph>, that is, an integer value ranging from
|
|
<codeph>-999,999,999</codeph> to <codeph>999,999,999</codeph>. This is the largest <codeph>DECIMAL</codeph>
|
|
value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a
|
|
value of zero for the scale.
|
|
</p>
|
|
|
|
<p>
|
|
Both <varname>precision</varname> and <varname>scale</varname> must be specified as integer literals, not any
|
|
other kind of constant expressions.
|
|
</p>
|
|
|
|
<p>
|
|
To check the precision or scale for arbitrary values, you can call the
|
|
<xref href="impala_math_functions.xml#math_functions"><codeph>precision()</codeph> and
|
|
<codeph>scale()</codeph> built-in functions</xref>. For example, you might use these values to figure out how
|
|
many characters are required for various fields in a report, or to understand the rounding characteristics of
|
|
a formula as applied to a particular <codeph>DECIMAL</codeph> column.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Range:</b>
|
|
</p>
|
|
|
|
<p>
|
|
The maximum precision value is 38. Thus, the largest integral value is represented by
|
|
<codeph>DECIMAL(38,0)</codeph> (999... with 9 repeated 38 times). The most precise fractional value (between
|
|
0 and 1, or 0 and -1) is represented by <codeph>DECIMAL(38,38)</codeph>, with 38 digits to the right of the
|
|
decimal point. The value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
|
|
would be .999... (9 repeated 38 times).
|
|
</p>
|
|
|
|
<p>
|
|
For a given precision and scale, the range of <codeph>DECIMAL</codeph> values is the same in the positive and
|
|
negative directions. For example, <codeph>DECIMAL(4,2)</codeph> can represent from -99.99 to 99.99. This is
|
|
different from other integral numeric types where the positive and negative bounds differ slightly.
|
|
</p>
|
|
|
|
<p>
|
|
When you use <codeph>DECIMAL</codeph> values in arithmetic expressions, the precision and scale of the result
|
|
value are determined as follows:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
For addition and subtraction, the precision and scale are based on the maximum possible result, that is,
|
|
if all the digits of the input values were 9s and the absolute values were added together.
|
|
</p>
|
|
<!-- Seems like buggy output from this first query, so hiding the example for the time being. -->
|
|
<codeblock audience="hidden"><![CDATA[[localhost:21000] > select 50000.5 + 12.444, precision(50000.5 + 12.444), scale(50000.5 + 12.444);
|
|
+------------------+-----------------------------+-------------------------+
|
|
| 50000.5 + 12.444 | precision(50000.5 + 12.444) | scale(50000.5 + 12.444) |
|
|
+------------------+-----------------------------+-------------------------+
|
|
| 50012.944 | 9 | 3 |
|
|
+------------------+-----------------------------+-------------------------+
|
|
[localhost:21000] > select 99999.9 + 99.999, precision(99999.9 + 99.999), scale(99999.9 + 99.999);
|
|
+------------------+-----------------------------+-------------------------+
|
|
| 99999.9 + 99.999 | precision(99999.9 + 99.999) | scale(99999.9 + 99.999) |
|
|
+------------------+-----------------------------+-------------------------+
|
|
| 100099.899 | 9 | 3 |
|
|
+------------------+-----------------------------+-------------------------+
|
|
]]>
|
|
</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum
|
|
of the scales of the input values.
|
|
</p>
|
|
</li>
|
|
|
|
<!-- Need to add some specifics to discussion of division. Details here: http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/29/564110.aspx -->
|
|
|
|
<li>
|
|
<p>
|
|
For division, Impala sets the precision and scale to values large enough to represent the whole and
|
|
fractional parts of the result.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
For <codeph>UNION</codeph>, the scale is the larger of the scales of the input values, and the precision
|
|
is increased if necessary to accommodate any additional fractional digits. If the same input value has
|
|
the largest precision and the largest scale, the result value has the same precision and scale. If one
|
|
value has a larger precision but smaller scale, the scale of the result value is increased. For example,
|
|
<codeph>DECIMAL(20,2) UNION DECIMAL(8,6)</codeph> produces a result of type
|
|
<codeph>DECIMAL(24,6)</codeph>. The extra 4 fractional digits of scale (6-2) are accommodated by
|
|
extending the precision by the same amount (20+4).
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
To doublecheck, you can always call the <codeph>PRECISION()</codeph> and <codeph>SCALE()</codeph>
|
|
functions on the results of an arithmetic expression to see the relevant values, or use a <codeph>CREATE
|
|
TABLE AS SELECT</codeph> statement to define a column based on the return type of the expression.
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
|
|
|
|
<ul>
|
|
<li>
|
|
Using the <codeph>DECIMAL</codeph> type is only supported under <keyword keyref="impala14_full"/> and higher.
|
|
</li>
|
|
|
|
<li>
|
|
Use the <codeph>DECIMAL</codeph> data type in Impala for applications where you used the
|
|
<codeph>NUMBER</codeph> data type in Oracle. The Impala <codeph>DECIMAL</codeph> type does not support the
|
|
Oracle idioms of <codeph>*</codeph> for scale or negative values for precision.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
<b>Conversions and casting:</b>
|
|
</p>
|
|
|
|
<p>
|
|
<ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
|
|
</p>
|
|
|
|
<p>
|
|
Impala automatically converts between <codeph>DECIMAL</codeph> and other numeric types where possible. A
|
|
<codeph>DECIMAL</codeph> with zero scale is converted to or from the smallest appropriate integral type. A
|
|
<codeph>DECIMAL</codeph> with a fractional part is automatically converted to or from the smallest
|
|
appropriate floating-point type. If the destination type does not have sufficient precision or scale to hold
|
|
all possible values of the source type, Impala raises an error and does not convert the value.
|
|
</p>
|
|
|
|
<p>
|
|
For example, these statements show how expressions of <codeph>DECIMAL</codeph> and other types are reconciled
|
|
to the same type in the context of <codeph>UNION</codeph> queries and <codeph>INSERT</codeph> statements:
|
|
</p>
|
|
|
|
<codeblock><![CDATA[[localhost:21000] > select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
|
|
+----------------+
|
|
| x |
|
|
+----------------+
|
|
| 1.5000 |
|
|
| 1.0000 |
|
|
+----------------+
|
|
[localhost:21000] > create table int_vs_decimal as select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
|
|
+-------------------+
|
|
| summary |
|
|
+-------------------+
|
|
| Inserted 2 row(s) |
|
|
+-------------------+
|
|
[localhost:21000] > desc int_vs_decimal;
|
|
+------+---------------+---------+
|
|
| name | type | comment |
|
|
+------+---------------+---------+
|
|
| x | decimal(14,4) | |
|
|
+------+---------------+---------+
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
To avoid potential conversion errors, you can use <codeph>CAST()</codeph> to convert <codeph>DECIMAL</codeph>
|
|
values to <codeph>FLOAT</codeph>, <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, <codeph>INT</codeph>,
|
|
<codeph>BIGINT</codeph>, <codeph>STRING</codeph>, <codeph>TIMESTAMP</codeph>, or <codeph>BOOLEAN</codeph>.
|
|
You can use exponential notation in <codeph>DECIMAL</codeph> literals or when casting from
|
|
<codeph>STRING</codeph>, for example <codeph>1.0e6</codeph> to represent one million.
|
|
</p>
|
|
|
|
<p>
|
|
If you cast a value with more fractional digits than the scale of the destination type, any extra fractional
|
|
digits are truncated (not rounded). Casting a value to a target type with not enough precision produces a
|
|
result of <codeph>NULL</codeph> and displays a runtime warning.
|
|
</p>
|
|
|
|
<codeblock><![CDATA[[localhost:21000] > select cast(1.239 as decimal(3,2));
|
|
+-----------------------------+
|
|
| cast(1.239 as decimal(3,2)) |
|
|
+-----------------------------+
|
|
| 1.23 |
|
|
+-----------------------------+
|
|
[localhost:21000] > select cast(1234 as decimal(3));
|
|
+----------------------------+
|
|
| cast(1234 as decimal(3,0)) |
|
|
+----------------------------+
|
|
| NULL |
|
|
+----------------------------+
|
|
WARNINGS: Expression overflowed, returning NULL
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
When you specify integer literals, for example in <codeph>INSERT ... VALUES</codeph> statements or arithmetic
|
|
expressions, those numbers are interpreted as the smallest applicable integer type. You must use
|
|
<codeph>CAST()</codeph> calls for some combinations of integer literals and <codeph>DECIMAL</codeph>
|
|
precision. For example, <codeph>INT</codeph> has a maximum value that is 10 digits long,
|
|
<codeph>TINYINT</codeph> has a maximum value that is 3 digits long, and so on. If you specify a value such as
|
|
123456 to go into a <codeph>DECIMAL</codeph> column, Impala checks if the column has enough precision to
|
|
represent the largest value of that integer type, and raises an error if not. Therefore, use an expression
|
|
like <codeph>CAST(123456 TO DECIMAL(9,0))</codeph> for <codeph>DECIMAL</codeph> columns with precision 9 or
|
|
less, <codeph>CAST(50 TO DECIMAL(2,0))</codeph> for <codeph>DECIMAL</codeph> columns with precision 2 or
|
|
less, and so on. For <codeph>DECIMAL</codeph> columns with precision 10 or greater, Impala automatically
|
|
interprets the value as the correct <codeph>DECIMAL</codeph> type; however, because
|
|
<codeph>DECIMAL(10)</codeph> requires 8 bytes of storage while <codeph>DECIMAL(9)</codeph> requires only 4
|
|
bytes, only use precision of 10 or higher when actually needed.
|
|
</p>
|
|
|
|
<codeblock><![CDATA[[localhost:21000] > create table decimals_9_0 (x decimal);
|
|
[localhost:21000] > insert into decimals_9_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
|
|
ERROR: AnalysisException: Possible loss of precision for target table 'decimal_testing.decimals_9_0'.
|
|
Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
|
|
[localhost:21000] > insert into decimals_9_0 values (cast(1 as decimal)), (cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as decimal)), (cast(1000000 as decimal));
|
|
|
|
[localhost:21000] > create table decimals_10_0 (x decimal(10,0));
|
|
[localhost:21000] > insert into decimals_10_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
|
|
]]>
|
|
</codeblock>
|
|
|
|
<p>
|
|
Be aware that in memory and for binary file formats such as Parquet or Avro, <codeph>DECIMAL(10)</codeph> or
|
|
higher consumes 8 bytes while <codeph>DECIMAL(9)</codeph> (the default for <codeph>DECIMAL</codeph>) or lower
|
|
consumes 4 bytes. Therefore, to conserve space in large tables, use the smallest-precision
|
|
<codeph>DECIMAL</codeph> type that is appropriate and <codeph>CAST()</codeph> literal values where necessary,
|
|
rather than declaring <codeph>DECIMAL</codeph> columns with high precision for convenience.
|
|
</p>
|
|
|
|
<p>
|
|
To represent a very large or precise <codeph>DECIMAL</codeph> value as a literal, for example one that
|
|
contains more digits than can be represented by a <codeph>BIGINT</codeph> literal, use a quoted string or a
|
|
floating-point value for the number, and <codeph>CAST()</codeph> to the desired <codeph>DECIMAL</codeph>
|
|
type:
|
|
</p>
|
|
|
|
<codeblock>insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000),
|
|
(cast("999999999999999999999999999999" as decimal(38,5))),
|
|
(cast(999999999999999999999999999999. as decimal(38,5)));
|
|
</codeblock>
|
|
|
|
<ul>
|
|
<li>
|
|
<p> The result of the <codeph>SUM()</codeph> aggregate function on
|
|
<codeph>DECIMAL</codeph> values is promoted to a precision of 38,
|
|
with the same precision as the underlying column. Thus, the result can
|
|
represent the largest possible value at that particular precision. </p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
<codeph>STRING</codeph> columns, literals, or expressions can be converted to <codeph>DECIMAL</codeph> as
|
|
long as the overall number of digits and digits to the right of the decimal point fit within the
|
|
specified precision and scale for the declared <codeph>DECIMAL</codeph> type. By default, a
|
|
<codeph>DECIMAL</codeph> value with no specified scale or precision can hold a maximum of 9 digits of an
|
|
integer value. If there are more digits in the string value than are allowed by the
|
|
<codeph>DECIMAL</codeph> scale and precision, the result is <codeph>NULL</codeph>.
|
|
</p>
|
|
<p>
|
|
The following examples demonstrate how <codeph>STRING</codeph> values with integer and fractional parts
|
|
are represented when converted to <codeph>DECIMAL</codeph>. If the scale is 0, the number is treated
|
|
as an integer value with a maximum of <varname>precision</varname> digits. If the precision is greater than
|
|
0, the scale must be increased to account for the digits both to the left and right of the decimal point.
|
|
As the precision increases, output values are printed with additional trailing zeros after the decimal
|
|
point if needed. Any trailing zeros after the decimal point in the <codeph>STRING</codeph> value must fit
|
|
within the number of digits specified by the precision.
|
|
</p>
|
|
<codeblock><![CDATA[[localhost:21000] > select cast('100' as decimal); -- Small integer value fits within 9 digits of scale.
|
|
+-----------------------------+
|
|
| cast('100' as decimal(9,0)) |
|
|
+-----------------------------+
|
|
| 100 |
|
|
+-----------------------------+
|
|
[localhost:21000] > select cast('100' as decimal(3,0)); -- Small integer value fits within 3 digits of scale.
|
|
+-----------------------------+
|
|
| cast('100' as decimal(3,0)) |
|
|
+-----------------------------+
|
|
| 100 |
|
|
+-----------------------------+
|
|
[localhost:21000] > select cast('100' as decimal(2,0)); -- 2 digits of scale is not enough!
|
|
+-----------------------------+
|
|
| cast('100' as decimal(2,0)) |
|
|
+-----------------------------+
|
|
| NULL |
|
|
+-----------------------------+
|
|
[localhost:21000] > select cast('100' as decimal(3,1)); -- (3,1) = 2 digits left of the decimal point, 1 to the right. Not enough.
|
|
+-----------------------------+
|
|
| cast('100' as decimal(3,1)) |
|
|
+-----------------------------+
|
|
| NULL |
|
|
+-----------------------------+
|
|
[localhost:21000] > select cast('100' as decimal(4,1)); -- 4 digits total, 1 to the right of the decimal point.
|
|
+-----------------------------+
|
|
| cast('100' as decimal(4,1)) |
|
|
+-----------------------------+
|
|
| 100.0 |
|
|
+-----------------------------+
|
|
[localhost:21000] > select cast('98.6' as decimal(3,1)); -- (3,1) can hold a 3 digit number with 1 fractional digit.
|
|
+------------------------------+
|
|
| cast('98.6' as decimal(3,1)) |
|
|
+------------------------------+
|
|
| 98.6 |
|
|
+------------------------------+
|
|
[localhost:21000] > select cast('98.6' as decimal(15,1)); -- Larger scale allows bigger numbers but still only 1 fractional digit.
|
|
+-------------------------------+
|
|
| cast('98.6' as decimal(15,1)) |
|
|
+-------------------------------+
|
|
| 98.6 |
|
|
+-------------------------------+
|
|
[localhost:21000] > select cast('98.6' as decimal(15,5)); -- Larger precision allows more fractional digits, outputs trailing zeros.
|
|
+-------------------------------+
|
|
| cast('98.6' as decimal(15,5)) |
|
|
+-------------------------------+
|
|
| 98.60000 |
|
|
+-------------------------------+
|
|
[localhost:21000] > select cast('98.60000' as decimal(15,1)); -- Trailing zeros in the string must fit within 'scale' digits (1 in this case).
|
|
+-----------------------------------+
|
|
| cast('98.60000' as decimal(15,1)) |
|
|
+-----------------------------------+
|
|
| NULL |
|
|
+-----------------------------------+
|
|
]]>
|
|
</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
Most built-in arithmetic functions such as <codeph>SIN()</codeph> and <codeph>COS()</codeph> continue to
|
|
accept only <codeph>DOUBLE</codeph> values because they are so commonly used in scientific context for
|
|
calculations of IEEE 954-compliant values. The built-in functions that accept and return
|
|
<codeph>DECIMAL</codeph> are:
|
|
<!-- List from Skye: positive, negative, least, greatest, fnv_hash, if, nullif, zeroifnull, isnull, coalesce -->
|
|
<!-- Nong had already told me about abs, ceil, floor, round, truncate -->
|
|
<ul>
|
|
<li>
|
|
<codeph>ABS()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>CEIL()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>COALESCE()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>FLOOR()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>FNV_HASH()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>GREATEST()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>IF()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>ISNULL()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>LEAST()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>NEGATIVE()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>NULLIF()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>POSITIVE()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>PRECISION()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>ROUND()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>SCALE()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>TRUNCATE()</codeph>
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>ZEROIFNULL()</codeph>
|
|
</li>
|
|
</ul>
|
|
See <xref href="impala_functions.xml#builtins"/> for details.
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
<codeph>BIGINT</codeph>, <codeph>INT</codeph>, <codeph>SMALLINT</codeph>, and <codeph>TINYINT</codeph>
|
|
values can all be cast to <codeph>DECIMAL</codeph>. The number of digits to the left of the decimal point
|
|
in the <codeph>DECIMAL</codeph> type must be sufficient to hold the largest value of the corresponding
|
|
integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning
|
|
there is sometimes a range of values that require one more digit of <codeph>DECIMAL</codeph> scale than
|
|
you might expect. For integer values, the precision of the <codeph>DECIMAL</codeph> type can be zero; if
|
|
the precision is greater than zero, remember to increase the scale value by an equivalent amount to hold
|
|
the required number of digits to the left of the decimal point.
|
|
</p>
|
|
<p>
|
|
The following examples show how different integer types are converted to <codeph>DECIMAL</codeph>.
|
|
</p>
|
|
<!-- According to Nong, it's a bug that so many integer digits can be converted to a DECIMAL
|
|
value with small (s,p) spec. So expect to re-do this example. -->
|
|
<codeblock><![CDATA[[localhost:21000] > select cast(1 as decimal(1,0));
|
|
+-------------------------+
|
|
| cast(1 as decimal(1,0)) |
|
|
+-------------------------+
|
|
| 1 |
|
|
+-------------------------+
|
|
[localhost:21000] > select cast(9 as decimal(1,0));
|
|
+-------------------------+
|
|
| cast(9 as decimal(1,0)) |
|
|
+-------------------------+
|
|
| 9 |
|
|
+-------------------------+
|
|
[localhost:21000] > select cast(10 as decimal(1,0));
|
|
+--------------------------+
|
|
| cast(10 as decimal(1,0)) |
|
|
+--------------------------+
|
|
| 10 |
|
|
+--------------------------+
|
|
[localhost:21000] > select cast(10 as decimal(1,1));
|
|
+--------------------------+
|
|
| cast(10 as decimal(1,1)) |
|
|
+--------------------------+
|
|
| 10.0 |
|
|
+--------------------------+
|
|
[localhost:21000] > select cast(100 as decimal(1,1));
|
|
+---------------------------+
|
|
| cast(100 as decimal(1,1)) |
|
|
+---------------------------+
|
|
| 100.0 |
|
|
+---------------------------+
|
|
[localhost:21000] > select cast(1000 as decimal(1,1));
|
|
+----------------------------+
|
|
| cast(1000 as decimal(1,1)) |
|
|
+----------------------------+
|
|
| 1000.0 |
|
|
+----------------------------+
|
|
]]>
|
|
</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
When a <codeph>DECIMAL</codeph> value is converted to any of the integer types, any fractional part is
|
|
truncated (that is, rounded towards zero):
|
|
</p>
|
|
<codeblock><![CDATA[[localhost:21000] > create table num_dec_days (x decimal(4,1));
|
|
[localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
|
|
[localhost:21000] > insert into num_dec_days values (cast(0.1 as decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), (cast(9.9 as decimal(4,1)));
|
|
[localhost:21000] > select cast(x as int) from num_dec_days;
|
|
+----------------+
|
|
| cast(x as int) |
|
|
+----------------+
|
|
| 1 |
|
|
| 2 |
|
|
| 4 |
|
|
| 0 |
|
|
| 0 |
|
|
| 9 |
|
|
| 9 |
|
|
+----------------+
|
|
]]>
|
|
</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
You cannot directly cast <codeph>TIMESTAMP</codeph> or <codeph>BOOLEAN</codeph> values to or from
|
|
<codeph>DECIMAL</codeph> values. You can turn a <codeph>DECIMAL</codeph> value into a time-related
|
|
representation using a two-step process, by converting it to an integer value and then using that result
|
|
in a call to a date and time function such as <codeph>from_unixtime()</codeph>.
|
|
</p>
|
|
<codeblock><![CDATA[[localhost:21000] > select from_unixtime(cast(cast(1000.0 as decimal) as bigint));
|
|
+-------------------------------------------------------------+
|
|
| from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint)) |
|
|
+-------------------------------------------------------------+
|
|
| 1970-01-01 00:16:40 |
|
|
+-------------------------------------------------------------+
|
|
[localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- x is a DECIMAL column.
|
|
|
|
[localhost:21000] > create table num_dec_days (x decimal(4,1));
|
|
[localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
|
|
[localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- The 4.5 value is truncated to 4 and becomes '4 days'.
|
|
+--------------------------------------+
|
|
| now() + interval cast(x as int) days |
|
|
+--------------------------------------+
|
|
| 2014-05-13 23:11:55.163284000 |
|
|
| 2014-05-14 23:11:55.163284000 |
|
|
| 2014-05-16 23:11:55.163284000 |
|
|
+--------------------------------------+
|
|
]]>
|
|
</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
Because values in <codeph>INSERT</codeph> statements are checked rigorously for type compatibility, be
|
|
prepared to use <codeph>CAST()</codeph> function calls around literals, column references, or other
|
|
expressions that you are inserting into a <codeph>DECIMAL</codeph> column.
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
|
|
|
|
<p>
|
|
<b>DECIMAL differences from integer and floating-point types:</b>
|
|
</p>
|
|
|
|
<p>
|
|
With the <codeph>DECIMAL</codeph> type, you are concerned with the number of overall digits of a number
|
|
rather than powers of 2 (as in <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, and so on). Therefore,
|
|
the limits with integral values of <codeph>DECIMAL</codeph> types fall around 99, 999, 9999, and so on rather
|
|
than 32767, 65535, 2
|
|
<sup>32</sup>
|
|
-1, and so on. For fractional values, you do not need to account for imprecise representation of the
|
|
fractional part according to the IEEE-954 standard (as in <codeph>FLOAT</codeph> and
|
|
<codeph>DOUBLE</codeph>). Therefore, when you insert a fractional value into a <codeph>DECIMAL</codeph>
|
|
column, you can compare, sum, query, <codeph>GROUP BY</codeph>, and so on that column and get back the
|
|
original values rather than some <q>close but not identical</q> value.
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> can cause problems or unexpected behavior due to inability
|
|
to precisely represent certain fractional values, for example dollar and cents values for currency. You might
|
|
find output values slightly different than you inserted, equality tests that do not match precisely, or
|
|
unexpected values for <codeph>GROUP BY</codeph> columns. <codeph>DECIMAL</codeph> can help reduce unexpected
|
|
behavior and rounding errors, at the expense of some performance overhead for assignments and comparisons.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Literals and expressions:</b>
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
When you use an integer literal such as <codeph>1</codeph> or <codeph>999</codeph> in a SQL statement,
|
|
depending on the context, Impala will treat it as either the smallest appropriate
|
|
<codeph>DECIMAL</codeph> type, or the smallest integer type (<codeph>TINYINT</codeph>,
|
|
<codeph>SMALLINT</codeph>, <codeph>INT</codeph>, or <codeph>BIGINT</codeph>). To minimize memory usage,
|
|
Impala prefers to treat the literal as the smallest appropriate integer type.
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>
|
|
When you use a floating-point literal such as <codeph>1.1</codeph> or <codeph>999.44</codeph> in a SQL
|
|
statement, depending on the context, Impala will treat it as either the smallest appropriate
|
|
<codeph>DECIMAL</codeph> type, or the smallest floating-point type (<codeph>FLOAT</codeph> or
|
|
<codeph>DOUBLE</codeph>). To avoid loss of accuracy, Impala prefers to treat the literal as a
|
|
<codeph>DECIMAL</codeph>.
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Storage considerations:</b>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Only the precision determines the storage size for <codeph>DECIMAL</codeph> values; the scale setting has
|
|
no effect on the storage size.
|
|
</li>
|
|
|
|
<li>
|
|
Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these text-based file formats,
|
|
leading zeros are not stored, but trailing zeros are stored. In these tables, each <codeph>DECIMAL</codeph>
|
|
value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is
|
|
present and an extra byte for negative values. Once the values are loaded into memory, they are represented
|
|
in 4, 8, or 16 bytes as described in the following list items. The on-disk representation varies depending
|
|
on the file format of the table.
|
|
</li>
|
|
|
|
<!-- Next couple of points can be conref'ed with identical list bullets farther down under File Format Considerations. -->
|
|
|
|
<li>
|
|
Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as
|
|
possible
|
|
<!-- 4, 8, or 16 bytes -->
|
|
depending on the precision specified for the <codeph>DECIMAL</codeph> column.
|
|
<ul>
|
|
<li>
|
|
In memory, <codeph>DECIMAL</codeph> values with precision of 9 or less are stored in 4 bytes.
|
|
</li>
|
|
|
|
<li>
|
|
In memory, <codeph>DECIMAL</codeph> values with precision of 10 through 18 are stored in 8 bytes.
|
|
</li>
|
|
|
|
<li>
|
|
In memory, <codeph>DECIMAL</codeph> values with precision greater than 18 are stored in 16 bytes.
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/file_format_blurb"/>
|
|
|
|
<ul>
|
|
<li>
|
|
The <codeph>DECIMAL</codeph> data type can be stored in any of the file formats supported by Impala, as
|
|
described in <xref href="impala_file_formats.xml#file_formats"/>. Impala only writes to tables that use the
|
|
Parquet and text formats, so those formats are the focus for file format compatibility.
|
|
</li>
|
|
|
|
<li>
|
|
Impala can query Avro, RCFile, or SequenceFile tables containing <codeph>DECIMAL</codeph> columns, created
|
|
by other Hadoop components, on CDH 5 only.
|
|
</li>
|
|
|
|
<li>
|
|
You can use <codeph>DECIMAL</codeph> columns in Impala tables that are mapped to HBase tables. Impala can
|
|
query and insert into such tables.
|
|
</li>
|
|
|
|
<li>
|
|
Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these tables, each
|
|
<codeph>DECIMAL</codeph> value takes up as many bytes as there are digits in the value, plus an extra byte
|
|
if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage for
|
|
<codeph>DECIMAL</codeph> columns.
|
|
</li>
|
|
|
|
<li>
|
|
Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes
|
|
depending on the precision specified for the <codeph>DECIMAL</codeph> column.
|
|
</li>
|
|
|
|
<li>
|
|
Parquet files containing <codeph>DECIMAL</codeph> columns are not expected to be readable under CDH 4. See
|
|
the <b>Compatibility</b> section for details.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
<b>UDF considerations:</b> When writing a C++ UDF, use the <codeph>DecimalVal</codeph> data type defined in
|
|
<filepath>/usr/include/impala_udf/udf.h</filepath>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/partitioning_blurb"/>
|
|
|
|
<p>
|
|
You can use a <codeph>DECIMAL</codeph> column as a partition key. Doing so provides a better match between
|
|
the partition key values and the HDFS directory names than using a <codeph>DOUBLE</codeph> or
|
|
<codeph>FLOAT</codeph> partitioning column:
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/schema_evolution_blurb"/>
|
|
|
|
<ul>
|
|
<li>
|
|
For text-based formats (text, RCFile, and SequenceFile tables), you can issue an <codeph>ALTER TABLE ...
|
|
REPLACE COLUMNS</codeph> statement to change the precision and scale of an existing
|
|
<codeph>DECIMAL</codeph> column. As long as the values in the column fit within the new precision and
|
|
scale, they are returned correctly by a query. Any values that do not fit within the new precision and
|
|
scale are returned as <codeph>NULL</codeph>, and Impala reports the conversion error. Leading zeros do not
|
|
count against the precision value, but trailing zeros after the decimal point do.
|
|
<codeblock><![CDATA[[localhost:21000] > create table text_decimals (x string);
|
|
[localhost:21000] > insert into text_decimals values ("1"), ("2"), ("99.99"), ("1.234"), ("000001"), ("1.000000000");
|
|
[localhost:21000] > select * from text_decimals;
|
|
+-------------+
|
|
| x |
|
|
+-------------+
|
|
| 1 |
|
|
| 2 |
|
|
| 99.99 |
|
|
| 1.234 |
|
|
| 000001 |
|
|
| 1.000000000 |
|
|
+-------------+
|
|
[localhost:21000] > alter table text_decimals replace columns (x decimal(4,2));
|
|
[localhost:21000] > select * from text_decimals;
|
|
+-------+
|
|
| x |
|
|
+-------+
|
|
| 1.00 |
|
|
| 2.00 |
|
|
| 99.99 |
|
|
| NULL |
|
|
| 1.00 |
|
|
| NULL |
|
|
+-------+
|
|
ERRORS:
|
|
Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234)
|
|
file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0
|
|
e8420-b4b13bab7f1be787_56794587_data.0
|
|
record: 1.234
|
|
Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000)
|
|
file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20
|
|
c565a-cc4bd86c724c96ba_311873428_data.0
|
|
record: 1.000000000
|
|
]]>
|
|
</codeblock>
|
|
</li>
|
|
|
|
<li>
|
|
For binary formats (Parquet and Avro tables), although an <codeph>ALTER TABLE ... REPLACE COLUMNS</codeph>
|
|
statement that changes the precision or scale of a <codeph>DECIMAL</codeph> column succeeds, any subsequent
|
|
attempt to query the changed column results in a fatal error. (The other columns can still be queried
|
|
successfully.) This is because the metadata about the columns is stored in the data files themselves, and
|
|
<codeph>ALTER TABLE</codeph> does not actually make any updates to the data files. If the metadata in the
|
|
data files disagrees with the metadata in the metastore database, Impala cancels the query.
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<codeblock>CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
|
|
INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
|
|
SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
|
|
SELECT CAST(1000.5 AS DECIMAL);
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/decimal_no_stats"/>
|
|
|
|
<!-- <p conref="../shared/impala_common.xml#common/partitioning_good"/> -->
|
|
|
|
<p conref="../shared/impala_common.xml#common/hbase_ok"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/parquet_ok"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/text_bulky"/>
|
|
|
|
<!-- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->
|
|
|
|
<!-- <p conref="../shared/impala_common.xml#common/internals_blurb"/> -->
|
|
|
|
<!-- <p conref="../shared/impala_common.xml#common/added_in_20"/> -->
|
|
|
|
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
<xref href="impala_literals.xml#numeric_literals"/>, <xref href="impala_tinyint.xml#tinyint"/>,
|
|
<xref href="impala_smallint.xml#smallint"/>, <xref href="impala_int.xml#int"/>,
|
|
<xref href="impala_bigint.xml#bigint"/>, <xref href="impala_decimal.xml#decimal"/>,
|
|
<xref href="impala_math_functions.xml#math_functions"/> (especially <codeph>PRECISION()</codeph> and
|
|
<codeph>SCALE()</codeph>)
|
|
</p>
|
|
</conbody>
|
|
</concept>
|