impala/docs/topics/impala_decimal.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.4.0" id="decimal">

  <title>DECIMAL Data Type (<keyword keyref="impala14"/> or higher only)</title>
  <titlealts audience="PDF"><navtitle>DECIMAL</navtitle></titlealts>
  <prolog>
    <metadata>
      <data name="Category" value="Impala"/>
      <data name="Category" value="Impala Data Types"/>
      <data name="Category" value="SQL"/>
      <data name="Category" value="Data Analysts"/>
      <data name="Category" value="Developers"/>
      <data name="Category" value="Schemas"/>
    </metadata>
  </prolog>

  <conbody>

    <p>
      A numeric data type with fixed scale and precision, used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER
      TABLE</codeph> statements. Suitable for financial and other arithmetic calculations where the imprecise
      representation and rounding behavior of <codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> make those types
      impractical.
    </p>

    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>

    <p>
      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
    </p>

<codeblock><varname>column_name</varname> DECIMAL[(<varname>precision</varname>[,<varname>scale</varname>])]</codeblock>

    <p>
      <codeph>DECIMAL</codeph> with no precision or scale values is equivalent to <codeph>DECIMAL(9,0)</codeph>.
    </p>

    <p>
      <b>Precision and Scale:</b>
    </p>

    <p>
      <varname>precision</varname> represents the total number of digits that can be represented by the column,
      regardless of the location of the decimal point. This value must be between 1 and 38. For example,
      representing integer values up to 9999, and floating-point values up to 99.99, both require a precision of 4.
      You can also represent corresponding negative values, without any change in the precision. For example, the
      range -9999 to 9999 still only requires a precision of 4.
    </p>

    <p>
      <varname>scale</varname> represents the number of fractional digits. This value must be less than or equal to
      <varname>precision</varname>. A scale of 0 produces integral values, with no fractional part. If precision
      and scale are equal, all the digits come after the decimal point, making all the values between 0 and
      0.999... or 0 and -0.999...
    </p>

    <p>
      When <varname>precision</varname> and <varname>scale</varname> are omitted, a <codeph>DECIMAL</codeph> value
      is treated as <codeph>DECIMAL(9,0)</codeph>, that is, an integer value ranging from
      <codeph>-999,999,999</codeph> to <codeph>999,999,999</codeph>. This is the largest <codeph>DECIMAL</codeph>
      value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a
      value of zero for the scale.
    </p>

    <p>
      Both <varname>precision</varname> and <varname>scale</varname> must be specified as integer literals, not any
      other kind of constant expressions.
    </p>

    <p>
      To check the precision or scale for arbitrary values, you can call the
      <xref href="impala_math_functions.xml#math_functions"><codeph>precision()</codeph> and
      <codeph>scale()</codeph> built-in functions</xref>. For example, you might use these values to figure out how
      many characters are required for various fields in a report, or to understand the rounding characteristics of
      a formula as applied to a particular <codeph>DECIMAL</codeph> column.
    </p>

    <p>
      <b>Range:</b>
    </p>

    <p>
      The maximum precision value is 38. Thus, the largest integral value is represented by
      <codeph>DECIMAL(38,0)</codeph> (999... with 9 repeated 38 times). The most precise fractional value (between
      0 and 1, or 0 and -1) is represented by <codeph>DECIMAL(38,38)</codeph>, with 38 digits to the right of the
      decimal point. The value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
      would be .999... (9 repeated 38 times).
    </p>

    <p>
      For a given precision and scale, the range of <codeph>DECIMAL</codeph> values is the same in the positive and
      negative directions. For example, <codeph>DECIMAL(4,2)</codeph> can represent from -99.99 to 99.99. This is
      different from other integral numeric types where the positive and negative bounds differ slightly.
    </p>

    <p>
      When you use <codeph>DECIMAL</codeph> values in arithmetic expressions, the precision and scale of the result
      value are determined as follows:
    </p>

    <ul>
      <li>
        <p>
          For addition and subtraction, the precision and scale are based on the maximum possible result, that is,
          if all the digits of the input values were 9s and the absolute values were added together.
        </p>
<!-- Seems like buggy output from this first query, so hiding the example for the time being. -->
<codeblock audience="hidden"><![CDATA[[localhost:21000] > select 50000.5 + 12.444, precision(50000.5 + 12.444), scale(50000.5 + 12.444);
+------------------+-----------------------------+-------------------------+
| 50000.5 + 12.444 | precision(50000.5 + 12.444) | scale(50000.5 + 12.444) |
+------------------+-----------------------------+-------------------------+
| 50012.944        | 9                           | 3                       |
+------------------+-----------------------------+-------------------------+
[localhost:21000] > select 99999.9 + 99.999, precision(99999.9 + 99.999), scale(99999.9 + 99.999);
+------------------+-----------------------------+-------------------------+
| 99999.9 + 99.999 | precision(99999.9 + 99.999) | scale(99999.9 + 99.999) |
+------------------+-----------------------------+-------------------------+
| 100099.899       | 9                           | 3                       |
+------------------+-----------------------------+-------------------------+
]]>
</codeblock>
      </li>

      <li>
        <p>
          For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum
          of the scales of the input values.
        </p>
      </li>

<!-- Need to add some specifics to discussion of division. Details here: http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/29/564110.aspx -->

      <li>
        <p>
          For division, Impala sets the precision and scale to values large enough to represent the whole and
          fractional parts of the result.
        </p>
      </li>

      <li>
        <p>
          For <codeph>UNION</codeph>, the scale is the larger of the scales of the input values, and the precision
          is increased if necessary to accommodate any additional fractional digits. If the same input value has
          the largest precision and the largest scale, the result value has the same precision and scale. If one
          value has a larger precision but smaller scale, the scale of the result value is increased. For example,
          <codeph>DECIMAL(20,2) UNION DECIMAL(8,6)</codeph> produces a result of type
          <codeph>DECIMAL(24,6)</codeph>. The extra 4 fractional digits of scale (6-2) are accommodated by
          extending the precision by the same amount (20+4).
        </p>
      </li>

      <li>
        <p>
          To doublecheck, you can always call the <codeph>PRECISION()</codeph> and <codeph>SCALE()</codeph>
          functions on the results of an arithmetic expression to see the relevant values, or use a <codeph>CREATE
          TABLE AS SELECT</codeph> statement to define a column based on the return type of the expression.
        </p>
      </li>
    </ul>

    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>

    <ul>
      <li>
        Using the <codeph>DECIMAL</codeph> type is only supported under <keyword keyref="impala14_full"/> and higher.
      </li>

      <li>
        Use the <codeph>DECIMAL</codeph> data type in Impala for applications where you used the
        <codeph>NUMBER</codeph> data type in Oracle. The Impala <codeph>DECIMAL</codeph> type does not support the
        Oracle idioms of <codeph>*</codeph> for scale or negative values for precision.
      </li>
    </ul>

    <p>
      <b>Conversions and casting:</b>
    </p>

    <p>
      <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
    </p>

    <p>
      Impala automatically converts between <codeph>DECIMAL</codeph> and other numeric types where possible. A
      <codeph>DECIMAL</codeph> with zero scale is converted to or from the smallest appropriate integral type. A
      <codeph>DECIMAL</codeph> with a fractional part is automatically converted to or from the smallest
      appropriate floating-point type. If the destination type does not have sufficient precision or scale to hold
      all possible values of the source type, Impala raises an error and does not convert the value.
    </p>

    <p>
      For example, these statements show how expressions of <codeph>DECIMAL</codeph> and other types are reconciled
      to the same type in the context of <codeph>UNION</codeph> queries and <codeph>INSERT</codeph> statements:
    </p>

<codeblock><![CDATA[[localhost:21000] > select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
+----------------+
| x              |
+----------------+
| 1.5000         |
| 1.0000         |
+----------------+
[localhost:21000] > create table int_vs_decimal as select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
+-------------------+
| summary           |
+-------------------+
| Inserted 2 row(s) |
+-------------------+
[localhost:21000] > desc int_vs_decimal;
+------+---------------+---------+
| name | type          | comment |
+------+---------------+---------+
| x    | decimal(14,4) |         |
+------+---------------+---------+
]]>
</codeblock>

    <p>
      To avoid potential conversion errors, you can use <codeph>CAST()</codeph> to convert <codeph>DECIMAL</codeph>
      values to <codeph>FLOAT</codeph>, <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, <codeph>INT</codeph>,
      <codeph>BIGINT</codeph>, <codeph>STRING</codeph>, <codeph>TIMESTAMP</codeph>, or <codeph>BOOLEAN</codeph>.
      You can use exponential notation in <codeph>DECIMAL</codeph> literals or when casting from
      <codeph>STRING</codeph>, for example <codeph>1.0e6</codeph> to represent one million.
    </p>

    <p>
      If you cast a value with more fractional digits than the scale of the destination type, any extra fractional
      digits are truncated (not rounded). Casting a value to a target type with not enough precision produces a
      result of <codeph>NULL</codeph> and displays a runtime warning.
    </p>

<codeblock><![CDATA[[localhost:21000] > select cast(1.239 as decimal(3,2));
+-----------------------------+
| cast(1.239 as decimal(3,2)) |
+-----------------------------+
| 1.23                        |
+-----------------------------+
[localhost:21000] > select cast(1234 as decimal(3));
+----------------------------+
| cast(1234 as decimal(3,0)) |
+----------------------------+
| NULL                       |
+----------------------------+
WARNINGS: Expression overflowed, returning NULL
]]>
</codeblock>

    <p>
      When you specify integer literals, for example in <codeph>INSERT ... VALUES</codeph> statements or arithmetic
      expressions, those numbers are interpreted as the smallest applicable integer type. You must use
      <codeph>CAST()</codeph> calls for some combinations of integer literals and <codeph>DECIMAL</codeph>
      precision. For example, <codeph>INT</codeph> has a maximum value that is 10 digits long,
      <codeph>TINYINT</codeph> has a maximum value that is 3 digits long, and so on. If you specify a value such as
      123456 to go into a <codeph>DECIMAL</codeph> column, Impala checks if the column has enough precision to
      represent the largest value of that integer type, and raises an error if not. Therefore, use an expression
      like <codeph>CAST(123456 TO DECIMAL(9,0))</codeph> for <codeph>DECIMAL</codeph> columns with precision 9 or
      less, <codeph>CAST(50 TO DECIMAL(2,0))</codeph> for <codeph>DECIMAL</codeph> columns with precision 2 or
      less, and so on. For <codeph>DECIMAL</codeph> columns with precision 10 or greater, Impala automatically
      interprets the value as the correct <codeph>DECIMAL</codeph> type; however, because
      <codeph>DECIMAL(10)</codeph> requires 8 bytes of storage while <codeph>DECIMAL(9)</codeph> requires only 4
      bytes, only use precision of 10 or higher when actually needed.
    </p>

<codeblock><![CDATA[[localhost:21000] > create table decimals_9_0 (x decimal);
[localhost:21000] > insert into decimals_9_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
ERROR: AnalysisException: Possible loss of precision for target table 'decimal_testing.decimals_9_0'.
Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
[localhost:21000] > insert into decimals_9_0 values (cast(1 as decimal)), (cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as decimal)), (cast(1000000 as decimal));

[localhost:21000] > create table decimals_10_0 (x decimal(10,0));
[localhost:21000] > insert into decimals_10_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
]]>
</codeblock>

    <p>
      Be aware that in memory and for binary file formats such as Parquet or Avro, <codeph>DECIMAL(10)</codeph> or
      higher consumes 8 bytes while <codeph>DECIMAL(9)</codeph> (the default for <codeph>DECIMAL</codeph>) or lower
      consumes 4 bytes. Therefore, to conserve space in large tables, use the smallest-precision
      <codeph>DECIMAL</codeph> type that is appropriate and <codeph>CAST()</codeph> literal values where necessary,
      rather than declaring <codeph>DECIMAL</codeph> columns with high precision for convenience.
    </p>

    <p>
      To represent a very large or precise <codeph>DECIMAL</codeph> value as a literal, for example one that
      contains more digits than can be represented by a <codeph>BIGINT</codeph> literal, use a quoted string or a
      floating-point value for the number, and <codeph>CAST()</codeph> to the desired <codeph>DECIMAL</codeph>
      type:
    </p>

<codeblock>insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000),
  (cast("999999999999999999999999999999" as decimal(38,5))),
  (cast(999999999999999999999999999999. as decimal(38,5)));
</codeblock>

    <ul>
      <li>
        <p> The result of the <codeph>SUM()</codeph> aggregate function on
            <codeph>DECIMAL</codeph> values is promoted to a precision of 38,
          with the same precision as the underlying column. Thus, the result can
          represent the largest possible value at that particular precision. </p>
      </li>

      <li>
        <p>
          <codeph>STRING</codeph> columns, literals, or expressions can be converted to <codeph>DECIMAL</codeph> as
          long as the overall number of digits and digits to the right of the decimal point fit within the
          specified precision and scale for the declared <codeph>DECIMAL</codeph> type. By default, a
          <codeph>DECIMAL</codeph> value with no specified scale or precision can hold a maximum of 9 digits of an
          integer value. If there are more digits in the string value than are allowed by the
          <codeph>DECIMAL</codeph> scale and precision, the result is <codeph>NULL</codeph>.
        </p>
        <p>
          The following examples demonstrate how <codeph>STRING</codeph> values with integer and fractional parts
          are represented when converted to <codeph>DECIMAL</codeph>. If the scale is 0, the number is treated
          as an integer value with a maximum of <varname>precision</varname> digits. If the precision is greater than
          0, the scale must be increased to account for the digits both to the left and right of the decimal point.
          As the precision increases, output values are printed with additional trailing zeros after the decimal
          point if needed. Any trailing zeros after the decimal point in the <codeph>STRING</codeph> value must fit
          within the number of digits specified by the precision.
        </p>
<codeblock><![CDATA[[localhost:21000] > select cast('100' as decimal); -- Small integer value fits within 9 digits of scale.
+-----------------------------+
| cast('100' as decimal(9,0)) |
+-----------------------------+
| 100                         |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(3,0)); -- Small integer value fits within 3 digits of scale.
+-----------------------------+
| cast('100' as decimal(3,0)) |
+-----------------------------+
| 100                         |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(2,0)); -- 2 digits of scale is not enough!
+-----------------------------+
| cast('100' as decimal(2,0)) |
+-----------------------------+
| NULL                        |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(3,1)); -- (3,1) = 2 digits left of the decimal point, 1 to the right. Not enough.
+-----------------------------+
| cast('100' as decimal(3,1)) |
+-----------------------------+
| NULL                        |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(4,1)); -- 4 digits total, 1 to the right of the decimal point.
+-----------------------------+
| cast('100' as decimal(4,1)) |
+-----------------------------+
| 100.0                       |
+-----------------------------+
[localhost:21000] > select cast('98.6' as decimal(3,1)); -- (3,1) can hold a 3 digit number with 1 fractional digit.
+------------------------------+
| cast('98.6' as decimal(3,1)) |
+------------------------------+
| 98.6                         |
+------------------------------+
[localhost:21000] > select cast('98.6' as decimal(15,1)); -- Larger scale allows bigger numbers but still only 1 fractional digit.
+-------------------------------+
| cast('98.6' as decimal(15,1)) |
+-------------------------------+
| 98.6                          |
+-------------------------------+
[localhost:21000] > select cast('98.6' as decimal(15,5)); -- Larger precision allows more fractional digits, outputs trailing zeros.
+-------------------------------+
| cast('98.6' as decimal(15,5)) |
+-------------------------------+
| 98.60000                      |
+-------------------------------+
[localhost:21000] > select cast('98.60000' as decimal(15,1)); -- Trailing zeros in the string must fit within 'scale' digits (1 in this case).
+-----------------------------------+
| cast('98.60000' as decimal(15,1)) |
+-----------------------------------+
| NULL                              |
+-----------------------------------+
]]>
</codeblock>
      </li>

      <li>
        Most built-in arithmetic functions such as <codeph>SIN()</codeph> and <codeph>COS()</codeph> continue to
        accept only <codeph>DOUBLE</codeph> values because they are so commonly used in scientific context for
        calculations of IEEE 954-compliant values. The built-in functions that accept and return
        <codeph>DECIMAL</codeph> are:
<!-- List from Skye: positive, negative, least, greatest, fnv_hash, if, nullif, zeroifnull, isnull, coalesce -->
<!-- Nong had already told me about abs, ceil, floor, round, truncate -->
        <ul>
          <li>
            <codeph>ABS()</codeph>
          </li>

          <li>
            <codeph>CEIL()</codeph>
          </li>

          <li>
            <codeph>COALESCE()</codeph>
          </li>

          <li>
            <codeph>FLOOR()</codeph>
          </li>

          <li>
            <codeph>FNV_HASH()</codeph>
          </li>

          <li>
            <codeph>GREATEST()</codeph>
          </li>

          <li>
            <codeph>IF()</codeph>
          </li>

          <li>
            <codeph>ISNULL()</codeph>
          </li>

          <li>
            <codeph>LEAST()</codeph>
          </li>

          <li>
            <codeph>NEGATIVE()</codeph>
          </li>

          <li>
            <codeph>NULLIF()</codeph>
          </li>

          <li>
            <codeph>POSITIVE()</codeph>
          </li>

          <li>
            <codeph>PRECISION()</codeph>
          </li>

          <li>
            <codeph>ROUND()</codeph>
          </li>

          <li>
            <codeph>SCALE()</codeph>
          </li>

          <li>
            <codeph>TRUNCATE()</codeph>
          </li>

          <li>
            <codeph>ZEROIFNULL()</codeph>
          </li>
        </ul>
        See <xref href="impala_functions.xml#builtins"/> for details.
      </li>

      <li>
        <p>
          <codeph>BIGINT</codeph>, <codeph>INT</codeph>, <codeph>SMALLINT</codeph>, and <codeph>TINYINT</codeph>
          values can all be cast to <codeph>DECIMAL</codeph>. The number of digits to the left of the decimal point
          in the <codeph>DECIMAL</codeph> type must be sufficient to hold the largest value of the corresponding
          integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning
          there is sometimes a range of values that require one more digit of <codeph>DECIMAL</codeph> scale than
          you might expect. For integer values, the precision of the <codeph>DECIMAL</codeph> type can be zero; if
          the precision is greater than zero, remember to increase the scale value by an equivalent amount to hold
          the required number of digits to the left of the decimal point.
        </p>
        <p>
          The following examples show how different integer types are converted to <codeph>DECIMAL</codeph>.
        </p>
<!-- According to Nong, it's a bug that so many integer digits can be converted to a DECIMAL
     value with small (s,p) spec. So expect to re-do this example. -->
<codeblock><![CDATA[[localhost:21000] > select cast(1 as decimal(1,0));
+-------------------------+
| cast(1 as decimal(1,0)) |
+-------------------------+
| 1                       |
+-------------------------+
[localhost:21000] > select cast(9 as decimal(1,0));
+-------------------------+
| cast(9 as decimal(1,0)) |
+-------------------------+
| 9                       |
+-------------------------+
[localhost:21000] > select cast(10 as decimal(1,0));
+--------------------------+
| cast(10 as decimal(1,0)) |
+--------------------------+
| 10                       |
+--------------------------+
[localhost:21000] > select cast(10 as decimal(1,1));
+--------------------------+
| cast(10 as decimal(1,1)) |
+--------------------------+
| 10.0                     |
+--------------------------+
[localhost:21000] > select cast(100 as decimal(1,1));
+---------------------------+
| cast(100 as decimal(1,1)) |
+---------------------------+
| 100.0                     |
+---------------------------+
[localhost:21000] > select cast(1000 as decimal(1,1));
+----------------------------+
| cast(1000 as decimal(1,1)) |
+----------------------------+
| 1000.0                     |
+----------------------------+
]]>
</codeblock>
      </li>

      <li>
        <p>
          When a <codeph>DECIMAL</codeph> value is converted to any of the integer types, any fractional part is
          truncated (that is, rounded towards zero):
        </p>
<codeblock><![CDATA[[localhost:21000] > create table num_dec_days (x decimal(4,1));
[localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
[localhost:21000] > insert into num_dec_days values (cast(0.1 as decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), (cast(9.9 as decimal(4,1)));
[localhost:21000] > select cast(x as int) from num_dec_days;
+----------------+
| cast(x as int) |
+----------------+
| 1              |
| 2              |
| 4              |
| 0              |
| 0              |
| 9              |
| 9              |
+----------------+
]]>
</codeblock>
      </li>

      <li>
        <p>
          You cannot directly cast <codeph>TIMESTAMP</codeph> or <codeph>BOOLEAN</codeph> values to or from
          <codeph>DECIMAL</codeph> values. You can turn a <codeph>DECIMAL</codeph> value into a time-related
          representation using a two-step process, by converting it to an integer value and then using that result
          in a call to a date and time function such as <codeph>from_unixtime()</codeph>.
        </p>
<codeblock><![CDATA[[localhost:21000] > select from_unixtime(cast(cast(1000.0 as decimal) as bigint));
+-------------------------------------------------------------+
| from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint)) |
+-------------------------------------------------------------+
| 1970-01-01 00:16:40                                         |
+-------------------------------------------------------------+
[localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- x is a DECIMAL column.

[localhost:21000] > create table num_dec_days (x decimal(4,1));
[localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
[localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- The 4.5 value is truncated to 4 and becomes '4 days'.
+--------------------------------------+
| now() + interval cast(x as int) days |
+--------------------------------------+
| 2014-05-13 23:11:55.163284000        |
| 2014-05-14 23:11:55.163284000        |
| 2014-05-16 23:11:55.163284000        |
+--------------------------------------+
]]>
</codeblock>
      </li>

      <li>
        <p>
          Because values in <codeph>INSERT</codeph> statements are checked rigorously for type compatibility, be
          prepared to use <codeph>CAST()</codeph> function calls around literals, column references, or other
          expressions that you are inserting into a <codeph>DECIMAL</codeph> column.
        </p>
      </li>
    </ul>

    <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>

    <p>
      <b>DECIMAL differences from integer and floating-point types:</b>
    </p>

    <p>
      With the <codeph>DECIMAL</codeph> type, you are concerned with the number of overall digits of a number
      rather than powers of 2 (as in <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, and so on). Therefore,
      the limits with integral values of <codeph>DECIMAL</codeph> types fall around 99, 999, 9999, and so on rather
      than 32767, 65535, 2
      <sup>32</sup>
      -1, and so on. For fractional values, you do not need to account for imprecise representation of the
      fractional part according to the IEEE-954 standard (as in <codeph>FLOAT</codeph> and
      <codeph>DOUBLE</codeph>). Therefore, when you insert a fractional value into a <codeph>DECIMAL</codeph>
      column, you can compare, sum, query, <codeph>GROUP BY</codeph>, and so on that column and get back the
      original values rather than some <q>close but not identical</q> value.
    </p>

    <p>
      <codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> can cause problems or unexpected behavior due to inability
      to precisely represent certain fractional values, for example dollar and cents values for currency. You might
      find output values slightly different than you inserted, equality tests that do not match precisely, or
      unexpected values for <codeph>GROUP BY</codeph> columns. <codeph>DECIMAL</codeph> can help reduce unexpected
      behavior and rounding errors, at the expense of some performance overhead for assignments and comparisons.
    </p>

    <p>
      <b>Literals and expressions:</b>
      <ul>
        <li>
          <p>
            When you use an integer literal such as <codeph>1</codeph> or <codeph>999</codeph> in a SQL statement,
            depending on the context, Impala will treat it as either the smallest appropriate
            <codeph>DECIMAL</codeph> type, or the smallest integer type (<codeph>TINYINT</codeph>,
            <codeph>SMALLINT</codeph>, <codeph>INT</codeph>, or <codeph>BIGINT</codeph>). To minimize memory usage,
            Impala prefers to treat the literal as the smallest appropriate integer type.
          </p>
        </li>

        <li>
          <p>
            When you use a floating-point literal such as <codeph>1.1</codeph> or <codeph>999.44</codeph> in a SQL
            statement, depending on the context, Impala will treat it as either the smallest appropriate
            <codeph>DECIMAL</codeph> type, or the smallest floating-point type (<codeph>FLOAT</codeph> or
            <codeph>DOUBLE</codeph>). To avoid loss of accuracy, Impala prefers to treat the literal as a
            <codeph>DECIMAL</codeph>.
          </p>
        </li>
      </ul>
    </p>

    <p>
      <b>Storage considerations:</b>
    </p>

    <ul>
      <li>
        Only the precision determines the storage size for <codeph>DECIMAL</codeph> values; the scale setting has
        no effect on the storage size.
      </li>

      <li>
        Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these text-based file formats,
        leading zeros are not stored, but trailing zeros are stored. In these tables, each <codeph>DECIMAL</codeph>
        value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is
        present and an extra byte for negative values. Once the values are loaded into memory, they are represented
        in 4, 8, or 16 bytes as described in the following list items. The on-disk representation varies depending
        on the file format of the table.
      </li>

<!-- Next couple of points can be conref'ed with identical list bullets farther down under File Format Considerations. -->

      <li>
        Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as
        possible
<!-- 4, 8, or 16 bytes -->
        depending on the precision specified for the <codeph>DECIMAL</codeph> column.
        <ul>
          <li>
            In memory, <codeph>DECIMAL</codeph> values with precision of 9 or less are stored in 4 bytes.
          </li>

          <li>
            In memory, <codeph>DECIMAL</codeph> values with precision of 10 through 18 are stored in 8 bytes.
          </li>

          <li>
            In memory, <codeph>DECIMAL</codeph> values with precision greater than 18 are stored in 16 bytes.
          </li>
        </ul>
      </li>
    </ul>

    <p conref="../shared/impala_common.xml#common/file_format_blurb"/>

    <ul>
      <li>
        The <codeph>DECIMAL</codeph> data type can be stored in any of the file formats supported by Impala, as
        described in <xref href="impala_file_formats.xml#file_formats"/>. Impala only writes to tables that use the
        Parquet and text formats, so those formats are the focus for file format compatibility.
      </li>

      <li>
        Impala can query Avro, RCFile, or SequenceFile tables containing <codeph>DECIMAL</codeph> columns, created
        by other Hadoop components, on CDH 5 only.
      </li>

      <li>
        You can use <codeph>DECIMAL</codeph> columns in Impala tables that are mapped to HBase tables. Impala can
        query and insert into such tables.
      </li>

      <li>
        Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these tables, each
        <codeph>DECIMAL</codeph> value takes up as many bytes as there are digits in the value, plus an extra byte
        if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage for
        <codeph>DECIMAL</codeph> columns.
      </li>

      <li>
        Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes
        depending on the precision specified for the <codeph>DECIMAL</codeph> column.
      </li>

      <li>
        Parquet files containing <codeph>DECIMAL</codeph> columns are not expected to be readable under CDH 4. See
        the <b>Compatibility</b> section for details.
      </li>
    </ul>

    <p>
      <b>UDF considerations:</b> When writing a C++ UDF, use the <codeph>DecimalVal</codeph> data type defined in
      <filepath>/usr/include/impala_udf/udf.h</filepath>.
    </p>

    <p conref="../shared/impala_common.xml#common/partitioning_blurb"/>

    <p>
      You can use a <codeph>DECIMAL</codeph> column as a partition key. Doing so provides a better match between
      the partition key values and the HDFS directory names than using a <codeph>DOUBLE</codeph> or
      <codeph>FLOAT</codeph> partitioning column:
    </p>

    <p conref="../shared/impala_common.xml#common/schema_evolution_blurb"/>

    <ul>
      <li>
        For text-based formats (text, RCFile, and SequenceFile tables), you can issue an <codeph>ALTER TABLE ...
        REPLACE COLUMNS</codeph> statement to change the precision and scale of an existing
        <codeph>DECIMAL</codeph> column. As long as the values in the column fit within the new precision and
        scale, they are returned correctly by a query. Any values that do not fit within the new precision and
        scale are returned as <codeph>NULL</codeph>, and Impala reports the conversion error. Leading zeros do not
        count against the precision value, but trailing zeros after the decimal point do.
<codeblock><![CDATA[[localhost:21000] > create table text_decimals (x string);
[localhost:21000] > insert into text_decimals values ("1"), ("2"), ("99.99"), ("1.234"), ("000001"), ("1.000000000");
[localhost:21000] > select * from text_decimals;
+-------------+
| x           |
+-------------+
| 1           |
| 2           |
| 99.99       |
| 1.234       |
| 000001      |
| 1.000000000 |
+-------------+
[localhost:21000] > alter table text_decimals replace columns (x decimal(4,2));
[localhost:21000] > select * from text_decimals;
+-------+
| x     |
+-------+
| 1.00  |
| 2.00  |
| 99.99 |
| NULL  |
| 1.00  |
| NULL  |
+-------+
ERRORS:
Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234)
file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0
e8420-b4b13bab7f1be787_56794587_data.0
record: 1.234
Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000)
file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20
c565a-cc4bd86c724c96ba_311873428_data.0
record: 1.000000000
]]>
</codeblock>
      </li>

      <li>
        For binary formats (Parquet and Avro tables), although an <codeph>ALTER TABLE ... REPLACE COLUMNS</codeph>
        statement that changes the precision or scale of a <codeph>DECIMAL</codeph> column succeeds, any subsequent
        attempt to query the changed column results in a fatal error. (The other columns can still be queried
        successfully.) This is because the metadata about the columns is stored in the data files themselves, and
        <codeph>ALTER TABLE</codeph> does not actually make any updates to the data files. If the metadata in the
        data files disagrees with the metadata in the metastore database, Impala cancels the query.
      </li>
    </ul>

    <p conref="../shared/impala_common.xml#common/example_blurb"/>

<codeblock>CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
SELECT CAST(1000.5 AS DECIMAL);
</codeblock>

    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>

    <p conref="../shared/impala_common.xml#common/decimal_no_stats"/>

<!-- <p conref="../shared/impala_common.xml#common/partitioning_good"/> -->

    <p conref="../shared/impala_common.xml#common/hbase_ok"/>

    <p conref="../shared/impala_common.xml#common/parquet_ok"/>

    <p conref="../shared/impala_common.xml#common/text_bulky"/>

<!-- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->

<!-- <p conref="../shared/impala_common.xml#common/internals_blurb"/> -->

<!-- <p conref="../shared/impala_common.xml#common/added_in_20"/> -->

    <p conref="../shared/impala_common.xml#common/column_stats_constant"/>

    <p conref="../shared/impala_common.xml#common/related_info"/>

    <p>
      <xref href="impala_literals.xml#numeric_literals"/>, <xref href="impala_tinyint.xml#tinyint"/>,
      <xref href="impala_smallint.xml#smallint"/>, <xref href="impala_int.xml#int"/>,
      <xref href="impala_bigint.xml#bigint"/>, <xref href="impala_decimal.xml#decimal"/>,
      <xref href="impala_math_functions.xml#math_functions"/> (especially <codeph>PRECISION()</codeph> and
      <codeph>SCALE()</codeph>)
    </p>
  </conbody>
</concept>