When converting a decimal to a double, we incorrectly used the powf()
function in the backend, which returns a float instead of a double.
This caused us to lose precision.
We fix the problem by replacing the powf() function with a pow()
function, which returns a double.
Testing:
- Added an EE test.
Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3
Reviewed-on: http://gerrit.cloudera.org:8080/8547
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Before this patch, decimal operations would never produce an error.
Division by and modulo zero would result in a NULL.
In this patch, we change this behavior so that we raise an error
instead of returning a NULL. We also modify the format of the decimal
expr tests format to also include an error field.
Testing:
- Added several expr and end to end tests.
Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Reviewed-on: http://gerrit.cloudera.org:8080/8344
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Address rounding on divide and multiply when results are truncated.
Testing: Manually ran some divides that should overflow, then added
the results to the test. Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.
Initial perf results:
Multiply is totall uninteresting so far, all implementations
return the same values in the same time:
+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700 | 114698450836.4234 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s
Divide shows no regression from prior with DECIMAL_V2 off:
+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax) | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910 |
+-----------------------------+-----------------------------------+
before: Fetched 1 row(s) in 13.08s
after: Fetched 1 row(s) in 13.06s
And with DECIMAL_V2 on:
+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax) | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202 |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s
So the performance regression is not as bad as expected. Still,
divide performance could use some work.
Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Reviewed-on: http://gerrit.cloudera.org:8080/6132
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This change implements the DECIMAL_V2's behavior for AVG().
The differences with DECIMAL_V1 are:
1. The output type has a minimum scale of 6. This is similar
to MS SQL's behavior which takes the max of 6 and the input
type's scale. We deviate from MS SQL in the output's precision
which is always set to 38. We use the smallest precision which
can store the output. A key insight is that the output of AVG()
is no wider than the inputs. Precision only needs to be adjusted
when the scale is augmented. Using a smaller precision avoids
potential loss of precision in subsequent decimal operations
(e.g. division) if AVG() is a subexpression. Please note that
the output type is different from SUM()/COUNT() as the latter
can have a much larger scale.
2. Due to a minimum of 6 decimal places for the output,
AVG() for decimal values whose whole number part exceeds 32
decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will
always overflow as the scale is augmented to 6. Certain
decimal types which work with AVG() in DECIMAL_V1 no longer
work in DECIMAL_V2.
Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Reviewed-on: http://gerrit.cloudera.org:8080/6038
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Implement the new DECIMAL return type rules for divide and modulo
expressions, active when query option DECIMAL_V2=1. See the comment
in the code for more details. A couple of examples that show why new
return type rules for divide are desirable.
For modulo, the return types are actually equivalent, though the
rules are expressed differently to have consistency with how
precision fixups are handled for each version.
DECIMAL Version 1:
+-------------------------------------------------------+
| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) |
+-----------------------------------------------------+
| 0 |
+-------------------------------------------------------+
DECIMAL Version 2:
+-------------------------------------------------------+
| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) |
+-----------------------------------------------------+
| 0.333333333333333333 |
+-------------------------------------------------------+
DECIMAL Version 1:
+-------------------------------------------------------+
| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) |
+-------------------------------------------------------+
| NULL |
+-------------------------------------------------------+
WARNINGS: UDF WARNING: Expression overflowed, returning NULL
DECIMAL Version 2:
+-------------------------------------------------------+
| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) |
+-------------------------------------------------------+
| 10.000000 |
+-------------------------------------------------------+
Change-Id: I83e7f7787edfa4b4bddc25945090542a0e90881b
Reviewed-on: http://gerrit.cloudera.org:8080/5952
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins