impala

mirror of https://github.com/apache/impala.git synced 2026-01-01 18:00:30 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	575b5a20e6	IMPALA-5017: Error on decimal overflow Before this patch, decimal operations would either silently overflow (in the case of sum() and avg()), or produce a warning. In this patch, the behaviour is changed so that an error is produced in the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behaviour is unchanged. We introduce overflow checks when computing sum() and avg(). This results in a ~30% performance regression when we are in decimal v2 mode compared to decimal v1. Benchmarks: Query: select sum(dec_38_19) from decimal_tbl Decimal v1: 11.57s Decimal v2: 16.58s Query: select avg(dec_38_19) from decimal_tbl Decimal v1: 12.08s Decimal v2: 17.08s The performance regression is not as bad if we are computing the sum or average of decimal column with a lower precision: Query: select sum(dec_9_5) from decimal_tbl Decimal v1: 11.06s Decimal v2: 13.08s Query: select avg(dec_9_5) from decimal_tbl Decimal v1: 11.56s Decimal v2: 13.57s Testing: - Added several end to end tests. - Updated Expr tests to check for error in case of overflow. Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Reviewed-on: http://gerrit.cloudera.org:8080/8404 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-01 23:23:01 +00:00
Zoltan Borok-Nagy	4f11bed407	IMPALA-5936: operator '%' overflows on large decimals Suppose we have a large decimal number, which is greater than INT_MAX. We want to calculate the modulo of this number by 3: BIG_DECIMAL % 3 The result of this calculation can be 0, 1, or 2. This can fit into a decimal with precision 1. The in-memory representation of such small decimals are stored in int32_t in the backend. Let's call this int32_t the result type. The backend had the invalid assumption that it can do the calculation as well using the result type. This assumption is true for multiplying or adding numbers, but not for modulo. Now the backend selects the biggest type of ['return type', '1st operand type', '2nd operand type'] to do the calculation. Change-Id: I2b06c8acd5aa490943e84013faf2eaac7c26ceb4 Reviewed-on: http://gerrit.cloudera.org:8080/8574 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 21:11:45 +00:00
Taras Bobrovytsky	d1b92c8b52	IMPALA-6183: Fix Decimal to Double conversion When converting a decimal to a double, we incorrectly used the powf() function in the backend, which returns a float instead of a double. This caused us to lose precision. We fix the problem by replacing the powf() function with a pow() function, which returns a double. Testing: - Added an EE test. Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3 Reviewed-on: http://gerrit.cloudera.org:8080/8547 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-15 02:54:53 +00:00
Taras Bobrovytsky	5ebea0ec4d	IMPALA-5018: Error on decimal modulo or divide by zero Before this patch, decimal operations would never produce an error. Division by and modulo zero would result in a NULL. In this patch, we change this behavior so that we raise an error instead of returning a NULL. We also modify the format of the decimal expr tests format to also include an error field. Testing: - Added several expr and end to end tests. Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Reviewed-on: http://gerrit.cloudera.org:8080/8344 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-25 00:44:34 +00:00
Zach Amsden	c87ab35af1	IMPALA-4813: Round on divide and multiply Address rounding on divide and multiply when results are truncated. Testing: Manually ran some divides that should overflow, then added the results to the test. Made the decimal-test use rounding behavior by default, and now the error margin of the test has decreased. Initial perf results: Multiply is totall uninteresting so far, all implementations return the same values in the same time: +-------------------------+-----------------------------------+ \| sum(l_quantity * l_tax) \| sum(l_extendedprice * l_discount) \| +-------------------------+-----------------------------------+ \| 61202493.3700 \| 114698450836.4234 \| +-------------------------+-----------------------------------+ Fetched 1 row(s) in 1.13s Divide shows no regression from prior with DECIMAL_V2 off: +-----------------------------+-----------------------------------+ \| sum(l_quantity / l_tax) \| sum(l_extendedprice / l_discount) \| +-----------------------------+-----------------------------------+ \| 46178777464.523809516381723 \| 61076151920731.010714279183910 \| +-----------------------------+-----------------------------------+ before: Fetched 1 row(s) in 13.08s after: Fetched 1 row(s) in 13.06s And with DECIMAL_V2 on: +-----------------------------+-----------------------------------+ \| sum(l_quantity / l_tax) \| sum(l_extendedprice / l_discount) \| +-----------------------------+-----------------------------------+ \| 46178777464.523809523847285 \| 61076151920731.010714285714202 \| +-----------------------------+-----------------------------------+ Fetched 1 row(s) in 16.06s So the performance regression is not as bad as expected. Still, divide performance could use some work. Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312 Reviewed-on: http://gerrit.cloudera.org:8080/6132 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-02 20:12:05 +00:00
Michael Ho	637cc3e447	IMPALA-4821: Update AVG() for DECIMAL_V2 This change implements the DECIMAL_V2's behavior for AVG(). The differences with DECIMAL_V1 are: 1. The output type has a minimum scale of 6. This is similar to MS SQL's behavior which takes the max of 6 and the input type's scale. We deviate from MS SQL in the output's precision which is always set to 38. We use the smallest precision which can store the output. A key insight is that the output of AVG() is no wider than the inputs. Precision only needs to be adjusted when the scale is augmented. Using a smaller precision avoids potential loss of precision in subsequent decimal operations (e.g. division) if AVG() is a subexpression. Please note that the output type is different from SUM()/COUNT() as the latter can have a much larger scale. 2. Due to a minimum of 6 decimal places for the output, AVG() for decimal values whose whole number part exceeds 32 decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will always overflow as the scale is augmented to 6. Certain decimal types which work with AVG() in DECIMAL_V1 no longer work in DECIMAL_V2. Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Reviewed-on: http://gerrit.cloudera.org:8080/6038 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-22 06:31:14 +00:00
Dan Hecht	a53eeb2068	IMPALA-4370: Divide and modulo result types for DECIMAL version V2 Implement the new DECIMAL return type rules for divide and modulo expressions, active when query option DECIMAL_V2=1. See the comment in the code for more details. A couple of examples that show why new return type rules for divide are desirable. For modulo, the return types are actually equivalent, though the rules are expressed differently to have consistency with how precision fixups are handled for each version. DECIMAL Version 1: +-------------------------------------------------------+ \| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) \| +-----------------------------------------------------+ \| 0 \| +-------------------------------------------------------+ DECIMAL Version 2: +-------------------------------------------------------+ \| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) \| +-----------------------------------------------------+ \| 0.333333333333333333 \| +-------------------------------------------------------+ DECIMAL Version 1: +-------------------------------------------------------+ \| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) \| +-------------------------------------------------------+ \| NULL \| +-------------------------------------------------------+ WARNINGS: UDF WARNING: Expression overflowed, returning NULL DECIMAL Version 2: +-------------------------------------------------------+ \| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) \| +-------------------------------------------------------+ \| 10.000000 \| +-------------------------------------------------------+ Change-Id: I83e7f7787edfa4b4bddc25945090542a0e90881b Reviewed-on: http://gerrit.cloudera.org:8080/5952 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-14 18:40:54 +00:00

7 Commits