impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 18:00:57 -05:00

Author	SHA1	Message	Date
Michael Ho	637cc3e447	IMPALA-4821: Update AVG() for DECIMAL_V2 This change implements the DECIMAL_V2's behavior for AVG(). The differences with DECIMAL_V1 are: 1. The output type has a minimum scale of 6. This is similar to MS SQL's behavior which takes the max of 6 and the input type's scale. We deviate from MS SQL in the output's precision which is always set to 38. We use the smallest precision which can store the output. A key insight is that the output of AVG() is no wider than the inputs. Precision only needs to be adjusted when the scale is augmented. Using a smaller precision avoids potential loss of precision in subsequent decimal operations (e.g. division) if AVG() is a subexpression. Please note that the output type is different from SUM()/COUNT() as the latter can have a much larger scale. 2. Due to a minimum of 6 decimal places for the output, AVG() for decimal values whose whole number part exceeds 32 decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will always overflow as the scale is augmented to 6. Certain decimal types which work with AVG() in DECIMAL_V1 no longer work in DECIMAL_V2. Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Reviewed-on: http://gerrit.cloudera.org:8080/6038 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-22 06:31:14 +00:00
Dan Hecht	a53eeb2068	IMPALA-4370: Divide and modulo result types for DECIMAL version V2 Implement the new DECIMAL return type rules for divide and modulo expressions, active when query option DECIMAL_V2=1. See the comment in the code for more details. A couple of examples that show why new return type rules for divide are desirable. For modulo, the return types are actually equivalent, though the rules are expressed differently to have consistency with how precision fixups are handled for each version. DECIMAL Version 1: +-------------------------------------------------------+ \| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) \| +-----------------------------------------------------+ \| 0 \| +-------------------------------------------------------+ DECIMAL Version 2: +-------------------------------------------------------+ \| cast(1 as decimal(20,0)) / cast(3 as decimal(20,0)) \| +-----------------------------------------------------+ \| 0.333333333333333333 \| +-------------------------------------------------------+ DECIMAL Version 1: +-------------------------------------------------------+ \| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) \| +-------------------------------------------------------+ \| NULL \| +-------------------------------------------------------+ WARNINGS: UDF WARNING: Expression overflowed, returning NULL DECIMAL Version 2: +-------------------------------------------------------+ \| cast(1 as decimal(6,0)) / cast(0.1 as decimal(38,38)) \| +-------------------------------------------------------+ \| 10.000000 \| +-------------------------------------------------------+ Change-Id: I83e7f7787edfa4b4bddc25945090542a0e90881b Reviewed-on: http://gerrit.cloudera.org:8080/5952 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-14 18:40:54 +00:00
Michael Ho	f982c3f76e	IMPALA-2020, IMPALA-4809: Codegen support for DECIMAL_V2 Currently, codegen supports converting type attributes (e.g. decimal type's precision and scale, type's size) obtained via calls to FunctionContextImpl::GetFnAttr() (previously Expr::GetConstantInt()) in cross-compiled code to runtime constants. This change extends this support for the query option DECIMAL_V2. To test this change, this change also handles a subset of IMPALA-2020: casting between decimal values is updated to support rounding (instead of truncation) when decimal_v2 is true. This change also cleans up the existing code by moving the codegen logic Expr::InlineConstant() to the codegen module and the type related logic in Expr::GetConstantInt() to FunctionContextImpl. Change-Id: I2434d240f65b81389b8a8ba027f980a0e1d1f981 Reviewed-on: http://gerrit.cloudera.org:8080/5950 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-11 07:07:45 +00:00
Thomas Tauber-Marshall	343bdad866	IMPALA-3210: last/first_value() support for IGNORE NULLS Added support for the 'ignore nulls' keyword to the last_value and first_value analytic functions, eg. 'last_value(col ignore nulls)', which would return the last value from the window that is not null, or null if all of the values in the window are null. We handle 'ignore nulls' in the FE in the same way that we handle 'distinct' - by adding isIgnoreNulls as a field in FunctionParams. To avoid affecting performance when 'ignore nulls' is not used, and to avoid having to special case 'ignore nulls' on the backend, this patch adds 'last_value_ignore_nulls' and 'first_value_ignore_nulls' builtin analytic functions that wrap 'last_value' and 'first_value' respectively. Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b Reviewed-on: http://gerrit.cloudera.org:8080/3328 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Internal Jenkins	2016-07-18 08:28:09 -07:00
Skye Wanderman-Milne	039bd44fdf	IMPALA-2688: decimal codegen support in aggregations This patch implements codegen support for aggregations with decimal input and intermediate type. For the following benchmark query: SELECT l_discount, count(*) AS cnt FROM biglineitem GROUP BY l_discount HAVING cnt > 9999999999999 Query time went from 8.85s to 3.74s (2.4x faster). Change-Id: I25934fcd6324e5bf1fa6859496107bf2ec68b8d3 Reviewed-on: http://gerrit.cloudera.org:8080/2050 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2016-02-11 02:32:22 +00:00
Tim Armstrong	5bbe2fe23d	IMPALA-2469: decimal table not found in exhaustive build Decimal tables are not generated for all file formats. The analytic-fns test is run on some of these formats, so fails when it cannot find the decimal_tiny table. Move the test to the decimal test that handles this correctly. Change-Id: Ic23b21ed90496fcc9f2f84cfd3dd92899d00498b	2015-10-05 11:30:41 -07:00
Shant Hovsepian	6d87fe090c	Improve Hll estimate for small cardinalities. Based on Google's HyperLogLog++ paper. Uses a bias correcting interpolation as a sub algorithm for Hll estimates within a specific range. Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9 Reviewed-on: http://gerrit.cloudera.org:8080/415 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-07-16 19:38:17 +00:00
Shant Hovsepian	69079411bf	Improve distinctpc/sa for small cardinalities. Improving the cardinality estimate for Flajolet and Martin's algorithm used in distinctpc and distinctpcsa. The estimate for small cardinalities is improved by providing a correction hinted to in the original paper. We use the correction constant 1.75 proposed by Scheuermann et al DialM-POMC '07 [Near-Optimal Compression of Probabilistic Counting Sketches for Networking Applications] Change-Id: I90410328a1a01a72601e7e95ae719fb8caf1587f Reviewed-on: http://gerrit.cloudera.org:8080/395 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-05-24 06:26:47 +00:00
Alex Behm	745e64a096	IMPALA-1837: Handle truncation when implicitly casting a literal to a decimal. Implicit casting to decimals allows truncating digits from the left of the decimal point (see TypesUtil). A literal that is implicitly cast to a decimal with truncation is wrapped into a CastExpr so the BE can evaluate it and report a warning. This behavior is consistent with casting/overflow of non-constant exprs that return decimal. IMPALA-1837: Without the CastExpr wrapping, such literals can exceed the max expected byte size sent to the BE in toThrift(). Change-Id: Icd7b8751b39b8031832eec04bd8eac7d7000ddf8 Reviewed-on: http://gerrit.cloudera.org:8080/195 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 19:58:58 -07:00
Matthew Jacobs	27209e4cb1	Fix exhaustive tests: move analytic fn tests using decimal_tbl to decimal.test Change-Id: Iaaa5bd59b27d2db2736874e96d38cb823f6e4a56 Reviewed-on: http://gerrit.cloudera.org:8080/147 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-03-05 03:05:49 +00:00
Nong Li	a1b2de9c95	Update distinctpc/pcsa to return bigint. Change-Id: Iac3414aa0151f52ba9ec028da152b09fc09af264 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4637 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-10-06 15:12:12 -07:00
Matthew Jacobs	86b9f8282f	Move aggregation tests on decimal tables to decimal.test Fixes test failures in exhaustive mode when aggregation tests are run on table formats that do not support decimal. Change-Id: Ic5dfb398575770cf318ffcc0ce3a20737bb2f5cd Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4636 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-10-06 15:11:58 -07:00
Matthew Jacobs	8a75e759cb	Move analytic fns test case for decimal to decimal.test Change-Id: Ic6e02484f47f9a9c47924850c8cf12daf8574c8c Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4449 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins	2014-09-23 07:26:32 -07:00
Alex Behm	0fb380961c	IMPALA-1187: Add appx_count_distinct query option to rewrite COUNT(DISTINCT) to NDV(). This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING). Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-09-20 16:11:34 -07:00
Matthew Jacobs	0facf61296	Analytic Functions: BE support for ROWS windows with arbitrary start bounds Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary start bounds. If there is a start bound specified a sliding window must be maintained. As input rows are processed they are added to the window. As they expire from the window, they are 'removed' from the current intermediate state of the evaluators (stored in curr_tuple_) by calling AggFnEvaluator::Remove(). This is an initial implementation that keeps the tuples in the window in memory. We can improve this later by using the BufferedTupleStream with an Iterator interface supporting multiple readers. This also fixes IMPALA-1253: LAST_VALUE returns incorrect results Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-09-20 16:08:12 -07:00
Skye Wanderman-Milne	f8905ea485	Fix AVG codegen We weren't returning the right merge function for decimal in GetAvgFunction(). Someday the functions will be registered in the FE like for scalar functions. Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-09-20 16:02:47 -07:00
Alex Behm	bceeb834f3	IMPALA-677: Fix visibility of semi and anti-joined table references. Semi or anti-joined table references are now only visible inside the On-clause of the corresponding join. Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-08-17 12:45:45 -07:00
Nong Li	f0c7947558	IMPALA-1121: Fix joins on decimal columns with different precision/scale. Change-Id: Ibac69763e28ad33ef41d000b5dd74fc73c74b73a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3726 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3739 Reviewed-by: Nong Li <nong@cloudera.com>	2014-08-04 01:45:40 -07:00
Nong Li	e7f7eab1b5	Missing reanalyze() in select stmt after substitution. Change-Id: I71203ebb02cf64e5bf259d2f6c5faf951f87f0d2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3144 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-19 02:52:10 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Nong Li	87295a4e06	Decimal implementation. This patch implements decimal support for text based formats. Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238 Tested-by: jenkins	2014-04-14 21:07:32 -07:00

21 Commits