Commit Graph

18 Commits

Author SHA1 Message Date
Thomas Tauber-Marshall
343bdad866 IMPALA-3210: last/first_value() support for IGNORE NULLS
Added support for the 'ignore nulls' keyword to the last_value and
first_value analytic functions, eg. 'last_value(col ignore nulls)',
which would return the last value from the window that is not null,
or null if all of the values in the window are null.

We handle 'ignore nulls' in the FE in the same way that we handle
'distinct' - by adding isIgnoreNulls as a field in FunctionParams.

To avoid affecting performance when 'ignore nulls' is not used, and
to avoid having to special case 'ignore nulls' on the backend, this
patch adds 'last_value_ignore_nulls' and 'first_value_ignore_nulls'
builtin analytic functions that wrap 'last_value' and 'first_value'
respectively.

Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b
Reviewed-on: http://gerrit.cloudera.org:8080/3328
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-07-18 08:28:09 -07:00
Skye Wanderman-Milne
039bd44fdf IMPALA-2688: decimal codegen support in aggregations
This patch implements codegen support for aggregations with decimal
input and intermediate type. For the following benchmark query:

SELECT l_discount, count(*) AS cnt
FROM biglineitem
GROUP BY l_discount
HAVING cnt > 9999999999999

Query time went from 8.85s to 3.74s (2.4x faster).

Change-Id: I25934fcd6324e5bf1fa6859496107bf2ec68b8d3
Reviewed-on: http://gerrit.cloudera.org:8080/2050
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2016-02-11 02:32:22 +00:00
Tim Armstrong
5bbe2fe23d IMPALA-2469: decimal table not found in exhaustive build
Decimal tables are not generated for all file formats. The
analytic-fns test is run on some of these formats, so fails
when it cannot find the decimal_tiny table. Move the test to
the decimal test that handles this correctly.

Change-Id: Ic23b21ed90496fcc9f2f84cfd3dd92899d00498b
2015-10-05 11:30:41 -07:00
Shant Hovsepian
6d87fe090c Improve Hll estimate for small cardinalities.
Based on Google's HyperLogLog++ paper. Uses a bias correcting
interpolation as a sub algorithm for Hll estimates within a specific
range.

Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9
Reviewed-on: http://gerrit.cloudera.org:8080/415
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2015-07-16 19:38:17 +00:00
Shant Hovsepian
69079411bf Improve distinctpc/sa for small cardinalities.
Improving the cardinality estimate for Flajolet and Martin's algorithm
used in distinctpc and distinctpcsa. The estimate for small cardinalities
is improved by providing a correction hinted to in the original paper.

We use the correction constant 1.75 proposed by Scheuermann et al
DialM-POMC '07 [Near-Optimal Compression of Probabilistic Counting
Sketches for Networking Applications]

Change-Id: I90410328a1a01a72601e7e95ae719fb8caf1587f
Reviewed-on: http://gerrit.cloudera.org:8080/395
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-05-24 06:26:47 +00:00
Alex Behm
745e64a096 IMPALA-1837: Handle truncation when implicitly casting a literal to a decimal.
Implicit casting to decimals allows truncating digits from the left of the
decimal point (see TypesUtil). A literal that is implicitly cast to a decimal
with truncation is wrapped into a CastExpr so the BE can evaluate it and report
a warning. This behavior is consistent with casting/overflow of non-constant
exprs that return decimal.
IMPALA-1837: Without the CastExpr wrapping, such literals can exceed the max
expected byte size sent to the BE in toThrift().

Change-Id: Icd7b8751b39b8031832eec04bd8eac7d7000ddf8
Reviewed-on: http://gerrit.cloudera.org:8080/195
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 19:58:58 -07:00
Matthew Jacobs
27209e4cb1 Fix exhaustive tests: move analytic fn tests using decimal_tbl to decimal.test
Change-Id: Iaaa5bd59b27d2db2736874e96d38cb823f6e4a56
Reviewed-on: http://gerrit.cloudera.org:8080/147
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 03:05:49 +00:00
Nong Li
a1b2de9c95 Update distinctpc/pcsa to return bigint.
Change-Id: Iac3414aa0151f52ba9ec028da152b09fc09af264
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4637
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-10-06 15:12:12 -07:00
Matthew Jacobs
86b9f8282f Move aggregation tests on decimal tables to decimal.test
Fixes test failures in exhaustive mode when aggregation tests
are run on table formats that do not support decimal.

Change-Id: Ic5dfb398575770cf318ffcc0ce3a20737bb2f5cd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4636
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-10-06 15:11:58 -07:00
Matthew Jacobs
8a75e759cb Move analytic fns test case for decimal to decimal.test
Change-Id: Ic6e02484f47f9a9c47924850c8cf12daf8574c8c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4449
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-09-23 07:26:32 -07:00
Alex Behm
0fb380961c IMPALA-1187: Add appx_count_distinct query option to rewrite COUNT(DISTINCT) to NDV().
This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING).

Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-09-20 16:11:34 -07:00
Matthew Jacobs
0facf61296 Analytic Functions: BE support for ROWS windows with arbitrary start bounds
Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary
start bounds. If there is a start bound specified a sliding window must
be maintained. As input rows are processed they are added to the window.
As they expire from the window, they are 'removed' from the current
intermediate state of the evaluators (stored in curr_tuple_) by calling
AggFnEvaluator::Remove(). This is an initial implementation that keeps
the tuples in the window in memory. We can improve this later by using
the BufferedTupleStream with an Iterator interface supporting multiple
readers.

This also fixes IMPALA-1253: LAST_VALUE returns incorrect results

Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-09-20 16:08:12 -07:00
Skye Wanderman-Milne
f8905ea485 Fix AVG codegen
We weren't returning the right merge function for decimal in
GetAvgFunction(). Someday the functions will be registered in the FE
like for scalar functions.

Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-09-20 16:02:47 -07:00
Alex Behm
bceeb834f3 IMPALA-677: Fix visibility of semi and anti-joined table references.
Semi or anti-joined table references are now only visible inside the
On-clause of the corresponding join.

Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-08-17 12:45:45 -07:00
Nong Li
f0c7947558 IMPALA-1121: Fix joins on decimal columns with different precision/scale.
Change-Id: Ibac69763e28ad33ef41d000b5dd74fc73c74b73a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3726
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3739
Reviewed-by: Nong Li <nong@cloudera.com>
2014-08-04 01:45:40 -07:00
Nong Li
e7f7eab1b5 Missing reanalyze() in select stmt after substitution.
Change-Id: I71203ebb02cf64e5bf259d2f6c5faf951f87f0d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3144
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-19 02:52:10 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Nong Li
87295a4e06 Decimal implementation.
This patch implements decimal support for text based formats.

Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238
Tested-by: jenkins
2014-04-14 21:07:32 -07:00