Added support for the 'ignore nulls' keyword to the last_value and
first_value analytic functions, eg. 'last_value(col ignore nulls)',
which would return the last value from the window that is not null,
or null if all of the values in the window are null.
We handle 'ignore nulls' in the FE in the same way that we handle
'distinct' - by adding isIgnoreNulls as a field in FunctionParams.
To avoid affecting performance when 'ignore nulls' is not used, and
to avoid having to special case 'ignore nulls' on the backend, this
patch adds 'last_value_ignore_nulls' and 'first_value_ignore_nulls'
builtin analytic functions that wrap 'last_value' and 'first_value'
respectively.
Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b
Reviewed-on: http://gerrit.cloudera.org:8080/3328
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
This patch implements codegen support for aggregations with decimal
input and intermediate type. For the following benchmark query:
SELECT l_discount, count(*) AS cnt
FROM biglineitem
GROUP BY l_discount
HAVING cnt > 9999999999999
Query time went from 8.85s to 3.74s (2.4x faster).
Change-Id: I25934fcd6324e5bf1fa6859496107bf2ec68b8d3
Reviewed-on: http://gerrit.cloudera.org:8080/2050
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Decimal tables are not generated for all file formats. The
analytic-fns test is run on some of these formats, so fails
when it cannot find the decimal_tiny table. Move the test to
the decimal test that handles this correctly.
Change-Id: Ic23b21ed90496fcc9f2f84cfd3dd92899d00498b
Based on Google's HyperLogLog++ paper. Uses a bias correcting
interpolation as a sub algorithm for Hll estimates within a specific
range.
Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9
Reviewed-on: http://gerrit.cloudera.org:8080/415
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
Improving the cardinality estimate for Flajolet and Martin's algorithm
used in distinctpc and distinctpcsa. The estimate for small cardinalities
is improved by providing a correction hinted to in the original paper.
We use the correction constant 1.75 proposed by Scheuermann et al
DialM-POMC '07 [Near-Optimal Compression of Probabilistic Counting
Sketches for Networking Applications]
Change-Id: I90410328a1a01a72601e7e95ae719fb8caf1587f
Reviewed-on: http://gerrit.cloudera.org:8080/395
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
Implicit casting to decimals allows truncating digits from the left of the
decimal point (see TypesUtil). A literal that is implicitly cast to a decimal
with truncation is wrapped into a CastExpr so the BE can evaluate it and report
a warning. This behavior is consistent with casting/overflow of non-constant
exprs that return decimal.
IMPALA-1837: Without the CastExpr wrapping, such literals can exceed the max
expected byte size sent to the BE in toThrift().
Change-Id: Icd7b8751b39b8031832eec04bd8eac7d7000ddf8
Reviewed-on: http://gerrit.cloudera.org:8080/195
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
Fixes test failures in exhaustive mode when aggregation tests
are run on table formats that do not support decimal.
Change-Id: Ic5dfb398575770cf318ffcc0ce3a20737bb2f5cd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4636
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This patch also fixes IMPALA-1164: NDV() now returns a BIGINT (and not STRING).
Change-Id: Ia2a3272204938579d61091ee4f7f2d1cbf38ed55
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4338
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Adds support in the BE AnalyticEvalNode for ROWS windows with arbitrary
start bounds. If there is a start bound specified a sliding window must
be maintained. As input rows are processed they are added to the window.
As they expire from the window, they are 'removed' from the current
intermediate state of the evaluators (stored in curr_tuple_) by calling
AggFnEvaluator::Remove(). This is an initial implementation that keeps
the tuples in the window in memory. We can improve this later by using
the BufferedTupleStream with an Iterator interface supporting multiple
readers.
This also fixes IMPALA-1253: LAST_VALUE returns incorrect results
Change-Id: Id5daf6c060ab4079bb8dacf2db8992985894a820
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4335
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
We weren't returning the right merge function for decimal in
GetAvgFunction(). Someday the functions will be registered in the FE
like for scalar functions.
Change-Id: I1153ef8570b0e78f0925b7d3d58ec3b0fbb2c589
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4336
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Semi or anti-joined table references are now only visible inside the
On-clause of the corresponding join.
Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Float/Doubles are lossy so using those as the default literal type
is problematic.
Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins