IMPALA-1292: Incorrect result in analytic SUM when ORDER BY column is null

The 'less than' predicate created by AnalyticPlanner used to check if the
previous row was less than the current row is not exactly what we want
to determine when rows in RANGE windows (the default window in this case)
share the same result values. Rows get the same results when the order by
exprs evaluate equally or both null, so it's easiest (and more efficient)
to use a predicate that simply checks equality or both null. We already
create such predicates for checking for partition boundaries, so this is
a trivial change.

When we support arbitrary RANGE window offsets we will likely want to
add similar predicates that compare two tuples plus/minus the offset,
but those will be simpler because there can be only one order by expr
when specifying RANGE offsets with PRECEDING/FOLLOWING.

Change-Id: I52ff6203686832852430e498eca6ad2cc2daee98
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4474
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
This commit is contained in:
Matthew Jacobs
2014-09-22 16:45:40 -07:00
committed by Nong Li
parent f9b60bce43
commit 28fc8ddf60
6 changed files with 87 additions and 101 deletions

View File

@@ -824,4 +824,44 @@ select count(distinct t1.c1) from
11
---- TYPES
BIGINT
====
====
---- QUERY
# IMPALA-1292: Incorrect result in analytic SUM when ORDER BY column is null
select tinyint_col, id,
SUM(id) OVER (ORDER BY tinyint_col ASC, id ASC)
FROM alltypesagg
where (tinyint_col is NULL or tinyint_col < 2) and id < 100 order by 1, 2
---- RESULTS
1,1,1
1,11,12
1,21,33
1,31,64
1,41,105
1,51,156
1,61,217
1,71,288
1,81,369
1,91,460
NULL,0,460
NULL,0,460
NULL,10,480
NULL,10,480
NULL,20,520
NULL,20,520
NULL,30,580
NULL,30,580
NULL,40,660
NULL,40,660
NULL,50,760
NULL,50,760
NULL,60,880
NULL,60,880
NULL,70,1020
NULL,70,1020
NULL,80,1180
NULL,80,1180
NULL,90,1360
NULL,90,1360
---- TYPES
TINYINT, INT, BIGINT
====