IMPALA-862: count(x) may return null when a similar count(distinct x) is also used

count(x) with no distinct and no group-by expressions returns NULL on empty input
if other distinct aggs (e.g. COUNT(distinct x) are present.
This happens because the COUNT is transformed to SUM(COUNT()),
with the inner COUNT being evaluated WITH a group-by expression (e.g. x).
SUM over empty input returns NULL, but COUNT should return 0.

This patch fixes this by replacing COUNT with zeroifnull(COUNT) before AggregateInfo
is generated if there are distinct aggs and no group-bys. The logic in AggregateInfo
itself has not been modified.

Change-Id: I902e3fdd95767135b2f3fe423e8802ef57366af1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1921
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
This commit is contained in:
Srinath Shankar
2014-03-11 18:44:21 -07:00
committed by jenkins
parent ce40134ad0
commit 74a975c45b
11 changed files with 242 additions and 164 deletions

View File

@@ -303,3 +303,27 @@ having count(bigint_col) > 100
bigint
---- RESULTS
====
---- QUERY
# Regression test for COUNT(ALL ) with no group-by and other distinct agg. IMPALA-862
select count(*), COUNT(distinct 1) from alltypesagg where false
---- RESULTS
0,0
---- TYPES
bigint, bigint
====
---- QUERY
# Regression test for COUNT(ALL ) with no group-by and other distinct agg. IMPALA-862
select count(tinyint_col), sum(distinct int_col) from alltypesagg
---- RESULTS
9000,499500
---- TYPES
bigint, bigint
====
---- QUERY
# Regression test for COUNT(ALL ) with no group-by and other distinct agg. IMPALA-862
select count(*), COUNT(distinct 1) from alltypesagg
---- RESULTS
10000,1
---- TYPES
bigint, bigint
====