Files
impala/testdata/workloads/functional-query/queries/QueryTest/uda.test
Tim Armstrong 852e1bb728 IMPALA-3931: arbitrary fixed-size uda intermediate types
Make many builtin aggregate functions use fixed-length intermediate
types:
* avg()
* ndv()
* stddev(), variance(), etc
* distinctpc(), distinctpcsa()

sample(), appx_median(), histogram() and group_concat() actually
allocate var-len data so aren't changed.

This has some major benefits:
* Spill-to-disk works properly with these aggregations.
* Aggregations are more efficient because there is one less pointer
  indirection.
* Aggregations use less memory, because we don't need an extra 12-byte
  StringValue for the indirection.

Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type
is represented in the same way as CHAR - a fixed-size array of bytes,
stored inline in tuples. However, it is not user-visible and does
not support CHAR semantics, i.e. users can't declare tables, functions,
etc with the type. The pointer and length is passed into aggregate functions
wrapped in a StringVal.

Updates some internal codegen functions to work better with the new
type. E.g. store values directly into the result tuple instead of
via an intermediate stack allocation.

Testing:
This change only affects builtin aggregate functions, for which we
have test coverage already. If we were to allow wider use of this type,
it would need further testing.

Added an analyzer test to ensure we can't use the type for UDAs.

Added a regression test for spilling avg().

Added a regression test for UDA with CHAR intermediate hitting DCHECK.

Perf:
Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(),
improved dramatically.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(60) | parquet / none / none | 18.44   | -17.54%    | 11.92      | -5.34%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(60) | TPCH-Q12 | parquet / none / none | 18.40  | 17.64       | +4.32%     |   0.77%   |   1.09%        | 1           | 5     |
| TPCH(60) | TPCH-Q22 | parquet / none / none | 7.07   | 6.90        | +2.36%     |   0.28%   |   0.30%        | 1           | 5     |
| TPCH(60) | TPCH-Q3  | parquet / none / none | 12.37  | 12.11       | +2.10%     |   0.18%   |   0.15%        | 1           | 5     |
| TPCH(60) | TPCH-Q7  | parquet / none / none | 42.48  | 42.09       | +0.93%     |   2.45%   |   0.80%        | 1           | 5     |
| TPCH(60) | TPCH-Q6  | parquet / none / none | 3.18   | 3.15        | +0.89%     |   0.67%   |   0.76%        | 1           | 5     |
| TPCH(60) | TPCH-Q19 | parquet / none / none | 7.24   | 7.20        | +0.50%     |   0.95%   |   0.67%        | 1           | 5     |
| TPCH(60) | TPCH-Q10 | parquet / none / none | 13.37  | 13.30       | +0.50%     |   0.48%   |   1.39%        | 1           | 5     |
| TPCH(60) | TPCH-Q5  | parquet / none / none | 7.47   | 7.44        | +0.36%     |   0.58%   |   0.54%        | 1           | 5     |
| TPCH(60) | TPCH-Q11 | parquet / none / none | 2.03   | 2.02        | +0.06%     |   0.26%   |   1.95%        | 1           | 5     |
| TPCH(60) | TPCH-Q4  | parquet / none / none | 5.48   | 5.50        | -0.27%     |   0.62%   |   1.12%        | 1           | 5     |
| TPCH(60) | TPCH-Q13 | parquet / none / none | 22.11  | 22.18       | -0.31%     |   0.18%   |   0.55%        | 1           | 5     |
| TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45   | 8.48        | -0.32%     |   0.40%   |   0.47%        | 1           | 5     |
| TPCH(60) | TPCH-Q9  | parquet / none / none | 33.39  | 33.66       | -0.81%     |   0.75%   |   0.59%        | 1           | 5     |
| TPCH(60) | TPCH-Q21 | parquet / none / none | 71.34  | 72.07       | -1.01%     |   1.84%   |   1.79%        | 1           | 5     |
| TPCH(60) | TPCH-Q14 | parquet / none / none | 5.93   | 6.00        | -1.07%     |   0.15%   |   0.69%        | 1           | 5     |
| TPCH(60) | TPCH-Q20 | parquet / none / none | 5.72   | 5.79        | -1.09%     |   0.59%   |   0.51%        | 1           | 5     |
| TPCH(60) | TPCH-Q18 | parquet / none / none | 45.42  | 45.93       | -1.10%     |   1.42%   |   0.50%        | 1           | 5     |
| TPCH(60) | TPCH-Q2  | parquet / none / none | 4.81   | 4.89        | -1.52%     |   1.68%   |   1.01%        | 1           | 5     |
| TPCH(60) | TPCH-Q16 | parquet / none / none | 5.41   | 5.52        | -1.98%     |   0.66%   |   0.73%        | 1           | 5     |
| TPCH(60) | TPCH-Q1  | parquet / none / none | 27.58  | 29.13       | -5.34%     |   0.24%   |   1.51%        | 1           | 5     |
| TPCH(60) | TPCH-Q8  | parquet / none / none | 12.61  | 14.30       | -11.78%    |   6.20%   | * 15.28% *     | 1           | 5     |
| TPCH(60) | TPCH-Q17 | parquet / none / none | 43.74  | 126.58      | I -65.44%  |   1.34%   |   9.60%        | 1           | 5     |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c
Reviewed-on: http://gerrit.cloudera.org:8080/7526
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2017-08-17 03:12:48 +00:00

165 lines
3.8 KiB
Plaintext

====
---- QUERY
select hll(int_col) from functional.alltypestiny;
---- RESULTS
'2'
---- TYPES
string
====
---- QUERY
select test_count(int_col) from functional.alltypestiny;
---- RESULTS
8
---- TYPES
bigint
====
---- QUERY
select test_count(int_col) from functional.alltypesagg;
---- RESULTS
10980
---- TYPES
bigint
====
---- QUERY
select sum_small_decimal(c3) from functional.decimal_tiny;
---- RESULTS
45.00
---- TYPES
decimal
====
---- QUERY
select trunc_sum(double_col),sum(double_col) from functional_parquet.alltypes where id < 5555;
---- RESULTS
252348,252348.5
---- TYPES
bigint,double
====
---- QUERY
select arg_is_const(int_col, 1) from functional_parquet.alltypes;
---- RESULTS
true
---- TYPES
boolean
====
---- QUERY
# Test with even number of input rows.
select toggle_null(id), count(*)
from functional_parquet.alltypesagg
---- RESULTS
NULL,11000
---- TYPES
int,bigint
====
---- QUERY
# Test with odd number of input rows.
select toggle_null(id), count(*)
from functional_parquet.alltypesagg
where id <= 9998
---- RESULTS
1,10999
---- TYPES
int,bigint
====
---- QUERY
# Test that input NULLs are passed to aggregate functions ok.
select count_nulls(tinyint_col), count(*)
from functional.alltypesagg
---- RESULTS
2000,11000
---- TYPES
bigint,bigint
====
---- QUERY
# Test that all types are exposed via the FunctionContext correctly.
# This relies on asserts in the UDA funciton
select agg_intermediate(int_col), count(*)
from functional.alltypesagg
---- RESULTS
NULL,11000
---- TYPES
bigint,bigint
====
---- QUERY
# Test that all types are exposed via the FunctionContext correctly.
# This relies on asserts in the UDA funciton
select agg_decimal_intermediate(cast(d1 as decimal(2,1)), 2), count(*)
from functional.decimal_tbl
---- RESULTS
NULL,5
---- TYPES
decimal,bigint
====
---- QUERY
# Test that all types are exposed via the FunctionContext correctly.
# This includes distinct aggregate expression to test IMPALA-5251.
# It also relies on asserts in the UDA funciton.
select
agg_string_intermediate(cast(c1 as decimal(20,10)), 1000, "foobar"),
agg_decimal_intermediate(cast(c3 as decimal(2,1)), 2),
agg_intermediate(int_col),
avg(c2),
min(c3-c1),
max(c1+c3),
count(distinct int_col),
sum(distinct int_col)
from
functional.alltypesagg,
functional.decimal_tiny
---- RESULTS
100,NULL,NULL,160.49989,-10.0989,11.8989,999,499500
---- TYPES
decimal,decimal,bigint,decimal,decimal,decimal,bigint,bigint
====
---- QUERY
# Test that all types are exposed via the FunctionContext correctly.
# This includes distinct aggregate expression to test IMPALA-5251.
# It also relies on asserts in the UDA funciton.
select
agg_string_intermediate(cast(c1 as decimal(20,10)), 1000, "foobar"),
agg_decimal_intermediate(cast(c3 as decimal(2,1)), 2),
agg_intermediate(int_col),
ndv(c2),
sum(distinct c1)/count(distinct c1)
from
functional.alltypesagg,
functional.decimal_tiny
group by
year,month,day
---- RESULTS
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
100,NULL,NULL,99,5.4994
---- TYPES
decimal,decimal,bigint,bigint,decimal
====
---- QUERY
# Test that char intermediate works as expected. The function char_intermediate_sum()
# computes the sum with an intermediate int.
select year, month, day, char_intermediate_sum(int_col), sum(int_col)
from functional.alltypesagg
group by year, month, day
order by year, month, day
---- RESULTS
2010,1,1,499500,499500
2010,1,2,499500,499500
2010,1,3,499500,499500
2010,1,4,499500,499500
2010,1,5,499500,499500
2010,1,6,499500,499500
2010,1,7,499500,499500
2010,1,8,499500,499500
2010,1,9,499500,499500
2010,1,10,499500,499500
2010,1,NULL,495000,495000
---- TYPES
int,int,int,int,bigint
====