Files
impala/testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
Abhishek Rawat 763acffb74 IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Passed exhaustive unit-tests.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 20.137ms (19.215ms)
       - Analysis finished: 46.275ms (44.597ms)
       - Analysis finished: 119.642ms (116.663ms)
       - Analysis finished: 361.195ms (355.856ms)
       - Analysis finished: 1s277ms (1s266ms)
       - Analysis finished: 5s664ms (5s640ms)
       - Analysis finished: 29s689ms (29s646ms)
       - Analysis finished: 2m (2m)

Test:
       - Analysis finished: 1.868ms (986.520us)
       - Analysis finished: 3.195ms (1.856ms)
       - Analysis finished: 7.332ms (3.484ms)
       - Analysis finished: 13.896ms (8.071ms)
       - Analysis finished: 31.015ms (18.963ms)
       - Analysis finished: 60.157ms (38.125ms)
       - Analysis finished: 113.694ms (67.642ms)
       - Analysis finished: 253.044ms (163.180ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Reviewed-on: http://gerrit.cloudera.org:8080/13645
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-12-16 05:51:27 +00:00

81 lines
2.3 KiB
Plaintext

====
---- QUERY
# Test union with multiple legs each having const expressions.
# Expect codegen to be disabled for const expressions.
set DISABLE_CODEGEN_ROWS_THRESHOLD=1;
select 1,2,3 union all select 4,5,6 union all select 7,8,9 order by 1;
---- TYPES
tinyint,tinyint,tinyint
---- RESULTS
1,2,3
4,5,6
7,8,9
---- RUNTIME_PROFILE
00:UNION
constant-operands=3
#SORT_NODE
ExecOption: Codegen Enabled
#UNION_NODE
ExecOption: Codegen Enabled, Codegen Disabled for const scalar expressions
====
---- QUERY
# Test insert statement with values (translated into UNION with const expressions).
# Expect codegen to be disabled for const expressions.
set DISABLE_CODEGEN_ROWS_THRESHOLD=1;
drop table if exists test_values_codegen;
create table test_values_codegen (c1 int, c2 timestamp, c3 string);
insert into test_values_codegen(c1) values (CAST(1+ceil(2.5)*3 as tinyint));
---- RUNTIME_PROFILE
00:UNION
constant-operands=1
#UNION_NODE
ExecOption: Codegen Enabled, Codegen Disabled for const scalar expressions
====
---- QUERY
# Test insert statement with values having const scalar expressions.
# Expect codegen to be disabled for const expressions.
set DISABLE_CODEGEN_ROWS_THRESHOLD=1;
insert into test_values_codegen values
(1+1, '2015-04-09 14:07:46.580465000', base64encode('hello world')),
(CAST(1*2+2-5 as INT), CAST(1428421382 as timestamp),
regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0));
---- RUNTIME_PROFILE
00:UNION
constant-operands=2
#UNION_NODE
ExecOption: Codegen Enabled, Codegen Disabled for const scalar expressions
====
---- QUERY
# Test the result of above inserts with codegen disabled.
select * from test_values_codegen order by c1;
---- TYPES
int, timestamp, string
---- RESULTS
-1,2015-04-07 15:43:02,'abcdef123ghi456'
2,2015-04-09 14:07:46.580465000,'aGVsbG8gd29ybGQ='
10,NULL,'NULL'
====
---- QUERY
# Test union with const expressions in a subplan.
# Expect codegen enabled.
select count(c.c_custkey), count(v.tot_price)
from tpch_nested_parquet.customer c, (
select sum(o_totalprice) tot_price from c.c_orders
union
select 9.99 tot_price) v;
---- TYPES
BIGINT, BIGINT
---- RESULTS
300000,249996
---- RUNTIME_PROFILE
01:SUBPLAN
| 03:UNION
| | constant-operands=1
#AGGREGATION_NODE (id=6)
ExecOption: Codegen Enabled
#UNION_NODE (id=3)
ExecOption: Codegen Enabled
#AGGREGATION_NODE (id=5)
ExecOption: Codegen Enabled
====