Commit Graph

3 Commits

Author SHA1 Message Date
stiga-huang
47309d14ca IMPALA-12204: Fix redundant codegen info added in subplan profiles
The SUBPLAN node will open its right child node many times in its
GetNext(), depending on how many rows generated from its left child. The
right child of a SUBPLAN node is a subtree of operators. They should not
add codegen info into profile in their Open() method since it will be
invoked repeatedly.

Currently, DataSink and UnionNode have such an issue. This patch fixes
them by adding the codegen info to profile in Close() instead of Open(),
just like what we did in IMPALA-11200.

Tests:
 - Add e2e tests

Change-Id: I99a0a842df63a03c61024e2b77d5118ca63a2b2d
Reviewed-on: http://gerrit.cloudera.org:8080/20037
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2023-06-13 07:05:41 +00:00
Csaba Ringhofer
3843f7ff46 IMPALA-11200: Avoid redundant "Codegen enabled" messages in profile
Before this patch the message was added to the profile in Open(),
which can be called multiple times in subplans.

Moved it to Close(), which is only called once in the lifetime
of a Node/Aggregator.

A drawback of this is that this info won't be visible when the
Node is still active, but I don't think that it is a very useful
info in a still running query.

Also added a new feature to test_result_verifier.py:
Inside RUNTIME_PROFILE section row_regex can be negated with !,
so !row_regex [regex] means that regex is not matched by any line
in the profile.

Testing:
- added a regression test

Change-Id: Iad2e31900ee6d29385cc8adc6bbf067d91f6450f
Reviewed-on: http://gerrit.cloudera.org:8080/18385
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-04-13 12:31:36 +00:00
Abhishek Rawat
763acffb74 IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Passed exhaustive unit-tests.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 20.137ms (19.215ms)
       - Analysis finished: 46.275ms (44.597ms)
       - Analysis finished: 119.642ms (116.663ms)
       - Analysis finished: 361.195ms (355.856ms)
       - Analysis finished: 1s277ms (1s266ms)
       - Analysis finished: 5s664ms (5s640ms)
       - Analysis finished: 29s689ms (29s646ms)
       - Analysis finished: 2m (2m)

Test:
       - Analysis finished: 1.868ms (986.520us)
       - Analysis finished: 3.195ms (1.856ms)
       - Analysis finished: 7.332ms (3.484ms)
       - Analysis finished: 13.896ms (8.071ms)
       - Analysis finished: 31.015ms (18.963ms)
       - Analysis finished: 60.157ms (38.125ms)
       - Analysis finished: 113.694ms (67.642ms)
       - Analysis finished: 253.044ms (163.180ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Reviewed-on: http://gerrit.cloudera.org:8080/13645
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-12-16 05:51:27 +00:00