Files
impala/common/function-registry/impala_functions.py
Michael Smith 2243f331cb IMPALA-11274: CNF Rewrite causes a regress in join node performance
This patch defines a subset of all predicates that are common and
relatively inexpensive to compute. Such predicates must involve
columns, constants, simple math or cast functions only.

Examples of the subset of the predicates allowed:

  1. (a = 1 AND cast(b as int) = 2) OR (c = d AND e = f)
  2. a in ('1', '2', '3') OR ((b = 'abc') AND (c = d))
  3. (a between 1 and 100) OR ((b is null) AND (c = d))

Examples of the predicates not allowed:

  1. (upper(a) != 'Y') AND b = 2) OR (c = d AND e = f)
  2. (coalesce(CAST(a AS string), '') = '') AND b = 2) OR
     (c = d AND e = f)

This patch further restricts the predicates to be converted to
conjunctive normal form (CNF) to be such a subset, with the aim to
reduce the run-time evaluation overhead of CNFs in which some
of the predicates can be duplicated.

Uses a cache in branching expressions to avoid visiting the entire
subtree on each call to applyRuleBottomUp. Skips cache complexity on
casts as they don't branch and are unlikely to be deeply nested.

Testing:
- New expression writer tests
- New planner tests

Change-Id: I326406c6b004fe31ec0e2a2f390a3845b8925aa9
Reviewed-on: http://gerrit.cloudera.org:8080/18458
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-25 05:37:17 +00:00

91 KiB