Files
impala/tests/query_test/test_codegen.py
Tim Armstrong d4648e87b4 IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Based on initial draft patch by Pooja Nilangekar.

Codegen'd expressions can be executed in two ways - either by
being called directly from a fully codegend function, or from
interpreted code via a function pointer (previously
ScalarFnCall::scalar_fn_wrapper_).

This change moves the function pointer from ScalarFnCall to its
base class ScalarExpr, so the full expr tree can be codegen'd, not
just the ScalarFnCall subtrees. The key refactoring and improvements
are:
* ScalarExpr::Get*Val() switches between interpreted and the codegen'd
  function pointer code paths in an inline function, avoiding a
  virtual function call to ScalarFnCal::Get*Val().
* Boilerplate logic is moved to ScalarExpr::GetCodegendComputeFn(),
  which calls a virtual function GetCodegenComputeFnImpl().
* ScalarFnCall's logic for deciding whether to interpret or codegen is
  better abstracted and exposed to ScalarExpr as IsInterpretable()
  and ShouldCodegen() methods.
* The ScalarExpr::codegend_compute_fn_ function pointer is only
  populated for expressions that are "codegen entry points". These
  include the roots of expr trees and non-root expressions
  where the parent expression calls Get*Val() from the
  pseudo-codegend GetCodegendComputeFnWrapper().
* ScalarFnCall is always initialised for interpreted execution.
  Otherwise the function pointer is needed for non-root expressions,
  e.g. to support ScalarExprEvaluator::GetConstantVal().
* Latent bugs/gaps for codegen of CollectionVal are fixed. CollectionVal
  is modified to use the StringVal memory layout to allow code sharing
  with StringVal. These fixes allowed simplification of
  IsNotEmptyPredicate codegen (from IMPALA-7657).

I chose to tackle two problems in one change - adding support for
generating codegen'd function pointers for all ScalarExprs, and adding
the "entry point" concept - to avoid a blow-up in the number of
codegen'd entry points that could lead to longer codegen times and/or
worse code because of inlining changes.

IMPALA-7331 (CHAR codegen support functions) is also fixed because
it was simpler to enable CHAR codegen within ScalarExpr than to carry
forward the exiting CHAR workarounds from ScalarFnCall. The
CHAR-specific codegen support required in the scalar expr subsystem is
very limited.  StringVal intermediates are used everywhere. Only
SlotRef actually operates on the different tuple layout, and the
required codegen support for SlotRef already exists for UDA
intermediates anyway.

Testing:
* Ran exhaustive tests.

Perf:
* Ran a basic insert benchmark, which went from 10.1s to 7.6s
  create table foo stored as parquet as
  select case when l_orderkey % 2 = 0 then 'aaa' else 'bbb' end
  from tpch30_parquet.lineitem;
* Ran a basic CHAR expr test:
  set num_nodes=1;
  set mt_dop=1;
  select count(*) from lineitem
  where cast(l_linestatus as CHAR(2)) = 'O ' and
        cast(l_returnflag as CHAR(2)) = 'N '
  The time spent in the scan went from 520ms to 220ms.
* Added perf regression test to tpcds-insert, similar to the manual
  benchmark.
* Ran single-node TPC-H with large and small scale factors, to estimate
  impact on execution perf and query startup time, respectively.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.84    | -0.18%     | 4.49       | -0.31%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval   |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.58   | 2.47        |   +4.18%   |   1.29%   |   0.88%        | 5     |   +4.12%       | 2.31    | 5.81   |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 4.81   | 4.61        |   +4.33%   |   2.18%   |   2.15%        | 5     |   +3.91%       | 1.73    | 3.09   |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 26.45  | 26.16       |   +1.09%   |   0.37%   |   0.50%        | 5     |   +1.36%       | 2.02    | 3.94   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 15.92  | 15.75       |   +1.09%   |   2.87%   |   1.65%        | 5     |   +0.88%       | 0.29    | 0.73   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.38   | 2.35        |   +1.12%   |   1.64%   |   1.11%        | 5     |   +0.80%       | 1.15    | 1.26   |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 2.94   | 2.91        |   +1.13%   |   7.68%   |   5.37%        | 5     |   -0.34%       | -0.29   | 0.27   |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 18.10  | 18.02       |   +0.42%   |   2.70%   |   0.56%        | 5     |   +0.28%       | 0.29    | 0.34   |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.72   | 4.72        |   -0.04%   |   1.20%   |   1.65%        | 5     |   +0.05%       | 0.00    | -0.04  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 3.92   | 3.93        |   -0.26%   |   1.08%   |   2.36%        | 5     |   +0.20%       | 0.58    | -0.23  |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.27   | 1.27        |   -0.28%   |   0.22%   |   0.88%        | 5     |   +0.09%       | 0.29    | -0.68  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.64   | 2.65        |   -0.45%   |   1.65%   |   0.65%        | 5     |   -0.24%       | -0.58   | -0.57  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 3.10   | 3.13        |   -0.76%   |   1.47%   |   1.12%        | 5     |   -0.21%       | -0.29   | -0.93  |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.20   | 1.21        |   -0.80%   |   2.26%   |   2.47%        | 5     |   -0.82%       | -1.15   | -0.53  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 1.97   | 1.99        |   -1.37%   |   1.84%   |   3.21%        | 5     |   -0.47%       | -0.58   | -0.83  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 11.53  | 11.63       |   -0.91%   |   0.46%   |   0.49%        | 5     |   -0.95%       | -2.02   | -3.08  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.13   | 5.21        |   -1.51%   |   2.24%   |   4.05%        | 5     |   -0.94%       | -0.58   | -0.73  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.61   | 3.66        |   -1.40%   |   0.66%   |   0.79%        | 5     |   -1.33%       | -1.73   | -3.05  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 19.42  | 19.71       |   -1.52%   |   1.34%   |   1.39%        | 5     |   -1.22%       | -1.44   | -1.76  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 5.08   | 5.15        |   -1.49%   |   1.34%   |   0.73%        | 5     |   -1.35%       | -1.44   | -2.20  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.42   | 3.49        |   -1.92%   |   0.93%   |   1.47%        | 5     |   -1.53%       | -1.15   | -2.49  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.15   | 1.19        |   -3.17%   |   2.27%   |   1.95%        | 5     |   -4.21%       | -1.15   | -2.41  |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 9.26   | 9.63        |   -3.85%   |   0.62%   |   0.59%        | 5     |   -3.78%       | -2.31   | -10.25 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:          impalad version 3.2.0-SNAPSHOT RELEASE ()
Baseline Impala Version: impalad version 3.2.0-SNAPSHOT RELEASE (2019-03-19)

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(2)  | parquet / none / none | 0.90    | -0.08%     | 0.80       | -0.05%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| TPCH(2)  | TPCH-Q18 | parquet / none / none | 1.22   | 1.19        |   +1.93%   |   3.81%   |   4.46%        | 20    |   +3.34%       | 1.62    | 1.46  |
| TPCH(2)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73        |   +1.97%   |   3.36%   |   2.94%        | 20    |   +0.97%       | 1.88    | 1.95  |
| TPCH(2)  | TPCH-Q11 | parquet / none / none | 0.49   | 0.48        |   +1.91%   |   6.19%   |   4.64%        | 20    |   +0.25%       | 0.95    | 1.09  |
| TPCH(2)  | TPCH-Q4  | parquet / none / none | 0.43   | 0.43        |   +1.99%   |   6.26%   |   5.86%        | 20    |   +0.15%       | 0.92    | 1.03  |
| TPCH(2)  | TPCH-Q15 | parquet / none / none | 0.50   | 0.49        |   +1.82%   |   7.32%   |   6.35%        | 20    |   +0.26%       | 1.01    | 0.83  |
| TPCH(2)  | TPCH-Q1  | parquet / none / none | 0.98   | 0.97        |   +0.79%   |   4.64%   |   2.73%        | 20    |   +0.36%       | 0.77    | 0.65  |
| TPCH(2)  | TPCH-Q19 | parquet / none / none | 0.83   | 0.83        |   +0.65%   |   3.33%   |   2.80%        | 20    |   +0.44%       | 2.18    | 0.67  |
| TPCH(2)  | TPCH-Q14 | parquet / none / none | 0.62   | 0.62        |   +0.97%   |   2.86%   |   1.00%        | 20    |   +0.04%       | 0.13    | 1.42  |
| TPCH(2)  | TPCH-Q3  | parquet / none / none | 0.88   | 0.87        |   +0.57%   |   2.17%   |   1.74%        | 20    |   +0.29%       | 1.15    | 0.92  |
| TPCH(2)  | TPCH-Q12 | parquet / none / none | 0.53   | 0.53        |   +0.27%   |   4.58%   |   5.78%        | 20    |   +0.46%       | 1.47    | 0.16  |
| TPCH(2)  | TPCH-Q17 | parquet / none / none | 0.72   | 0.72        |   +0.15%   |   3.64%   |   5.55%        | 20    |   +0.21%       | 0.86    | 0.10  |
| TPCH(2)  | TPCH-Q21 | parquet / none / none | 2.05   | 2.05        |   +0.21%   |   1.99%   |   2.37%        | 20    |   +0.01%       | 0.25    | 0.30  |
| TPCH(2)  | TPCH-Q5  | parquet / none / none | 1.28   | 1.27        |   +0.24%   |   1.61%   |   1.80%        | 20    |   -0.02%       | -0.57   | 0.44  |
| TPCH(2)  | TPCH-Q13 | parquet / none / none | 1.27   | 1.27        |   -0.34%   |   1.69%   |   1.83%        | 20    |   -0.20%       | -1.65   | -0.61 |
| TPCH(2)  | TPCH-Q7  | parquet / none / none | 1.72   | 1.73        |   -0.55%   |   2.40%   |   1.69%        | 20    |   -0.03%       | -0.42   | -0.83 |
| TPCH(2)  | TPCH-Q8  | parquet / none / none | 1.27   | 1.28        |   -0.68%   |   3.10%   |   3.89%        | 20    |   -0.06%       | -0.54   | -0.62 |
| TPCH(2)  | TPCH-Q6  | parquet / none / none | 0.36   | 0.36        |   -0.84%   |   0.79%   |   3.51%        | 20    |   -0.07%       | -0.36   | -1.04 |
| TPCH(2)  | TPCH-Q2  | parquet / none / none | 0.65   | 0.65        |   -1.17%   |   4.76%   |   5.99%        | 20    |   -0.05%       | -0.25   | -0.69 |
| TPCH(2)  | TPCH-Q9  | parquet / none / none | 1.59   | 1.62        |   -2.01%   |   1.45%   |   5.12%        | 20    |   -0.16%       | -1.24   | -1.69 |
| TPCH(2)  | TPCH-Q20 | parquet / none / none | 0.68   | 0.69        |   -1.73%   |   4.35%   |   4.43%        | 20    |   -0.49%       | -1.74   | -1.25 |
| TPCH(2)  | TPCH-Q22 | parquet / none / none | 0.38   | 0.40        |   -2.89%   |   7.42%   |   6.39%        | 20    |   -0.21%       | -0.66   | -1.34 |
| TPCH(2)  | TPCH-Q16 | parquet / none / none | 0.59   | 0.62        |   -4.01%   |   6.33%   |   5.83%        | 20    |   -4.72%       | -1.39   | -2.13 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+

Change-Id: I839d7a3a2f5e1309c33a1f66013ef11628c5dc11
Reviewed-on: http://gerrit.cloudera.org:8080/12797
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-15 22:34:28 +00:00

96 lines
4.8 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# Tests end-to-end codegen behaviour.
from tests.common.impala_test_suite import ImpalaTestSuite
from tests.common.skip import SkipIf
from tests.common.test_dimensions import create_exec_option_dimension_from_dict
from tests.common.test_result_verifier import get_node_exec_options,\
assert_codegen_enabled
class TestCodegen(ImpalaTestSuite):
@classmethod
def get_workload(self):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
super(TestCodegen, cls).add_test_dimensions()
cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension_from_dict({
'exec_single_node_rows_threshold' : [0]}))
# No need to run this on all file formats. Run it on text/none, which has stats
# computed.
cls.ImpalaTestMatrix.add_constraint(
lambda v: v.get_value('table_format').file_format == 'text' and
v.get_value('table_format').compression_codec == 'none')
def test_disable_codegen(self, vector):
"""Test that codegen is enabled/disabled by the planner as expected."""
self.run_test_case('QueryTest/disable-codegen', vector)
def test_select_node_codegen(self, vector):
"""Test that select node is codegened"""
result = self.execute_query('select * from (select * from functional.alltypes '
'limit 1000000) t1 where int_col > 10 limit 10')
exec_options = get_node_exec_options(result.runtime_profile, 1)
# Make sure test fails if there are no exec options in the profile for the node
assert len(exec_options) > 0
assert_codegen_enabled(result.runtime_profile, [1])
def test_datastream_sender_codegen(self, vector):
"""Test the KrpcDataStreamSender's codegen logic"""
self.run_test_case('QueryTest/datastream-sender-codegen', vector)
def test_codegen_failure_for_char_type(self, vector):
"""IMPALA-7288: Regression tests for the codegen failure path when working with a
CHAR column type. Until IMPALA-3207 is completely fixed there are various paths where
we need to bail out of codegen."""
# Previously codegen for this join failed in HashTableCtx::CodegenEquals() because of
# missing ScalarFnCall codegen support, which was added in IMPALA-7331.
result = self.execute_query("select 1 from functional.chars_tiny t1, "
"functional.chars_tiny t2 "
"where t1.cs = cast(t2.cs as string)")
profile_str = str(result.runtime_profile)
assert "Probe Side Codegen Enabled" in profile_str, profile_str
assert "Build Side Codegen Enabled" in profile_str, profile_str
assert ("TextConverter::CodegenWriteSlot(): Char isn't supported for CodegenWriteSlot"
in profile_str), profile_str
# Codegen for this join fails because it is joining two CHAR exprs.
result = self.execute_query("select 1 from functional.chars_tiny t1, "
"functional.chars_tiny t2 "
"where t1.cs = t2.cs")
profile_str = str(result.runtime_profile)
assert ("Probe Side Codegen Disabled: HashTableCtx::CodegenHashRow(): CHAR NYI"
in profile_str), profile_str
assert ("Build Side Codegen Disabled: HashTableCtx::CodegenHashRow(): CHAR NYI"
in profile_str), profile_str
assert ("TextConverter::CodegenWriteSlot(): Char isn't supported for CodegenWriteSlot"
in profile_str), profile_str
# Previously codegen for this join failed in HashTableCtx::CodegenEvalRow() because of
# missing ScalarFnCall codegen support, which was added in IMPALA-7331.
result = self.execute_query("select 1 from functional.chars_tiny t1, "
"functional.chars_tiny t2 where t1.cs = "
"FROM_TIMESTAMP(cast(t2.cs as string), 'yyyyMMdd')")
profile_str = str(result.runtime_profile)
assert "Probe Side Codegen Enabled" in profile_str, profile_str
assert "Build Side Codegen Enabled" in profile_str, profile_str
assert ("TextConverter::CodegenWriteSlot(): Char isn't supported for CodegenWriteSlot"
in profile_str), profile_str