Commit Graph

11 Commits

Author SHA1 Message Date
Riza Suminto
95f353ac4a IMPALA-13507: Allow disabling glog buffering via with_args fixture
We have plenty of custom_cluster tests that assert against content of
Impala daemon log files while the process is still running using
assert_log_contains() and it's wrappers. The method specifically mention
about disabling glog buffering ('-logbuflevel=-1'), but not all
custom_cluster tests do that. This often result in flaky test that hard
to triage and often neglected if it does not frequently run in core
exploration.

This patch adds boolean param 'disable_log_buffering' into
CustomClusterTestSuite.with_args for test to declare intention to
inspect log files in live minicluster. If it is True, start minicluster
with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on
any calls to assert_log_contains().

There are several complex custom_cluster tests that left unchanged and
print out such WARNING logs, such as:
- TestQueryLive
- TestQueryLogTableBeeswax
- TestQueryLogOtherTable
- TestQueryLogTableHS2
- TestQueryLogTableAll
- TestQueryLogTableBufferPool
- TestStatestoreRpcErrors
- TestWorkloadManagementInitWait
- TestWorkloadManagementSQLDetails

This patch also fixed some small flake8 issues on modified tests.

There is a flakiness sign at test_query_live.py where test query is
submitted to coordinator and fail because sys.impala_query_live table
has not exist yet from coordinator's perspective. This patch modify
test_query_live.py to wait for few seconds until sys.impala_query_live
is queryable.

Testing:
- Pass custom_cluster tests in exhaustive exploration.

Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61
Reviewed-on: http://gerrit.cloudera.org:8080/22015
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-05 04:49:05 +00:00
Yida Wu
f93bd98621 IMPALA-11805: Use llvm ObjectCache for codegen caching
Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, and after the compilation of the module, the
compiled codegened functions of the module would be set to the
execution engine, therefore functions could be used by Impala.
During function compilation within the module, if an ObjectCache
is set to the execution engine, the compiled codegened functions
would be also written into the cache. This way, if we keep the
cache, when revisiting the same module (fragment), we can
efficiently reuse the specific ObjectCache, loading pre-compiled
codegened functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(1)  | parquet / none / none | 0.22    | -0.65%     | 0.20       | -0.75%         |
+----------+-----------------------+---------+------------+------------+----------------+
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(1)  | TPCH-Q13 | parquet / none / none | 0.49   | 0.47        |   +2.80%   |   5.32%    |   5.07%        | 10    |   +1.22%       | 1.63    | 1.19  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.16   | 0.16        |   +3.51%   |   1.32%    | * 10.38% *     | 10    |   +0.06%       | 0.49    | 1.06  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.12   | 0.12        |   +1.39%   |   2.27%    |   2.24%        | 10    |   +1.50%       | 1.90    | 1.37  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.21   | 0.21        |   +1.56%   | * 10.02% * | * 11.42% *     | 10    |   +1.18%       | 0.57    | 0.32  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 0.27   | 0.27        |   +1.71%   |   6.46%    |   1.29%        | 10    |   -0.19%       | -1.19   | 0.81  |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.11   | 0.11        |   +0.79%   |   2.76%    |   2.15%        | 10    |   +0.10%       | 1.46    | 0.71  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.26   | 0.26        |   +0.71%   |   6.63%    |   6.18%        | 10    |   +0.04%       | 0.49    | 0.25  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 0.17   | 0.17        |   +0.41%   | * 14.66% * | * 13.01% *     | 10    |   +0.05%       | 0.40    | 0.07  |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.16   | 0.16        |   +0.19%   |   1.41%    |   1.39%        | 10    |   +0.25%       | 1.46    | 0.31  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.17   | 0.17        |   +0.22%   |   1.70%    |   1.77%        | 10    |   -0.05%       | -0.40   | 0.28  |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.16   | 0.16        |   -0.27%   |   0.54%    |   1.46%        | 10    |   +0.14%       | 0.93    | -0.54 |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.11   | 0.11        |   -0.38%   |   0.81%    |   2.06%        | 10    |   +0.03%       | 0.22    | -0.54 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.17   | 0.17        |   -0.38%   |   0.67%    |   1.58%        | 10    |   -0.01%       | -0.13   | -0.70 |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.27   | 0.27        |   -0.08%   |   1.24%    |   1.15%        | 10    |   -0.33%       | -1.37   | -0.15 |
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.16   | 0.16        |   -1.18%   | * 16.61% * | * 10.25% *     | 10    |   +0.33%       | 0.40    | -0.19 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.22   | 0.22        |   -1.67%   |   1.62%    |   7.45%        | 10    |   +0.43%       | 1.02    | -0.70 |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.22   | 0.22        |   -0.98%   |   0.22%    |   1.55%        | 10    |   -0.26%       | -2.16   | -1.97 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 0.48   | 0.49        |   -1.18%   |   3.58%    |   4.40%        | 10    |   -0.25%       | -1.19   | -0.66 |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.26   | 0.26        |   -1.93%   |   7.84%    |   6.24%        | 10    |   -0.14%       | -0.13   | -0.62 |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.18   | 0.19        |   -3.31%   | * 11.47% * | * 12.47% *     | 10    |   -0.25%       | -1.72   | -0.63 |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 0.34   | 0.35        |   -5.22%   |   6.87%    | * 10.03% *     | 10    |   -2.15%       | -1.28   | -1.38 |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.16   | 0.18        |   -11.00%  | * 16.07% * |   3.84%        | 10    |   -0.90%       | -1.81   | -2.35 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

We are no longer using ExecutionEngine for caching, so we got rid of
the LlvmExecutionEngineWrapper class. Instead, we brought in a new
class CodeGenObjectCache to implement llvm::ObjectCache.

Testing:
Passed LlvmCodeGenCacheTest and custom_cluster/test_codegen_cache.py.

Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Reviewed-on: http://gerrit.cloudera.org:8080/20733
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-12-19 20:32:10 +00:00
Daniel Becker
db5f3b18e4 IMPALA-12306: (Part 2) Make codegen cache tests with symbol emitter more robust
The codegen cache tests that include having a symbol emitter (previously
TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map})
introduced by IMPALA-12260 were added to ensure we don't produce a
use-after-free.

There are two problems with these tests:
  1. Setting the codegen cache size correctly in the tests has proved to
     be difficult because new commits and different build types (debug
     vs. release) have a huge effect on what sizes are appropriate. We
     have had many build failures because of this.

  2. Use-after-free is undefined behaviour and does not guarantee a
     crash but the tests rely on the crash to catch the bug described in
     IMPALA-12260.

This change solves the second problem. The tests added by IMPALA-12260
relied on a crash in the situation described there:
'LlvmCodeGen::symbol_emitter_' is registered as an event listener with
the current 'llvm::ExecutionEngine', then the engine is cached but the
'LlvmCodeGen' object, which owns the symbol emitter, is destroyed at the
end of the query. When the cached execution engine is destroyed later,
it frees any remaining object files and notifies the symbol emitter
about this, but the symbol emitter has already been destroyed so its
pointer is invalid (use-after-free).

However, we can't rely on the crash to detect the use-after-free because
1) the crash is not guaranteed to happen, use-after-free is undefined
   behaviour
2) the crash may happen well after the query has finished returning
   results.

This change solves the problem in the following way:
In 'CodegenSymbolEmitter' we introduce a counter that is incremented in
NotifyObjectEmitted() and decremented in NotifyFreeingObject(). At the
time of the destruction of the 'CodegenSymbolEmitter', this counter
should be zero - if it is greater than zero, the LLVM execution engine
to which the 'CodegenSymbolEmitter' is subscribed is still alive and it
will try to notify the symbol emitter when the object file is freed
(most likely when the execution engine itself is destroyed), leading to
use-after-free

We also add a hidden startup flag,
'--codegen_symbol_emitter_log_successful_destruction_test_only'. When it
is set to true, 'CodegenSymbolEmitter' will log a message when it is
being destroyed correctly (i.e. when the counter is zero and
use-after-free will not happen). We use it in the tests - if we don't
have the expected message in the logs (after some timeout), the test
fails.

Testing:
 - modified the tests
   TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map}
   so they reliably detect use-after-free.

Change-Id: I61b9b0de9c896f3de7eb1be7de33d822b1ab70d0
Reviewed-on: http://gerrit.cloudera.org:8080/20318
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-15 16:20:37 +00:00
Daniel Becker
3435baba67 IMPALA-12306: (Part 1) Make codegen cache tests with symbol emitter more robust
The codegen cache tests that include having a symbol emitter (previously
TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map})
introduced by IMPALA-12260 were added to ensure we don't produce a
use-after-free.

There are two problems with these tests:
  1. Setting the codegen cache size correctly in the tests has proved to
     be difficult because new commits and different build types (debug
     vs. release) have a huge effect on what sizes are appropriate. We
     have had many build failures because of this.

  2. Use-after-free is undefined behaviour and does not guarantee a
     crash but the tests rely on the crash to catch the bug described in
     IMPALA-12260.

This commit solves the first problem. We use the
'--codegen_cache_entry_bytes_charge_overhead' startup flag to
artificially assign a higher size (memory charge) to the cache entries,
compared to which the real size, and therefore also changes in the real
size, are insignificant.

Change-Id: If801ae6d3d9f5286ed886b1d06c37a32bc1d2c54
Reviewed-on: http://gerrit.cloudera.org:8080/20304
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-08-09 12:28:52 +00:00
Daniel Becker
645abfc353 IMPALA-12269: Codegen cache false negative because of function names hash
Codegen cache entries (execution engines holding an LLVM code module)
are stored by keys derived from the unoptimised llvm modules: the key is
either the whole unoptimised module (normal mode) or its hash (optimal
mode). Because hash collisions are possible (in optimal mode), as an
extra precaution we also compare the hashes of the function names in the
current and the cached module. However, when assembling the function
name list we do not filter out duplicate function names, which may
result in cases where the unoptimised llvm modules are identical but the
function name hashes do not match.

Example:
First query:
  select int_col, tinyint_col
  from alltypessmall
  order by int_col desc
  limit 20;

Second query:
  select tinyint_col
  from alltypessmall
  order by int_col desc
  limit 20;

In the first query, there are two 'SlotRef' objects referencing
'tinyint_col' which want to codegen a 'GetSlotRef()' function. The
second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the
already codegen'd functions, finds the function created by its first
invokation and returns that. The two 'SlotRef' objects will use the same
'llvm::Function' and there will be only one copy of it in the module,
but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with
this function in order for their respective function pointers to be set
after JIT-compilation.

'LlvmCodeGen::GetAllFunctionNames()' will return the names of all
functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called,
including duplicates.

The second query generates the same unoptimised module as the first
query (for the corresponding fragment), but does not have a duplicated
'GetSlotRef()' function in its function name list, so the cached module
is rejected.

Note that this also results in the cached module being evicted when the
new module from the second query is inserted into the cache because the
new module will have the same key as the cached one (the modules are
identical).

This change fixes this problem by using a de-duplicated and sorted
function name list.

Testing:
  - Added a test in test_codegen_cache.py that asserts that there is a
    cache hit and no eviction in the above example.

Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139
Reviewed-on: http://gerrit.cloudera.org:8080/20168
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-08-04 09:47:50 +00:00
Daniel Becker
66b701f806 IMPALA-12292: TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map} fail in builds
The above codegen cache tests were introduced by IMPALA-12260. They run
two queries and the first query produces two codegen cache entries. The
tests aim to bring about the following scenario:

1. both codegen cache entries from the first query fit in the cache
AND
2. both entries from the first query are evicted during the second
   query.

The parameters that can be tuned are the following:
1. the size of the codegen cache entries of the first query
2. the size of the codegen cache entries of the second query
3. the size of the codegen cache.

If the parameters are chosen badly or the sizes of the codegen cache
entries change because of other Impala changes (e.g. codegen
optimisations), the conditions may not be satisfied and the tests may
fail like they did now.

This change makes the tests more robust by
 - increasing the cache footprint of the second query (from 487.40 KB to
   663.68 KB)
 - choosing the size of the codegen cache so as to leave as much margin
   on each side as possible. At present
     - the minimal codegen cache size so that both entries from the
       first query fit the cache is around 2.4 MB
     - the maximal cache size so that both entries from the first query
       are evicted during the second query is around 4.1 MB
   Therefore we choose a cache size of 3.25 MB, which lies in the middle.

Experience has shown that this setup is fragile and breaks easily when
new commits are added to Impala. Therefore this change relaxes some of
the assertions in the tests as a temporary measure to prevent build
failures. For this and other reasons IMPALA-12306 was opened to make
these tests more robust.

Change-Id: I15320b8c0d06f4d93927b19731c11bd4e15b3690
Reviewed-on: http://gerrit.cloudera.org:8080/20224
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-28 08:52:57 +00:00
Daniel Becker
c0feea2c9f IMPALA-12260: Crash if '--asm_module_dir' is set
If Impala is started with the --asm_module_dir flag set and codegen
cache is used, Impala crashes.

The problem is with the lifetime of 'LlvmCodeGen::symbol_emitter_'. It
is registered as an event listener with the current
'llvm::ExecutionEngine'. Then the engine is cached but the
'LlvmCodeGen' object, which owns the symbol emitter, is destroyed at the
end of the query. When the cached execution engine is destroyed later,
it tries to notify the symbol emitter, but it has already been destroyed
so its pointer is invalid.

This change solves the problem by wrapping the execution engine and the
symbol emitter together in a wrapper class, LlvmExecutionEngineWrapper,
that is responsible for managing their lifetimes. The LlvmCodeGen and
the CodeGenCache classes now hold shared pointers to this wrapper class.
If we add other objects in the future whose lifetimes are tied to the
execution engine (but are not owned by it), they should be put into the
wrapper class.

Testing:
 - added regression tests in tests/custom_cluster/test_codegen_cache.py
   that fail without this change.

Change-Id: I23f871abb962ad317f9c0075ca303c09dd56bcd9
Reviewed-on: http://gerrit.cloudera.org:8080/20155
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-13 16:50:44 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Yida Wu
e15610633e IMPALA-11965: Fix TestCodegenCache failure when codegen cache disabled by default
The patch fixes the testcase TestCodegenCache failure when the codegen
cache is changed to be disabled by default, because the testcase
assumes the codegen cache is enabled with the default setting.

The solution is to specify a default value to codegen_cache_capacity
in the testcase's start option, so that manually ensures the
codegen cache is on during the test.

Tests:
Passed TestCodegenCache in the exhaustive run with codegen cache
disabled by default.

Change-Id: I749a6ba68553834bdea908741aa7449ed32cd569
Reviewed-on: http://gerrit.cloudera.org:8080/19574
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-05 16:59:37 +00:00
Yida Wu
4bdb99938a IMPALA-11470: Add Cache For Codegen Functions
The patch adds supports of the cache for CodeGen functions
to improve the performance of sub-second queries.

The main idea is to store the codegen functions to a cache,
and reuse them when it is appropriate to avoid repeated llvm
optimization time which could take over hundreds of milliseconds.

In this patch, we implement a cache to store codegen functions.
The cache is a singleton instance for each daemon, and contains
multiple cache entries. Each cache entry is at the fragment
level, that is storing all the codegen functions of a fragment
in a cache entry, if one exactly same fragment comes again, it
should be able to find all the codegen functions it needs
from the specific cache entry, therefore saving the time.

The module bitcode is used as the key to the cache, which will
be generated before the module optimization and final
compilation. If codegen_cache_mode is NORMAL, which is by default,
we will store the full bitcode string as the key. Otherwise, if
codegen_cache_mode is set to OPTIMAL, we will store a key only
containing the hash code and the total length of a full key to
reduce memory consumption.

Also, KrpcDataStreamSenderConfig::CodegenHashRow() is changed to
pass the hash seed as an argument because it can't hit the cache
for the fragment if using a dynamic hash seed within the codegen
function.

Codegen cache is disabled automatically for a fragment using a
native udf, because it can lead to a crash in this case. The reason
for that is the udf is loaded to the llvm execution engine global
mapping instead of the llvm module, however, the current key to the
cache entry uses the llvm module bitcode which can't reflect the
change of the udf address if the udf is reloaded during runtime,
for example database recreation, then it could lead to a crash due
to using an old udf address from the cache. Disable it until there
is a better solution, filed IMPALA-11771 to follow.

The patch also introduces following new flags for start and query
options for feature configuration and operation purpose.
Start option for configuration:
  - codegen_cache_capacity: The capacity of the cache, if set to 0,
    codegen cache is disabled.

Query option for operations:
  - disable_codegen_cache: Codegen cache will be disabled when it
    is set to true.

  - codegen_cache_mode: It is defined by a new enum type
    TCodeGenCacheMode. There are four types, NORMAL and OPTIMAL,
    and two other types, NORMAL_DEBUG and OPTIMAL_DEBUG, which are
    the debug mode of the first two types.
    If using NORMAL, a full key will be stored to the cache, it will
    cost more memory for each entry because the key is the bitcode
    of the llvm module, it can be large.
    If using OPTIMAL, the cache will only store the hash code and
    length of the key, it reduces the memory consumption largely,
    however, could be possible to have collision issues.
    If using debug modes, the behavior would be the same as the
    non-debug modes, but more logs or statistics will be allowed,
    that means could be slower.
    Only valid when disable_codegen_cache is set to false.

New impalad metrics:
  - impala.codegen-cache.misses
  - impala.codegen-cache.entries-in-use
  - impala.codegen-cache.entries-in-use-bytes
  - impala.codegen-cache.entries-evicted
  - impala.codegen-cache.hits
  - impala.codegen-cache.entry-sizes

New profile Metrics:
  - CodegenCacheLookupTime
  - CodegenCacheSaveTime
  - ModuleBitcodeGenTime
  - NumCachedFunctions

TPCH-1 performance evaluation (8 iteration) on AWS m5a.4xlarge,
the result removes the first iteration to show the benefit of the
cache:
Query     Cached(s) NoCache(s) Delta(Avg) NoCodegen(s)  Delta(Avg)
TPCH-Q1    0.39      1.02       -61.76%     5.59         -93.02%
TPCH-Q2    0.56      1.21       -53.72%     0.47         19.15%
TPCH-Q3    0.37      0.77       -51.95%     0.43         -13.95%
TPCH-Q4    0.36      0.51       -29.41%     0.33         9.09%
TPCH-Q5    0.39      1.1        -64.55%     0.39         0%
TPCH-Q6    0.24      0.27       -11.11%     0.77         -68.83%
TPCH-Q7    0.39      1.2        -67.5%      0.39         0%
TPCH-Q8    0.58      1.46       -60.27%     0.45         28.89%
TPCH-Q9    0.8       1.38       -42.03%     1            -20%
TPCH-Q10   0.6       1.03       -41.75%     0.85         -29.41%
TPCH-Q11   0.3       0.93       -67.74%     0.2          50%
TPCH-Q12   0.28      0.48       -41.67%     0.38         -26.32%
TPCH-Q13   1.11      1.22       -9.02%      1.16         -4.31%
TPCH-Q14   0.55      0.78       -29.49%     0.45         22.22%
TPCH-Q15   0.33      0.73       -54.79%     0.44         -25%
TPCH-Q16   0.32      0.78       -58.97%     0.41         -21.95%
TPCH-Q17   0.56      0.84       -33.33%     0.89         -37.08%
TPCH-Q18   0.54      0.92       -41.3%      0.89         -39.33%
TPCH-Q19   0.35      2.34       -85.04%     0.35         0%
TPCH-Q20   0.34      0.98       -65.31%     0.31         9.68%
TPCH-Q21   0.83      1.14       -27.19%     0.86         -3.49%
TPCH-Q22   0.26      0.52       -50%        0.25         4%

From the result, it shows a pretty good performance compared to
codegen without cache (default setting). However, compared
to codegen disabled, as expected, for short queries, codegen
cache is not always faster, probably because for the codegen
cache, it still needs some time to prepare the codegen functions
and generate an appropriate module bitcode to be the key, if
the time of the preparation is larger than the benefit from
the codegen functions, especially for the extremely short queries,
the result can be slower than not using the codegen. There could
be room to improve in future.

We also test the total cache entry size for tpch queries. The data
below shows the total codegen cache used by each tpch query. We
can see the optimal mode is very helpful to reduce the size of
the cache, and the reason is the much smaller key in optimal mode
we mentioned before because the only difference between two modes
is the key.

Query     Normal(KB)  Optimal(KB)
TPCH-Q1     604.1       50.9
TPCH-Q2     973.4       135.5
TPCH-Q3     561.1       36.5
TPCH-Q4     423.3       41.1
TPCH-Q5     866.9       93.3
TPCH-Q6     295.9       4.9
TPCH-Q7     1105.4      124.5
TPCH-Q8     1382.6      211
TPCH-Q9     1041.4      119.5
TPCH-Q10    738.4       65.4
TPCH-Q11    1201.6      136.3
TPCH-Q12    452.8       46.7
TPCH-Q13    541.3       48.1
TPCH-Q14    696.8       102.8
TPCH-Q15    1148.1      95.2
TPCH-Q16    740.6       77.4
TPCH-Q17    990.1       133.4
TPCH-Q18    376         70.8
TPCH-Q19    1280.1      179.5
TPCH-Q20    1260.9      180.7
TPCH-Q21    722.5       66.8
TPCH-Q22    713.1       49.8

Tests:
Ran exhaustive tests.
Added E2e testcase TestCodegenCache.
Added unit testcase LlvmCodeGenCacheTest.

Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Reviewed-on: http://gerrit.cloudera.org:8080/19181
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-07 21:57:46 +00:00