This change fixes expr-test.cc to work with codegen as it's
originally intended. Fixing it uncovers a couple of bugs fixed
in this patch:
IMPALA-4705: When an IR function is materialized, its
function body is parsed to find all its callee functions
to be materialized too. However, the old code doesn't
detect callee fnctions referenced indirectly (e.g. a
callee function passed as argument to another function).
This change fixes the problem above inspecting the use
lists of llvm::Function objects. When parsing the bitcode
module into memory, LLVM already establishes a use list
for each llvm::Value object which llvm::Function is a
subclass of. A use list contains all the locations in
the module in which the Value is referenced. For a
llvm::Function object, that would be its call sites and
constant expressions referencing the functions. By using
the use lists of llvm::Function in the module, a global
map is established at Impala initialization time to map
functions to their corresponding callee functions. This
map is then used when materializing a function to ensure
all its callee functions are also materialized recursively.
IMPALA-4779: conditional function isfalse(), istrue(),
isnotfalse(), isnotrue() aren't cross-compiled so they
will lead to unexpected query failure when codegen is enabled.
This change will cross-compile these functions.
IMPALA-4780: next_day() always returns NULL when codegen
is enabled. The bound checks for next_day() use some class
static variables initialized in the global constructors
(@llvm.global_ctors). However, we never execute the global
constructors before calling the JIT compiled functions.
This causes these variables to remain as zero, causing all
executions of next_day() to fail the bound checks. The reason
why these class static variables aren't compiled as global
constants in LLVM IR is that TimestampFunctions::MIN_YEAR is
not a compile time constant. This change fixes the problem
above by setting TimestampFunctions::MIN_YEAR to a known constant
value. A DCHECK is added to verify that it matches the value
defined in the boost library.
Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608
Reviewed-on: http://gerrit.cloudera.org:8080/5732
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
This patch ensures that setting the query option
enable_expr_rewrites=false will disable both constant folding in the
frontend (which it did already) and constant caching in the backend
(which is enabled in this patch). This gives a way for users to revert
to the old behaviour of non-deterministic UDFs before these
optimisations were added in Impala 2.8.
Before this patch, the backend would cache values based on IsConstant().
This meant that there was no way to override caching of values of
non-deterministic UDFs, e.g. with enable_expr_rewrites.
After this patch, we only cache literal values in the backend. This
offers the same performance as before in the common case where the
frontend will constant fold the expressions anyway.
Also rename some functions to more cleanly separate the backend concepts
of "constant" expressions and expressions that can be evaluated without
a TupleRow. In a future change (IMPALA-4617) we should remove the
IsConstant() analysis logic from the backend entirely and pass the
information from the frontend. We should also fix isConstant() in the
frontend so that it only returns true when it is safe to constant-fold
the expression (IMPALA-4606). Once that is done, we could revert back
to using IsConstant() instead of IsLiteral().
Testing:
Added targeted test to test constant folding of UDFs: we expect
different results depending on whether constant folding is enabled.
Also run TestUdfs with expr rewrites enabled and disabled, since this
can exercise different code paths. Refactored test_udfs somewhat to
avoid running uninteresting combinations of query options for
targeted tests and removed some 'drop * if not exists' statements
that aren't necessary when using unique_database.
This change revealed flakiness in test_mem_limit, which seems
to have only worked by coincidence. Updated TrackAllocation() to
actually set the query status when a memory limit is exceeded.
Looped this test for a while to make sure it isn't flaky any
more.
Also fix other test bugs where the vector argument is modified
in-place, which can leak out to other tests.
Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7
Reviewed-on: http://gerrit.cloudera.org:8080/5391
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
For LLVM IR UDF, Impalad will link an external LLVM module
in which the IR UDF is defined with the main module. If it
happens that a symbol is defined in both modules, LLVM may
choose to discard the one defined in the external module.
The discarded function and its callee will not be present
in the linked module.
In IMPALA-4595, udf-sample.cc was compiled without any
optimization. Duplicated definition such as StringVal::null()
may have different inlining level between the external module
and the main module. When the duplicated definition in
the external module is discarded, some of its callee
functions (which are not inlined) may not be defined in the
main module so they can no longer be located in the linked
module. This trips up some code in the LlvmCodegen::LinkModule().
In particular, when parsing for functions in external module
which are materialized during linking, certain functions may
not be present due to the reason above. Impalad will hit
a DCHECK in debug build or crash due to null pointer access
in release build.
This change fixes the problem above by taking into account
that certain functions may not be defined anymore after linking.
This change also fixes two incorrect status propagation in
fe-support.cc.
Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7
Reviewed-on: http://gerrit.cloudera.org:8080/5384
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Adds a new ExprRewriteRule for replacing constant expressions
with their literal equivalent via BE evaluation. Applies the
new rule together with the existing ones on the parse tree,
after analysis.
Limitations
- Constant folding is applied on the unresolved expressions.
As a result, it only works for expressions that are constant
within a single query block, as opposed to expressions that
may become constant after fully substituting inline-view exprs.
- Exprs are not normalized, so some opportunities for constant
folding are missed for certain expr-tree shapes.
This patch includes the following interesting changes:
- Introduces a timestamp literal that can only be produced
by constant folding (not expressible directly via SQL).
- To make sure that rewrites have no user-visible effect,
the original result types and column labels of the top-level
statement are restored after the rewrites are performed.
- Does not fold exprs if their evaluation resulted in a
warning or error, or if the resulting value is not
representable by corresponding FE LiteralExpr.
- Fixes an existing issue with converting strings between
the FE/BE. String produced in the BE that have characters
with a value > 127 are not correctly deserialized into a
Java String via thrift. We detect this case during constant
folding and abandon folding of such exprs.
- Fixes several issues with detecting/reporting errors in
NativeEvalConstExprs().
- Cleans up ExprContext::GetValue() into
ExprContext::GetConstantValue() which clarifies its only use
of evaluating exprs from the FE.
Testing:
- Modifies expr-test.cc to run all tests through the constant
folding path.
- Adds basic planner and rewrite rule tests.
- Exhaustive test run passed
Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
Reviewed-on: http://gerrit.cloudera.org:8080/5109
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.
There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
codegen path (we have implementations of all builtin aggregate
functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
are supported.
Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.
Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.
Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".
Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.
Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Reviewed-on: http://gerrit.cloudera.org:8080/4655
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch is mostly mechanical move of codegen related logic
from each exec node's Prepare() to its Codegen() function.
After this change, code generation will no longer happen in
Prepare(). Instead, it will happen after Prepare() completes in
PlanFragmentExecutor. This is an intermediate step towards the
final goal of sharing compiled code among fragment instances in
multi-threading.
As part of the clean up, this change also removes the logic for
lazy codegen object creation. In other words, if codegen is enabled,
the codegen object will always be created. This simplifies some
of the logic in ScalarFnCall::Prepare() and various Codegen()
functions by reducing error checking needed. This change also
removes the logic added for tackling IMPALA-1755 as it's not
needed anymore after the clean up.
The clean up also rectifies a not so well documented situation.
Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
we may still codegen a UDF if it was written in LLVM IR or if it
has more than 8 arguments. This patch enforces the query option
by failing the query in both cases. To run the query, the user
must enable codegen. This change also extends the number of
arguments supported in the interpretation path of ScalarFn to 20.
Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Reviewed-on: http://gerrit.cloudera.org:8080/4651
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Previously, each fragment using dynamic code generation will
parse the bitcode module and populate the LLVM data structures
for all the functions and their bodies in the bitcode module.
This is wasteful as we may only use a few functions out of all
the functions parsed. We rely on dead code elimination to
delete most of the unused functions so we won't waste time
compiling them.
This change implements lazy materialization of the functions'
bodies. On the initial parse of the bitcode module, we just
create the Function objects for each function in the module.
The functions' bodies will be materialized on demand from the
bitcode module when they are actually referenced in the query.
This ensures that the prepare time during codegen is proportional
to the number of IR functions referenced by the query instead
of being proportional to the total number of IR functions in
the module.
This change also stops cross-compiling BufferedTupleStream::GetTupleRow()
as there isn't much benefit for doing it. In addition, move the ctors
and dtors of LikePredicate to the header file to avoid an unnecessary
alias in the IR module.
For TPCH-Q2, a fragment which only codegen 9 functions used to spend
146ms in codegen. It now goes down to 35ms, a 76% reduction.
CodeGen:(Total: 146.041ms, non-child: 146.041ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 2.003ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 2.12 MB (2225304)
- NumFunctions: 9 (9)
- NumInstructions: 129 (129)
- OptimizationTime: 29.019ms
- PrepareTime: 114.651ms
CodeGen:(Total: 35.288ms, non-child: 35.288ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 1.880ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 2.12 MB (2221276)
- NumFunctions: 9 (9)
- NumInstructions: 129 (129)
- OptimizationTime: 5.101ms
- PrepareTime: 28.044ms
Change-Id: I6ed7862fc5e86005ecea83fa2ceb489e737d66b2
Reviewed-on: http://gerrit.cloudera.org:8080/3220
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This patch changes the resolution of overloaded functions so that we
prefer functions where there is no loss of precision in argument types.
Previously, the logic would happily convert DECIMAL to FLOAT even if
there was a more suitable overload available. E.g. greatest(TINYINT,
DECIMAL) was resolved to greatest(FLOAT...) instead of greatest(DECIMAL).
This only changes behaviour when no overload exactly matches the argument
types, but the arguments can be converted with no loss of precision,
e.g. TINYINT to DECIMAL.
This patch introduces a conceptual distinction between strict and
non-strict compatibility. All contexts aside from function matching
use non-strict to support the current behavior of implicitly casting
decimals to floats/doubles.
This patch also makes resolution of overloaded functions consistent
regardless of what order functions were added to the Db - overloads are
checked in a canonical order.
Switching to this canonical order revealed further problems with
overload resolution where the correct overload was selected only because
of the order in which it was added to the database. For example, the
logic equally preferred resolving fn(STRING, TINYINT) to
fn(TIMESTAMP, INT) or fn(STRING, INT). This required changes to the
compatibility matrix.
Various cleanup and simplification of the type compatibility logic is
also included.
Change-Id: I50e657c78cdcb925b616b5b088b801510020e255
Reviewed-on: http://gerrit.cloudera.org:8080/845
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.
Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.
Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.
Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
UDF invocations in udf.test should not specify a database. This is how
we switch between testing IR UDFs in the ir_function_test database and
native UDFs in the native_function_test database.
Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295
With this change, leaky UDFs built with the SDK will still fail when
using the test harness, but leaky UDFs running in Impala will only
trigger a warning. This change also updates the test infrastructure to
always check for non-fatal errors/warnings.
Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938
Float/Doubles are lossy so using those as the default literal type
is problematic.
Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This commit is the first step in improving the performance of partition
pruning. Currently, Impala can prune approximately 10K partitions per
sec, thereby introducing significant overhead for huge table with a
large number of partitions. With this commit we reduce that overhead by
3X by batching the partition pruning calls to the backend.
Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687
This patch introduces the ability to specify a prepare and close
function for a UDF, as well as FunctionContext methods for maintaining
state across UDF invocations within a query. Many of the changes are
related to adding an Expr::Open() function which calls the UDF's
prepare function, if specified (it has to be called in Open() since
the LLVM module must be compiled first).
Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757
We weren't initializing the udf mem pool causing UDFs to return strings to crash if used as part
of a constant expression.
Change-Id: Ic3a0e556aec8ce03a9e59f3ccf6980c682046b50
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1447
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This patch refactors HDFSScanNode to copy and prepare all conjunct
exprs in Prepare(), rather than in the scanner threads. This is
necessary so the UDF exprs get codegen'd. Prepare() also only codegens
the functions for the necessary file formats now, rather than for all
file formats regardless of what's actually be scanned.
Change-Id: Ic3220cbd0cba9a3baa138b1f50ecdc6889ed0cd1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/710
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Exprs used for partition pruning are prepared/evaluated with a
separate RuntimeState. If these exprs use UDFs, the runtime state
needs access to the process's ExecEnv so we can use the LibCache and
the IR produced by the UDF exprs needs to be optimized and jit'd.
Change-Id: If7c1d6ebc0015ef3c21a0421c1a36cad4be66625
Reviewed-on: http://gerrit.ent.cloudera.com:8080/695
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
AnalyzeDDLTest was failing because the fesupport binary couldn't
resolve a function used in libTestUdfs.so (the function was defined in
udf.cc, rather than udf.h). I couldn't figure out how to cleanly build
udf.cc into the libTestUdfs.so, so instead I removed the use of the
function in test-udfs.cc.
Change-Id: I81243547584a5b49a5f9265d0d17e035e18d6110
Reviewed-on: http://gerrit.ent.cloudera.com:8080/694
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:
* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases
Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
I looked around some and I think having create/drop/show [aggregate] function
seems reasonable and extends nicely for UDTs.
The create aggregate function can accept a lot of arguments. The non-essential one, I
went with resolving them by name rather than position (i.e. argName="value"). I think
this is better for the user than specifying it by position.
The grammar is:
CREATE AGGREGATE <name>(<arg_types>) RETURNS <type> [INTERMEDIATE <type>]
LOCATION '/path' UpdateFn='Fn' [comment='comment']
[SerializeFn='symbol'] [MergeFn='symbol'] [InitFn='symbol'] [FinalizeFn='symbol']
The optional args at the end can be in any order. If the other symbols are not
specified, we derive them from the UpdateFn symbol that's required. The analyzer
would try to figure it out and fail if we can't find the derived symbol in the binary.
The simplest example would be:
CREATE AGGREGATE FUNCTION count(float) RETURNS BIGINT LOCATION '/path'
UpdateFn='CountUpdateFn';
In which case we assume the intermediate type is the return type and the other functions
are called 'CountInitFn', 'CountSerializeFn', 'CountMergeFn' 'CountFinalizeFn'.
Change-Id: Iefc5741293050f5b295df28e9d1a7d039ead8675
Reviewed-on: http://gerrit.ent.cloudera.com:8080/513
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>