impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Michael Ho	1d933919ee	IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen This change fixes expr-test.cc to work with codegen as it's originally intended. Fixing it uncovers a couple of bugs fixed in this patch: IMPALA-4705: When an IR function is materialized, its function body is parsed to find all its callee functions to be materialized too. However, the old code doesn't detect callee fnctions referenced indirectly (e.g. a callee function passed as argument to another function). This change fixes the problem above inspecting the use lists of llvm::Function objects. When parsing the bitcode module into memory, LLVM already establishes a use list for each llvm::Value object which llvm::Function is a subclass of. A use list contains all the locations in the module in which the Value is referenced. For a llvm::Function object, that would be its call sites and constant expressions referencing the functions. By using the use lists of llvm::Function in the module, a global map is established at Impala initialization time to map functions to their corresponding callee functions. This map is then used when materializing a function to ensure all its callee functions are also materialized recursively. IMPALA-4779: conditional function isfalse(), istrue(), isnotfalse(), isnotrue() aren't cross-compiled so they will lead to unexpected query failure when codegen is enabled. This change will cross-compile these functions. IMPALA-4780: next_day() always returns NULL when codegen is enabled. The bound checks for next_day() use some class static variables initialized in the global constructors (@llvm.global_ctors). However, we never execute the global constructors before calling the JIT compiled functions. This causes these variables to remain as zero, causing all executions of next_day() to fail the bound checks. The reason why these class static variables aren't compiled as global constants in LLVM IR is that TimestampFunctions::MIN_YEAR is not a compile time constant. This change fixes the problem above by setting TimestampFunctions::MIN_YEAR to a known constant value. A DCHECK is added to verify that it matches the value defined in the boost library. Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608 Reviewed-on: http://gerrit.cloudera.org:8080/5732 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-03 23:35:25 +00:00
Tim Armstrong	88448d1d4a	IMPALA-4586: don't constant fold in backend This patch ensures that setting the query option enable_expr_rewrites=false will disable both constant folding in the frontend (which it did already) and constant caching in the backend (which is enabled in this patch). This gives a way for users to revert to the old behaviour of non-deterministic UDFs before these optimisations were added in Impala 2.8. Before this patch, the backend would cache values based on IsConstant(). This meant that there was no way to override caching of values of non-deterministic UDFs, e.g. with enable_expr_rewrites. After this patch, we only cache literal values in the backend. This offers the same performance as before in the common case where the frontend will constant fold the expressions anyway. Also rename some functions to more cleanly separate the backend concepts of "constant" expressions and expressions that can be evaluated without a TupleRow. In a future change (IMPALA-4617) we should remove the IsConstant() analysis logic from the backend entirely and pass the information from the frontend. We should also fix isConstant() in the frontend so that it only returns true when it is safe to constant-fold the expression (IMPALA-4606). Once that is done, we could revert back to using IsConstant() instead of IsLiteral(). Testing: Added targeted test to test constant folding of UDFs: we expect different results depending on whether constant folding is enabled. Also run TestUdfs with expr rewrites enabled and disabled, since this can exercise different code paths. Refactored test_udfs somewhat to avoid running uninteresting combinations of query options for targeted tests and removed some 'drop * if not exists' statements that aren't necessary when using unique_database. This change revealed flakiness in test_mem_limit, which seems to have only worked by coincidence. Updated TrackAllocation() to actually set the query status when a memory limit is exceeded. Looped this test for a while to make sure it isn't flaky any more. Also fix other test bugs where the vector argument is modified in-place, which can leak out to other tests. Change-Id: I0c76e3c8a8d92749256c312080ecd7aac5d99ce7 Reviewed-on: http://gerrit.cloudera.org:8080/5391 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-08 04:53:53 +00:00
Michael Ho	9337518137	IMPALA-4595: Ignore discarded functions after linking For LLVM IR UDF, Impalad will link an external LLVM module in which the IR UDF is defined with the main module. If it happens that a symbol is defined in both modules, LLVM may choose to discard the one defined in the external module. The discarded function and its callee will not be present in the linked module. In IMPALA-4595, udf-sample.cc was compiled without any optimization. Duplicated definition such as StringVal::null() may have different inlining level between the external module and the main module. When the duplicated definition in the external module is discarded, some of its callee functions (which are not inlined) may not be defined in the main module so they can no longer be located in the linked module. This trips up some code in the LlvmCodegen::LinkModule(). In particular, when parsing for functions in external module which are materialized during linking, certain functions may not be present due to the reason above. Impalad will hit a DCHECK in debug build or crash due to null pointer access in release build. This change fixes the problem above by taking into account that certain functions may not be defined anymore after linking. This change also fixes two incorrect status propagation in fe-support.cc. Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7 Reviewed-on: http://gerrit.cloudera.org:8080/5384 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-07 22:52:35 +00:00
Alex Behm	bbf5255d0e	IMPALA-1788: Fold constant expressions. Adds a new ExprRewriteRule for replacing constant expressions with their literal equivalent via BE evaluation. Applies the new rule together with the existing ones on the parse tree, after analysis. Limitations - Constant folding is applied on the unresolved expressions. As a result, it only works for expressions that are constant within a single query block, as opposed to expressions that may become constant after fully substituting inline-view exprs. - Exprs are not normalized, so some opportunities for constant folding are missed for certain expr-tree shapes. This patch includes the following interesting changes: - Introduces a timestamp literal that can only be produced by constant folding (not expressible directly via SQL). - To make sure that rewrites have no user-visible effect, the original result types and column labels of the top-level statement are restored after the rewrites are performed. - Does not fold exprs if their evaluation resulted in a warning or error, or if the resulting value is not representable by corresponding FE LiteralExpr. - Fixes an existing issue with converting strings between the FE/BE. String produced in the BE that have characters with a value > 127 are not correctly deserialized into a Java String via thrift. We detect this case during constant folding and abandon folding of such exprs. - Fixes several issues with detecting/reporting errors in NativeEvalConstExprs(). - Cleans up ExprContext::GetValue() into ExprContext::GetConstantValue() which clarifies its only use of evaluating exprs from the FE. Testing: - Modifies expr-test.cc to run all tests through the constant folding path. - Adds basic planner and rewrite rule tests. - Exhaustive test run passed Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9 Reviewed-on: http://gerrit.cloudera.org:8080/5109 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-23 21:11:30 +00:00
Tim Armstrong	d7246d64c7	IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Reviewed-on: http://gerrit.cloudera.org:8080/4655 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 03:27:12 +00:00
Michael Ho	b15d992abe	IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen() This patch is mostly mechanical move of codegen related logic from each exec node's Prepare() to its Codegen() function. After this change, code generation will no longer happen in Prepare(). Instead, it will happen after Prepare() completes in PlanFragmentExecutor. This is an intermediate step towards the final goal of sharing compiled code among fragment instances in multi-threading. As part of the clean up, this change also removes the logic for lazy codegen object creation. In other words, if codegen is enabled, the codegen object will always be created. This simplifies some of the logic in ScalarFnCall::Prepare() and various Codegen() functions by reducing error checking needed. This change also removes the logic added for tackling IMPALA-1755 as it's not needed anymore after the clean up. The clean up also rectifies a not so well documented situation. Previously, even if a user explicitly sets DISABLE_CODEGEN to true, we may still codegen a UDF if it was written in LLVM IR or if it has more than 8 arguments. This patch enforces the query option by failing the query in both cases. To run the query, the user must enable codegen. This change also extends the number of arguments supported in the interpretation path of ScalarFn to 20. Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Reviewed-on: http://gerrit.cloudera.org:8080/4651 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-10-19 08:18:37 +00:00
Michael Ho	276376acac	IMPALA-3674: Lazy materialization of LLVM module bitcode. Previously, each fragment using dynamic code generation will parse the bitcode module and populate the LLVM data structures for all the functions and their bodies in the bitcode module. This is wasteful as we may only use a few functions out of all the functions parsed. We rely on dead code elimination to delete most of the unused functions so we won't waste time compiling them. This change implements lazy materialization of the functions' bodies. On the initial parse of the bitcode module, we just create the Function objects for each function in the module. The functions' bodies will be materialized on demand from the bitcode module when they are actually referenced in the query. This ensures that the prepare time during codegen is proportional to the number of IR functions referenced by the query instead of being proportional to the total number of IR functions in the module. This change also stops cross-compiling BufferedTupleStream::GetTupleRow() as there isn't much benefit for doing it. In addition, move the ctors and dtors of LikePredicate to the header file to avoid an unnecessary alias in the IR module. For TPCH-Q2, a fragment which only codegen 9 functions used to spend 146ms in codegen. It now goes down to 35ms, a 76% reduction. CodeGen:(Total: 146.041ms, non-child: 146.041ms, % non-child: 100.00%) - CodegenTime: 0.000ns - CompileTime: 2.003ms - LoadTime: 0.000ns - ModuleBitcodeSize: 2.12 MB (2225304) - NumFunctions: 9 (9) - NumInstructions: 129 (129) - OptimizationTime: 29.019ms - PrepareTime: 114.651ms CodeGen:(Total: 35.288ms, non-child: 35.288ms, % non-child: 100.00%) - CodegenTime: 0.000ns - CompileTime: 1.880ms - LoadTime: 0.000ns - ModuleBitcodeSize: 2.12 MB (2221276) - NumFunctions: 9 (9) - NumInstructions: 129 (129) - OptimizationTime: 5.101ms - PrepareTime: 28.044ms Change-Id: I6ed7862fc5e86005ecea83fa2ceb489e737d66b2 Reviewed-on: http://gerrit.cloudera.org:8080/3220 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2016-07-20 18:30:25 -07:00
Tim Armstrong	75887730cb	IMPALA-2233: avoid loss of precision in function arguments This patch changes the resolution of overloaded functions so that we prefer functions where there is no loss of precision in argument types. Previously, the logic would happily convert DECIMAL to FLOAT even if there was a more suitable overload available. E.g. greatest(TINYINT, DECIMAL) was resolved to greatest(FLOAT...) instead of greatest(DECIMAL). This only changes behaviour when no overload exactly matches the argument types, but the arguments can be converted with no loss of precision, e.g. TINYINT to DECIMAL. This patch introduces a conceptual distinction between strict and non-strict compatibility. All contexts aside from function matching use non-strict to support the current behavior of implicitly casting decimals to floats/doubles. This patch also makes resolution of overloaded functions consistent regardless of what order functions were added to the Db - overloads are checked in a canonical order. Switching to this canonical order revealed further problems with overload resolution where the correct overload was selected only because of the order in which it was added to the database. For example, the logic equally preferred resolving fn(STRING, TINYINT) to fn(TIMESTAMP, INT) or fn(STRING, INT). This required changes to the compatibility matrix. Various cleanup and simplification of the type compatibility logic is also included. Change-Id: I50e657c78cdcb925b616b5b088b801510020e255 Reviewed-on: http://gerrit.cloudera.org:8080/845 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:40 -07:00
Skye Wanderman-Milne	cfd4ff2546	IMPALA-1589: allow up to 8 non-variadic arguments in the interpreted UDF path Change-Id: Ie17763366311554ee1a58ed6b8a8d40973ae20d9 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5604 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-12-16 18:53:16 -08:00
Alex Behm	f696861c5c	Throw error on unrecognized test sections. Our .test file parser used to not abort tests when there is a malformed test/section. This patch changes that behavior to report an error and treat the test as failed. Quite a few tests were not well-formed, and were not executed as a result. This patch fixes those tests. Arguably, the test file parser should be more flexible in which places to accept comments, but this patch does not address that problem. Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-02 18:08:09 -08:00
Skye Wanderman-Milne	390e773a44	rand() is not a constant expr Also fixes a bug in Expr::DebugString() Change-Id: I32b53072755781d0858481187864d2319b9ae1cb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5400 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 6de9fab17a5032dd7c9d1ef6b8071703c67d223f) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5425 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-11-25 18:38:27 -08:00
Skye Wanderman-Milne	3a6600c964	Fix UDF test UDF invocations in udf.test should not specify a database. This is how we switch between testing IR UDFs in the ir_function_test database and native UDFs in the native_function_test database. Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295	2014-06-26 22:17:56 -07:00
Skye Wanderman-Milne	6ac9a8104b	IMPALA-1009: UDF/UDA leaks should not fail queries With this change, leaky UDFs built with the SDK will still fail when using the test harness, but leaky UDFs running in Impala will only trigger a warning. This change also updates the test infrastructure to always check for non-fatal errors/warnings. Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938	2014-06-10 21:46:47 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Skye Wanderman-Milne	c8b2017093	Add decimal UDF/UDA support. Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739	2014-05-29 20:49:53 -07:00
Dimitris Tsirogiannis	ca86e470de	IMPALA-887: Improve partition pruning time This commit is the first step in improving the performance of partition pruning. Currently, Impala can prune approximately 10K partitions per sec, thereby introducing significant overhead for huge table with a large number of partitions. With this commit we reduce that overhead by 3X by batching the partition pruning calls to the backend. Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687	2014-05-26 13:10:12 -07:00
Alex Behm	66a6c1f312	Fix UDF query test files. Change-Id: Idea277ea2d20c47b2a81b0f2f06c48455de2ea45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1780 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-03-06 07:37:14 -08:00
Skye Wanderman-Milne	6ceed1e632	UDF API additions This patch introduces the ability to specify a prepare and close function for a UDF, as well as FunctionContext methods for maintaining state across UDF invocations within a query. Many of the changes are related to adding an Expr::Open() function which calls the UDF's prepare function, if specified (it has to be called in Open() since the LLVM module must be compiled first). Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757	2014-03-05 07:32:34 -08:00
Skye Wanderman-Milne	203fc66456	Add GetTypeDesc() method to FunctionContext. This is currently only implemented for NativeUdfExpr. Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720	2014-03-01 20:24:30 -08:00
Nong Li	d5d4b4785b	Fix broken udf test case. Should not specify DB. Change-Id: I5f6343cbef9f52d349130360e029b38b23d0187a Reviewed-on: http://gerrit.ent.cloudera.com:8080/1505 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 11:34:56 -08:00
Nong Li	7d578a9e54	Cleanup for IMPALA-774 fix. Change-Id: I47bce71c482b3576957e88980f764c30f45229a9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1454 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1470	2014-02-05 22:58:51 -08:00
Nong Li	ccd8c0338f	IMPALA-774: Fix runtimestate setup when evaluating expr from FE. We weren't initializing the udf mem pool causing UDFs to return strings to crash if used as part of a constant expression. Change-Id: Ic3a0e556aec8ce03a9e59f3ccf6980c682046b50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1447 Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-02-05 11:02:27 -08:00
Skye Wanderman-Milne	9d05d6d03a	Allow UDF tests to run in parallel. Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d Reviewed-on: http://gerrit.ent.cloudera.com:8080/727 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:47 -08:00
Skye Wanderman-Milne	7e8e184acf	Allow UDFs in conjunct expressions. This patch refactors HDFSScanNode to copy and prepare all conjunct exprs in Prepare(), rather than in the scanner threads. This is necessary so the UDF exprs get codegen'd. Prepare() also only codegens the functions for the necessary file formats now, rather than for all file formats regardless of what's actually be scanned. Change-Id: Ic3220cbd0cba9a3baa138b1f50ecdc6889ed0cd1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/710 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne	97a6b12e37	Fix UDFs used in partition pruning exprs. Exprs used for partition pruning are prepared/evaluated with a separate RuntimeState. If these exprs use UDFs, the runtime state needs access to the process's ExecEnv so we can use the LibCache and the IR produced by the UDF exprs needs to be optimized and jit'd. Change-Id: If7c1d6ebc0015ef3c21a0421c1a36cad4be66625 Reviewed-on: http://gerrit.ent.cloudera.com:8080/695 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne	b41ff0c8cd	Modify test-udfs.cc so there are no undefined symbols in shared library. AnalyzeDDLTest was failing because the fesupport binary couldn't resolve a function used in libTestUdfs.so (the function was defined in udf.cc, rather than udf.h). I couldn't figure out how to cleanly build udf.cc into the libTestUdfs.so, so instead I removed the use of the function in test-udfs.cc. Change-Id: I81243547584a5b49a5f9265d0d17e035e18d6110 Reviewed-on: http://gerrit.ent.cloudera.com:8080/694 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:27 -08:00
Nong Li	911cfc1bb9	Fix vararg UDFs. Change-Id: I0e202b984ece7de3d220b6ce89b0c0a4c9edcb45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/688 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:26 -08:00
Skye Wanderman-Milne	8692e7df8d	Add timestamp support to CodegenAnyVal Change-Id: I2bbeae16660709c2c15d545e6d1c791912e880db Reviewed-on: http://gerrit.ent.cloudera.com:8080/655 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:21 -08:00
Nong Li	1eb2b7a964	Add execution for vararg UDFs. Change-Id: I46e5670c09ac0b8e62f39dfc832fe880dd1dc995 Reviewed-on: http://gerrit.ent.cloudera.com:8080/572 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:09 -08:00
Skye Wanderman-Milne	b7f83bcd73	Add support for LLVM IR UDFs. This patch also adds a number of improvements to NativeUdfExpr. Highlights include: * Correctly handling the lowering of AnyVal struct types (required for ABI compatibility) * A rudimentary library cache for reusing handles produced by dlopen * More complicated test cases Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/540 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:03 -08:00
Nong Li	8963d79f51	Fix build break from UdfContext rename. Change-Id: Ia3df23fcba7d3812ae90565daab89916cbb50861 Reviewed-on: http://gerrit.ent.cloudera.com:8080/549 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:01 -08:00
Nong Li	e39de94316	Add parser/analysis to support UDAs. I looked around some and I think having create/drop/show [aggregate] function seems reasonable and extends nicely for UDTs. The create aggregate function can accept a lot of arguments. The non-essential one, I went with resolving them by name rather than position (i.e. argName="value"). I think this is better for the user than specifying it by position. The grammar is: CREATE AGGREGATE <name>(<arg_types>) RETURNS <type> [INTERMEDIATE <type>] LOCATION '/path' UpdateFn='Fn' [comment='comment'] [SerializeFn='symbol'] [MergeFn='symbol'] [InitFn='symbol'] [FinalizeFn='symbol'] The optional args at the end can be in any order. If the other symbols are not specified, we derive them from the UpdateFn symbol that's required. The analyzer would try to figure it out and fail if we can't find the derived symbol in the binary. The simplest example would be: CREATE AGGREGATE FUNCTION count(float) RETURNS BIGINT LOCATION '/path' UpdateFn='CountUpdateFn'; In which case we assume the intermediate type is the return type and the other functions are called 'CountInitFn', 'CountSerializeFn', 'CountMergeFn' 'CountFinalizeFn'. Change-Id: Iefc5741293050f5b295df28e9d1a7d039ead8675 Reviewed-on: http://gerrit.ent.cloudera.com:8080/513 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:59 -08:00
Skye Wanderman-Milne	fd99db0300	First pass at UdfExpr. Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/439 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:50 -08:00

33 Commits