impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 12:02:10 -05:00

Author	SHA1	Message	Date
Tim Armstrong	d7246d64c7	IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Reviewed-on: http://gerrit.cloudera.org:8080/4655 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-09 03:27:12 +00:00
Tim Armstrong	381e719065	IMPALA-4266: Java udf returning string can give incorrect results The memory management of string results was wrong: strings returned from Exprs must live until the next time FreeLocalAllocations() is called. Otherwise the buffer holding the string is freed or reused by the next UDF call. The fix is to copy string values into a buffer with the right lifetime. Testing: Added a regression test based on Bharath's example that reproduced the bug reliably. Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161 Reviewed-on: http://gerrit.cloudera.org:8080/4941 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 02:47:11 +00:00
Skye Wanderman-Milne	9f366645ab	IMPALA-3378/IMPALA-3379: fix various JNI issues This patch: 1) Removes JniUtil::Cleanup() and JniUtil::global_refs_. We never called Cleanup(), and all the jobjects in global_refs_ are meant to have the lifetime of the impalad process. This makes JniUtil::GetGlobalClassRef() and JniUtil::LocalToGlobalRef() thread-safe (which fixes IMPALA-3379). 2) Introduces a new JniUtil::FreeGlobalRef() method, which is a wrapper around the JNI DeleteGlobalRef() method. 3) Change JNI users to use the JniUtil methods instead of the JNI methods directly where appropriate. This makes error checking more consistent, and makes it easier to find all JNI uses. This is possible since GetGlobalClassRef() and LocalToGlobalRef() are now thread-safe and don't leak jobjects. 4) Removes HiveUdfCall::JniContext::cl, as well as other JNI constants, and replaces them with process-wide static singletons. It then moves the initialization to a new HiveUdfCall::Init() method which is once called in the main thread at the beginning of the process. This fixes IMPALA-3378. 5) Deletes the JniContext created for each HiveUdfCall Unfortunately I am not able to repro IMPALA-3378 so there is no test case (I didn't attempt IMPALA-3379 but it's similar). Change-Id: I8cd089e355d2ee2d5ace81f05b214272c05cf941 Reviewed-on: http://gerrit.cloudera.org:8080/2820 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:41 -07:00
Bharath Vissapragada	ef0dac661c	IMPALA-2843: Persist hive udfs across catalog restarts This commit adds a new feature to persist hive/java udfs across catalog restarts. IMPALA-1748 already added this for non-java udfs by storing them in parameters map of the Db object and reading them back at catalog startup. However we follow a different approach for hive udfs by converting them to Hive's function format and adding them as hive functions to the metastore. This makes it possible to share udfs between hive and Impala as the udfs added from one service are accessible to other. This commit takes care of format conversions between hive and impala and user can just add function once in either of the services. Background: Hive and impala treat udfs differently. Hive resolves the evaluate function in the udf class at runtime depending on the data types of the input arguments. So user can add one function by name and can pass any arguments to it as long as there is a compatible evaluate function in the udf class. However Impala takes the input types of the udf as a part of function definition (that maps to only one evaluate function) and loads the function only for those set of input argument types. If we have multiple 'evaluate' methods, we need to add multiple functions one for each of them. This commit adds new variants of CREATE \| DROP FUNCTIONS to Impala which lets the user to create and drop hive/java udfs without input argument types or return types. Catalog takes care of loading/dropping the udf signatures corresponding to each "evaluate" method in the udf symbol class. The syntax is as follows, CREATE FUNCTION [IF NOT EXISTS] <function name> <function_opts> DROP FUNCTION [IF EXISTS] <function name> Examples: CREATE FUNCTION IF NOT EXISTS foo location '/path/to/jar' SYMBOL='TestUdf'; CREATE FUNCTION bar location '/path/to/jar' SYMBOL='TestUdf2'; DROP FUNCTION foo; DROP FUNCTION IF EXISTS bar; The older way of creating hive/java udfs with specific signature is still supported, however they are not persisted across restarts. So a restart of catalog can wipe them out. Additionally this commit also loads all the compatible java udfs added outside of Impala and they needn't be separately loaded. One thing to note here is that the functions added using the new CREATE FUNCTION can only be dropped using the new DROP FUNCTION syntax (without signature). The same rule applies for the java udfs added using the old CREATE FUNCTION syntax (with signature). Change-Id: If31ed3d5ac4192e3bc2d57610a9a0bbe1f62b42d Reviewed-on: http://gerrit.cloudera.org:8080/2250 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-02-19 23:04:03 -08:00

4 Commits