impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Author	SHA1	Message	Date
Tim Armstrong	75887730cb	IMPALA-2233: avoid loss of precision in function arguments This patch changes the resolution of overloaded functions so that we prefer functions where there is no loss of precision in argument types. Previously, the logic would happily convert DECIMAL to FLOAT even if there was a more suitable overload available. E.g. greatest(TINYINT, DECIMAL) was resolved to greatest(FLOAT...) instead of greatest(DECIMAL). This only changes behaviour when no overload exactly matches the argument types, but the arguments can be converted with no loss of precision, e.g. TINYINT to DECIMAL. This patch introduces a conceptual distinction between strict and non-strict compatibility. All contexts aside from function matching use non-strict to support the current behavior of implicitly casting decimals to floats/doubles. This patch also makes resolution of overloaded functions consistent regardless of what order functions were added to the Db - overloads are checked in a canonical order. Switching to this canonical order revealed further problems with overload resolution where the correct overload was selected only because of the order in which it was added to the database. For example, the logic equally preferred resolving fn(STRING, TINYINT) to fn(TIMESTAMP, INT) or fn(STRING, INT). This required changes to the compatibility matrix. Various cleanup and simplification of the type compatibility logic is also included. Change-Id: I50e657c78cdcb925b616b5b088b801510020e255 Reviewed-on: http://gerrit.cloudera.org:8080/845 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:40 -07:00
Skye Wanderman-Milne	cfd4ff2546	IMPALA-1589: allow up to 8 non-variadic arguments in the interpreted UDF path Change-Id: Ie17763366311554ee1a58ed6b8a8d40973ae20d9 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5604 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-12-16 18:53:16 -08:00
Alex Behm	f696861c5c	Throw error on unrecognized test sections. Our .test file parser used to not abort tests when there is a malformed test/section. This patch changes that behavior to report an error and treat the test as failed. Quite a few tests were not well-formed, and were not executed as a result. This patch fixes those tests. Arguably, the test file parser should be more flexible in which places to accept comments, but this patch does not address that problem. Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-02 18:08:09 -08:00
Skye Wanderman-Milne	390e773a44	rand() is not a constant expr Also fixes a bug in Expr::DebugString() Change-Id: I32b53072755781d0858481187864d2319b9ae1cb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5400 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 6de9fab17a5032dd7c9d1ef6b8071703c67d223f) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5425 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-11-25 18:38:27 -08:00
Skye Wanderman-Milne	3a6600c964	Fix UDF test UDF invocations in udf.test should not specify a database. This is how we switch between testing IR UDFs in the ir_function_test database and native UDFs in the native_function_test database. Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295	2014-06-26 22:17:56 -07:00
Skye Wanderman-Milne	6ac9a8104b	IMPALA-1009: UDF/UDA leaks should not fail queries With this change, leaky UDFs built with the SDK will still fail when using the test harness, but leaky UDFs running in Impala will only trigger a warning. This change also updates the test infrastructure to always check for non-fatal errors/warnings. Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938	2014-06-10 21:46:47 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Skye Wanderman-Milne	c8b2017093	Add decimal UDF/UDA support. Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739	2014-05-29 20:49:53 -07:00
Dimitris Tsirogiannis	ca86e470de	IMPALA-887: Improve partition pruning time This commit is the first step in improving the performance of partition pruning. Currently, Impala can prune approximately 10K partitions per sec, thereby introducing significant overhead for huge table with a large number of partitions. With this commit we reduce that overhead by 3X by batching the partition pruning calls to the backend. Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687	2014-05-26 13:10:12 -07:00
Alex Behm	66a6c1f312	Fix UDF query test files. Change-Id: Idea277ea2d20c47b2a81b0f2f06c48455de2ea45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1780 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-03-06 07:37:14 -08:00
Skye Wanderman-Milne	6ceed1e632	UDF API additions This patch introduces the ability to specify a prepare and close function for a UDF, as well as FunctionContext methods for maintaining state across UDF invocations within a query. Many of the changes are related to adding an Expr::Open() function which calls the UDF's prepare function, if specified (it has to be called in Open() since the LLVM module must be compiled first). Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757	2014-03-05 07:32:34 -08:00
Skye Wanderman-Milne	203fc66456	Add GetTypeDesc() method to FunctionContext. This is currently only implemented for NativeUdfExpr. Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720	2014-03-01 20:24:30 -08:00
Nong Li	d5d4b4785b	Fix broken udf test case. Should not specify DB. Change-Id: I5f6343cbef9f52d349130360e029b38b23d0187a Reviewed-on: http://gerrit.ent.cloudera.com:8080/1505 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 11:34:56 -08:00
Nong Li	7d578a9e54	Cleanup for IMPALA-774 fix. Change-Id: I47bce71c482b3576957e88980f764c30f45229a9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1454 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1470	2014-02-05 22:58:51 -08:00
Nong Li	ccd8c0338f	IMPALA-774: Fix runtimestate setup when evaluating expr from FE. We weren't initializing the udf mem pool causing UDFs to return strings to crash if used as part of a constant expression. Change-Id: Ic3a0e556aec8ce03a9e59f3ccf6980c682046b50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1447 Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-02-05 11:02:27 -08:00
Skye Wanderman-Milne	9d05d6d03a	Allow UDF tests to run in parallel. Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d Reviewed-on: http://gerrit.ent.cloudera.com:8080/727 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:47 -08:00
Skye Wanderman-Milne	7e8e184acf	Allow UDFs in conjunct expressions. This patch refactors HDFSScanNode to copy and prepare all conjunct exprs in Prepare(), rather than in the scanner threads. This is necessary so the UDF exprs get codegen'd. Prepare() also only codegens the functions for the necessary file formats now, rather than for all file formats regardless of what's actually be scanned. Change-Id: Ic3220cbd0cba9a3baa138b1f50ecdc6889ed0cd1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/710 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne	97a6b12e37	Fix UDFs used in partition pruning exprs. Exprs used for partition pruning are prepared/evaluated with a separate RuntimeState. If these exprs use UDFs, the runtime state needs access to the process's ExecEnv so we can use the LibCache and the IR produced by the UDF exprs needs to be optimized and jit'd. Change-Id: If7c1d6ebc0015ef3c21a0421c1a36cad4be66625 Reviewed-on: http://gerrit.ent.cloudera.com:8080/695 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne	b41ff0c8cd	Modify test-udfs.cc so there are no undefined symbols in shared library. AnalyzeDDLTest was failing because the fesupport binary couldn't resolve a function used in libTestUdfs.so (the function was defined in udf.cc, rather than udf.h). I couldn't figure out how to cleanly build udf.cc into the libTestUdfs.so, so instead I removed the use of the function in test-udfs.cc. Change-Id: I81243547584a5b49a5f9265d0d17e035e18d6110 Reviewed-on: http://gerrit.ent.cloudera.com:8080/694 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:27 -08:00
Nong Li	911cfc1bb9	Fix vararg UDFs. Change-Id: I0e202b984ece7de3d220b6ce89b0c0a4c9edcb45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/688 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:26 -08:00
Skye Wanderman-Milne	8692e7df8d	Add timestamp support to CodegenAnyVal Change-Id: I2bbeae16660709c2c15d545e6d1c791912e880db Reviewed-on: http://gerrit.ent.cloudera.com:8080/655 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:21 -08:00
Nong Li	1eb2b7a964	Add execution for vararg UDFs. Change-Id: I46e5670c09ac0b8e62f39dfc832fe880dd1dc995 Reviewed-on: http://gerrit.ent.cloudera.com:8080/572 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:09 -08:00
Skye Wanderman-Milne	b7f83bcd73	Add support for LLVM IR UDFs. This patch also adds a number of improvements to NativeUdfExpr. Highlights include: * Correctly handling the lowering of AnyVal struct types (required for ABI compatibility) * A rudimentary library cache for reusing handles produced by dlopen * More complicated test cases Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/540 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:03 -08:00
Nong Li	8963d79f51	Fix build break from UdfContext rename. Change-Id: Ia3df23fcba7d3812ae90565daab89916cbb50861 Reviewed-on: http://gerrit.ent.cloudera.com:8080/549 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:01 -08:00
Nong Li	e39de94316	Add parser/analysis to support UDAs. I looked around some and I think having create/drop/show [aggregate] function seems reasonable and extends nicely for UDTs. The create aggregate function can accept a lot of arguments. The non-essential one, I went with resolving them by name rather than position (i.e. argName="value"). I think this is better for the user than specifying it by position. The grammar is: CREATE AGGREGATE <name>(<arg_types>) RETURNS <type> [INTERMEDIATE <type>] LOCATION '/path' UpdateFn='Fn' [comment='comment'] [SerializeFn='symbol'] [MergeFn='symbol'] [InitFn='symbol'] [FinalizeFn='symbol'] The optional args at the end can be in any order. If the other symbols are not specified, we derive them from the UpdateFn symbol that's required. The analyzer would try to figure it out and fail if we can't find the derived symbol in the binary. The simplest example would be: CREATE AGGREGATE FUNCTION count(float) RETURNS BIGINT LOCATION '/path' UpdateFn='CountUpdateFn'; In which case we assume the intermediate type is the return type and the other functions are called 'CountInitFn', 'CountSerializeFn', 'CountMergeFn' 'CountFinalizeFn'. Change-Id: Iefc5741293050f5b295df28e9d1a7d039ead8675 Reviewed-on: http://gerrit.ent.cloudera.com:8080/513 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:59 -08:00
Skye Wanderman-Milne	fd99db0300	First pass at UdfExpr. Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/439 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:50 -08:00

26 Commits