Commit Graph

26 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
6ac9a8104b IMPALA-1009: UDF/UDA leaks should not fail queries
With this change, leaky UDFs built with the SDK will still fail when
using the test harness, but leaky UDFs running in Impala will only
trigger a warning. This change also updates the test infrastructure to
always check for non-fatal errors/warnings.

Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938
2014-06-10 21:46:47 -07:00
Victor Bittorf
09aff77a6c IMPALA-943: removed database udf_test from front-end tests
Added CATCH section to test files.

Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912
2014-06-09 19:06:15 -07:00
Skye Wanderman-Milne
c8b2017093 Add decimal UDF/UDA support.
Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739
2014-05-29 20:49:53 -07:00
Skye Wanderman-Milne
bd2fc2d1d4 IMPALA-934: Refresh cached UDF library when creating a new function
This change adds the ability to refresh a local cache entry, causing
the old cache entry to be dropped and the library to be reloaded from
HDFS. This is used in ResolveSymbolLookup(), which is called by the
frontend when creating a new a function, and in ImpalaServer when
receiving a "create function" heartbeat. This change also makes sure
the FE calls into the backend for jars, so jars get refreshed as well.

Change-Id: I5fd61c1bc2e04838449335d5a68b61af8b101b01
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2286
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e8587794b3b82438190c91b2ebe9d1e12db73981)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2348
2014-04-24 19:39:16 -07:00
Lenni Kuff
bb09b5270f IMPALA-839: Update tests to be more thorough when run exhaustively
Some tests have constraints that were there only to help reduce runtime which
reduces coverage when running in exhaustive mode. The majority of the constraints
are because it adds no value to run the test across additional dimensions (or
it is invalid to run with those dimensions). Updates the tests that have
legitimate constraints to use two new helper methods for constraining the table format
dimension:
create_uncompressed_text_dimension()
create_parquet_dimension()

These will create a dimension that will produce a single test vector, either
uncompressed text or parquet respectively.

Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290
2014-04-18 20:11:31 -07:00
Alex Behm
2fff51d9e9 IMP-1329,IMPALA-924: Make ExchangeNode::Open() block until rows are available.
The bug: Coordinator::Wait() is supposed to block until rows become available for
consumption by the client. We rely on Wait() to determine when to advance the query
status to a 'ready' state and signal to the client that rows can be fetched.
Long fetch times can trigger client timeouts at various levels (socket, app, etc.).
Coordinator::Wait() simply opens the coordinator fragment's plan tree.
For most plan nodes, Open() does work to prepare the plan tree, s.t., GetNext()
returns quickly. However, for ExchangeNodes Open() used to not wait
until rows are obtained form the underlying stream receiver.
The fix: Make ExchangeNode::Open() block until rows are available.

Change-Id: I7b197eea11d21fd732414d96c899a17b2d99631c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2185
2014-04-10 23:49:38 -07:00
Skye Wanderman-Milne
8e9776b824 Mark TestUdfs.test_mem_limits to run serially
This was causing other tests to fail with process mem limit exceeded.

Change-Id: I1407b0896052aece691c681827994961b09d8103
(cherry picked from commit 2bcc46117f504f50ded724fddf74f24bd829c6c6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2003
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-03-19 14:18:11 -07:00
Skye Wanderman-Milne
3e728f3180 Symbol mangling for UDF prepare/close functions
Change-Id: If8f1386073f467e66ada74e606fc98f3344f0733
(cherry picked from commit 32df8b3f963a2b46ec33aad86a151d4c7ecda39c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1993
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-03-19 02:15:07 -07:00
Skye Wanderman-Milne
44125729dc UDF/UDA memory management improvements
* AggFnEvaluator now uses the UDF mem pool (I'm planning to change
  this to per-exec node pools in the expr refactoring)
* FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker
* Added FunctionContextImpl::Close() which sets warnings for leaked allocations

Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954
Tested-by: jenkins
2014-03-17 20:38:25 -07:00
Lenni Kuff
23c619f794 Limit test_udfs to always run with a single exec_option test vector
Change-Id: If3ff1f5f17a95cce88282f9dc165fe5ce85200b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1781
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1811
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-07 18:44:11 -08:00
Skye Wanderman-Milne
6ceed1e632 UDF API additions
This patch introduces the ability to specify a prepare and close
function for a UDF, as well as FunctionContext methods for maintaining
state across UDF invocations within a query. Many of the changes are
related to adding an Expr::Open() function which calls the UDF's
prepare function, if specified (it has to be called in Open() since
the LLVM module must be compiled first).

Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757
2014-03-05 07:32:34 -08:00
Skye Wanderman-Milne
203fc66456 Add GetTypeDesc() method to FunctionContext.
This is currently only implemented for NativeUdfExpr.

Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720
2014-03-01 20:24:30 -08:00
Nong Li
904ae86e82 IMPALA-626: Allow dropping functions while it is running.
Change-Id: Ia9d6fa1daadddbd05961696d13b9ff43fef2da61
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1621
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-02-20 13:12:10 -08:00
Lenni Kuff
5f027f61c5 IMPALA-800 / IMPALA-795: Check catalog version before removing entries from the lib cache
There was an issue with the lib cache cleanup code where if a function were dropped
then re-created we might incorrectly remove the new functions's library from the cache.

Consider these statements executed in quick succession:
1) create function fn()
2) drop function fn()
3) create function fn()
4) select fn() ...

Since we perform direct-DDL and immediately apply the result of a DDL operation to the
local impalad catalog, steps 1-4 may complete before a statestore catalog update with the
drop from step 2) is received. When the statestore heartbeat with the drop is received, we
incorrectly removed the new function's lib cache entry while the select statement was
executing, causing the crash.

The fix for this problem is to verify the catalog versions to ensure we only drop items
that have a catalog version <= the catalog version the drop corresponds to.

Change-Id: I7dd1886bf24740cb41f1315ecbb540e38d9ad363
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1552
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1576
2014-02-17 17:56:49 -08:00
Skye Wanderman-Milne
3598395290 Set sync_ddl=true for tests that drop functions.
This is a temporary "fix" for IMPALA-795 to unblock the build. The
actual fix should prevent a dropped and re-created function from being
re-dropped by an old catalog update.

Change-Id: Id9dc36a8ecd5e7d1a1146ad0ac092ae12cb33529
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1547
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 80439d638a4ac02cedfe1490556b176cd818429f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1559
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-02-14 10:44:54 -08:00
Nong Li
3722711a06 IMPALA-800 workaround. Mark test_libs_with_same_filename as serial.
This test will drop functions in a binary used by the other UDF
tests. That triggers IMPALA-800.

Change-Id: I8e6f1ad5b4a7ece2d891559751142f0c12e07c3c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1556
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit 95100e0bdfd9472183fcc7cd8636666d5b654a37)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1558
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-02-14 00:22:08 -08:00
Nong Li
80d4fd958e IMPALA-786: Drop function should clear library cache.
We were previously only clearing the cache in the catalog service
update loop so the impalad the drop was issued to was not doing the
right thing.

Change-Id: I6bee228e8c0d565cea4ea61cbf64240d83a45a7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1511
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-02-10 18:51:39 -08:00
Skye Wanderman-Milne
b54d16dabd IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions.
Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
39f77b8b8f Add support for cluster-synchronized catalog operations
This change adds support for cluster-synchronized catalog operations. This provides the
guaranteethat after a catalog op completes, all other subscribers to the catalog topic have
also processed that update. This is useful when load balancing, because a common workflow
is to target a different impalad for each statement executed.
For example if each of the following were executed sequentially, but targeting
a different node:
1) CREATE TABLE Foo
2) INSERT INTO Foo
3) SELECT * FROM Foo
4) INSERT INTO Foo ....

Since both the INSERT and the CREATE update the catalog, it would not work as expected
without this patch. The user might either get a "table not found" error or would be
missing partition information from the INSERT.

The downside is that this approach to DDL takes a bit longer because we need to wait
until all subscribers have processed an update. If all nodes are healthy, this overhead
should not be significantly longer than the current DDL time. However, a single bad node
might slow down or completely block the completion of all DDL operations. By default
this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1

To test this, the base test suite was updated to support selecting a random impalad
to execute each query section in a query test file. This is currently only enabled
for the insert and DDL tests, but could be leveraged by more tests in the future.

TODO: Add additional failure tests around this functionality.
TODO: Add an explicit "sync" statement so users do not need to run all their DDL
in this mode (since it is slower).

Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/899
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:58 -08:00
Skye Wanderman-Milne
9d05d6d03a Allow UDF tests to run in parallel.
Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/727
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:47 -08:00
Nong Li
4800995d44 Add execution for Hive UDFs.
Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500
Reviewed-on: http://gerrit.ent.cloudera.com:8080/458
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:25 -08:00
Nong Li
904289d168 Add UDA execution.
Change-Id: Ie5aab79742675fc62ed731c13abe83304df80991
Reviewed-on: http://gerrit.ent.cloudera.com:8080/642
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:24 -08:00
Nong Li
6b9a7de02e Add symbol resolution during analysis for create function stmts.
Before this, we had to specify the entire mangled symbol. This can be quite
long and quite tedious (take a look at some of the create UDA test cases that
specify all the symbols).

This patch adds some code to convert from the user function signature to the
mangled name. This means the user can specify the unmangled name and we can
do the symbol lookup. The mangling rules are pretty convoluted but if it is
messed up, the user can always specify the full symbol.

Some other minor cleanup in:
  - JNI from FE to BE
  - UDFs/UDAs that are loaded as test data

Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/624
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:20 -08:00
Skye Wanderman-Milne
b7f83bcd73 Add support for LLVM IR UDFs.
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:

* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases

Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:03 -08:00
Skye Wanderman-Milne
cf7ed25377 Fix UDF test, take two
Change-Id: I817389d94dab665199d2c1b7365e8ce0d1495c41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/504
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:53 -08:00
Skye Wanderman-Milne
fd99db0300 First pass at UdfExpr.
Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/439
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:50 -08:00