Commit Graph

9 Commits

Author SHA1 Message Date
Lenni Kuff
51f003a785 IMP-1156: Add CatalogServer API for listing all UDFs and UDAs in a database
Adds a new client API for retrieving all user defined functions (aggregate and scalar)
in a database. This is a requirement from CM Backup Disaster and Recovery.

Change-Id: I4e33d714795fe808370262f36218ea112f67ec30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-14 00:01:25 -08:00
Henry Robinson
f241782966 IMPALA-620: Fix re-registration starvation bug in statestore
This patch fixes a slightly pathological state that occurs when the
statestore is under heavy load. The result of the bug is that
subscribers cannot successfully re-register because the statestore never
marks them as failed.

The exact sequence of events is as follows:

1. Subscriber registers with state-store.
2. Statestore does not send heartbeats in timely fashion to
   subscriber. Subscriber times-out.
3. Subscriber is restarted quickly. Statestore does not detect
   restart.
4. Subscriber's RegisterSubscriber() call fails, because statestore
   detects duplicate registration.
5. Subscriber restarts again. Since state-store is slow to send
   heartbeats, the state-store has not detected the restart and the
   subscriber receives a heartbeat message from the statestore and
   does not reject it.
6. Statestore continues to believe subscriber is alive, since the
   heartbeats are not being rejected.

To fix this, we add a registration ID to each successfully registered
subscriber that is known to both subscriber and statestore. If the
subscriber should restart and re-register, it receives a new
registration ID. Whenever a heartbeat arrives, it compares its
registration ID to that sent by the statestore with the heartbeat, and
rejects the heartbeat if they do not match.

We also allow re-registration of existing subscribers (getting rid of
the dreaded "Duplicate subscription" message). A new registration
overwrites an old one.

Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/778
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:53:53 -08:00
Lenni Kuff
13605ad834 Support catalogd in ImpalaCluster test library
Adds basic support for catalogd to our ImpalaCluster test library/object model.
This will allow us to write more programatic tests targeting the catalogd process
including process failure tests and metric check validators.

Change-Id: I8e5f7bc73f999f105437c6d3d52c6d436a354d2d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/617
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:16 -08:00
Alex Behm
6a1cc58936 IMPALA-491: Improve error message when queries are cancelled due to BE nodes dying.
Change-Id: If9a47d9021b08385743093fbe8054b48119eaff9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/523
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:58 -08:00
ishaan
21aa5b547a Catch unknown process id exception in impala_cluster to fix intermittent process test failures. 2014-01-08 10:51:56 -08:00
ishaan
e7c6d57f9c IMP-773: Add better logging/error detection to start-impala-cluster.py 2014-01-08 10:51:25 -08:00
Lenni Kuff
9a4feb7391 Add impala local failure test framework and tests 2014-01-08 10:50:18 -08:00
Alan Choi
251a8a2bf1 IMP-57: rename fe_port to beeswax_port 2014-01-08 10:47:53 -08:00
Lenni Kuff
b3fce13b1d Initial Impala failure testing library + modularize run-workload
This adds initial changes for the Impala failure testing library. It also refactors
run workload into its own module to it can be used in other tests.

The failure testing has two main components - the first is an object model on top on top
of Impala services in a cluster. This allows for enumerating the serivces in the cluster
and executing commands on remote machines. This initial cut is built on top of the
CM service to help with starting/stopping services. The long term goal is to let this run
on both a CM cluster and non-CM cluster as well as locally.

The other part of the failure injection change is failure_inctor module that uses the
Impala service abstraction to select and inject failures into random impala services.

This failure testing framework hasn't been completely validated because the product code
is not yet ready, but it is important to get this checked in so all new changes to
run-workload are based off this refactor.

Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5
2014-01-08 10:46:16 -08:00