Commit Graph

13 Commits

Author SHA1 Message Date
Nong Li
a25400c94e Increase timeout in test_rows_availability to make sure query state is what we expect.
Change-Id: Id4feebcc7b7cecb07555009219e6420e48a0c82b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3534
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3579
2014-07-22 12:12:13 -07:00
Lenni Kuff
6afea60704 Update test logging to print executable SQL statements and log all actions executed
This is the first step in cleaning up the test logging. It provides a common connection
interface that provides tracing around all operations. When a test fails the output will
be executable SQL. It also logs actions such as when a connection is opened, close, or
when an operation is cancelled. Currently only beeswax connections are supported, but
I have a seperate patch that adds support for executing using HS2 as well as Beeswax.

Example of new logging:
-- connecting to: localhost:21000
-- executing against localhost:21000
use functional;

SET disable_codegen=False;
SET abort_on_error=1;
SET batch_size=0;
SET num_nodes=0;

-- executing against localhost:21000
select a.timestamp_col from alltypessmall a inner join alltypessmall b on
(a.timestamp_col = b.timestamp_col)
where a.year=2009 and a.month=1 and b.year=2009 and b.month=1;
-- closing connection to: localhost:21000

Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:40 -08:00
Henry Robinson
9bc840dc85 Support for custom cluster configurations in some tests
Test suites that derive from common.CustomClusterTestSuite have a brand
new cluster for every tests case, which they can configure as they wish
with custom arguments using the @with_args() decorator.

A future improvement is to optionally only have one cluster per test
suite, to allow multiple tests to run more quickly if they share
configuration options.

Change-Id: I6abd5740e644996d7ca2800edf4ff11b839d1bc4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/882
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:57 -08:00
Henry Robinson
f241782966 IMPALA-620: Fix re-registration starvation bug in statestore
This patch fixes a slightly pathological state that occurs when the
statestore is under heavy load. The result of the bug is that
subscribers cannot successfully re-register because the statestore never
marks them as failed.

The exact sequence of events is as follows:

1. Subscriber registers with state-store.
2. Statestore does not send heartbeats in timely fashion to
   subscriber. Subscriber times-out.
3. Subscriber is restarted quickly. Statestore does not detect
   restart.
4. Subscriber's RegisterSubscriber() call fails, because statestore
   detects duplicate registration.
5. Subscriber restarts again. Since state-store is slow to send
   heartbeats, the state-store has not detected the restart and the
   subscriber receives a heartbeat message from the statestore and
   does not reject it.
6. Statestore continues to believe subscriber is alive, since the
   heartbeats are not being rejected.

To fix this, we add a registration ID to each successfully registered
subscriber that is known to both subscriber and statestore. If the
subscriber should restart and re-register, it receives a new
registration ID. Whenever a heartbeat arrives, it compares its
registration ID to that sent by the statestore with the heartbeat, and
rejects the heartbeat if they do not match.

We also allow re-registration of existing subscribers (getting rid of
the dreaded "Duplicate subscription" message). A new registration
overwrites an old one.

Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/778
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:53:53 -08:00
Lenni Kuff
2336ed99a4 Re-enable process failure tests + add simple failure tests for catalogd
This brings back online the process failure tests and adds a basic failure
test for the catalog service. The timeouts had to be adjusted to account for the
extra time it takes to load the the catalog and also there is an additional state
store subscriber. Note: the statestore 'live.backends' metric which is used in these
tests needs to be renamed, it really means 'live.subscribers'. However, it requires some
coordination with other teams to make the change.
Also updated start-impala-cluster to check the catalog.ready flag to ensure the impalad
catalog is ready to accept queries.

Change-Id: If22e25dba7dc83aa40bec937b5f82b815bed4645
Reviewed-on: http://gerrit.ent.cloudera.com:8080/730
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:52 -08:00
Lenni Kuff
a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00
Alex Behm
e4a24c8c1d Fixed the process failure test that was failing due to a race in
reading/writing a query's profile web page.

Change-Id: Ibf4a27aa17eb6439630d1616c2c719fc1ee2ba4e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/553
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:03 -08:00
Alex Behm
6a1cc58936 IMPALA-491: Improve error message when queries are cancelled due to BE nodes dying.
Change-Id: If9a47d9021b08385743093fbe8054b48119eaff9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/523
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:58 -08:00
Lenni Kuff
8095a963d0 Enable statestore to send delta updates
This change adds support for the statestore to send delta updates. It also updates
the scheduler to properly handle receiving delta and non-delta updates.

With this change, the statestore will keeps a versioned log of the history of updates to
each topic, with each topic update getting a sequentially increasing version number that
is unique across the topic.
Subscribers also track the max version of each topic which they have have successfully
processed. The statestore can use this information to send a delta of updates to a
subscriber, rather than all items in the topic. For non-delta updates, the statestore
will send an update that includes all values in the topic.

Additionally, if a client has received an unexpected delta update version range, they
can request a new delta update by setting the "from_version" field of the TTopicDelta
heartbeat response. The next state-store update will be based off of this version the
client responded with.

Still left to do - Garbage collect un-needed deletions (those which have a version
number smaller than min(subscriber.version_number))

Change-Id: I53be0a647d7f7f3f8aea0d13cd9dc219411a1e5a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/217
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:45 -08:00
Lenni Kuff
fa3c927b25 Fix test to wait for impalad num_live_backends=CLUSTER_SIZE after statestore restart 2014-01-08 10:51:30 -08:00
Lenni Kuff
9a4feb7391 Add impala local failure test framework and tests 2014-01-08 10:50:18 -08:00
ishaan
15658f384b Include targeted performance tests in experiments and add a new query 2014-01-08 10:49:02 -08:00
Lenni Kuff
4bd6279646 Add targeted failure injection test suite using debug failpoints
Change-Id: I013e913ad3c89f44524bf19638a1da2b83df7463
2014-01-08 10:47:54 -08:00