Files
impala/tests/statestore/test_statestore.py
Henry Robinson f241782966 IMPALA-620: Fix re-registration starvation bug in statestore
This patch fixes a slightly pathological state that occurs when the
statestore is under heavy load. The result of the bug is that
subscribers cannot successfully re-register because the statestore never
marks them as failed.

The exact sequence of events is as follows:

1. Subscriber registers with state-store.
2. Statestore does not send heartbeats in timely fashion to
   subscriber. Subscriber times-out.
3. Subscriber is restarted quickly. Statestore does not detect
   restart.
4. Subscriber's RegisterSubscriber() call fails, because statestore
   detects duplicate registration.
5. Subscriber restarts again. Since state-store is slow to send
   heartbeats, the state-store has not detected the restart and the
   subscriber receives a heartbeat message from the statestore and
   does not reject it.
6. Statestore continues to believe subscriber is alive, since the
   heartbeats are not being rejected.

To fix this, we add a registration ID to each successfully registered
subscriber that is known to both subscriber and statestore. If the
subscriber should restart and re-register, it receives a new
registration ID. Whenever a heartbeat arrives, it compares its
registration ID to that sent by the statestore with the heartbeat, and
rejects the heartbeat if they do not match.

We also allow re-registration of existing subscribers (getting rid of
the dreaded "Duplicate subscription" message). A new registration
overwrites an old one.

Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/778
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:53:53 -08:00

26 lines
918 B
Python
Executable File

#!/usr/bin/env python
# Copyright (c) 2012 Cloudera, Inc. All rights reserved.
#
import pytest
import os
from tests.common.impala_test_suite import ImpalaTestSuite
from tests.common.impala_cluster import Process
class SimpleSubscriberProcess(Process):
"""Runs a subscriber binary that registers with the statestore and immediately exits,
indicating its sucesss in the exit code"""
def __init__(self):
binary_path = os.path.join(
os.environ['IMPALA_HOME'], "be/build/debug/statestore/statestore-test-client")
Process.__init__(self, [binary_path])
class TestStatestore(ImpalaTestSuite):
def test_subscriber_restart(self):
"""Start several clients with the same subscriber ID to confirm that re-registration
after a process restart works correctly (see IMPALA-620)"""
s = SimpleSubscriberProcess()
for i in xrange(5):
s.start()
rc, _, _ = s.wait()
assert rc == 0