mirror of
https://github.com/apache/impala.git
synced 2025-12-31 06:02:51 -05:00
This patch fixes a slightly pathological state that occurs when the statestore is under heavy load. The result of the bug is that subscribers cannot successfully re-register because the statestore never marks them as failed. The exact sequence of events is as follows: 1. Subscriber registers with state-store. 2. Statestore does not send heartbeats in timely fashion to subscriber. Subscriber times-out. 3. Subscriber is restarted quickly. Statestore does not detect restart. 4. Subscriber's RegisterSubscriber() call fails, because statestore detects duplicate registration. 5. Subscriber restarts again. Since state-store is slow to send heartbeats, the state-store has not detected the restart and the subscriber receives a heartbeat message from the statestore and does not reject it. 6. Statestore continues to believe subscriber is alive, since the heartbeats are not being rejected. To fix this, we add a registration ID to each successfully registered subscriber that is known to both subscriber and statestore. If the subscriber should restart and re-register, it receives a new registration ID. Whenever a heartbeat arrives, it compares its registration ID to that sent by the statestore with the heartbeat, and rejects the heartbeat if they do not match. We also allow re-registration of existing subscribers (getting rid of the dreaded "Duplicate subscription" message). A new registration overwrites an old one. Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe Reviewed-on: http://gerrit.ent.cloudera.com:8080/778 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>
43 lines
1.7 KiB
Python
Executable File
43 lines
1.7 KiB
Python
Executable File
#!/usr/bin/env python
|
|
# Copyright (c) 2012 Cloudera, Inc. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
#
|
|
# Utility functions related to executing shell commands.
|
|
import logging
|
|
import shlex
|
|
from subprocess import Popen, PIPE
|
|
|
|
logging.basicConfig(level=logging.ERROR, format='%(threadName)s: %(message)s')
|
|
LOG = logging.getLogger('shell_util')
|
|
LOG.setLevel(level=logging.DEBUG)
|
|
|
|
def exec_process(cmd):
|
|
"""Executes a subprocess, waiting for completion. The process exit code, stdout and
|
|
stderr are returned as a tuple."""
|
|
LOG.debug('Executing: %s' % (cmd,))
|
|
# Popen needs a list as its first parameter. The first element is the command,
|
|
# with the rest being arguments.
|
|
p = exec_process_async(cmd)
|
|
stdout, stderr = p.communicate()
|
|
rc = p.returncode
|
|
return rc, stdout, stderr
|
|
|
|
def exec_process_async(cmd):
|
|
"""Executes a subprocess, returning immediately. The process object is returned for
|
|
later retrieval of the exit code etc. """
|
|
LOG.debug('Executing: %s' % (cmd,))
|
|
# Popen needs a list as its first parameter. The first element is the command,
|
|
# with the rest being arguments.
|
|
return Popen(shlex.split(cmd), shell=False, stdout=PIPE, stderr=PIPE)
|