Files
impala/common
Henry Robinson 8b4a748d5e IMPALA-1726: Statestore to timeout hung RPCs
If a Heartbeat() RPC appears hung, the statestore should abort that RPC
so as not to hold on to a sender thread, and to trigger the failure
detector to evict the hung node.

We could just add a TCP timeout to the client cache used by the
statestore, but doing so would mean that all RPCs were subject to the
timeout, and UpdateState() typically takes *much* longer than
Heartbeat() by design, so setting a reasonable timeout would be
impossible. Instead, this patch adds a second client cache designed only
for Heartbeat() RPCs, with an aggressive timeout of 3s by
default. (Heartbeat() usually takes ~1-2ms). A timeout for UpdateState()
is also set to avoid thread starvation, but this is much less aggressive
at 300s.

This patch also adds ClientConnection::DoRpc(), which calls an RPC and
handles various failure modes, including timeout. If DoRpc() returns an
error, the statestore handles it in the usual way, including updating
the failure detector if the failed RPC is Heartbeat().

Change-Id: I2f2462278e59581937c9c10910625d2724a11efa
Reviewed-on: http://gerrit.cloudera.org:8080/206
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-03-20 14:37:14 -07:00
..