mirror of
https://github.com/apache/impala.git
synced 2026-01-28 00:04:45 -05:00
If a Heartbeat() RPC appears hung, the statestore should abort that RPC so as not to hold on to a sender thread, and to trigger the failure detector to evict the hung node. We could just add a TCP timeout to the client cache used by the statestore, but doing so would mean that all RPCs were subject to the timeout, and UpdateState() typically takes *much* longer than Heartbeat() by design, so setting a reasonable timeout would be impossible. Instead, this patch adds a second client cache designed only for Heartbeat() RPCs, with an aggressive timeout of 3s by default. (Heartbeat() usually takes ~1-2ms). A timeout for UpdateState() is also set to avoid thread starvation, but this is much less aggressive at 300s. This patch also adds ClientConnection::DoRpc(), which calls an RPC and handles various failure modes, including timeout. If DoRpc() returns an error, the statestore handles it in the usual way, including updating the failure detector if the failed RPC is Heartbeat(). Change-Id: I2f2462278e59581937c9c10910625d2724a11efa Reviewed-on: http://gerrit.cloudera.org:8080/206 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins