mirror of
https://github.com/apache/impala.git
synced 2026-01-08 21:03:01 -05:00
As originally envisaged, the statestore was not intended to transmit GBs of data to large clusters. Now that it is, some changes to the design are required to improve stability and responsiveness. One of the most acute problems is the way that the topic updates, which may be extremely large, were responsible for conveying failure detection information. The delay in sending topic heartbeats (which were often blocked behind a queue of other updates) meant that large clusters needed to set large timeouts for every subscriber, and to reduce the frequency at which topic updates were sent. This change adds a second heartbeat type to the statestore which is responsibie only for coordinating failure information between the subscriber and the statestore. These 'keep-alive' heartbeats are sent much more frequently, as they have a tiny payload and do very little work. The statestore now does not use the result of topic updates to maintain its view of which subscribers are alive or dead, but only the results of the keep-alive heartbeats. I've tested these changes with a 500MB catalog on a 73 node cluster. Keep-alive frequencies are nice and stable (at 500ms out of the box) even during the initial large topic-update distribution phase, or after a statestore failure. Internally, this patch changes the nomenclature for the statestore: we stay away from 'heartbeat' (except for the failure detector, which is heartbeat-based), and instead use 'message' as a generic term for both keep-alive and topic update. Externally, we need to rationalise the flag names to control both update types. The benefit of this change is that only the keep-alive frequency matters for cluster stability, and that depends only on the cluster size, not on the size of the catalog and the cluster size. So we can set it to, e.g., 1s out of the box with some confidence that that will work for clusters up to ~200 nodes. The topic update frequency may actually be set aggressively, because the statestore will naturally throttle itself during times of heavy traffic and nothing will time out. Change-Id: Ia447e4ebefda890a5b810a213e97f00cca499989 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4003 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com> Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5138