1 Commits

Author SHA1 Message Date
Joe McDonnell
71818c673b IMPALA-13253: Add option to enable keepalive for client connections
Client connections can drop without an explicit close. This can
happen if client machine resets or there is a network disruption.
Some load balancers have an idle timeout that result in the
connection becoming invalid without an explicit teardown. With
short idle timeouts (e.g. AWS LB has a timeout of 350 seconds),
this can impact many connections.

This adds startup options to enable / tune TCP keepalive settings for
client connections:
client_keepalive_probe_period_s - idle time before doing keepalive probes
  If set to > 0, keepalive is enabled.
client_keepalive_retry_period_s - time between keepalive probes
client_keepalive_retry_count - number of keepalive probes
These startup options mirror the startup options for Kudu's
equivalent functionality.

Thrift has preexisting support for turning on keepalive, but that
support uses the OS defaults for keepalive settings. To add the
ability to tune the keepalive settings, this implements a wrapper
around the Thrift socket (both TLS and non-TLS) and manually sets
the keepalive options on the socket (mirroring code from Kudu's
Socket::SetTcpKeepAlive).

This does not enable keepalive by default to make it easy to backport.
A separate patch will turn keepalive on by default.

Testing:
 - Added a custom cluster test that connects with impala-shell
   and verifies that the socket has the keepalive timer.
   Verified that it works on Ubuntu 20, Centos 7, and Redhat 8.
 - Used iptables to manually test cases where the client is unreachable
   and verified that the server detects that and closes the connection.

Change-Id: I9e50f263006c456bc0797b8306aa4065e9713450
Reviewed-on: http://gerrit.cloudera.org:8080/22254
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-16 16:45:27 +00:00