mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
IMPALA-2605: prevent long-running child processes from keeping TCP connection open
The problem: By default, all file descriptors opened by a process, including sockets, are inherited by any forked child processes. This includes the connection socket created at the beginning of each test in ImpalaTestSuite.setup_class(). In TestHiveMetaStoreFailure.test_hms_service_dies(), the Hive Metastore is stopped and restarted, meaning the metastore in now a child process of the test process. This causes the client connection not to be closed when the parent process (the test) exits, meaning that one of a finite number of connections (64) to Impala is left permanently in use. This would be barely noticeable except run-tests.py runs the mini stress test with 4 * <num CPUs> concurrent clients by default. On our build machines, this is 64 clients, which is also the default max number of connections for an impalad. When a test process tries to make the 65th connection (since the leaked connection is still there), it blocks until a connection is freed up. Due to a quirk of the xdist py.test plugin that I don't fully understand, the test framework will not clean up test classes (and close the connections) until a number of tests complete, causing the test process to deadlock. The solution: use the close_fds argument to make sure the TCP socket is closed in the spawned child process. This is also done in CustomClusterTestSuite._start_impala_cluster() when it starts the new cluster. This patch also switches test_hms_failure.py to use check_call() instead of call(), and explicitly caps the number of stress clients at 64. Change-Id: I03feae922883a0624df1422ffb6ba5f1d83fb869 Reviewed-on: http://gerrit.cloudera.org:8080/1853 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins
This commit is contained in:
committed by
Internal Jenkins
parent
d3ab94162c
commit
9c4eb9fc61
@@ -40,9 +40,13 @@ NUM_CONCURRENT_TESTS = multiprocessing.cpu_count()
|
||||
if 'NUM_CONCURRENT_TESTS' in os.environ:
|
||||
NUM_CONCURRENT_TESTS = int(os.environ['NUM_CONCURRENT_TESTS'])
|
||||
|
||||
# Default the number of stress clinets to 4x the number of CPUs
|
||||
# Default the number of stress clinets to 4x the number of CPUs (but not exceeding the
|
||||
# default max # of concurrent connections)
|
||||
# This can be overridden by setting the NUM_STRESS_CLIENTS environment variable.
|
||||
NUM_STRESS_CLIENTS = multiprocessing.cpu_count() * 4
|
||||
# TODO: fix the stress test so it can start more clients than available connections
|
||||
# without deadlocking (e.g. close client after each test instead of on test class
|
||||
# teardown).
|
||||
NUM_STRESS_CLIENTS = min(multiprocessing.cpu_count() * 4, 64)
|
||||
if 'NUM_STRESS_CLIENTS' in os.environ:
|
||||
NUM_STRESS_CLIENTS = int(os.environ['NUM_STRESS_CLIENTS'])
|
||||
|
||||
|
||||
Reference in New Issue
Block a user