Commit Graph

3 Commits

Author SHA1 Message Date
stiga-huang
90944d7340 IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
Recently, we see many timeout failures of test_concurrent_ddls.py in S3
builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful
to dump the server stacktraces so we can understand why some RPCs are
slow/stuck.

This patch extracts the logic of dumping stacktraces in
script-timeout-check.sh to a separate script, dump-stacktraces.sh.
The script also dumps jstacks of HMS and NameNode. Dumping all these
stacktraces is time-consuming so we do them in parallel, which also
helps to get consistent snapshots of all servers.

When any tests in test_concurrent_ddls.py timeout, we use
dump-stacktraces.sh to dump the stacktraces before exit. Previously,
some tests depend on pytest.mark.timeout for detecting timeouts. It's
hard to add a customized callback for dumping server stacktraces. So
this patch refactors test_concurrent_ddls.py to only use timeout of
multiprocessing.

Tests:
 - Tested the scripts locally.
 - Verified the error handling of timeout logics in Jenkins jobs

Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Reviewed-on: http://gerrit.cloudera.org:8080/16800
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-03 08:05:23 +00:00
stiga-huang
482c769b09 IMPALA-9196: Dump jstack and collect logs when tests timeout
This patch augments script-timeout-check.sh to also dump the jstack of
FE when tests timeout.

Tests:
 - Manually test the script with sudo privilege
 - Tested the script in private Jenkins jobs

Change-Id: Ib8a5b140024c236209c7e44149660189890b9d06
Reviewed-on: http://gerrit.cloudera.org:8080/14794
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-14 19:50:38 +00:00
Joe McDonnell
fc4a91cf8c IMPALA-9165: Add timeout for create-load-data.sh
This converts the existing bin/run-all-tests-timeout-check.sh
to a more generic bin/script-timeout-check.sh. It uses this
new script for both bin/run-all-tests.sh and
testdata/bin/create-load-data.sh. The new script takes two
arguments:
 -timeout : timeout in minutes
 -script_name : name of the calling script
The script_name is used in debugging output / output filenames
to make it clear what timed out.

The run-all-tests.sh timeout remains the same.
testdata/bin/create-load-data.sh uses a 2.5 hour timeout.
This should help debug the issue in IMPALA-9165, because at
least the logs would be preserved on the Jenkins job.

Testing:
 - Tested the timeout script by hand with a caller script that
   sleeps longer than the timeout
 - Ran a gerrit-verify-dryrun-external

Change-Id: I19d76bd8850c7d4b5affff4d21f32d8715a382c6
Reviewed-on: http://gerrit.cloudera.org:8080/14741
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2019-11-20 21:59:26 +00:00