IMPALA-12643 (part 1): Limit memory consumption for resolve_minidumps.py

On some platforms (Centos 7), resolve_minidumps.py's call to
minidump_stackwalk goes haywire and uses all the system memory
until it gets OOM killed. Some library must have corrupt
symbols, etc. As a workaround, this detects whether the
prlimit utility is present and uses this to run minidump_stackwalk
with a 4GB limit on virtual memory. This kills the process
earlier and avoids using all system memory.

Testing:
 - Verified that bin/jenkins/finalize.sh uses resolve_minidumps.py
   on a Redhat 8 Jenkins job (and it works)
 - Verified that bin/jenkins/finalize.sh works properly on
   my Ubuntu 20 box
 - Ran a Jenkins job on Centos 7 and verified that the prlimit
   code kills minidump_stackwalk when it uses 4GB of memory.

Change-Id: I4db8facb8a037327228c3714e047e0d1f0fe1d94
Reviewed-on: http://gerrit.cloudera.org:8080/20862
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This commit is contained in:
Joe McDonnell
2023-12-15 16:55:32 -08:00
parent c3f875eac4
commit c0a015fdac

View File

@@ -32,7 +32,7 @@
# that were used by the binary. It gets the symbols for all
# those libraries and resolves the minidump.
#
# Usage: resolve_minidump.py --minidump_file [file] --output_file [file]
# Usage: resolve_minidumps.py --minidump_file [file] --output_file [file]
# (optional -v or --verbose for more output)
import errno
@@ -288,9 +288,22 @@ def dump_symbols_for_all_modules(dump_syms, objcopy, module_list, out_dir):
def resolve_minidump(minidump_stackwalk, minidump_path, symbol_dir, verbose, out_file):
minidump_stackwalk_cmd = [minidump_stackwalk, minidump_path, symbol_dir]
# There are circumstances where the minidump_stackwalk can go wrong and become
# a runaway process capable of using all system memory. If the prlimit utility
# is present, we use it to apply a limit on the memory consumption.
#
# See if we have the prlimit utility
check_prlimit = subprocess.run(["prlimit", "-V"], stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL)
if check_prlimit.returncode == 0:
# The prlimit utility is available, so wrap the minidump_stackwalk command
# to apply a 4GB limit on virtual memory. In normal operations, 4G is plenty.
prlimit_wrapper = ["prlimit", "--as={0}".format(4 * 1024 * 1024 * 1024)]
minidump_stackwalk_cmd = prlimit_wrapper + minidump_stackwalk_cmd
with open(out_file, "w") as out_f:
stderr_output = None if verbose else subprocess.DEVNULL
subprocess.run([minidump_stackwalk, minidump_path, symbol_dir], stdout=out_f,
subprocess.run(minidump_stackwalk_cmd, stdout=out_f,
stderr=stderr_output, check=True)