38 Commits

Author SHA1 Message Date
Csaba Ringhofer
f98b697c7b IMPALA-13929: Make 'functional-query' the default workload in tests
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.

All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.

The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.

get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.

Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-08 07:12:55 +00:00
Joe McDonnell
19678ae65c IMPALA-13431: Deflake TestLogging.test_excessive_cerr_ignore_pid
On a couple UBSAN runs, test_excessive_cerr_ignore_pid sometimes
fails find the message providing the next path in the last line
of the ERROR log file. The logs aren't preserved, so we don't know
the exact contents of the log file.

This does two things:
1. It changes the test to preserve the log file on failure by
   copying from the temporary directory to a directory that
   will last past the end of the test. This gives us data to
   work with if we see this again.
2. A theory is that an extra line or two of logging could go to
   the file after it writes the message with the next path.
   This changes the test to check the last 3 lines of the log
   file for the message providing the next path.

Testing:
 - Ran test with UBSAN

Change-Id: I4745184e983ee5669822059289aab18caf0b72a9
Reviewed-on: http://gerrit.cloudera.org:8080/21926
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-10-15 23:15:00 +00:00
Riza Suminto
9c87cf41bf IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite
There are many custom cluster tests that require creating temporary
directory. The temporary directory typically live within a scope of test
method and cleaned afterwards. However, some test do create temporary
directory directly and forgot to clean them afterwards, leaving junk
dirs under /tmp/ or $LOG_DIR.

This patch unify the temporary directory management inside
CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in
CustomClusterTestSuite.with_args() that list tmp dirs to create.
'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept
formatting pattern that is replaceable by a temporary dir path, defined
through 'tmp_dir_placeholders'.

There are few occurrences where mkdtemp is called and not replaceable by
this work, such as tests/comparison/cluster.py. In that case, this patch
change them to supply prefix arg so that developer knows that it comes
from Impala test script.

This patch also addressed several flake8 errors in modified files.

Testing:
- Pass custom cluster tests in exhaustive mode.
- Manually run few modified tests and observe that the temporary dirs
  are created and removed under logs/custom_cluster_tests/ as the tests
  go.

Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5
Reviewed-on: http://gerrit.cloudera.org:8080/21836
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-10-02 01:25:39 +00:00
Riza Suminto
3381fbf761 IMPALA-12595: Allow automatic removal of old logs from previous PID
IMPALA-11184 add code to target specific PID for log rotation. This
align with glog behavior and grant safety. That is, it is strictly limit
log rotation to only consider log files made by the currently running
Impalad and exclude logs made by previous PID or other living-colocated
Impalads. The downside of this limit is that logs can start accumulate
in a node when impalad is frequently restarted and is only resolvable by
admin doing manual log removal.

To help avoid this manual removal, this patch adds a backend flag
'log_rotation_match_pid' that relax the limit by dropping the PID in
glob pattern. Default value for this new flag is False. However, for
testing purpose, start-impala-cluster.py will override it to True since
test minicluster logs to a common log directory. Setting
'log_rotation_match_pid' to True will prevent one impalad from
interfering with log rotation of other impalad in minicluster.

As a minimum exercise for this new log rotation behavior,
test_breakpad.py::TestLogging is modified to invoke
start-impala-cluster.py with 'log_rotation_match_pid' set to False.

Testing:
- Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid.
- Split TestLogging into two. One run test_excessive_cerr_ignore_pid in
  core exploration, while the other run the rest of logging tests in
  exhaustive exploration.
- Pass exhaustive tests.

Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb
Reviewed-on: http://gerrit.cloudera.org:8080/20754
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-12-09 03:34:57 +00:00
Joe McDonnell
c233634d74 IMPALA-11975: Fix Dictionary methods to work with Python 3
Python 3 made the main dictionary methods lazy (items(),
keys(), values()). This means that code that uses those
methods may need to wrap the call in list() to get a
list immediately. Python 3 also removed the old iter*
lazy variants.

This changes all locations to use Python 3 dictionary
methods and wraps calls with list() appropriately.
This also changes all itemitems(), itervalues(), iterkeys()
locations to items(), values(), keys(), etc. Python 2
will not use the lazy implementation of these, so there
is a theoretical performance impact. Our python code is
mostly for tests and the performance impact is minimal.
Python 2 will be deprecated when Python 3 is functional.

This addresses these pylint warnings:
dict-iter-method
dict-keys-not-iterating
dict-values-not-iterating

Testing:
 - Ran core tests

Change-Id: Ie873ece54a633a8a95ed4600b1df4be7542348da
Reviewed-on: http://gerrit.cloudera.org:8080/19590
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
ba3518366a IMPALA-11952 (part 4): Fix odds and ends: Octals, long, lambda, etc.
There are a variety of small python 3 syntax differences:
 - Octal constants need to start with 0o rather than just 0
 - Long constants are not supported (i.e. numbers ending with L)
 - Lambda syntax is slightly different
 - The 'ur' string mode is no longer supported

Testing:
 - check-python-syntax.sh now passes

Change-Id: Ie027a50ddf6a2a0db4b34ec9b49484ce86947f20
Reviewed-on: http://gerrit.cloudera.org:8080/19554
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Joe McDonnell
2b550634d2 IMPALA-11952 (part 2): Fix print function syntax
Python 3 now treats print as a function and requires
the parenthesis in invocation.

print "Hello World!"
is now:
print("Hello World!")

This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.

print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)

To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function

This also fixes random flake8 issues that intersect with
the changes.

Testing:
 - check-python-syntax.sh shows no errors related to print

Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Michael Smith
f4b2ef5a00 IMPALA-11275: log thread info during minidump
Writes ThreadDebugInfo to stdout/stderr when a minidump is generated to
capture thread and query details related to the dump. Example message:
> Minidump in thread [1790536]async-exec-thread running query
  1a47cc1e2df94cb4:88dfa08200000000, fragment instance
  0000000000000000:0000000000000000

Refactors DumpCallback so that repeated writes to STDOUT/STDERR are less
redundant.

Adds unit tests to run with ThreadDebugInfo. Removes the 'static' prefix
from DumpCallback so it can be invoked from unit tests, but doesn't add
it to the header as it's intended to be for internal use.

Testing:
- Added crash to Coordinator::Exec and manually tested dump handling.
- Added a new unit test for DumpCallback.
- Ran tests/custom_cluster/test_breakpad.py to verify nothing broke in
  refactor. Those tests don't have ThreadDebugInfo available.

Change-Id: Iea2bdf10db29a0f8ccbe5e767b708781d42a9b8a
Reviewed-on: http://gerrit.cloudera.org:8080/18508
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-13 21:09:57 +00:00
Riza Suminto
87f3dc294f IMPALA-11184: Use ProgramInvocationShortName for log glob pattern
Impala uses FLAGS_log_filename as log symlink name and as part of the
glob pattern during log rotation. The user will not set this flag in
most cases, and it will default to google::ProgramInvocationShortName().
But if the user sets a custom value to this flag, the glob pattern will
mistakenly target the symlink instead of the actual log files. This
leads to a wrong behavior of DeleteOldLogs() and
GetLatestCanonicalLogPath().

This patch replace FLAGS_log_filename with
google::ProgramInvocationShortName() in glob pattern.

Testing:
- Pass simple-logger-test
- Pass exhaustive test_breakpad.py::TestLogging

Change-Id: I6c71bdb67f70c571d18fb8630d4a816ab75686fa
Reviewed-on: http://gerrit.cloudera.org:8080/18326
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-03-18 00:39:40 +00:00
Riza Suminto
8eeb000c35 IMPALA-11152: Remove dependence on symlink when rotating logs
IMPALA-5256 implements log rotation by following the glog's symlink and
checking the size of the pointed file. While this has been robust most
of the time, there can be a rare situation where the symlink is missing.
Glog itself does not guarantee that the symlink creation will always be
successful. It won't retry symlink creation until the next rotation by
glog. The side effect of this issue is that impala::CheckLogSize() will
spam ERROR log every second for not finding the symlink.

This patch removes the dependence on the glog symlink for this log
rotation. We now directly specify the base file name of the targetted
log kind and pick the latest log path. This patch also makes
impala::CheckLogSize() less chatty by printing an error message for
every FLAGS_logbufsecs (default is 30s).

Testing:
- Add test_breakpad.py::TestLogging::test_excessive_cerr_no_symlink.
- Pass test_breakpad.py in exhaustive exploration.

Change-Id: I30509e98038dbf9ca293144089f6ee92ce186a97
Reviewed-on: http://gerrit.cloudera.org:8080/18286
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-03-05 03:39:08 +00:00
Riza Suminto
b692a92fa2 IMPALA-5256: Force log rotation when max_log_size exceeded
Impala daemons allow STDOUT/STDERR redirection into INFO/ERROR log
respectively through redirect_stdout_stderr startup flag. If
redirect_stdout_stderr is true, daemons redirect STDOUT/STDERR stream to
write into the log file symlink created by glog. There are two problems
with this approach:

1. Glog updates the symlink to point to the new log file when it does
   log rotation. However, Impala is not aware that the symlink point to
   a different file. So cout/cerr write still goes to the oldest log
   file.

2. When there is a lot of write activity to cout/cerr, the log file can
   grow big. However, glog is not aware of STDOUT/STDERR activity. It
   only counts the message bytes written to glog (LOG(INFO),
   LOG(ERROR)). Thus, it only uses its internal bytes count when
   deciding to rotate the logs.

This commit addresses the issue by monitoring the log file size every
second. If Impala sees that the log file has exceeded max_log_size, it
will call google::FlushLogFiles(), ahead of logbufsecs. If the log file
stays big after the flush, we will force the glog to rotate the log.
Since there is no direct way to force glog to rotate, we do this by
changing the log extension to random extension through
google::SetLogFilenameExtension(), and immediately return them to
extensionless (empty string extension).

We also check periodically whether the log file symlink has pointed to a
new file. If it has changed, we reattach the STDOUT/STDERR stream to the
new log file.

Testing:
- Pass the core test.
- Add new exhaustive test TestLogging::test_excessive_cerr.

Change-Id: I1b94727180354fe69989ebf3cd1a8f8cda1cf0c3
Reviewed-on: http://gerrit.cloudera.org:8080/17997
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-11-12 19:54:29 +00:00
Tim Armstrong
4fb8e8e324 IMPALA-8816: reduce custom cluster test runtime in core
This includes some optimisations and a bulk move of tests
to exhaustive.

Move a bunch of custom cluster tests to exhaustive. I selected
these partially based on runtime (i.e. I looked most carefully
at the tests that ran for over a minute) and the likelihood
of them catching a precommit bug.  Regression tests for specific
edge cases and tests for parts of the code that are very stable
were prime candidates.

Remove an unnecessary cluster restart in test_breakpad.

Merge test_scheduler_error into test_failpoints to avoid an unnecessary
cluster restart.

Speed up cluster starts by ensuring that the default statestore args are
applied even when _start_impala_cluster() is called directly. This
shaves a couple of seconds off each restart. We made the default args
use a faster update frequency - see IMPALA-7185 - but they did not
take effect in all tests.

Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62
Reviewed-on: http://gerrit.cloudera.org:8080/13967
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-06 21:34:26 +00:00
Lars Volker
c1274fafb0 IMPALA-8191: Wait for additional breakpad processes during test
The Breakpad signal handler forks off a process to write a minidump.
During the breakpad tests we send signals to the Impala daemons and then
wait for all processes to go away. Prior to this change we did this by
waiting on the PID returned by process.get_pid(). It is determined by
iterating over psutil.get_pid_list() which is an ordered list of PIDs
running on the system. We return the first process in the list with a
matching command line. In cases where the PID space rolled over, this
could have been the forked off breakpad process and we'd wait on that
one. During the subsequent check that all processes are indeed gone, we
could then pick up the original Impala daemon that had forked off to
write the minidump and was still in the process of shutting down.

To fix this, we wait for every process twice. Processes are identified
by their command and iterating through them twice makes sure we catch
both the original daemon and it's breakpad child.

This change also contains improvements to the logging of processes in
our tests. This should make it easier to identify similar issues in the
future.

Testing: I ran the breakpad tests in exhaustive mode. I didn't try to
exercise it around a PID roll-over, but we shouldn't see the issue in
IMPALA-8191 again.

Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Reviewed-on: http://gerrit.cloudera.org:8080/12501
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-20 04:06:17 +00:00
Tim Armstrong
f9ced753ba IMPALA-7999: clean up start-*d.sh scripts
Delete these wrapper scripts and replace with a generic
start-daemon.sh script that sets environment variables
without the other logic.

Move the logic for setting JAVA_TOOL_OPTIONS into
start-impala-cluster.py.

Remove some options like -jvm_suspend, -gdb, -perf that
may not be used. These can be reintroduced if needed.

Port across the kerberized minicluster logic (which has
probably bitrotted) in case it needs to be revived.

Remove --verbose option that didn't appear to be useful
(it claims to print daemon output to the console,
but output is still redirected regardless).

Removed a level of quoting in custom cluster test argument
handling - this was made unnecessary by properly escaping
arguments with pipes.escape() in run_daemon().

Testing:
* Ran exhaustive tests.
* Ran on CentOS 6 to confirm we didn't reintroduce Popen issue
  worked around by kwho.

Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b
Reviewed-on: http://gerrit.cloudera.org:8080/12271
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-05 13:10:08 +00:00
Lars Volker
b3318ad434 IMPALA-8114: Deflake test_breakpad.py
A test failed recently in a private build and it looked like the loop in
wait_for_num_processes had terminated to early. To make sure that the
forked of processes that write the minidumps have actually started, we
now sleep for 1 second before entering the wait loop.

Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b
Reviewed-on: http://gerrit.cloudera.org:8080/12273
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-25 23:41:39 +00:00
Tim Armstrong
ff628d2b13 IMPALA-7986,IMPALA-7987: run daemons in docker containers
This refactors start-impala-cluster.py to allow multiple implementations
of the minicluster operations like start and stop. There are now
two classes implementing the same set of operations -
MiniClusterOperations and DockerMiniClusterOperations. The docker
versions start and stop the containers added in IMPALA-7948.

With some configuration (see instructions below), the containers can
connect back to services (HDFS, HMS, Kudu, Sentry, etc) running on the
host. Config generation was modified so that services optionally
communicate via the docker bridge network rather than loopback
(the host's loopback interface is not accessible to the containers).

Notes:
* I improved the container build to regenerate containers when cluster
  configs are regenerated (previously the containers could have stale
  configs).
* Switch from CMD to ENTRYPOINT to allow passing in arguments to "docker
  run" without clobbering default args.
* Python 2.6 is not supported for this code path. This only affects
  CentOS 6, which has limited support for docker anyway.
* I deferred implementing wait_for_cluster(), since the existing
  code requires surgery to abstract out assumptions about locating
  processes and web UI ports - see IMPALA-7988.

How to use:
==========
Create a docker network to use for internal cluster communication,
e.g.:
  docker network create -d bridge --gateway=172.17.0.1 \
      --subnet=172.17.0.1/16 impala-cluster

Add the gateway address of the docker network you created to
impala-config-local.sh, e.g.:

  export INTERNAL_LISTEN_HOST=172.17.0.1
  export DEFAULT_FS=hdfs://${INTERNAL_LISTEN_HOST}:20500

Regenerate configs and docker images:

  . bin/impala-config.sh
  ./bin/create-test-configuration.sh
  ninja -j $IMPALA_BUILD_THREADS docker_images

Restart the minicluster and Impala services to pick up the config:

  ./testdata/bin/run-all.sh
  start-impala-cluster.py --docker_network impala-cluster

You can connect with impala-shell and run some queries. You will
likely run into issues, particularly if running against an existing
data load, since "localhost" or "127.0.0.1" get baked into HMS
table definitions.

Testing:
Ran exhaustive tests (not using Docker) to make sure I didn't break
anything.

Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11
Reviewed-on: http://gerrit.cloudera.org:8080/12095
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-18 04:56:49 +00:00
Tim Armstrong
93ee538c54 IMPALA-7714: remove unsafe code from signal handlers
IMPALA-6271 added LOG statements to some signal handlers and an exit()
call to a different signal handler. These functions are not async-signal
safe.

The fixes are:
* Use the write system call directly. I tried using glog's RAW_LOG
  functionality but had major issues getting it to work.
* Call _exit() directly instead of exit() so that static destructors
  are not run. This is the same default behaviour as SIGTERM. This
  wans't necessary to prevent this specific crash.

Testing:
Could reproduce the crash by looping
tests/custom_cluster/test_local_catalog.py until a minidump was
produced. After this fix it did not reproduce after looping for
a few hours.

Ran exhaustive build.

Change-Id: I52037d6510b9b34ec33d3a8b5492226aee1b9d92
Reviewed-on: http://gerrit.cloudera.org:8080/11777
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-10-26 01:50:16 +00:00
Pranay
f699a6ce83 IMPALA-6271: Impala daemon should log a message when it's being shut down
Currently Impalad does not log any message when SIGTERM is sent to
impalad to terminate or to do a graceful shut down. This change logs
a message when SIGTERM is received by impalad/catalogd/statestored.
This logging will assist in debugging the issues seen in the field
where impalad was not gracefully shut down (some other signal
was generated that led to impalad/catalogd/statestored crash).

Testing:
-------
a) Used kill to send signals to impalad/catalogd/statestored
   `kill -s SIGTERM <pid of impalad/catalogd/statestored>` and see the
   log message is being logged in impalad/catalogd/statestored.INFO.
b) Ran test_breakpad.py to check that existing breakpad functionalities
   are not affected.
c) Ran exhaustive tests without failure.
d) Added new test in test_breakpad.py to handle SIGTERM for
   impalad/statestored/catalogd.

Change-Id: Id20da9e30440b7348557beccb8a0da14775fcc29
Reviewed-on: http://gerrit.cloudera.org:8080/10847
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-29 00:08:24 +00:00
Lars Volker
6dc7237fc1 IMPALA-6387: Increase wait for Breakpad crash handling
It seems that a recent slowdown of our test infrastructure might have
caused Breakpad to take a longer time to write Minidumps. There could
also be a more fundamental issue leading to hangs. To rule this out,
this change increases the default timeout to something larger to allow
the tests to complete.

Change-Id: I84742be9af9444607fde4baf8ea1c0092ff181fe
Reviewed-on: http://gerrit.cloudera.org:8080/9018
Tested-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-01-12 17:22:56 +00:00
Michael Brown
bf29ec53c3 IMPALA-6049: breakpad tests: skip all tests with local filesystem
The breakpad tests were recently refactored to support inclusion of one
of them as a core test. In this refactor, we neglected to ensure
setup_class() called its parent. This means the skipping called in said
parent doesn't occur, and the test is executed in an unsupported
environment (local filesystem).

This patch fixes that by ensuring we call the parent setup_class() via
super().

Testing:

$ TARGET_FILESYSTEM="local" impala-py.test tests/custom_cluster/test_breakpad.py \
      -k test_abort_writes_minidump

tests/custom_cluster/test_breakpad.py::TestBreakpadCore::test_abort_writes_minidump
SKIPPED

Change-Id: Ib4a3ff29dd85c79c4c3b3e3afb699861e408aa95
Reviewed-on: http://gerrit.cloudera.org:8080/8272
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-14 01:02:08 +00:00
Lars Volker
f03900a805 IMPALA-6023: Fix broken breakpad test
We have a test to make sure that hitting a DCHECK will write a minidump.
We used to pass "-beeswax_port=1" to the server to trigger a DCHECK. A
while ago, this DCHECK seems to have been removed, but we still called
abort() if the ImpalaServer failed to start. This masked the slightly
altered behavior and the test still succeeded.

However, the fix for IMPALA-4786 changed the behavior to call exit(1)
instead of abort() if the ImpalaServer failed to start.

To fix the test, we change it to pass an unresolvable hostname to
impalad, which will result in a call to abort().

This change also splits the breakpad tests into core and exhaustive sets
to make sure that tests which depend on other parts of Impala are
included in every core run.

Change-Id: Ifb5af3e72963280a6677a99aa6a0e5785443bb0c
Reviewed-on: http://gerrit.cloudera.org:8080/8240
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-10 23:08:44 +00:00
Lars Volker
dc2f69e5a0 IMPALA-5809: Relax max_minidumps in breakpad test
The change to address IMPALA-5769 added periodic cleaning for minidumps,
which got in the way of the other minidump tests.

This change sets max_minidumps to the default value (9) for all tests to
keep the cleanup thread from interfering, and then sets a smaller limit
where needed.

Change-Id: I977930ae87b8d4671a89c1e07ba76b12eb92fa55
Reviewed-on: http://gerrit.cloudera.org:8080/7716
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-18 04:51:30 +00:00
Lars Volker
294d42adc1 IMPALA-5769: Add periodic minidump cleanup
Minidumps can be written by sending SIGUSR1 to our daemon processes.
That way, an arbitrary number of minidump files can be created. This
change adds minidump cleanup to the periodic log file cleanup to
effectively bound the maximum number of minidumps we keep around.

Change-Id: Ie02ff2271412d814f84a4ff42ccbca51d91bf980
Reviewed-on: http://gerrit.cloudera.org:8080/7605
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-16 08:41:16 +00:00
Lars Volker
b82957055c IMPALA-4737: Prevent SIGUSR1 from killing daemons when minidumps are disabled
If a user disabled minidumps before this change, we did not register the
signal handler for SIGUSR1 at all. Sending SIGUSR1 to a daemon would
subsequently kill it.

This change registers the SIG_IGN handler to ignore the signal if
minidumps are disabled.

Change-Id: I13d866a2eec832500131954a7f693c33585ea51e
Reviewed-on: http://gerrit.cloudera.org:8080/7631
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-16 01:18:22 +00:00
Lars Volker
344c26aa29 IMPALA-5616: Add --enable_minidumps startup flag
If set to 'false', this flag will disable registration of the Breakpad
signal handlers during startup. The default value is 'true'. This does
not affect the ability to disable the handlers by specifying an empty
value for --minidump_path.

This change adds a test to test_breakpad.py.

Change-Id: Ie2039b9140e1c281810b27b76140e2105198bc37
Reviewed-on: http://gerrit.cloudera.org:8080/7541
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-02 01:32:06 +00:00
Lars Volker
5518cbcb78 IMPALA-5424: Ignore errors when removing minidumps folder
On developer machines it can happen that /tmp/minidumps does not exists
when test_minidump_relative_path gets executed. In this case errors from
rmtree should be ignored.

Change-Id: Ifab76a30898805d2df5e7452079a536d8747ac50
Reviewed-on: http://gerrit.cloudera.org:8080/7062
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-06-02 23:23:52 +00:00
Lars Volker
8afb59045e IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794
This change switches to a new Breakpad version, which includes fixes for
Breakpad bugs #681 and #728. The toolchain change was reviewed here:
https://gerrit.cloudera.org/6866

The change also undoes the workaround introduced in IMPALA-3794.

In addition to running test_breakpad.py in a loop for a while, I tested
Then I verified that the test fails with the old toolchain version
(88e5b2) and works with the new one (ffe3e4).

To test #728 I added a sleep() call before SendContinueSignalToChild()
and then killed the parent process, manually observing that the child
would die, too.

Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19
Reviewed-on: http://gerrit.cloudera.org:8080/6883
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-17 15:19:12 +00:00
Lars Volker
a827e9edc1 IMPALA-3794: Workaround for Breakpad ID conflicts
Breakpad determines the ID of the minidump file to be written in case of
a crash during startup of the process randomly, seeded with the current
system time with second granularity. If two impalads start up within the
same second, there is a chance for a name conflict. The one second delay
between starting impalads in start-impala-cluster.py is not sufficient:

I0407 22:34:52.018563 28473 minidump.cc:245] Setting minidump size limit
to 20971520.
I0407 22:34:52.997046 28749 minidump.cc:245] Setting minidump size limit
to 20971520.

When sending a signal to all of them, one process can overwrite the
minidump of another one. This is an upstream issue and is tracked in
Breakpad-681. I further confirmed my suspicion by tentatively making an
own output folder for each running instance of impalad and was then
unable to reproduce the issue. However, it is a more clear solution to
fix the underlying issue than to change the folder locations for
minidumps in impala.

Until this is fixed upstream, we can make sure that we see at least one
minidump for the group of impalads in the test cluster. It is not a
product defect, since we don't support running multiple impalads on a
single host, let alone starting them all at once.

To test this I ran the following loop for about an hour on my dev
machine without hitting the issue:

while [ $? -eq 0 ]; do impala-py.test
tests/custom_cluster/test_breakpad.py --exploration_strategy=exhaustive
-k test_minidump_relative_path -x -s; done

Change-Id: I4ae589f6eb5cbbfb860943214edc0e6415eeb862
Reviewed-on: http://gerrit.cloudera.org:8080/6588
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-08 19:58:25 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Taras Bobrovytsky
609b80410e Clean up Python test import statements
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.

Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-07-15 23:26:18 +00:00
Lars Volker
948a6c34fc IMPALA-3677: Write minidump on SIGUSR1
Sending SIGUSR1 to any of the impala daemons (catalogd, impalad,
statestored) will trigger a minidump write.

The hotspot JVM also uses SIGUSR1 internally. However the documentation
explains, that existing signal handlers will be transparently wrapped by
the JVM and no spurious signals should be received by the daemon signal
handler:
http://www.oracle.com/technetwork/java/javase/signals-139944.html

Example: killall -SIGUSR1 catalogd

Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513
Reviewed-on: http://gerrit.cloudera.org:8080/3312
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-07-14 19:04:44 +00:00
Lars Volker
c69cd15a0a IMPALA-3656: Hitting DCHECK/CHECK does not write minidumps
When hitting a DCHECK/CHECK the daemons do not write minidumps. This is
caused by glog's own stack unwinding mechanism, which catches SIGABRT
and removes all other handlers before aborting.

This change bumps the glog version to include a patch, which backports a
change from glog, which only resets the SIGABRT handler, if it is the
one installed by glog itself.

cda16b3443

Change-Id: I08e6b83af1b4ff1b8c916fe6c9052b88b760e188
Reviewed-on: http://gerrit.cloudera.org:8080/3286
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2016-06-11 05:31:32 -07:00
Lars Volker
ca62ce65e9 IMPALA-3684, IMPALA-3693: Disable core files for breakpad tests
The breakpad tests were writing core files when triggering minidump
writes. This was actually not needed and interfered with test execution
and artifact collection. Most notably processes would take a long time
to terminate while writing core files (IMPALA-3684). The core files
would also be wrongly collected by Jenkins (IMPALA-3693).

This change adds code to stop test clusters reliably, making
test_breakpad independent from calling setup-impala-cluster.py via
os.system. It also disables core dumps for the duration of the test and
re-enables them afterwards.

Change-Id: If592339632aa662b59be09d911229566d5772321
Reviewed-on: http://gerrit.cloudera.org:8080/3339
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2016-06-09 17:31:00 -07:00
Lars Volker
d16e83214a IMPALA-3581: Change location of minidump folders to log_dir
Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on
reboot on various distributions. This change moves the default location to
FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent
name collisions in case of local test clusters and strangely configured installations.

For local test clusters the minidumps will be written to
$IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}.

Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f
Reviewed-on: http://gerrit.cloudera.org:8080/3171
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Lars Volker
df8bf3a965 IMPALA-3490: Add flag to reduce minidump size
IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them
to write minidump files. This change introduces a flag
'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of
thread stack memory it includes in a minidump, aiming to reduce the minidump
size during crashes with a lot of threads. Once a minidump is expected to
exceed the configured value, breakpad will include the full stack memory for the
first 20 threads, and afterwards capture only 2KB of stack memory for each
additional thread.

Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052
Reviewed-on: http://gerrit.cloudera.org:8080/2990
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:04 -07:00
Lars Volker
c9df348c38 IMPALA-2686: Add breakpad crash handler to all daemons
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.

Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:52 -07:00