impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Joe McDonnell	19678ae65c	IMPALA-13431: Deflake TestLogging.test_excessive_cerr_ignore_pid On a couple UBSAN runs, test_excessive_cerr_ignore_pid sometimes fails find the message providing the next path in the last line of the ERROR log file. The logs aren't preserved, so we don't know the exact contents of the log file. This does two things: 1. It changes the test to preserve the log file on failure by copying from the temporary directory to a directory that will last past the end of the test. This gives us data to work with if we see this again. 2. A theory is that an extra line or two of logging could go to the file after it writes the message with the next path. This changes the test to check the last 3 lines of the log file for the message providing the next path. Testing: - Ran test with UBSAN Change-Id: I4745184e983ee5669822059289aab18caf0b72a9 Reviewed-on: http://gerrit.cloudera.org:8080/21926 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-15 23:15:00 +00:00
Riza Suminto	9c87cf41bf	IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite There are many custom cluster tests that require creating temporary directory. The temporary directory typically live within a scope of test method and cleaned afterwards. However, some test do create temporary directory directly and forgot to clean them afterwards, leaving junk dirs under /tmp/ or $LOG_DIR. This patch unify the temporary directory management inside CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in CustomClusterTestSuite.with_args() that list tmp dirs to create. 'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept formatting pattern that is replaceable by a temporary dir path, defined through 'tmp_dir_placeholders'. There are few occurrences where mkdtemp is called and not replaceable by this work, such as tests/comparison/cluster.py. In that case, this patch change them to supply prefix arg so that developer knows that it comes from Impala test script. This patch also addressed several flake8 errors in modified files. Testing: - Pass custom cluster tests in exhaustive mode. - Manually run few modified tests and observe that the temporary dirs are created and removed under logs/custom_cluster_tests/ as the tests go. Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5 Reviewed-on: http://gerrit.cloudera.org:8080/21836 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-02 01:25:39 +00:00
Riza Suminto	3381fbf761	IMPALA-12595: Allow automatic removal of old logs from previous PID IMPALA-11184 add code to target specific PID for log rotation. This align with glog behavior and grant safety. That is, it is strictly limit log rotation to only consider log files made by the currently running Impalad and exclude logs made by previous PID or other living-colocated Impalads. The downside of this limit is that logs can start accumulate in a node when impalad is frequently restarted and is only resolvable by admin doing manual log removal. To help avoid this manual removal, this patch adds a backend flag 'log_rotation_match_pid' that relax the limit by dropping the PID in glob pattern. Default value for this new flag is False. However, for testing purpose, start-impala-cluster.py will override it to True since test minicluster logs to a common log directory. Setting 'log_rotation_match_pid' to True will prevent one impalad from interfering with log rotation of other impalad in minicluster. As a minimum exercise for this new log rotation behavior, test_breakpad.py::TestLogging is modified to invoke start-impala-cluster.py with 'log_rotation_match_pid' set to False. Testing: - Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid. - Split TestLogging into two. One run test_excessive_cerr_ignore_pid in core exploration, while the other run the rest of logging tests in exhaustive exploration. - Pass exhaustive tests. Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb Reviewed-on: http://gerrit.cloudera.org:8080/20754 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-09 03:34:57 +00:00
Joe McDonnell	c233634d74	IMPALA-11975: Fix Dictionary methods to work with Python 3 Python 3 made the main dictionary methods lazy (items(), keys(), values()). This means that code that uses those methods may need to wrap the call in list() to get a list immediately. Python 3 also removed the old iter* lazy variants. This changes all locations to use Python 3 dictionary methods and wraps calls with list() appropriately. This also changes all itemitems(), itervalues(), iterkeys() locations to items(), values(), keys(), etc. Python 2 will not use the lazy implementation of these, so there is a theoretical performance impact. Our python code is mostly for tests and the performance impact is minimal. Python 2 will be deprecated when Python 3 is functional. This addresses these pylint warnings: dict-iter-method dict-keys-not-iterating dict-values-not-iterating Testing: - Ran core tests Change-Id: Ie873ece54a633a8a95ed4600b1df4be7542348da Reviewed-on: http://gerrit.cloudera.org:8080/19590 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	ba3518366a	IMPALA-11952 (part 4): Fix odds and ends: Octals, long, lambda, etc. There are a variety of small python 3 syntax differences: - Octal constants need to start with 0o rather than just 0 - Long constants are not supported (i.e. numbers ending with L) - Lambda syntax is slightly different - The 'ur' string mode is no longer supported Testing: - check-python-syntax.sh now passes Change-Id: Ie027a50ddf6a2a0db4b34ec9b49484ce86947f20 Reviewed-on: http://gerrit.cloudera.org:8080/19554 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Michael Smith	f4b2ef5a00	IMPALA-11275: log thread info during minidump Writes ThreadDebugInfo to stdout/stderr when a minidump is generated to capture thread and query details related to the dump. Example message: > Minidump in thread [1790536]async-exec-thread running query 1a47cc1e2df94cb4:88dfa08200000000, fragment instance 0000000000000000:0000000000000000 Refactors DumpCallback so that repeated writes to STDOUT/STDERR are less redundant. Adds unit tests to run with ThreadDebugInfo. Removes the 'static' prefix from DumpCallback so it can be invoked from unit tests, but doesn't add it to the header as it's intended to be for internal use. Testing: - Added crash to Coordinator::Exec and manually tested dump handling. - Added a new unit test for DumpCallback. - Ran tests/custom_cluster/test_breakpad.py to verify nothing broke in refactor. Those tests don't have ThreadDebugInfo available. Change-Id: Iea2bdf10db29a0f8ccbe5e767b708781d42a9b8a Reviewed-on: http://gerrit.cloudera.org:8080/18508 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-13 21:09:57 +00:00
Riza Suminto	87f3dc294f	IMPALA-11184: Use ProgramInvocationShortName for log glob pattern Impala uses FLAGS_log_filename as log symlink name and as part of the glob pattern during log rotation. The user will not set this flag in most cases, and it will default to google::ProgramInvocationShortName(). But if the user sets a custom value to this flag, the glob pattern will mistakenly target the symlink instead of the actual log files. This leads to a wrong behavior of DeleteOldLogs() and GetLatestCanonicalLogPath(). This patch replace FLAGS_log_filename with google::ProgramInvocationShortName() in glob pattern. Testing: - Pass simple-logger-test - Pass exhaustive test_breakpad.py::TestLogging Change-Id: I6c71bdb67f70c571d18fb8630d4a816ab75686fa Reviewed-on: http://gerrit.cloudera.org:8080/18326 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-03-18 00:39:40 +00:00
Riza Suminto	8eeb000c35	IMPALA-11152: Remove dependence on symlink when rotating logs IMPALA-5256 implements log rotation by following the glog's symlink and checking the size of the pointed file. While this has been robust most of the time, there can be a rare situation where the symlink is missing. Glog itself does not guarantee that the symlink creation will always be successful. It won't retry symlink creation until the next rotation by glog. The side effect of this issue is that impala::CheckLogSize() will spam ERROR log every second for not finding the symlink. This patch removes the dependence on the glog symlink for this log rotation. We now directly specify the base file name of the targetted log kind and pick the latest log path. This patch also makes impala::CheckLogSize() less chatty by printing an error message for every FLAGS_logbufsecs (default is 30s). Testing: - Add test_breakpad.py::TestLogging::test_excessive_cerr_no_symlink. - Pass test_breakpad.py in exhaustive exploration. Change-Id: I30509e98038dbf9ca293144089f6ee92ce186a97 Reviewed-on: http://gerrit.cloudera.org:8080/18286 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-03-05 03:39:08 +00:00
Riza Suminto	b692a92fa2	IMPALA-5256: Force log rotation when max_log_size exceeded Impala daemons allow STDOUT/STDERR redirection into INFO/ERROR log respectively through redirect_stdout_stderr startup flag. If redirect_stdout_stderr is true, daemons redirect STDOUT/STDERR stream to write into the log file symlink created by glog. There are two problems with this approach: 1. Glog updates the symlink to point to the new log file when it does log rotation. However, Impala is not aware that the symlink point to a different file. So cout/cerr write still goes to the oldest log file. 2. When there is a lot of write activity to cout/cerr, the log file can grow big. However, glog is not aware of STDOUT/STDERR activity. It only counts the message bytes written to glog (LOG(INFO), LOG(ERROR)). Thus, it only uses its internal bytes count when deciding to rotate the logs. This commit addresses the issue by monitoring the log file size every second. If Impala sees that the log file has exceeded max_log_size, it will call google::FlushLogFiles(), ahead of logbufsecs. If the log file stays big after the flush, we will force the glog to rotate the log. Since there is no direct way to force glog to rotate, we do this by changing the log extension to random extension through google::SetLogFilenameExtension(), and immediately return them to extensionless (empty string extension). We also check periodically whether the log file symlink has pointed to a new file. If it has changed, we reattach the STDOUT/STDERR stream to the new log file. Testing: - Pass the core test. - Add new exhaustive test TestLogging::test_excessive_cerr. Change-Id: I1b94727180354fe69989ebf3cd1a8f8cda1cf0c3 Reviewed-on: http://gerrit.cloudera.org:8080/17997 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-11-12 19:54:29 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Lars Volker	c1274fafb0	IMPALA-8191: Wait for additional breakpad processes during test The Breakpad signal handler forks off a process to write a minidump. During the breakpad tests we send signals to the Impala daemons and then wait for all processes to go away. Prior to this change we did this by waiting on the PID returned by process.get_pid(). It is determined by iterating over psutil.get_pid_list() which is an ordered list of PIDs running on the system. We return the first process in the list with a matching command line. In cases where the PID space rolled over, this could have been the forked off breakpad process and we'd wait on that one. During the subsequent check that all processes are indeed gone, we could then pick up the original Impala daemon that had forked off to write the minidump and was still in the process of shutting down. To fix this, we wait for every process twice. Processes are identified by their command and iterating through them twice makes sure we catch both the original daemon and it's breakpad child. This change also contains improvements to the logging of processes in our tests. This should make it easier to identify similar issues in the future. Testing: I ran the breakpad tests in exhaustive mode. I didn't try to exercise it around a PID roll-over, but we shouldn't see the issue in IMPALA-8191 again. Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca Reviewed-on: http://gerrit.cloudera.org:8080/12501 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-20 04:06:17 +00:00
Tim Armstrong	f9ced753ba	IMPALA-7999: clean up start-d.sh scripts Delete these wrapper scripts and replace with a generic start-daemon.sh script that sets environment variables without the other logic. Move the logic for setting JAVA_TOOL_OPTIONS into start-impala-cluster.py. Remove some options like -jvm_suspend, -gdb, -perf that may not be used. These can be reintroduced if needed. Port across the kerberized minicluster logic (which has probably bitrotted) in case it needs to be revived. Remove --verbose option that didn't appear to be useful (it claims to print daemon output to the console, but output is still redirected regardless). Removed a level of quoting in custom cluster test argument handling - this was made unnecessary by properly escaping arguments with pipes.escape() in run_daemon(). Testing: Ran exhaustive tests. * Ran on CentOS 6 to confirm we didn't reintroduce Popen issue worked around by kwho. Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b Reviewed-on: http://gerrit.cloudera.org:8080/12271 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-05 13:10:08 +00:00
Lars Volker	b3318ad434	IMPALA-8114: Deflake test_breakpad.py A test failed recently in a private build and it looked like the loop in wait_for_num_processes had terminated to early. To make sure that the forked of processes that write the minidumps have actually started, we now sleep for 1 second before entering the wait loop. Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b Reviewed-on: http://gerrit.cloudera.org:8080/12273 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-25 23:41:39 +00:00
Tim Armstrong	ff628d2b13	IMPALA-7986,IMPALA-7987: run daemons in docker containers This refactors start-impala-cluster.py to allow multiple implementations of the minicluster operations like start and stop. There are now two classes implementing the same set of operations - MiniClusterOperations and DockerMiniClusterOperations. The docker versions start and stop the containers added in IMPALA-7948. With some configuration (see instructions below), the containers can connect back to services (HDFS, HMS, Kudu, Sentry, etc) running on the host. Config generation was modified so that services optionally communicate via the docker bridge network rather than loopback (the host's loopback interface is not accessible to the containers). Notes: * I improved the container build to regenerate containers when cluster configs are regenerated (previously the containers could have stale configs). * Switch from CMD to ENTRYPOINT to allow passing in arguments to "docker run" without clobbering default args. * Python 2.6 is not supported for this code path. This only affects CentOS 6, which has limited support for docker anyway. * I deferred implementing wait_for_cluster(), since the existing code requires surgery to abstract out assumptions about locating processes and web UI ports - see IMPALA-7988. How to use: ========== Create a docker network to use for internal cluster communication, e.g.: docker network create -d bridge --gateway=172.17.0.1 \ --subnet=172.17.0.1/16 impala-cluster Add the gateway address of the docker network you created to impala-config-local.sh, e.g.: export INTERNAL_LISTEN_HOST=172.17.0.1 export DEFAULT_FS=hdfs://${INTERNAL_LISTEN_HOST}:20500 Regenerate configs and docker images: . bin/impala-config.sh ./bin/create-test-configuration.sh ninja -j $IMPALA_BUILD_THREADS docker_images Restart the minicluster and Impala services to pick up the config: ./testdata/bin/run-all.sh start-impala-cluster.py --docker_network impala-cluster You can connect with impala-shell and run some queries. You will likely run into issues, particularly if running against an existing data load, since "localhost" or "127.0.0.1" get baked into HMS table definitions. Testing: Ran exhaustive tests (not using Docker) to make sure I didn't break anything. Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11 Reviewed-on: http://gerrit.cloudera.org:8080/12095 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-18 04:56:49 +00:00
Tim Armstrong	93ee538c54	IMPALA-7714: remove unsafe code from signal handlers IMPALA-6271 added LOG statements to some signal handlers and an exit() call to a different signal handler. These functions are not async-signal safe. The fixes are: * Use the write system call directly. I tried using glog's RAW_LOG functionality but had major issues getting it to work. * Call _exit() directly instead of exit() so that static destructors are not run. This is the same default behaviour as SIGTERM. This wans't necessary to prevent this specific crash. Testing: Could reproduce the crash by looping tests/custom_cluster/test_local_catalog.py until a minidump was produced. After this fix it did not reproduce after looping for a few hours. Ran exhaustive build. Change-Id: I52037d6510b9b34ec33d3a8b5492226aee1b9d92 Reviewed-on: http://gerrit.cloudera.org:8080/11777 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-10-26 01:50:16 +00:00
Pranay	f699a6ce83	IMPALA-6271: Impala daemon should log a message when it's being shut down Currently Impalad does not log any message when SIGTERM is sent to impalad to terminate or to do a graceful shut down. This change logs a message when SIGTERM is received by impalad/catalogd/statestored. This logging will assist in debugging the issues seen in the field where impalad was not gracefully shut down (some other signal was generated that led to impalad/catalogd/statestored crash). Testing: ------- a) Used kill to send signals to impalad/catalogd/statestored `kill -s SIGTERM <pid of impalad/catalogd/statestored>` and see the log message is being logged in impalad/catalogd/statestored.INFO. b) Ran test_breakpad.py to check that existing breakpad functionalities are not affected. c) Ran exhaustive tests without failure. d) Added new test in test_breakpad.py to handle SIGTERM for impalad/statestored/catalogd. Change-Id: Id20da9e30440b7348557beccb8a0da14775fcc29 Reviewed-on: http://gerrit.cloudera.org:8080/10847 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-29 00:08:24 +00:00
Lars Volker	6dc7237fc1	IMPALA-6387: Increase wait for Breakpad crash handling It seems that a recent slowdown of our test infrastructure might have caused Breakpad to take a longer time to write Minidumps. There could also be a more fundamental issue leading to hangs. To rule this out, this change increases the default timeout to something larger to allow the tests to complete. Change-Id: I84742be9af9444607fde4baf8ea1c0092ff181fe Reviewed-on: http://gerrit.cloudera.org:8080/9018 Tested-by: Lars Volker <lv@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-01-12 17:22:56 +00:00
Michael Brown	bf29ec53c3	IMPALA-6049: breakpad tests: skip all tests with local filesystem The breakpad tests were recently refactored to support inclusion of one of them as a core test. In this refactor, we neglected to ensure setup_class() called its parent. This means the skipping called in said parent doesn't occur, and the test is executed in an unsupported environment (local filesystem). This patch fixes that by ensuring we call the parent setup_class() via super(). Testing: $ TARGET_FILESYSTEM="local" impala-py.test tests/custom_cluster/test_breakpad.py \ -k test_abort_writes_minidump tests/custom_cluster/test_breakpad.py::TestBreakpadCore::test_abort_writes_minidump SKIPPED Change-Id: Ib4a3ff29dd85c79c4c3b3e3afb699861e408aa95 Reviewed-on: http://gerrit.cloudera.org:8080/8272 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-14 01:02:08 +00:00
Lars Volker	f03900a805	IMPALA-6023: Fix broken breakpad test We have a test to make sure that hitting a DCHECK will write a minidump. We used to pass "-beeswax_port=1" to the server to trigger a DCHECK. A while ago, this DCHECK seems to have been removed, but we still called abort() if the ImpalaServer failed to start. This masked the slightly altered behavior and the test still succeeded. However, the fix for IMPALA-4786 changed the behavior to call exit(1) instead of abort() if the ImpalaServer failed to start. To fix the test, we change it to pass an unresolvable hostname to impalad, which will result in a call to abort(). This change also splits the breakpad tests into core and exhaustive sets to make sure that tests which depend on other parts of Impala are included in every core run. Change-Id: Ifb5af3e72963280a6677a99aa6a0e5785443bb0c Reviewed-on: http://gerrit.cloudera.org:8080/8240 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 23:08:44 +00:00
Lars Volker	dc2f69e5a0	IMPALA-5809: Relax max_minidumps in breakpad test The change to address IMPALA-5769 added periodic cleaning for minidumps, which got in the way of the other minidump tests. This change sets max_minidumps to the default value (9) for all tests to keep the cleanup thread from interfering, and then sets a smaller limit where needed. Change-Id: I977930ae87b8d4671a89c1e07ba76b12eb92fa55 Reviewed-on: http://gerrit.cloudera.org:8080/7716 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-18 04:51:30 +00:00
Lars Volker	294d42adc1	IMPALA-5769: Add periodic minidump cleanup Minidumps can be written by sending SIGUSR1 to our daemon processes. That way, an arbitrary number of minidump files can be created. This change adds minidump cleanup to the periodic log file cleanup to effectively bound the maximum number of minidumps we keep around. Change-Id: Ie02ff2271412d814f84a4ff42ccbca51d91bf980 Reviewed-on: http://gerrit.cloudera.org:8080/7605 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-16 08:41:16 +00:00
Lars Volker	b82957055c	IMPALA-4737: Prevent SIGUSR1 from killing daemons when minidumps are disabled If a user disabled minidumps before this change, we did not register the signal handler for SIGUSR1 at all. Sending SIGUSR1 to a daemon would subsequently kill it. This change registers the SIG_IGN handler to ignore the signal if minidumps are disabled. Change-Id: I13d866a2eec832500131954a7f693c33585ea51e Reviewed-on: http://gerrit.cloudera.org:8080/7631 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-16 01:18:22 +00:00
Lars Volker	344c26aa29	IMPALA-5616: Add --enable_minidumps startup flag If set to 'false', this flag will disable registration of the Breakpad signal handlers during startup. The default value is 'true'. This does not affect the ability to disable the handlers by specifying an empty value for --minidump_path. This change adds a test to test_breakpad.py. Change-Id: Ie2039b9140e1c281810b27b76140e2105198bc37 Reviewed-on: http://gerrit.cloudera.org:8080/7541 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-02 01:32:06 +00:00
Lars Volker	5518cbcb78	IMPALA-5424: Ignore errors when removing minidumps folder On developer machines it can happen that /tmp/minidumps does not exists when test_minidump_relative_path gets executed. In this case errors from rmtree should be ignored. Change-Id: Ifab76a30898805d2df5e7452079a536d8747ac50 Reviewed-on: http://gerrit.cloudera.org:8080/7062 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-02 23:23:52 +00:00
Lars Volker	8afb59045e	IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794 This change switches to a new Breakpad version, which includes fixes for Breakpad bugs #681 and #728. The toolchain change was reviewed here: https://gerrit.cloudera.org/6866 The change also undoes the workaround introduced in IMPALA-3794. In addition to running test_breakpad.py in a loop for a while, I tested Then I verified that the test fails with the old toolchain version (88e5b2) and works with the new one (ffe3e4). To test #728 I added a sleep() call before SendContinueSignalToChild() and then killed the parent process, manually observing that the child would die, too. Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19 Reviewed-on: http://gerrit.cloudera.org:8080/6883 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-17 15:19:12 +00:00
Lars Volker	a827e9edc1	IMPALA-3794: Workaround for Breakpad ID conflicts Breakpad determines the ID of the minidump file to be written in case of a crash during startup of the process randomly, seeded with the current system time with second granularity. If two impalads start up within the same second, there is a chance for a name conflict. The one second delay between starting impalads in start-impala-cluster.py is not sufficient: I0407 22:34:52.018563 28473 minidump.cc:245] Setting minidump size limit to 20971520. I0407 22:34:52.997046 28749 minidump.cc:245] Setting minidump size limit to 20971520. When sending a signal to all of them, one process can overwrite the minidump of another one. This is an upstream issue and is tracked in Breakpad-681. I further confirmed my suspicion by tentatively making an own output folder for each running instance of impalad and was then unable to reproduce the issue. However, it is a more clear solution to fix the underlying issue than to change the folder locations for minidumps in impala. Until this is fixed upstream, we can make sure that we see at least one minidump for the group of impalads in the test cluster. It is not a product defect, since we don't support running multiple impalads on a single host, let alone starting them all at once. To test this I ran the following loop for about an hour on my dev machine without hitting the issue: while [ $? -eq 0 ]; do impala-py.test tests/custom_cluster/test_breakpad.py --exploration_strategy=exhaustive -k test_minidump_relative_path -x -s; done Change-Id: I4ae589f6eb5cbbfb860943214edc0e6415eeb862 Reviewed-on: http://gerrit.cloudera.org:8080/6588 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-08 19:58:25 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
Lars Volker	948a6c34fc	IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Reviewed-on: http://gerrit.cloudera.org:8080/3312 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-07-14 19:04:44 +00:00
Lars Volker	c69cd15a0a	IMPALA-3656: Hitting DCHECK/CHECK does not write minidumps When hitting a DCHECK/CHECK the daemons do not write minidumps. This is caused by glog's own stack unwinding mechanism, which catches SIGABRT and removes all other handlers before aborting. This change bumps the glog version to include a patch, which backports a change from glog, which only resets the SIGABRT handler, if it is the one installed by glog itself. `cda16b3443` Change-Id: I08e6b83af1b4ff1b8c916fe6c9052b88b760e188 Reviewed-on: http://gerrit.cloudera.org:8080/3286 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2016-06-11 05:31:32 -07:00
Lars Volker	ca62ce65e9	IMPALA-3684, IMPALA-3693: Disable core files for breakpad tests The breakpad tests were writing core files when triggering minidump writes. This was actually not needed and interfered with test execution and artifact collection. Most notably processes would take a long time to terminate while writing core files (IMPALA-3684). The core files would also be wrongly collected by Jenkins (IMPALA-3693). This change adds code to stop test clusters reliably, making test_breakpad independent from calling setup-impala-cluster.py via os.system. It also disables core dumps for the duration of the test and re-enables them afterwards. Change-Id: If592339632aa662b59be09d911229566d5772321 Reviewed-on: http://gerrit.cloudera.org:8080/3339 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Silvius Rus <srus@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2016-06-09 17:31:00 -07:00
Lars Volker	d16e83214a	IMPALA-3581: Change location of minidump folders to log_dir Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on reboot on various distributions. This change moves the default location to FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent name collisions in case of local test clusters and strangely configured installations. For local test clusters the minidumps will be written to $IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}. Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f Reviewed-on: http://gerrit.cloudera.org:8080/3171 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-05-31 23:32:11 -07:00
Lars Volker	df8bf3a965	IMPALA-3490: Add flag to reduce minidump size IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them to write minidump files. This change introduces a flag 'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of thread stack memory it includes in a minidump, aiming to reduce the minidump size during crashes with a lot of threads. Once a minidump is expected to exceed the configured value, breakpad will include the full stack memory for the first 20 threads, and afterwards capture only 2KB of stack memory for each additional thread. Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052 Reviewed-on: http://gerrit.cloudera.org:8080/2990 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:18:04 -07:00
Lars Volker	c9df348c38	IMPALA-2686: Add breakpad crash handler to all daemons This changes add breakpad crash handling support to catalogd, impalad, and statestored. The destination folder for minidump files can be configured via the 'minidump_path' command line flag. Leaving it empty will disable minidump generation. The daemons will rotate minidump files. The number of files to keep can be configured with the 'max_minidumps' command line flag. Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6 Reviewed-on: http://gerrit.cloudera.org:8080/2028 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:52 -07:00

38 Commits