Commit Graph

11 Commits

Author SHA1 Message Date
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Joe McDonnell
41a3f4d4ca IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
When using bin/dump_breakpad_symbols.py to dump symbols for RPM/DEB
packages, the script extracts the packages to a temporary directory
and relies on keeping that directory around until the processing
is finished. The parallel processing added in IMPALA-11511 breaks
the logic that keeps the temporary directory around, so the script
generates errors like:

Found debugging info in /tmp/tmpqfZ9MZ/usr/lib/debug/usr/lib/impala/sbin-retail/impalad.debug
Failed to open ELF file '/tmp/tmpqfZ9MZ/usr/lib/debug/usr/lib/impala/sbin-retail/impalad.debug': No such file or directory
Failed to write symbol file.

This turns off parallelism for bin/dump_breakpad_symbols.py when
processing RPM/DEB packages (i.e. -r/--pkg). This also avoids using
a ThreadPool when num_processes <= 1.

Testing:
 - Hand tested with Redhat 7 RPMs

Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Reviewed-on: http://gerrit.cloudera.org:8080/20943
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-01-29 23:33:34 +00:00
Joe McDonnell
3285cfd690 IMPALA-12125: Support for dumping symbols from RPMs without separate symbols
Some RPMs contain binaries with debug symbols with no separate
debuginfo package needed. bin/dump_breakpad_symbols.py does not
allow this combination, as it expects a corresponding symbol
package. This adds a --no_symbol_pkg option to dump_breakpad_symbols.py
to turn off the requirement that --pkg be combined with --symbol_pkg.

Testing:
 - Tested with an RPM package with an unstripped impalad binary
 - Tested with the usual RPM + debuginfo RPM combination

Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b
Reviewed-on: http://gerrit.cloudera.org:8080/20944
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Yida Wu <wydbaggio000@gmail.com>
2024-01-29 17:32:51 +00:00
Joe McDonnell
3bcd770dfc IMPALA-10048: Go parallel for dump_breakpad_symbols.py
This modifies dump_breakpad_symbols.py to use a ThreadPool
to go parallel when there are multiple binaries or
libraries to process. This is common for Jenkins jobs that
dump symbols for all backend tests. The different binaries
write out to different directories, so the threads don't
interfere with each other.

Testing:
 - Ran locally dumping the symbols for all backend tests
 - Ran a Jenkins job that generates a minidump and triggers
   the minidump symbol processing. It went parallel and
   worked fine.

Change-Id: I93427bb07f1d9718bd6df90acfd247210b54294d
Reviewed-on: http://gerrit.cloudera.org:8080/20802
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2024-01-08 18:51:53 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
10c19b1a57 IMPALA-11511: Add build options for reducing binary sizes
Impala's build produces dozens of C++ binaries
that link in all Impala libraries. Each binary is
hundreds of megabytes, leading to 10s of gigabytes
of disk space. A large proportion of this (~80%) is debug
information. The debug information increases in newer
versions of GCC such as GCC 10.

This introduces two options for reducing the size
of debug information:
 - IMPALA_MINIMAL_DEBUG_INFO=true builds Impala with
   minimal debug information (-g1). This contains line tables
   and can resolve backtraces, but it does not contain
   variable information and restricts further debugging.
 - IMPALA_COMPRESSED_DEBUG_INFO=true builds Impala with
   compressed debug information (-gz). This does not change
   the debug information included, but the compression saves
   significant disk space. gdb is known to work with
   compressed debug information, but other tools may not
   support it. The dump_breakpad_symbols.py script has been
   adjusted to handle these binaries.
These are disabled by default.

Release impalad binary sizes:
Configuration                  | Size (bytes) | % reduction over base
Base                           | 707834808    | N/A
Stripped                       |  83351664    | 88%
Minimal debuginfo              | 215924096    | 69%
Compressed debuginfo           | 301619286    | 57%
Minimal + compressed debuginfo | 120886705    | 83%

Testing:
 - Generated minidumps and resolved them
 - Verified this is disabled by default

Change-Id: I04a20258a86053d8f3972b9c7c81cd5bec1bbb66
Reviewed-on: http://gerrit.cloudera.org:8080/18962
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-09-26 17:47:55 +00:00
Joe McDonnell
2357958e73 IMPALA-10304: Fix log level and format for pytests
Recent testing showed that the pytests are not
respecting the log level and format set in
conftest.py's configure_logging(). It is using
the default log level of WARNING and the
default formatter.

The issue is that logging.basicConfig() is only
effective the first time it is called. The code
in lib/python/impala_py_lib/helpers.py does a
call to logging.basicConfig() at the global
level, and conftest.py imports that file. This
renders the call in configure_logging()
ineffective.

To avoid this type of confusion, logging.basicConfig()
should only be called from the main() functions for
libraries. This removes the call in lib/python/impala_py_lib
(as it is not needed for a library without a main function).
It also fixes up various other locations to move the
logging.basicConfig() call to the main() function.

Testing:
 - Ran the end to end tests and custom cluster tests
 - Confirmed the logging format
 - Added an assert in configure_logging() to test that
   the INFO log level is applied to the root logger.

Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588
Reviewed-on: http://gerrit.cloudera.org:8080/16679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 15:32:21 +00:00
Joe McDonnell
56ee90c598 IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.

This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.

Testing:
 - The only impediment to building with
   IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
   Impala-lzo. With a custom Impala-lzo, compilation succeeds.
   Either Impala-lzo will be fixed or it will be removed.
 - Core tests

Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-30 16:25:37 +00:00
Lars Volker
e98c88f233 IMPALA-5110: Add deb support to dump_breakpad_symbols.py
Change-Id: I524d2fe4660551c2fe4ff190b7e5bbb33d986b10
Reviewed-on: http://gerrit.cloudera.org:8080/6462
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-27 22:24:21 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Lars Volker
12799fae6c IMPALA-3489: Add script to extract breakpad symbols from binaries
Change-Id: I3ee0972efcb50609407b04cd6f4309b244a84861
Reviewed-on: http://gerrit.cloudera.org:8080/2961
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 01:30:11 -07:00