Commit Graph

12 Commits

Author SHA1 Message Date
Riza Suminto
3ed2a82a95 IMPALA-14606: Stop building impala-shell for Python 2
This patch stop setting up and building impala-shell for Python 2.
A more thorough clean up will be done in the future.

Testing:
Pass build and test/shell/ in RHEL8.

Change-Id: Ic7d59b283f4e2f011880ff6221d550b52714a538
Reviewed-on: http://gerrit.cloudera.org:8080/23750
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-10 04:40:46 +00:00
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Joe McDonnell
ea0969a772 IMPALA-11980 (part 2): Fix absolute import issues for impala_shell
Python 3 changed the behavior of imports with PEP328. Existing
imports become absolute unless they use the new relative import
syntax. This adapts the impala-shell code to use absolute
imports, fixing issues where it is imported from our test code.

There are several parts to this:
1. It moves impala shell code into shell/impala_shell.
   This matches the directory structure of the PyPi package.
2. It changes the imports in the shell code to be
   absolute paths (i.e. impala_shell.foo rather than foo).
   This fixes issues with Python 3 absolute imports.
   It also eliminates the need for ugly hacks in the PyPi
   package's __init__.py.
3. This changes Thrift generation to put it directly in
   $IMPALA_HOME/shell rather than $IMPALA_HOME/shell/gen-py.
   This means that the generated Thrift code is rooted in
   the same directory as the shell code.
4. This changes the PYTHONPATH to include $IMPALA_HOME/shell
   and not $IMPALA_HOME/shell/gen-py. This means that the
   test code is using the same import paths as the pypi
   package.

With all of these changes, the source code is very close
to the directory structure of the PyPi package. As long as
CMake has generated the thrift files and the Python version
file, only a few differences remain. This removes those
differences by moving the setup.py / MANIFEST.in and other
files from the packaging directory to the top-level
shell/ directory. This means that one can pip install
directly from the source code. i.e. pip install $IMPALA_HOME/shell

This also moves the shell tarball generation script to the
packaging directory and changes bin/impala-shell.sh to use
Python 3.

This sorts the imports using isort for the affected Python files.

Testing:
 - Ran a regular core job with Python 2
 - Ran a core job with Python 3 and verified that the absolute
   import issues are gone.

Change-Id: Ica75a24fa6bcb78999b9b6f4f4356951b81c3124
Reviewed-on: http://gerrit.cloudera.org:8080/22330
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-21 15:14:11 +00:00
Joe McDonnell
aefd1b0920 IMPALA-13551: Produce the shell tarball by pip installing impala-shell
Currently, the shell tarball maintains its own packaging code
and directory layout. This is very complicated and currently has
several Python packages directly checked into our repository.

To simplify it, this changes the shell tarball to be based on
pip installing the pypi package. Specifically, the new directory
structure for an unpack shell tarball is:
impala-shell-4.5.0-SNAPSHOT/
  impala-shell
  install_py${PYTHON_VERSION}/
  install_py${ANOTHER_PYTHON_VERSION}/
For example, install_py2.7 is the Python 2.7 pip install of impala-shell.
install_py3.8 is a Python 3.8 pip install of impala-shell. This means
that the impala-shell script simply picks the install for the
specified version of python and uses that pip install directory.
To make this more consistent across different Linux distributions, this
upgrades pip in the virtualenv to the latest.

With this, ext-py and pkg_resources.py can be removed.

This requires rearranging the shell build code. Specifically, this splits
out the code that generates impala_build_version.py so that it can run
before generating the pypi package. The shell tarball now has a dependency
on the pypi package and must run after it.

This builds on Michael Smith's work from IMPALA-11399.

Testing:
 - Ran shell tests locally
 - Built on Centos 7, Redhat 8 & 9, Ubuntu 20 & 22, SLES 15

Change-Id: Ifbb66ab2c5bc7180221f98d9bf5e38d62f4ac036
Reviewed-on: http://gerrit.cloudera.org:8080/20171
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-17 22:52:01 +00:00
Michael Smith
12325eb7ec IMPALA-12515: Build modules for extra pythons
Adds IMPALA_EXTRA_PACKAGE_PYTHONS to build impala-shell tarball
dependencies for additional Python targets. That can be used to build a
tarball that supports multiple Python 3 minor versions at once.

Updates the impala-shell script to provide a clear error message when
attempting to use the tarball with a Python version that it hasn't been
built for.

Change-Id: I13720a9e3c50f348bef41f5e91f810204e416f13
Reviewed-on: http://gerrit.cloudera.org:8080/20617
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-11-03 16:50:25 +00:00
Joe McDonnell
07d5a93de6 IMPALA-12220: pip install ext-py dependencies in the shell tarball
The impala-shell tarball ships its external dependencies
by building eggs and including them in the ext-py* directories.
On Redhat 9 and Ubuntu 22, the impala-shell tarball encountered
a regression where the sasl package could not access its
Client class:
Error connecting: AttributeError, module 'sasl' has no attribute 'Client'

This only occurs when using eggs (which are zip files). The virtualenv
installs worked fine. Unpacking the eggs and using the content directly
also avoids the problem.

This reworks the shell tarball to instead build wheels and install
them with 'pip install'. This means that the external dependencies
are not packaged in eggs, and this avoids the issue with sasl. This
is a minimal change to avoid the issue until the shell tarball build
can be reworked more extensively.

Testing:
 - Ran shell tests on Redhat 9

Change-Id: I49403979c559b7f8bbe038865c06db6024468d72
Reviewed-on: http://gerrit.cloudera.org:8080/20095
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-21 05:21:01 +00:00
Joe McDonnell
234d641d7b IMPALA-11961/IMPALA-12207: Add Redhat 9 / Ubuntu 22 support
This adds support for Redhat 9 / Ubuntu 22. It updates
to a newer toolchain that has those builds, and it adds
supporting code in bootstrap_system.sh.

Redhat 9 and Ubuntu 22 use python = python3, which requires
various changes to build scripts and tests. Ubuntu 22 uses
Python 3.10, which deprecates certain ssl.PROTOCOL_TLS, so
this adapts test_client_ssl.py to that change until it
can be fully addressed in IMPALA-12219.

Various OpenSSL methods have been deprecated. As a workaround
until these can be addressed properly, this specifies
-Wno-deprecated-declarations. This can be removed once the
code is adapted to the non-deprecated APIs in IMPALA-12226.

Impala crashes with tcmalloc errors unless we update to a newer
gperftools, so this moves to gperftools 2.10. gperftools changed
the default for tcmalloc.aggressive_memory_decommit to off, so
this adapts our code to set it for backend tests. The gperftools
upgrade does not show any performance regression:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.08    | -0.64%     | 2.20       | -0.37%         |
+----------+-----------------------+---------+------------+------------+----------------+

With newer Python versions, the impala-virtualenv command
fails to create a Python 3 virtualenv. This switches to
using Python 3's builtin venv command for Python >=3.6.

Kudu needed a newer version and LLVM required a couple patches.

Testing:
 - Ran a core job on Ubuntu 22 and Redhat 9. The tests run
   to completion without crashing. There are test failures
   that will be addressed in follow-up JIRAs.
 - Ran dockerised tests on Ubuntu 22.
 - Ran dockerised tests on Ubuntu 20 and Rocky 8.5.

Change-Id: If1fcdb2f8c635ecd6dc7a8a1db81f5f389c78b86
Reviewed-on: http://gerrit.cloudera.org:8080/20073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-06-21 05:21:01 +00:00
Joe McDonnell
9fb1274867 IMPALA-12117: Use separate cache dirs for shell pip installs
Pip sporadically hits an error when installing impala-shell into
a virtualenv. An example symptom is this (though the issue is
not specific to thrift):
WARNING: Skipping page https://pypi.org/simple/thrift/ because the
   GET request got Content-Type: Unknown. The only supported
   Content-Types are application/vnd.pypi.simple.v1+json,
   application/vnd.pypi.simple.v1+html, and text/html
ERROR: Could not find a version that satisfies the requirement
   thrift==0.16.0 (from impala-shell) (from versions: none)
ERROR: No matching distribution found for thrift==0.16.0

It appears that this error can occur when two pip processes
are installing into virtualenvs simultaneously and share a
cache directory. This happens for our impala-shell build,
because we are doing pip install for Python 2 and Python 3
simultaneously. The impala-python/impala-python3 virtualenvs
do not use a cache directory and are not impacted.

This changes the shell's pip install to give the Python 2 and
Python 3 separate cache directories. The cache directories are
placed in ~/.cache like the regular pip cache. These do not
consume much space (a couple MB).

Testing:
 - Ran all-build-options-ub2004 ten times without seeing the failure

Change-Id: I3f834b9f8c8cbc09830745ad132677a2fe17e07b
Reviewed-on: http://gerrit.cloudera.org:8080/19813
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-05-08 17:31:06 +00:00
Michael Smith
c5aed4e78e IMPALA-11955: (Addendum) Use impala-python for packaging
Uses impala-python when running packaging scripts to use a known python
version with setuptools available. This supports running on systems
where the `python` binary is available (as Python 2) but doesn't include
setuptools.

In this configuration IMPALA_SYSTEM_PYTHON2_OVERRIDE= is set to disable
building with python2, and only python3 is used for shell packaging.

Also ensures that we use IMPALA_SYSTEM_PYTHON2/3 when using system
python for building.

Testing:
- Manual build with python as a minimal Python 2 install, and Python 3.8
  (including setuptools).

Change-Id: I51c257010ef8fb1790482cdc3315aede908ef095
Reviewed-on: http://gerrit.cloudera.org:8080/19619
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-03-16 00:19:57 +00:00
Gergely Fürnstáhl
1056e16a27 IMPALA-11846: Fix builds with setuptools>=66.0.0
setuptools 66.0.0 introduced a breaking change, it does not support non
PEP440 compliant version names. This breaks impala_shell's packaging and
installing test if the system python3's version is 3.8+.

This is a quick fix to unblock builds. The rest of the work will be done
in IMPALA-11849 (e.g. stabilizing the python environments version).

impala_shell releases should not be affected by this, as the version
number we generate is already PEP440 compliant.

Testing:
 - Built locally with python3.8

Change-Id: I4eb0957fb576e590b86b6fe570216cfb72d11aef
Reviewed-on: http://gerrit.cloudera.org:8080/19431
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-01-19 03:59:23 +00:00
Michael Smith
181fd94068 IMPALA-8373: Test impala-shell with python3
Sets up a python3 virtualenv, installs impala-shell into it, and runs
tests.

Change-Id: I8e123aecd53a7ded44a7da7eb8c8b853cebbfc56
Reviewed-on: http://gerrit.cloudera.org:8080/18588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-06-13 17:13:42 +00:00
Michael Smith
5263d13112 IMPALA-11314: Test PyPI package with system python
Sets up a virtualenv with system python to install the impala-shell PyPI
package into. Using system python provides better coverage for Python
versions likely to be used by customers. Runs impala-shell tests using
the PyPI package to provide better coverage for the artifact customers
will use.

Includes a PyPI install in notests_independent_targets because these
seem to be used for Python testing despite -notests.

Change-Id: I384ea6a7dab51945828cca629860400a23fa0c05
Reviewed-on: http://gerrit.cloudera.org:8080/18586
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-06-13 17:13:42 +00:00