IMPALA-9626: Use Python from the toolchain for Impala

Historically Impala used the Python2 version that was available on
the hosting platform, as long as that version was at least v2.6.
This caused constant headache as all Python syntax had to be kept
compatible with Python 2.6 (for Centos 6). It also caused a recent problem
on Centos 8: here the system Python version was compiled with the
system's GCC version (v8.3), which was much more recent than the Impala
standard compiler version (GCC 4.9.2). When the Impala virtualenv was
built, the system Python version supplied C compiler switches for models
containing native code that were unknown for the Impala version of GCC,
thus breaking virtualenv installation.

This patch changes the Impala virtualenv to always use the Python2
version from the toolchain, which is built with the toolchain compiler.

This ensures that
- Impala always has a known Python 2.7 version for all its scripts,
- virtualenv modules based on native code will always be installable, as
  the Python environment and the modules are built with the same compiler
  version.

Additional changes:
- Add an auto-use fixture to conftest.py to check that the tests are
  being run with Python 2.7.x
- Make bootstrap_toolchain.py independent from the Impala virtualenv:
  remove the dependency on the "sh" library

Tests:
- Passed core-mode tests on CentOS 7.4
- Passed core-mode tests in Docker-based mode for centos:7
  and ubuntu:16.04

Most content in this patch was developed but not published earlier
by Tim Armstrong.

Change-Id: Ic7b40cef89cfb3b467b61b2d54a94e708642882b
Reviewed-on: http://gerrit.cloudera.org:8080/15624
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Laszlo Gaal
2020-03-18 18:03:24 +01:00
committed by Impala Public Jenkins
parent 21aa514353
commit c97191b6a5
5 changed files with 67 additions and 30 deletions

View File

@@ -46,6 +46,7 @@ import tarfile
import tempfile
import textwrap
import urllib
from bootstrap_toolchain import ToolchainPackage
LOG = logging.getLogger(os.path.splitext(os.path.basename(__file__))[0])
@@ -83,7 +84,7 @@ def create_virtualenv():
for member in file.getmembers():
file.extract(member, build_dir)
file.close()
python_cmd = detect_python_cmd()
python_cmd = download_toolchain_python()
exec_cmd([python_cmd, find_file(build_dir, "virtualenv*", "virtualenv.py"), "--quiet",
"--python", python_cmd, ENV_DIR])
shutil.rmtree(build_dir)
@@ -189,21 +190,23 @@ def find_file(*paths):
return files[0]
def detect_python_cmd():
'''Returns the system command that provides python 2.6 or greater.'''
paths = os.getenv("PATH").split(os.path.pathsep)
for cmd in ("python", "python27", "python2.7", "python-27", "python-2.7", "python26",
"python2.6", "python-26", "python-2.6"):
for path in paths:
cmd_path = os.path.join(path, cmd)
if not os.path.exists(cmd_path) or not os.access(cmd_path, os.X_OK):
continue
exit = subprocess.call([cmd_path, "-c", textwrap.dedent("""
import sys
sys.exit(int(sys.version_info[:2] < (2, 6)))""")])
if exit == 0:
return cmd_path
raise Exception("Could not find minimum required python version 2.6")
def download_toolchain_python():
'''Grabs the Python implementation from the Impala toolchain, using the machinery from
bin/bootstrap_toolchain.py
'''
toolchain_root = os.environ.get("IMPALA_TOOLCHAIN")
if not toolchain_root:
raise Exception(
"Impala environment not set up correctly, make sure $IMPALA_TOOLCHAIN is set.")
package = ToolchainPackage("python")
package.download()
python_cmd = os.path.join(package.pkg_directory(), "bin/python")
if not os.path.exists(python_cmd):
raise Exception("Unexpected error bootstrapping python from toolchain: {0} does not "
"exist".format(python_cmd))
return python_cmd
def install_deps():