Commit Graph

23 Commits

Author SHA1 Message Date
Joe McDonnell
11396d3146 IMPALA-13384: Only install gcovr deps for coverage builds
IMPALA-13279 upgraded gcovr to 7.2 and moved it from python 2 to
python 3.8. gcovr has several dependencies that require native
compilation, and this increased the cost of initializing the
Python 3 virtualenv substantially:

Without gcovr: 1m43.279s
With gcovr and deps: 6m35.107s

This moves gcovr to its own requirements file and only installs
gcovr if this is a coverage build (detected from the
.cmake_buid_type file).

Testing:
 - Verified that a coverage build does install gcovr and
   produce a report

Change-Id: I1d0fd6d21273053aaf2acee39fcb83d9093d49a2
Reviewed-on: http://gerrit.cloudera.org:8080/21849
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-09-27 00:25:41 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Joe McDonnell
566df80891 IMPALA-11959: Add Python 3 virtualenv
This adds a Python 3 equivalent to the impala-python
virtualenv base on the toolchain Python 3.7.16.
This modifies bootstrap_virtualenv.py to support
the two different modes. This adds py2-requirements.txt
and py3-requirements.txt to allow some differences
between the Python 2 and Python 3 virtualenvs.

Here are some specific package changes:
 - allpairs is replaced with allpairspy, as allpairs did
   not support Python 3.
 - requests is upgraded slightly, because otherwise is has issues
   with idna==2.8.
 - pylint is limited to Python 3, because we are adding it
   and don't need it on both
 - flake8 is limited to Python 2, because it will take
   some work to switch to a version that works on Python 3
 - cm_api is limited to Python 2, because it doesn't support
   Python 3
 - pytest-random does not support Python 3 and it is unused,
   so it is removed
 - Bump the version of setuptool-scm to support Python 3

This adds impala-pylint, which can be used to do further
Python 3 checks via --py3k. This also adds a bin/check-pylint-py3k.sh
script to enforce specific py3k checks. The banned py3k warnings
are specified in the bin/banned_py3k_warnings.txt. This is currently
empty, but this can ratchet up the py3k strictness over time
to avoid regressions.

This pulls in a new toolchain with the fix for IMPALA-11956
to get Python 3.7.16.

Testing:
 - Hand tested that the allpairs libraries produce the
   same results
 - The python3 virtualenv has no influence on regular
   tests yet

Change-Id: Ica4853f440c9a46a79bd5fb8e0a66730b0b4efc0
Reviewed-on: http://gerrit.cloudera.org:8080/19567
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Michael Smith
0c72c98f91 IMPALA-9627: Update utility scripts for Python 3
Updates utility scripts that don't use impala-python to work with Python
3 so we can build on systems that don't include Python 2 (such as SLES
15 SP4).

Primarily adds 'universal_newlines=True' to subprocess calls so they
return text rather than binary data in Python 3 with a change that's
compatible with Python 2.

Testing:
- built in SLES 15 SP4 container with Python 3

Change-Id: I7f4ce71fa1183aaeeca55d0666aeb113640c5cf2
Reviewed-on: http://gerrit.cloudera.org:8080/19559
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-03-01 04:53:49 +00:00
yx91490
f566e7dee7 IMPALA-10994: Normalize the pip package name part of download URL.
According to PEP-0503, pip repo server doesn't support unnormalized URL
access, and some package name within
'infra/python/deps/*requirements.txt' are unnormalized, e.g. 'Cython',
and pip_download.py will concat $PYPI_MIRROR and package name to get
download URL directly, which maybe unnormalized.

Fix this by normalize package name in download URL using the
recommanded method in PEP-0503.

Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864
Reviewed-on: http://gerrit.cloudera.org:8080/17987
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-11-26 08:17:10 +00:00
Joe McDonnell
1142c7b58e IMPALA-10606: Simplify impala-python virtualenv bootstrapping
Bootstrapping the impala-python virtualenv requires multiple
rounds of pip installs with different sets of requirements.
This consolidates the requirements.txt, stage2-requirements.txt,
and compiled-requirements.txt into a single requirements.txt.
This will make it easier to upgrade python packages.

This also splits out setuptools into its own
setuptools-requirements.txt. Setuptools is used during the
pip install for several of the dependencies. Recent versions
of setuptools do not support Python 2, but some of the install
tools (like easy_install) don't know how to pick a version
of setuptools that works with Python 2. Splitting it out to its
own requirements file lets us pin the version.

To make review easier, this does not change any of the versions
of the dependencies. It also leaves the stage2-requirements.txt
and compiled-requirements.txt split out in separate sections
of requirements.txt. These will later be turned into a single
alphabetical list.

Testing:
 - Tested impala-python locally
 - Ran GVO

Change-Id: I8e920e5a257f1e0613065685078624a50d59bf2e
Reviewed-on: http://gerrit.cloudera.org:8080/17226
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2021-03-31 03:17:24 +00:00
guojingfeng
7baa31ea04 IMPALA-10093: Replace urllib with wget to download python deps
When build impala in Company internal network, pip_download.py
failed to download dependency eggs from https engpoint Although
correcly set system proxy like http_proxy, https_proxy. Is is
a issue of python2's urllib. I just replace urllib with wget
which can works well with system proxy like https_proxy.

Change-Id: I146d93312701fd682420cb65cf4738bc030f3cfb
Reviewed-on: http://gerrit.cloudera.org:8080/16344
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-11 16:43:35 +00:00
Akshesh
e4388740f7 Make infra/python compatible with both Python 2 & 3
Change-Id: If4285a021bb581f88425daa52ef8a3f844017d82
Reviewed-on: http://gerrit.cloudera.org:8080/13070
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Akshesh Doshi <aksheshdoshi@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-25 20:43:45 +00:00
Philip Zeyliger
202807e2ff Speed up Python dependencies.
This parallelizes downloading some Python libraries, giving a speedup of
$IMPALA_HOME/infra/python/deps/download_requirements.  I've seen this
take from 7-15 seconds before and from 2-5 seconds after.

I also checked that we always have at least Python 2.6 when
building Impala, so I was able to remove the try/except
handling in bootstrap_toolchain.

Change-Id: I7cbf622adb7d037f1a53c519402dcd8ae3c0fe30
Reviewed-on: http://gerrit.cloudera.org:8080/10234
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-01 22:12:39 +00:00
Philip Zeyliger
eaf66172df IMPALA-6863: Make pip_download.py honor redirects.
As part of our continuing woes with PyPi infrastructure, we've now seen
redirects. Following redirects seems like the right thing to do, so I've
changed the downloader code to follow them.

I checked that this is available in Python 2.6.

The build failure signature looks like:

   Downloading AllPairs-2.0.1.tar.gz from cb85d029b3/AllPairs-2.0.1.tar.gz
   ('http error', 302, 'Found', <httplib.HTTPMessage instance at 0x7fbf7819b050>)
   Download failed after several attempts.
   Warning: Unable to download Python requirements.
   Warning: bootstrap_virtualenv or other Python-based tooling may fail.

Change-Id: Ic7551cec43a2d378df7e3cc7d521ace338b56ba2
Reviewed-on: http://gerrit.cloudera.org:8080/10083
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Philip Zeyliger <philip@cloudera.com>
Reviewed-by: Lars Volker <lv@cloudera.com>
2018-04-17 19:49:55 +00:00
Lars Volker
37565e3812 IMPALA-6731: Use private index in bootstrap_virtualenv
This change switches to using a private pypi index url when using a
private pypi mirror. This allows to run the tests without relying on the
public Python pypi mirrors.

Some packages can not detect their dependencies correctly when they get
installed together with the dependencies in the same call to pip. This
change adds a second stage of package installation to separate these
packages from their dependencies.

It also adds a few missing packages and updates some packages to newer
versions.

Testing: Ran this on a box where I blocked DNS resolution to Python's
upstream pypi.

Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b
Reviewed-on: http://gerrit.cloudera.org:8080/9798
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-03-26 22:45:03 +00:00
Lars Volker
b1ef7de0e7 IMPALA-6695: Fix PyPi regex, update setuptools version
pytest-runner, which is required by kudu-python requires are more recent
version of setuptools. Adding an explicit dependency required an update
to the regular expression to parse PyPi URLs.

Change-Id: Ia67189f81a31a9a5a0ed80cd4d6661762ef427b2
Reviewed-on: http://gerrit.cloudera.org:8080/9711
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-03-18 16:39:32 +00:00
Tianyi Wang
af6769d95a IMPALA-6690: Fix pip_download.py on python 2.6
IMPALA-6682 used set literal syntax in pip_download.py, which is
introduced in python 2.7. This patch changes it to set constructor.

It's tested on python 2.6.9.

Change-Id: I82b4116ee056f605c8aadf39a8b92b78313cb8bf
Reviewed-on: http://gerrit.cloudera.org:8080/9694
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-17 01:32:22 +00:00
Tianyi Wang
8dde41e802 IMPALA-6682: Remove MD5 assumption from pypi download script
pip_download.py assumes the python repository to use md5 as the hash
algorithm, which is not required by PEP-503 and not always true in
reality. This patch removes this assumption and enables support of all
hash algorithms in python hashlib.

Testing: buildall.sh works with 2 repos. One uses md5 and another uses
sha-256.

Change-Id: Ie78f851490cbab10daa654aece36dab6e6c4329b
Reviewed-on: http://gerrit.cloudera.org:8080/9683
Reviewed-by: Tianyi Wang <twang@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-16 03:39:33 +00:00
Sailesh Mukil
50bd015f2d IMPALA-5333: Add support for Impala to work with ADLS
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Added another dependency to bootstrap_build.sh for the ADLS Python
client.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
2017-05-25 19:35:24 +00:00
Taras Bobrovytsky
4a79c9e7e3 IMPALA-5181: Extract PYPI metadata from a webpage
There were some build failures due to a failure to download a JSON file
containing package metadata from PYPI. We need to switch to downloading
this from a PYPI mirror. In order to be able to download the metadata
from a PYPI mirror, we need be able to extract the data from a web page,
because PYPI mirrors do not always have a JSON interface.

We implement a regex based html parser in this patch. Also, we increase
the number of download attempts and randomly vary the amount of time
between each attempt.

Testing:
- Tested locally against PYPI and a PYPI mirror.
- Ran a private build that passed (which used a PYPI mirror).

Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
Reviewed-on: http://gerrit.cloudera.org:8080/6579
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-08 00:19:08 +00:00
Tim Armstrong
c8e15e484c IMPALA-4593,IMPALA-4635: fix some python build issues
Build C/C++ packages with toolchain GCC to avoid ABI compatibility
issues. This requires a multi-step bootstrapping process:
1. install basic non-C/C++ packages into the virtualenv
2. use Python 2.7 from the virtualenv to bootstrap the toolchain
3. use toolchain gcc to build C/C++ packages
4. build the kudu-python package with toolchain gcc and Cython

To avoid potentially pulling in cached versions of packages
built with a different compiler, this patch also disables pip's
caching. This should not have a significant effect on performance
since we've enabled ccache and cache downloaded packages in
infra/python/deps.

Improve bootstrapping time significantly by using ccache and by
parallelising the numpy build - the most expensive part of the
install process. On a system with a warmed-up ccache,
bootstrapping after deleting infra/python/env takes 1m16s. Previously
it could take over 5m.

Testing:
Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI
problem mentioned in IMPALA-4593. Initially "import kudu" failed
in my dev environment. After deleting infra/python/env and
re-bootstrapping, "import kudu" succeeded.

Also ran the standard test suite on CentOS 6 and built Impala on
a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7;
Ubuntu12.04,14.04,16.04) to make sure nothing broke.

Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf
Reviewed-on: http://gerrit.cloudera.org:8080/6218
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-07 02:56:18 +00:00
Tim Armstrong
51b1310681 IMPALA-3872: allow providing PyPi mirror for python packages
We still rely on the python.org json API, which doesn't seem to be
mirrored (instead there's a html-based index format implemented by
the mirrors).

The mirror can be provided by setting the PYPI_MIRROR environment
variable. The default is "https://pypi.python.org".

Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Reviewed-on: http://gerrit.cloudera.org:8080/4770
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-08 05:34:50 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Lars Volker
b94f88a697 IMPALA-3886: Improve log of pip_download.py
pip_download.py prints the following line for each dependency that is
already up-to-date:

File with matching md5sum already exists, skipping download.

This change adds the filename to the message so it is more useful.

Change-Id: Ie3d81743814be37ee8ddbe04c264ed2bf37410f9
Reviewed-on: http://gerrit.cloudera.org:8080/3687
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-07-21 08:30:51 -07:00
Taras Bobrovytsky
baf8fe202c IMPALA-3778: Fix ASF packaging build
The tarballs in IMPALA_HOME/infra/python/deps and the thirdparty
directory have been removed in the ASF repository. All Python
dependencies and CDH components must now be downloaded as part of every
build. This caused the ASF packaging build to fail. Before this patch,
we used the system pip to download the Python dependencies, which caused
flakiness and inconsistency on different operating systems. This patch
fixes the problem by using our own script (which requires Python 2.6+ to
be installed on the system), to download all the files in
requirements.txt.

Also replaced all whl and zip Python packages with tar.gz to make it
consistent with the ASF build.

Change-Id: Ibe5a743096cda2059bd330805d324983f6730e19
Reviewed-on: http://gerrit.cloudera.org:8080/3647
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
2016-07-14 19:04:45 +00:00
Tim Armstrong
fc3ff1c52f IMPALA-3763: download_requirements fixes
* Download to infra/python/deps instead of the current directory.
* Download the correct virtualenv version, to match the version on
  cdh5-trunk
* Don't re-download packages repeatedly, instead check the md5sum.

Testing:
Tested manually on the ASF tree, then made sure that
bootstrap_virtualenv completed successfully to make sure we had all
of the requirements downloaded successfully.

Change-Id: I5a3c42236dddfd8a456c82605dc1fdc199a2bc48
Reviewed-on: http://gerrit.cloudera.org:8080/3416
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-06-21 00:37:54 -07:00
Tim Armstrong
ec3a1c7866 download_requirements should download kudu-python and virtualenv
This is required for the ASF migration, since we don't want to include
all of the tarballs in the repo and we want to allow developers to build
using dependencies obtained from the standard upstream sources.

Also remove a workaround for an old issue with building an impyla
development version package.

Change-Id: Ie9216596db0f37d706ea7f77c129cecd5b070429
Reviewed-on: http://gerrit.cloudera.org:8080/3217
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-06-13 17:32:27 -07:00