To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3
This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
doesn't have a main function, it removes the hash-bang and makes
sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
(or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
replaced by the cm-client pypi package and interfaces have changed.
Rather than migrating the code (which hasn't been used in years), this
deletes the old code and stops installing cm-api into the virtualenv.
The code can be restored and revamped if there is any interest in
interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
bit-rotted. Some pieces can be run manually, but it can't be fully
verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
version that supports Python 3. The newest version of kazoo requires
upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
needing other upgrades.
The two remaining uses of impala-python are:
- bin/cmake_aux/create_virtualenv.sh
- bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.
The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)
Testing:
- Ran core job
- Ran build + dataload on Centos 7, Redhat 8
- Manual testing of individual scripts (except some bitrotted areas like the
random query generator)
Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Python 3 now treats print as a function and requires
the parenthesis in invocation.
print "Hello World!"
is now:
print("Hello World!")
This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.
print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)
To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function
This also fixes random flake8 issues that intersect with
the changes.
Testing:
- check-python-syntax.sh shows no errors related to print
Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Initializing the impala-python virtualenv takes a couple minutes,
so it is useful to do that in parallel to the rest of the build.
This moves the impala-python initialization to its own step
in the CMake build. It stops using impala-python for commands
invoked from buildall.sh or the CMake build to avoid premature
or concurrent initializations of impala-python. Then, it adds
a dedicated step to initialize impala-python.
Testing:
- Ran a core job and a couple builds
- Rebuilt and verified that impala-python is not reinitialized
if it is already initialized
Change-Id: Ieff51263c55bd234028fed7101c94b4a928590f0
Reviewed-on: http://gerrit.cloudera.org:8080/16607
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
gen_build_version.py now specifies --git-dir explicitly, to avoid
fetching the revision of a containing git directory that's not an Impala
checkout.
This came up when building Impala without a git directory, but within a
build system that happens to have a git directory higher up in the
directory tree.
I tested this by running the script manually and observing
it works identically in the normal case.
Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd
Reviewed-on: http://gerrit.cloudera.org:8080/8500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
It was noticed that some build processes did not checkout Impala and
instead built it from a tarball. This would cause our gen_build_version
script to write a blank version info everytime to the version.info file.
This patch takes care of the case where if there is an already existing
version.info file and we cannot get the git rev-parse output, we use
the old file instead. Blank version info is written only when we don't
have an old version.info file and we cannot do a git rev-parse.
Change-Id: Id7af33502bbb70185dc15ffca6219436a616f25b
Reviewed-on: http://gerrit.cloudera.org:8080/4411
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Sailesh Mukil <sailesh@cloudera.com>
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
gen_build_version.sh previously had a --noclean option which did not
overwrite the version information if it was already populated. Since
--noclean was the default option, it always never updated the version
information.
This patch modifies gen_build_version.py to generate a
common/version.cc instead of a common/version.h. Now,
common/version.h will be a part of the git repo and will not need to
be modified on every build. It declares the functions that will return
the build information. These functions will be defined in
common/version.cc and the build information will change on every new
build.
Since only the .cc file changes on every build, we will not incur a
highly noticable change in build times.
Also changed the function names from GetImpalaBuild...() to
GetImpaladBuild...() so as to avoid naming confusion between the
Impala-lzo and the Impala functions.
There is an accompanying change in the Impala-lzo library too.
Change-Id: Ie461110b6f8ca545f04ea33b7b502aea550b8551
Reviewed-on: http://gerrit.cloudera.org:8080/2651
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Sailesh Mukil <sailesh@cloudera.com>
Python tests and infra scripts will now use "python" from the virtualenv
via $IMPALA_HOME/bin/impala-python. Some scripts could be simplified now
that python 2.6 and a dependable set of third-party libraries are
available but that is not done as part of this commit.
Change-Id: If1cf96898d6350e78ea107b9026b12ba63a4162f
Reviewed-on: http://gerrit.cloudera.org:8080/603
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Before this patch the -noclean option had almost no effect on the BE build time because
some source files were re-generated with .py scripts regardless.
This change allows ./buildall -skiptests -noclean to do a true incremental rebuild.
Change-Id: Ib3af85db05bdc96a2279a22c1d49d735f2cabd4e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1394
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1415
The packaging builds are done in a way that doesn't allow for us to query the
git hash at the time of build. The fix is to generate a version file before
build time and use that to populate the changes. This changes adds a
save-version.sh script that is run before the package build starts. This change also consolidates the version info gathering between core impala and the CLI.