13 Commits

Author SHA1 Message Date
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Michael Smith
7d07192e89 IMPALA-9627: Use universal_newlines for Python 3
Fixes subprocess.check_output calls for Python 3 using
universal_newlines=True.

Change-Id: I3dae9113635cf23ae02f1f630de311e64119c456
Reviewed-on: http://gerrit.cloudera.org:8080/19812
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-28 23:28:49 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Joe McDonnell
2b550634d2 IMPALA-11952 (part 2): Fix print function syntax
Python 3 now treats print as a function and requires
the parenthesis in invocation.

print "Hello World!"
is now:
print("Hello World!")

This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.

print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)

To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function

This also fixes random flake8 issues that intersect with
the changes.

Testing:
 - check-python-syntax.sh shows no errors related to print

Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Fang-Yu Rao
7450c96a76 IMPALA-11133 (Addendum): Encode a string in utf8 before printing it
In the first part of this patch, we decoded a string with 'utf8' in
order to print it (on the command line) since the author field of a
commit could contain non-ASCII characters.

However, we did not take into consideration that in some scenarios,
we would like to redirect the output to another file. If this is the
case, then we may encounter a UnicodeEncodeError due to
sys.stdout.encoding being None. To resolve the issue, we encode the
formatted string with 'utf8'.

Testing:
 - Manually verified that we won't get a UnicodeEncodeError if we
   redirect the output to another file.

Change-Id: Iad9b1fb0a523e219bc9f40a57ff7335808be283f
Reviewed-on: http://gerrit.cloudera.org:8080/18270
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2022-03-05 06:53:01 +00:00
Fang-Yu Rao
8e5a42fff4 IMPALA-11133: Decode author of a commit with utf8 before printing it
We found that compare_branches.py could fail if the author of a commit
contains non-ASCII characters because the script attempts to print the
field. This patch fixes the problem by explicitly decoding the value of
author with the encoding 'utf8'. The commit message is also decoded with
'utf8' to prevent similar problems from happening when there are
non-ASCII characters in the commit message.

Testing:
 - Manually verified that we won't get the UnicodeDecodeError after this
   patch.

Change-Id: Ieb03b0937a994db2bf08e4199574d04f7fb99f5d
Reviewed-on: http://gerrit.cloudera.org:8080/18256
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-02-23 07:15:47 +00:00
stiga-huang
d8ef9562cc Only fetch needed branches in compare_branches.py
Fetching all branches of a remote repo will be time-consuming. This
changes compare_branches.py to only fetch the needed branches, i.e.
source_branch and target_branch.

Tests:
 - Before this change, it can't finish in 30 mins when comparing two
   downstream branches. After this change, it finishes in one minute.

Change-Id: Ia0c70ad4de1fa79498ca32853b6ea99aee2d40a7
Reviewed-on: http://gerrit.cloudera.org:8080/17246
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-01 04:32:30 +00:00
Joe McDonnell
e7fc18c4ea IMPALA-10608: Update kudu-python version and remove some unused packages
This updates kudu-python to version 1.14.0 (from 1.2.0).
As part of this, it disables ccache for bootstrap_virtualenv.py.
ccache wasn't working anyway, because pip install uses random
temporary directories. It also needs to copy a few files to
the build directory for the Kudu install. The advantage to
upgrading is that the new version no longer has a numpy dependency.

Additionally, this modifies a few minor packages:
 - virtualenv moves to the latest version prior to the rewrite
   that accompanied version 20 (i.e. 16.10.7).
 - setuptools moves to the last version that supports python 2.7 (44.1.1)
 - remove botos3, ipython, and ordereddict

These changes speed up installing the virtualenv
Before:
real	3m11.956s
user	2m49.620s
sys	0m14.266s
After:
real    1m38.798s
user    1m33.591s
sys     0m8.112s

Testing:
 - Hand tests, GVO run

Change-Id: Ib47770df9e46de448fe2bffef7abe2c3aa942fb9
Reviewed-on: http://gerrit.cloudera.org:8080/17231
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-31 03:17:24 +00:00
Philip Zeyliger
242e822ae6 Add --partial_ok to compare_branches.py.
This change lets compare_branches.py succeed with a partial set of
cherry-picks. In our case, we sometimes get baacked up with several
commits needing to be cherry-picked. The first few cherry-pick fine, and
there's a problematic commit sometime down the line. By accepting the
first few that cherry-pick fine, someone resolving the conflict can
start at where that conflict begins, rather than having to wrangle more
commits unnecessarily.

I tested this with and without the flag, and confirmed that if the
first commit is problematic for cherry-picks, the command does
fail.

Change-Id: I2a8b34577f9cb74565adf90a2b7d5328bc555f85
Reviewed-on: http://gerrit.cloudera.org:8080/10025
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Philip Zeyliger <philip@cloudera.com>
2018-04-11 22:54:51 +00:00
Lars Volker
772afb0369 Add missing brace to example JSON
Change-Id: Ieabf181a6e052790ebe7034a1e4dd1632644fd8e
Reviewed-on: http://gerrit.cloudera.org:8080/9532
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2018-03-07 20:53:49 +00:00
Philip Zeyliger
5ca603a376 IMPALA-6410: compare_branches: use looser expression
We've already got one use of "Cherry-pick:" instead of "Cherry-picks:"
in master, so I'm loosening the regular expression a bit. (And
converting the string search into a case-insensitive regexp search.)

I tested this by running it manually and inspecting results.

Change-Id: Ie3f75d9e01d2760571547b1a1a5f42bbc8455a05
Reviewed-on: http://gerrit.cloudera.org:8080/9135
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-26 22:45:57 +00:00
Philip Zeyliger
088d6add78 IMPALA-6410: Use subprocess in compare_branches.py.
Switches bin/compare_branches.py to use 'subprocess' instead
of 'sh'. We often use 'sh' in Impala testing code for its
friendly API, but it has to be installed separately. To avoid
automation that is just doing git operations needing to
either build the Impala python environment or otherwise get
extra libraries, I converted the usages.

As a side-effect, the script outputs the stdout of 'git cherry-pick',
whereas it used to swallow it. I like it better this way.

I tested this by running it in an environment which needed
some cherry-picks.

Change-Id: I509a548a129e7ad67aaf800a8ba03cffad51dd81
Reviewed-on: http://gerrit.cloudera.org:8080/9130
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2018-01-25 20:27:59 +00:00
Philip Zeyliger
674d29d71d IMPALA-6410: Tool to cherrypick changes across branches.
This script compares two branches and optionally cherry-picks changes
across. It uses the Gerrit Change-Id as the key, and it supports a
configuration file and a string to ignore commits.

Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
Reviewed-on: http://gerrit.cloudera.org:8080/9045
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2018-01-24 03:34:52 +00:00