9 Commits

Author SHA1 Message Date
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Michael Smith
df2e03ce0b IMPALA-11974: Fix xrange, split in collect_diagnostics.py
Python3 deprecates xrange operator, this commit replaces it with the
range operator similar to earlier replacements in IMPALA-11974.

Adds universal_newlines to Popen so that we return text in Python 3.

Testing:
- Ran '$IMPALA_HOME/bin/diagnostics/collect_diagnostics.py --pid <pid>
  --minidumps 2 1 --minidumps_dir $IMPALA_HOME/logs/cluster/minidumps
  --stacks 2 1' with Python 2/3 and inspected the results.

Change-Id: I52f075825d47613293b106a7c50d4499c19cd3f4
Reviewed-on: http://gerrit.cloudera.org:8080/19746
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-17 21:54:13 +00:00
Bharath Vissapragada
5f5c8612f6 [diagnostics] Make --minidump_dir consistent with Impala's --minidump_path
Currently, the diagnostics script expects a full path to the actual
directory to which process minidumps are written. This is however
incosistent with Impala's configuration --minidump_path.

Impala creates a subdirectory under FLAGS_minidump_path (for ex:
<FLAGS_minidump_path>/impalad) to which it writes the minidumps.

This commit fixes the diagnostic script input --minidump_dir to be
consistent with the above behavior from Impala. It now looks for
minidumps under the directory <--minidump_path>/<process-name>

The users of this script are expected to fix their input args
accordingly.

Change-Id: I9e59f108a1f29a33768a39d0f4554d96e2dcd381
Reviewed-on: http://gerrit.cloudera.org:8080/11353
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
2018-09-04 20:17:37 +00:00
Philip Zeyliger
4ab1b92452 Make collect_diagnostics executable and minor argument-parsing changes.
collect_diagnostics.py was missing a shebang (#!) at its
beginning as well as the executable bit. Since it's
meant to be runnable standalone, I've added those.
I was able to also simplify the argument handling around
the single required --pid argument.

Change-Id: If4d021c2f6f9dec62d6865d32ec0419e41a2441c
Reviewed-on: http://gerrit.cloudera.org:8080/11178
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-14 00:43:23 +00:00
Bharath Vissapragada
3e73645035 Fix diagnostics path to not include the parent dir structure
Without the fix, the diagnostics tar file included the entire
directory structure of the diagnostics root dir.

Before:
=======
$ tar tf /tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh.tar.gz
tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh/
tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh/stacks/
tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh/stacks/jstack-0.txt
....

After:
=====
$ tar tf /tmp/impala-diagnostics-2018-05-08-12-01-51-Y0nlQI.tar.gz
impala-diagnostics-2018-05-08-12-01-51-Y0nlQI/
impala-diagnostics-2018-05-08-12-01-51-Y0nlQI/stacks/
impala-diagnostics-2018-05-08-12-01-51-Y0nlQI/stacks/jstack-0.txt
.....

Tested with python 2.6

Change-Id: I540f6c228a0315780d45cf11961f124478b5dd0c
Reviewed-on: http://gerrit.cloudera.org:8080/10347
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-10 09:29:55 +00:00
Bharath Vissapragada
ce09269fde IMPALA-6747: Automate diagnostics collection.
This commit adds the necessary tooling to automate diagnostics
collection for Impala daemons. Following diagnostics are supported.

1. Native core dump (+ shared libs)
2. GDB/Java thread dump (pstack + jstack)
3. Java heap dump (jmap)
4. Minidumps (using breakpad) *
5. Profiles

Given the required inputs, the script outputs a zip compressed
impala diagnostic bundle with all the diagnostics collected.

The script can be run manually with the following command.

python collect_diagnostics.py --help

Tested with python 2.6 and later.

* minidumps collected here correspond to the state of the Impala
process at the time this script is triggered. This is different
from collect_minidumps.py which archives the entire minidump
directory.

Change-Id: I166e726f1dd1ce81187616e4f06d2404fa379bf8
Reviewed-on: http://gerrit.cloudera.org:8080/10056
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Bharath Vissapragada <bharathv@cloudera.com>
2018-04-13 22:01:33 +00:00
Bharath Vissapragada
cf4f314922 Revert "IMPALA-6747: Automate diagnostics collection."
A couple of things donot work in python2.6
 -- Multiple with statements in the same context
 -- shutil.make_archive()

I need a little more time to test the fix with python2.6.
Meanwhile, reverting this to unblock others. I'll resubmit
the fix when I'm confident that it works with python2.6

This reverts commit 2883c99500.

Change-Id: I221ede9d5eb4d89ea20992cc27a8284803af3223
Reviewed-on: http://gerrit.cloudera.org:8080/9872
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2018-03-30 21:35:48 +00:00
Bharath Vissapragada
2883c99500 IMPALA-6747: Automate diagnostics collection.
This commit adds the necessary tooling to automate diagnostics
collection for Impala daemons. Following diagnostics are supported.

1. Native core dump (+ shared libs)
2. GDB/Java thread dump (pstack + jstack)
3. Java heap dump (jmap)
4. Minidumps (using breakpad) *
5. Profiles

Given the required inputs, the script outputs a zip compressed
impala diagnostic bundle with all the diagnostics collected.

The script can be run manually with the following command.

python collect_diagnostics.py --help

* minidumps collected here correspond to the state of the Impala
process at the time this script is triggered. This is different
from collect_minidumps.py which archives the entire minidump
directory.

Change-Id: Ib29caec7c3be5b6a31e60461294979c318300f64
Reviewed-on: http://gerrit.cloudera.org:8080/9815
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-29 00:12:18 +00:00