Commit Graph

9 Commits

Author SHA1 Message Date
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Michael Brown
2cf66cfc49 IMPALA-8169: small changes to Leopard
- Fix a bug in which rsync --chown doesn't work on CentOS 7.

- Update HOST_TESTDATA_EXTERNAL_VOLUME_PATH (for the minicluster data):
  most runs now are on EC2 etc., and they already need a large volume
  for docker images, so just keep the cluster data there, too.

- Reduce extremely verbose logging.

- Default to a database that's part of dataload (tpch_kudu).

- Change some of the controller variables to my preferred defaults.

Change-Id: I169f60dad53d2e4980ed6bd1f350fb0dcf274306
Testing: Regular downstream runs for months.
Reviewed-on: http://gerrit.cloudera.org:8080/12386
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-07 01:48:13 +00:00
Michael Brown
1cfe77dbaa IMPALA-4427: leopard: make DOCKER_IMAGE_NAME required
This patch now requires users of the Leopard framework to supply a
DOCKER_IMAGE_NAME. The cloudera/impala image isn't being maintained, and
until Apache Impala (incubating) decides to publish its own image, no
default image name is possible. It is still possible to run the Leopard
framework against a homegrown Docker image that has Apache Impala
(incubating) and PostgreSQL installed: simply build such an image and
export the DOCKER_IMAGE_NAME environment variable before running the
controller.

While here, fix some flake8 non-indent problems.

Testing: short Leopard controller / query generator run.

Change-Id: Ic1cb96cb5c9a894f40e0892f3bdd3f3d0158e887
Reviewed-on: http://gerrit.cloudera.org:8080/4936
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Jim Apple <jbapple@cloudera.com>
2016-11-04 20:35:17 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Taras Bobrovytsky
82af47c7af Switch query generator to randomness db by default
Currently the leopard query generator framework uses functional
database by default. This patch switches the default database to
"randomness", which contains more interesting data and wider tables
than functional. This patch makes it possible to optionally generate
and load the "randomness" database on the target machine. There is
also a nested schema parsing bug fix.

Change-Id: Idea0a095b42cc584cf3b52801385c9a991342218
Reviewed-on: http://gerrit.cloudera.org:8080/1987
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-02-03 00:14:28 +00:00
Taras Bobrovytsky
f3d2f6bd7e IMPALA-2898: Fix the leopard framework (qgen)
Some recent commits broke the query generator leopard framework, for
example QueryResultComparator requires a different number of arguments.

Additional changes:
- Added better support for running the query generator in nested types
  mode
- Keeping track of the number of queries that returned data
- Made it easier to control behavior from a central place by adding
  flags to controller.py

Change-Id: I8f47c52097ccd53df4233b88eea887ce5fab1955
Reviewed-on: http://gerrit.cloudera.org:8080/1968
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-30 06:02:09 +00:00
Casey Ching
d202d6a967 Use "impala-python" (virtualenv) instead of system python
Python tests and infra scripts will now use "python" from the virtualenv
via $IMPALA_HOME/bin/impala-python. Some scripts could be simplified now
that python 2.6 and a dependable set of third-party libraries are
available but that is not done as part of this commit.

Change-Id: If1cf96898d6350e78ea107b9026b12ba63a4162f
Reviewed-on: http://gerrit.cloudera.org:8080/603
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-06 02:09:09 +00:00
Taras Bobrovytsky
e33452cb54 Add self-serve framework to the Query Generator
Allows running the query generator continuously. Produces several reports per day. Reports
are presented as a web page which allows the user to conveniently examine the issues in
the report. The user can also start a custom run against a private Impala branch using the
web interface.

Change-Id: If2bfca34904f78f40e15a9c84b1be42fca014b9d
Reviewed-on: http://gerrit.cloudera.org:8080/309
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
2015-06-16 02:39:05 +00:00