Commit Graph

16 Commits

Author SHA1 Message Date
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Michael Smith
8b2598cd70 IMPALA-12485: Remove Python 2 has_key
Switch calls to dict#has_key (Python 2-only) for 'key in dict' syntax.

Change-Id: I08e9f6667011d70ceddbf919a61d1be7d6e07ee4
Reviewed-on: http://gerrit.cloudera.org:8080/20541
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 00:44:23 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Joe McDonnell
2b550634d2 IMPALA-11952 (part 2): Fix print function syntax
Python 3 now treats print as a function and requires
the parenthesis in invocation.

print "Hello World!"
is now:
print("Hello World!")

This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.

print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)

To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function

This also fixes random flake8 issues that intersect with
the changes.

Testing:
 - check-python-syntax.sh shows no errors related to print

Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Andrew Sherman
acea702276 IMPALA-8536: Add Scalable Pool Configuration to Admission Controller.
Add 3 configuration parameters to Admission Controller that scale with
the number of hosts in the resource pool. These parameters are specified
to the Impalad through the -llama_site_path flag which points to a Llama
XML configuration file.

The new configuration parameters are:
+ Max Running Queries Multiple - this floating point number is
  multiplied by the current total number of executors at runtime to give
  the maximum number of concurrently running queries allowed in the
  pool. This calculation is rounded up to the nearest integer so the
  result will always be at least one as long as the parameter is
  non-zero.
+ Max Queued Queries Multiple - this floating point number is multiplied
  by the current total number of executors at runtime to give the
  maximum number of queries that can be queued in the pool. This
  calculation is rounded up to the nearest integer so the result will
  always be at least one as long as the parameter is non-zero.
+ Max Memory Multiple - this number of bytes is multiplied by the
  current total number of executors at runtime to give the maximum
  memory available across the cluster for the pool.
If any of these parameters have zero value then they will be ignored.
In this case the corresponding non-scalable parameters will be used, if
they are set.

The new parameters are exposed through the webui.

At various points in the code Admission Controller looks at the Pool
Config objects to find non-scalable parameters such as the max number of
queries that can run in the pool. These access have been encapsulated in
functions that return the scalable version of the configuration value if
the new scalable parameters are being used. Diagnostic messages are
enhanced to show the origin of the encapsulated parameters.

TESTING

All end-to-end tests are running clean with ASAN.

The unit test admission-controller-test.cc has been expanded to test the
newly added code.

Added an end-to-end test that adds and removes Impalads from a
minicluster.

Change-Id: If47508728124076f3b9200c27cffc989f7a4f188
Reviewed-on: http://gerrit.cloudera.org:8080/13307
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-19 05:58:10 +00:00
Lars Volker
dba30cc29b IMPALA-4330: Fix JSON syntax in generate_metrics.py
The hardcoded JSON string in MDL_BASE had a superfluous comma, that
tripped both the simplejson and json parsers. This change removes it so
the string works with both parsers.

Change-Id: I98456df28d48ed22cefcc570e88df78fdf441c23
Reviewed-on: http://gerrit.cloudera.org:8080/4887
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-02 20:01:58 +00:00
Tim Armstrong
ee2a06d827 Remove Llama dependency
This change prevents us from depending on LLAMA to build.

Note that the LLAMA MiniKDC is left in - it is a test
utility that does not depend on LLAMA itself.
IMPALA-4292 tracks cleaning this up.

Testing:
Ran a private build to verify that all tests pass.

Change-Id: If2e5e21d8047097d56062ded11b0832a1d397fe0
Reviewed-on: http://gerrit.cloudera.org:8080/4739
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-10-18 16:35:58 +00:00
Lars Volker
ef4c9958d0 IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo
This change removes some of the occurrences of the strings 'CDH'/'cdh'
from the Impala repository. References to Cloudera-internal Jiras have
been replaced with upstream Jira issues on issues.cloudera.org.

For several categories of occurrences (e.g. pom.xml files,
DOWNLOAD_CDH_COMPONENTS) I also created a list of follow-up Jiras to
remove the occurrences left after this change.

Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8
Reviewed-on: http://gerrit.cloudera.org:8080/4187
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-10-13 00:40:41 +00:00
Thomas Tauber-Marshall
b2c2fe7813 IMPALA-3786: Replace "cloudera" with "apache" (part 2)
As part of the ASF transition, we need to replace references to
Cloudera in Impala with references to Apache. This primarily means
changing Java package names from com.cloudera.impala.* to
org.apache.impala.*

A prior patch renamed all the files as necessary, and this patch
performs the actual code changes. Most of the changes in this patch
were generated with some commands of the form:

find . | grep "\.java\|\.py\|\.h\|\.cc" | \
  xargs sed -i s/'com\(.\)cloudera\(\.\)impala/org\1apache\2impala/g

along with some manual fixes.

After this patch, the remaining references to Cloudera in the repo
mostly fall into the categories:
- External components that have cloudera in their own package names,
  eg. com.cloudera.kudu/llama
- URLs, eg. https://repository.cloudera.com/

Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2
Reviewed-on: http://gerrit.cloudera.org:8080/3937
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-09-29 21:14:13 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
ishaan
ba78337329 Bump version to 2.5.0-cdh5-INTERNAL
Change-Id: I4213571f59a19959fb9ff18f4ebeddb52e761c89
Reviewed-on: http://gerrit.cloudera.org:8080/809
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-11-26 00:59:25 +00:00
Matthew Jacobs
b338034376 Lower-case units in generated metrics mdl file
The mdl file will be consumed by CM. They have asked for the units
to be lower-case.

Change-Id: Iacc583ff2c1680ec02a41feab558fbb2890d95be
Reviewed-on: http://gerrit.cloudera.org:8080/499
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-07-01 20:18:37 +00:00
Matthew Jacobs
cbedd03d9f Add option in generate_metrics.py to output CM mdl
Adds support in the script generate_metrics.py to produce
a CM compatible metric definition (MDL) file.

Fixes some metrics missing descriptions and changing some
metrics created as gauges that are really counters.

TODO: Support histograms, stats, and metric defs with args

Change-Id: I3ebb45145035facab5d4408118150f8c8eb8786a
Reviewed-on: http://gerrit.cloudera.org:8080/423
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-06-11 02:58:33 +00:00
Matthew Jacobs
f37682a16f Fix packaging build for Python 2.4
cgroups.py was using unsupported "except <Exception> as <var>" syntax.

generate_metrics.py was using the json module which is not available
in Python 2.4, but contains simplejson which provides the same
functionality.

Change-Id: If2c176c15a9573dd2a2acf5ee459ff24ce891ce3
Reviewed-on: http://gerrit.cloudera.org:8080/396
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2015-05-19 17:13:33 +00:00
Matthew Jacobs
fe87bb1563 Add MetricDefs, static definitions of metric metadata generated from json
Adds a static definition of the metric metadata used by Impala. The
metric names, descriptions, and other properties are defined in
common/thrift/metrics.json file, and the generate_metrics.py script
creates a thrift representation. The metric definitions are then
available in a constant map which is used at runtime to instantiate
metrics, looking them up in the map by the metric key.

New metrics should be defined by adding an entry to the list of metrics
in metrics.json with the following properties:

key:         The unique string identifying the metric. If the metric can
             be templated, e.g. rpc call duration, it may be a format
             string (in the format used by strings::Substitute()).
description: A text description of the metric. May also be a format
             string.
label:       A brief title for the metric, not currently used by
             Impala but provided for external tools.
units:       The unit of the metric. Must be a valid value of TUnit.
kind:        The kind of metric, e.g. GAUGE or COUNTER. Must be a valid
             value of TMetricKind.
contexts:    The context in which this metric may be instantiated.
             Usually "IMPALAD", "STATESTORED", "CATALOGD", but may be
             a different kind of 'entity'. Not currently used by
             Impala but provided for modeling purposes for external
             tools.

For example, adding the counter for the total number of queries run over
the lifetime of the impalad process might look like:

  {
    "key": "impala-server.num-queries",
    "description": "The total number of queries processed.",
    "label": "Queries",
    "units": "UNIT",
    "kind": "COUNTER",
    "contexts": [
      "IMPALAD"
    ]
  }

TODO: Incorporate 'label' into the metrics debug page.
TODO: Verify the context at runtime, e.g. verify 'contexts' contains,
      e.g. a DCHECK.

After the metric definition is added, the generate_metrics.py script
will generate the TMetricDefs.thrift that contains a TMetricDef for
the metric definition. At runtime, the metric can be instantiated
using the key defined in metrics.json. Gauges, Counters, and
Properties are instantiated using static methods on MetricGroup. Other
metric types are instantiated using static CreateAndRegister methods
on their associated classes.

TODO: Generate a thrift enum used to lookup metric defs.
TODO: Consolidate the instantiation of metrics that are created
      outside of metrics.h (i.e. collection metrics, memory metrics).
TODO: Need a better way to verify if metric definitions are missing.

Change-Id: Iba7f94144d0c34f273c502ce6b9a2130ea8fedaa
Reviewed-on: http://gerrit.cloudera.org:8080/330
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-05-14 21:27:28 +00:00