impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	1913ab46ed	IMPALA-14501: Migrate most scripts from impala-python to impala-python3 To remove the dependency on Python 2, existing scripts need to use python3 rather than python. These commands find those locations (for impala-python and regular python): git grep impala-python \| grep -v impala-python3 \| grep -v impala-python-common \| grep -v init-impala-python git grep bin/python \| grep -v python3 This removes or switches most of these locations by various means: 1. If a python file has a #!/bin/env impala-python (or python) but doesn't have a main function, it removes the hash-bang and makes sure that the file is not executable. 2. Most scripts can simply switch from impala-python to impala-python3 (or python to python3) with minimal changes. 3. The cm-api pypi package (which doesn't support Python 3) has been replaced by the cm-client pypi package and interfaces have changed. Rather than migrating the code (which hasn't been used in years), this deletes the old code and stops installing cm-api into the virtualenv. The code can be restored and revamped if there is any interest in interacting with CM clusters. 4. This switches tests/comparison over to impala-python3, but this code has bit-rotted. Some pieces can be run manually, but it can't be fully verified with Python 3. It shouldn't hold back the migration on its own. 5. This also replaces locations of impala-python in comments / documentation / READMEs. 6. kazoo (used for interacting with HBase) needed to be upgraded to a version that supports Python 3. The newest version of kazoo requires upgrades of other component versions, so this uses kazoo 2.8.0 to avoid needing other upgrades. The two remaining uses of impala-python are: - bin/cmake_aux/create_virtualenv.sh - bin/impala-env-versioned-python These will be removed separately when we drop Python 2 support completely. In particular, these are useful for testing impala-shell with Python 2 until we stop supporting Python 2 for impala-shell. The docker-based tests still use /usr/bin/python, but this can be switched over independently (and doesn't impact impala-python) Testing: - Ran core job - Ran build + dataload on Centos 7, Redhat 8 - Manual testing of individual scripts (except some bitrotted areas like the random query generator) Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc Reviewed-on: http://gerrit.cloudera.org:8080/23468 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-22 16:30:17 +00:00
Joe McDonnell	c5a0ec8bdf	IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package This puts all of the thrift-generated python code into the impala_thrift_gen package. This is similar to what Impyla does for its thrift-generated python code, except that it uses the impala_thrift_gen package rather than impala._thrift_gen. This is a preparatory patch for fixing the absolute import issues. This patches all of the thrift files to add the python namespace. This has code to apply the patching to the thirdparty thrift files (hive_metastore.thrift, fb303.thrift) to do the same. Putting all the generated python into a package makes it easier to understand where the imports are getting code. When the subsequent change rearranges the shell code, the thrift generated code can stay in a separate directory. This uses isort to sort the imports for the affected Python files with the provided .isort.cfg file. This also adds an impala-isort shell script to make it easy to run. Testing: - Ran a core job Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0 Reviewed-on: http://gerrit.cloudera.org:8080/20169 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-15 17:03:02 +00:00
Michael Smith	8b2598cd70	IMPALA-12485: Remove Python 2 has_key Switch calls to dict#has_key (Python 2-only) for 'key in dict' syntax. Change-Id: I08e9f6667011d70ceddbf919a61d1be7d6e07ee4 Reviewed-on: http://gerrit.cloudera.org:8080/20541 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-10 00:44:23 +00:00
Michael Smith	0a42185d17	IMPALA-9627: Update utility scripts for Python 3 (part 2) We're starting to see environments where the system Python ('python') is Python 3. Updates utility and build scripts to work with Python 3, and updates check-pylint-py3k.sh to check scripts that use system python. Fixes other issues found during a full build and test run with Python 3.8 as the default for 'python'. Fixes a impala-shell tip that was supposed to have been two tips (and had no space after period when they were printed). Removes out-of-date deploy.py and various Python 2.6 workarounds. Testing: - Full build with /usr/bin/python pointed to python3 - run-all-tests passed with python pointed to python3 - ran push_to_asf.py Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb Reviewed-on: http://gerrit.cloudera.org:8080/19697 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-04-26 18:52:23 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Andrew Sherman	acea702276	IMPALA-8536: Add Scalable Pool Configuration to Admission Controller. Add 3 configuration parameters to Admission Controller that scale with the number of hosts in the resource pool. These parameters are specified to the Impalad through the -llama_site_path flag which points to a Llama XML configuration file. The new configuration parameters are: + Max Running Queries Multiple - this floating point number is multiplied by the current total number of executors at runtime to give the maximum number of concurrently running queries allowed in the pool. This calculation is rounded up to the nearest integer so the result will always be at least one as long as the parameter is non-zero. + Max Queued Queries Multiple - this floating point number is multiplied by the current total number of executors at runtime to give the maximum number of queries that can be queued in the pool. This calculation is rounded up to the nearest integer so the result will always be at least one as long as the parameter is non-zero. + Max Memory Multiple - this number of bytes is multiplied by the current total number of executors at runtime to give the maximum memory available across the cluster for the pool. If any of these parameters have zero value then they will be ignored. In this case the corresponding non-scalable parameters will be used, if they are set. The new parameters are exposed through the webui. At various points in the code Admission Controller looks at the Pool Config objects to find non-scalable parameters such as the max number of queries that can run in the pool. These access have been encapsulated in functions that return the scalable version of the configuration value if the new scalable parameters are being used. Diagnostic messages are enhanced to show the origin of the encapsulated parameters. TESTING All end-to-end tests are running clean with ASAN. The unit test admission-controller-test.cc has been expanded to test the newly added code. Added an end-to-end test that adds and removes Impalads from a minicluster. Change-Id: If47508728124076f3b9200c27cffc989f7a4f188 Reviewed-on: http://gerrit.cloudera.org:8080/13307 Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-19 05:58:10 +00:00
Lars Volker	dba30cc29b	IMPALA-4330: Fix JSON syntax in generate_metrics.py The hardcoded JSON string in MDL_BASE had a superfluous comma, that tripped both the simplejson and json parsers. This change removes it so the string works with both parsers. Change-Id: I98456df28d48ed22cefcc570e88df78fdf441c23 Reviewed-on: http://gerrit.cloudera.org:8080/4887 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-02 20:01:58 +00:00
Tim Armstrong	ee2a06d827	Remove Llama dependency This change prevents us from depending on LLAMA to build. Note that the LLAMA MiniKDC is left in - it is a test utility that does not depend on LLAMA itself. IMPALA-4292 tracks cleaning this up. Testing: Ran a private build to verify that all tests pass. Change-Id: If2e5e21d8047097d56062ded11b0832a1d397fe0 Reviewed-on: http://gerrit.cloudera.org:8080/4739 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-10-18 16:35:58 +00:00
Lars Volker	ef4c9958d0	IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo This change removes some of the occurrences of the strings 'CDH'/'cdh' from the Impala repository. References to Cloudera-internal Jiras have been replaced with upstream Jira issues on issues.cloudera.org. For several categories of occurrences (e.g. pom.xml files, DOWNLOAD_CDH_COMPONENTS) I also created a list of follow-up Jiras to remove the occurrences left after this change. Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8 Reviewed-on: http://gerrit.cloudera.org:8080/4187 Reviewed-by: Jim Apple <jbapple@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-13 00:40:41 +00:00
Thomas Tauber-Marshall	b2c2fe7813	IMPALA-3786: Replace "cloudera" with "apache" (part 2) As part of the ASF transition, we need to replace references to Cloudera in Impala with references to Apache. This primarily means changing Java package names from com.cloudera.impala.* to org.apache.impala.* A prior patch renamed all the files as necessary, and this patch performs the actual code changes. Most of the changes in this patch were generated with some commands of the form: find . \| grep "\.java\\|\.py\\|\.h\\|\.cc" \| \ xargs sed -i s/'com\(.\)cloudera\(\.\)impala/org\1apache\2impala/g along with some manual fixes. After this patch, the remaining references to Cloudera in the repo mostly fall into the categories: - External components that have cloudera in their own package names, eg. com.cloudera.kudu/llama - URLs, eg. https://repository.cloudera.com/ Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2 Reviewed-on: http://gerrit.cloudera.org:8080/3937 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-09-29 21:14:13 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
ishaan	ba78337329	Bump version to 2.5.0-cdh5-INTERNAL Change-Id: I4213571f59a19959fb9ff18f4ebeddb52e761c89 Reviewed-on: http://gerrit.cloudera.org:8080/809 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-11-26 00:59:25 +00:00
Matthew Jacobs	b338034376	Lower-case units in generated metrics mdl file The mdl file will be consumed by CM. They have asked for the units to be lower-case. Change-Id: Iacc583ff2c1680ec02a41feab558fbb2890d95be Reviewed-on: http://gerrit.cloudera.org:8080/499 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-07-01 20:18:37 +00:00
Matthew Jacobs	cbedd03d9f	Add option in generate_metrics.py to output CM mdl Adds support in the script generate_metrics.py to produce a CM compatible metric definition (MDL) file. Fixes some metrics missing descriptions and changing some metrics created as gauges that are really counters. TODO: Support histograms, stats, and metric defs with args Change-Id: I3ebb45145035facab5d4408118150f8c8eb8786a Reviewed-on: http://gerrit.cloudera.org:8080/423 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 02:58:33 +00:00
Matthew Jacobs	f37682a16f	Fix packaging build for Python 2.4 cgroups.py was using unsupported "except <Exception> as <var>" syntax. generate_metrics.py was using the json module which is not available in Python 2.4, but contains simplejson which provides the same functionality. Change-Id: If2c176c15a9573dd2a2acf5ee459ff24ce891ce3 Reviewed-on: http://gerrit.cloudera.org:8080/396 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2015-05-19 17:13:33 +00:00
Matthew Jacobs	fe87bb1563	Add MetricDefs, static definitions of metric metadata generated from json Adds a static definition of the metric metadata used by Impala. The metric names, descriptions, and other properties are defined in common/thrift/metrics.json file, and the generate_metrics.py script creates a thrift representation. The metric definitions are then available in a constant map which is used at runtime to instantiate metrics, looking them up in the map by the metric key. New metrics should be defined by adding an entry to the list of metrics in metrics.json with the following properties: key: The unique string identifying the metric. If the metric can be templated, e.g. rpc call duration, it may be a format string (in the format used by strings::Substitute()). description: A text description of the metric. May also be a format string. label: A brief title for the metric, not currently used by Impala but provided for external tools. units: The unit of the metric. Must be a valid value of TUnit. kind: The kind of metric, e.g. GAUGE or COUNTER. Must be a valid value of TMetricKind. contexts: The context in which this metric may be instantiated. Usually "IMPALAD", "STATESTORED", "CATALOGD", but may be a different kind of 'entity'. Not currently used by Impala but provided for modeling purposes for external tools. For example, adding the counter for the total number of queries run over the lifetime of the impalad process might look like: { "key": "impala-server.num-queries", "description": "The total number of queries processed.", "label": "Queries", "units": "UNIT", "kind": "COUNTER", "contexts": [ "IMPALAD" ] } TODO: Incorporate 'label' into the metrics debug page. TODO: Verify the context at runtime, e.g. verify 'contexts' contains, e.g. a DCHECK. After the metric definition is added, the generate_metrics.py script will generate the TMetricDefs.thrift that contains a TMetricDef for the metric definition. At runtime, the metric can be instantiated using the key defined in metrics.json. Gauges, Counters, and Properties are instantiated using static methods on MetricGroup. Other metric types are instantiated using static CreateAndRegister methods on their associated classes. TODO: Generate a thrift enum used to lookup metric defs. TODO: Consolidate the instantiation of metrics that are created outside of metrics.h (i.e. collection metrics, memory metrics). TODO: Need a better way to verify if metric definitions are missing. Change-Id: Iba7f94144d0c34f273c502ce6b9a2130ea8fedaa Reviewed-on: http://gerrit.cloudera.org:8080/330 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 21:27:28 +00:00

16 Commits