This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.
This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.
Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.
This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.
Testing:
- Ran a core job
Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Python 3 now treats print as a function and requires
the parenthesis in invocation.
print "Hello World!"
is now:
print("Hello World!")
This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.
print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)
To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function
This also fixes random flake8 issues that intersect with
the changes.
Testing:
- check-python-syntax.sh shows no errors related to print
Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Add 3 configuration parameters to Admission Controller that scale with
the number of hosts in the resource pool. These parameters are specified
to the Impalad through the -llama_site_path flag which points to a Llama
XML configuration file.
The new configuration parameters are:
+ Max Running Queries Multiple - this floating point number is
multiplied by the current total number of executors at runtime to give
the maximum number of concurrently running queries allowed in the
pool. This calculation is rounded up to the nearest integer so the
result will always be at least one as long as the parameter is
non-zero.
+ Max Queued Queries Multiple - this floating point number is multiplied
by the current total number of executors at runtime to give the
maximum number of queries that can be queued in the pool. This
calculation is rounded up to the nearest integer so the result will
always be at least one as long as the parameter is non-zero.
+ Max Memory Multiple - this number of bytes is multiplied by the
current total number of executors at runtime to give the maximum
memory available across the cluster for the pool.
If any of these parameters have zero value then they will be ignored.
In this case the corresponding non-scalable parameters will be used, if
they are set.
The new parameters are exposed through the webui.
At various points in the code Admission Controller looks at the Pool
Config objects to find non-scalable parameters such as the max number of
queries that can run in the pool. These access have been encapsulated in
functions that return the scalable version of the configuration value if
the new scalable parameters are being used. Diagnostic messages are
enhanced to show the origin of the encapsulated parameters.
TESTING
All end-to-end tests are running clean with ASAN.
The unit test admission-controller-test.cc has been expanded to test the
newly added code.
Added an end-to-end test that adds and removes Impalads from a
minicluster.
Change-Id: If47508728124076f3b9200c27cffc989f7a4f188
Reviewed-on: http://gerrit.cloudera.org:8080/13307
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The hardcoded JSON string in MDL_BASE had a superfluous comma, that
tripped both the simplejson and json parsers. This change removes it so
the string works with both parsers.
Change-Id: I98456df28d48ed22cefcc570e88df78fdf441c23
Reviewed-on: http://gerrit.cloudera.org:8080/4887
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This change prevents us from depending on LLAMA to build.
Note that the LLAMA MiniKDC is left in - it is a test
utility that does not depend on LLAMA itself.
IMPALA-4292 tracks cleaning this up.
Testing:
Ran a private build to verify that all tests pass.
Change-Id: If2e5e21d8047097d56062ded11b0832a1d397fe0
Reviewed-on: http://gerrit.cloudera.org:8080/4739
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
This change removes some of the occurrences of the strings 'CDH'/'cdh'
from the Impala repository. References to Cloudera-internal Jiras have
been replaced with upstream Jira issues on issues.cloudera.org.
For several categories of occurrences (e.g. pom.xml files,
DOWNLOAD_CDH_COMPONENTS) I also created a list of follow-up Jiras to
remove the occurrences left after this change.
Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8
Reviewed-on: http://gerrit.cloudera.org:8080/4187
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
As part of the ASF transition, we need to replace references to
Cloudera in Impala with references to Apache. This primarily means
changing Java package names from com.cloudera.impala.* to
org.apache.impala.*
A prior patch renamed all the files as necessary, and this patch
performs the actual code changes. Most of the changes in this patch
were generated with some commands of the form:
find . | grep "\.java\|\.py\|\.h\|\.cc" | \
xargs sed -i s/'com\(.\)cloudera\(\.\)impala/org\1apache\2impala/g
along with some manual fixes.
After this patch, the remaining references to Cloudera in the repo
mostly fall into the categories:
- External components that have cloudera in their own package names,
eg. com.cloudera.kudu/llama
- URLs, eg. https://repository.cloudera.com/
Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2
Reviewed-on: http://gerrit.cloudera.org:8080/3937
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The mdl file will be consumed by CM. They have asked for the units
to be lower-case.
Change-Id: Iacc583ff2c1680ec02a41feab558fbb2890d95be
Reviewed-on: http://gerrit.cloudera.org:8080/499
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Adds support in the script generate_metrics.py to produce
a CM compatible metric definition (MDL) file.
Fixes some metrics missing descriptions and changing some
metrics created as gauges that are really counters.
TODO: Support histograms, stats, and metric defs with args
Change-Id: I3ebb45145035facab5d4408118150f8c8eb8786a
Reviewed-on: http://gerrit.cloudera.org:8080/423
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
cgroups.py was using unsupported "except <Exception> as <var>" syntax.
generate_metrics.py was using the json module which is not available
in Python 2.4, but contains simplejson which provides the same
functionality.
Change-Id: If2c176c15a9573dd2a2acf5ee459ff24ce891ce3
Reviewed-on: http://gerrit.cloudera.org:8080/396
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Adds a static definition of the metric metadata used by Impala. The
metric names, descriptions, and other properties are defined in
common/thrift/metrics.json file, and the generate_metrics.py script
creates a thrift representation. The metric definitions are then
available in a constant map which is used at runtime to instantiate
metrics, looking them up in the map by the metric key.
New metrics should be defined by adding an entry to the list of metrics
in metrics.json with the following properties:
key: The unique string identifying the metric. If the metric can
be templated, e.g. rpc call duration, it may be a format
string (in the format used by strings::Substitute()).
description: A text description of the metric. May also be a format
string.
label: A brief title for the metric, not currently used by
Impala but provided for external tools.
units: The unit of the metric. Must be a valid value of TUnit.
kind: The kind of metric, e.g. GAUGE or COUNTER. Must be a valid
value of TMetricKind.
contexts: The context in which this metric may be instantiated.
Usually "IMPALAD", "STATESTORED", "CATALOGD", but may be
a different kind of 'entity'. Not currently used by
Impala but provided for modeling purposes for external
tools.
For example, adding the counter for the total number of queries run over
the lifetime of the impalad process might look like:
{
"key": "impala-server.num-queries",
"description": "The total number of queries processed.",
"label": "Queries",
"units": "UNIT",
"kind": "COUNTER",
"contexts": [
"IMPALAD"
]
}
TODO: Incorporate 'label' into the metrics debug page.
TODO: Verify the context at runtime, e.g. verify 'contexts' contains,
e.g. a DCHECK.
After the metric definition is added, the generate_metrics.py script
will generate the TMetricDefs.thrift that contains a TMetricDef for
the metric definition. At runtime, the metric can be instantiated
using the key defined in metrics.json. Gauges, Counters, and
Properties are instantiated using static methods on MetricGroup. Other
metric types are instantiated using static CreateAndRegister methods
on their associated classes.
TODO: Generate a thrift enum used to lookup metric defs.
TODO: Consolidate the instantiation of metrics that are created
outside of metrics.h (i.e. collection metrics, memory metrics).
TODO: Need a better way to verify if metric definitions are missing.
Change-Id: Iba7f94144d0c34f273c502ce6b9a2130ea8fedaa
Reviewed-on: http://gerrit.cloudera.org:8080/330
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins