This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
- Remove Fabric and Paramiko as requirements. They aren't needed by
anything in buildall.sh.
- Add a means to install into the impala-python virtual environment by hand.
impala-pip is fine for this.
- Add another requirements file for extended testing. The dependency
situation is messy and untangling that out of impala-python and into
lib/python should be out of the scope of IMPALA-7460.
- Update core tests, which cover real regressions that have happened in
the past, to run against locations that don't require a Paramiko
import. This moves some logic out of concurrent_select.py into a
thinner module.
- Insulate ssh_util from globally-scoped import so that it only imports
when needed.
Testing:
- This works in my development environment.
- This works in my downstream stress and query gen environments.
- This works when doing a full data load.
- Impala still builds on a variety of OSs.
Todo:
- A subsequent review will update the versions.
Change-Id: Ibf9010a0387b52c95b7bda5d1d4606eba1008b65
Reviewed-on: http://gerrit.cloudera.org:8080/11264
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The major changes are:
1) Collect backtrace and fatal log on crash.
2) Poll memory usage. The data is only displayed at this time.
3) Support kerberos.
4) Add random queries.
5) Generate random and TPC-H nested data on a remote cluster. The
random data generator was converted to use MR for scaling.
6) Add a cluster abstraction to run data loading for #5 on a
remote or local cluster. This also moves and consolidates some
Cloudera Manager utilities that were in the stress test.
7) Cleanup the wrappers around impyla. That stuff was getting
messy.
Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7
Reviewed-on: http://gerrit.cloudera.org:8080/1298
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This is general clean up in prep for use with the stress test.
Changes:
1) Failed commands and failure to connect now raise exceptions.
Previously run_cmd() was not guaranteed to do anything at all in
remote mode.
2) Fix scope of 'hosts' which was at the class level but was modified
by instance level functions which makes no sense since different
instances could clash with each other.
3) Remove uses of opaque *args and **kwargs instead of named args. The
generic forms should be avoided since they impair readability.
4) Stop trying to get the cluster hosts from an environment variable
unconditionally upon construction.
5) Remove 'local' member variable, it's not needed and allowing 'local'
to be set to False when no 'hosts' are not set makes no sense.
6) Simplify and remove unneeded methods and arguments.
Change-Id: Id90bd3b640f2681bb7e82a5e6d5e49ed8c5a7b98
Reviewed-on: http://gerrit.cloudera.org:8080/514
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This change encloses fabric's task method with its 'hide' context manager. The current
state of the running commands are muted (i.e, hosts connected to, which command is running
etc.). Error messages are NOT muted, and will still be displayed (connection error,
command failure).
Change-Id: Ibfbbb995ab6fe057faec9af8be90449654b21f8c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1155
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
The plugin runner uses fabric as the underlying mechanism for running remote commands on
cluster hosts. fabric in turn uses paramiko, which generates a lot of log spew. This
change seta parmiko's logging level to ERROR, eliminating excess logging. Additionally,
it also constrains fabric's logging.
Change-Id: I6229d64f95f9c1512cc01842c4a661e96e421086
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1064
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins