This patch implements 'wait_for_finished_timeout',
'wait_for_admission_control', and 'get_admission_result' for
ImpylaHS2Client.
This patch also changes the behavior of ImpylaHS2Connection to produce
several extra cursors aside from self.__cursor for 'execute' call that
supplies user argument and each 'execute_async' to make issuing multiple
concurrent queries possible. Note that each HS2 cursor opens its own HS2
Session. Therefore, this patch breaks the assumption that an
ImpylaHS2Connection is always under a single HS2 Session (see HIVE-11402
and HIVE-14247 on why concurrent query with shared HS2 Session is
problematic). However, they do share the same query options stored at
self.__query_options. In practice, most Impala tests do not care about
running concurrent queries under a single HS2 session but only require
them to use the same query options.
The following additions are new for both BeeswaxConnection and
ImpylaHS2Connection:
- Add method 'log_client' for convenience.
- Implement consistent query state mapping and checking through several
accessor methods.
- Add methods 'wait_for_impala_state' and 'wait_for_any_impala_state'
that use 'get_impala_exec_state' method internally.
- Add 'fetch_profile_after_close' parameter to 'close_query' method. If
True, 'close_query' will return the query profile after closing the
query.
- Add 'discard_results' parameter for 'fetch' method. This can save time
parsing results if the test does not care about the result.
Reuse existing op_handle_to_query_id and add new
session_handle_to_session_id to parse HS2
TOperationHandle.operationId.guid and TSessionHandle.sessionId.guid
respectively.
To show that ImpylaHS2Connection is on par with BeeswaxConnection, this
patch refactors test_admission_controller.py to test using HS2 protocol
by default. Test that does raw HS2 RPC (require capabilities from
HS2TestSuite) is separated out into a new TestAdmissionControllerRawHS2
class and stays using beeswax protocol by default. All calls to
copy.copy is replaced with copy.deepcopy for safety.
Testing:
- Pass exhaustive tests.
Change-Id: I9ac07732424c16338e060c9392100b54337f11b8
Reviewed-on: http://gerrit.cloudera.org:8080/22362
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
We've relied on a copied version of thrift_sasl.py, which needs
to be updated to be compatible with python 3, so taking this
opportunity to add the thrift_sasl 0.4.1 package to ext-py like
the other external python libs we use.
Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75
Reviewed-on: http://gerrit.cloudera.org:8080/15513
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The shell uses Thrift's TSSLSocket to negotiate secure connections to
Impala. This socket uses a variable SSL_VERSION to determine which SSL
and TLS protocol versions it will connect to.
SSL_VERSION was hardcoded to be PROTOCOL_TLSv1, which only supports
TLSv1 servers and no other protocol version. Change the allowed version
to be PROTOCOL_SSLv23, which supports any TLS or SSL protocol. We rely
on the server not to allow SSLv2 or v3 connections.
Testing: Added a new custom cluster test to confirm that the shell can
connect to a TLSv1.2 cluster. Confirmed that the test is correctly
skipped on machines with an old version of OpenSSL that does not support
TLSv1.2.
Change-Id: I5487f82d110676b9c3c7a5305931da00c7f68ca0
Reviewed-on: http://gerrit.cloudera.org:8080/7675
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The major changes are:
1) Collect backtrace and fatal log on crash.
2) Poll memory usage. The data is only displayed at this time.
3) Support kerberos.
4) Add random queries.
5) Generate random and TPC-H nested data on a remote cluster. The
random data generator was converted to use MR for scaling.
6) Add a cluster abstraction to run data loading for #5 on a
remote or local cluster. This also moves and consolidates some
Cloudera Manager utilities that were in the stress test.
7) Cleanup the wrappers around impyla. That stuff was getting
messy.
Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7
Reviewed-on: http://gerrit.cloudera.org:8080/1298
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore. This is sufficient to test impala authentication.
When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.
Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems. These are
left for later work.
However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.
Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
'run-all.sh -format', re-source impala-config.sh, and then start
impala daemons as usual. You must reformat the cluster because
kerberizing it will change all the ownership of all files in HDFS.
Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working
Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing
Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
Adds support for running all the Impala query tests against a secure cluster. This run
mode can be selected by adding a --use_kerberos flag to run-tests.py and pointing to the
correct (secure) Hive Metastore Service.