This fixes a few impala-shell Python 3 issues:
1. In ImpalaShell's do_history(), the decode() call needs to be
avoided in Python 3, because in Python 3 the cmd is already
a string and doesn't need further decoding. (IMPALA-11315)
2. TestImpalaShell.test_http_socket_timeout() gets a different
error message in Python 3. It throws the "BlockingIOError"
rather than "socker.error". (IMPALA-11316)
3. ImpalaHttpClient.py's code to retrieve the body when
handling an HTTP error needs to have a decode() call
for the body. Otherwise, the body remains bytes and
causes TestImpalaShellInteractive.test_http_interactions_extra()
to fail. (IMPALA-11317)
Testing:
- Ran shell tests in the standard way
- Ran shell tests with the impala-shell executable coming from
a Python 3 virtualenv using the PyPi package
Change-Id: Ie58380a17d7e011f4ce96b27d34717509a0b80a6
Reviewed-on: http://gerrit.cloudera.org:8080/18556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
impala-shell fail with TypeError when installed with python3. This is
due to behavior change of division operator ('/') between python2 vs
python3. This patch fix the issue by changing the operator with floor
division ('//') that result in integer type as described in
https://peps.python.org/pep-0238/.
Testing:
- Manually install impala-shell with from pip with python3 and verify
the fix works.
Change-Id: Ifbe4df6a7a4136e590f383fc6475e2283e35eadc
Reviewed-on: http://gerrit.cloudera.org:8080/18546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch ports the implementation of GSSAPI authentication over http
transport from Impyla (https://github.com/cloudera/impyla/pull/415) to
impala-shell.
The implementation adds a new dependency on 'kerberos' python module,
which is a pip-installed module distributed under Apache License Version
2.
When using impala-shell with Kerberos over http, it is assumed that the
host has a preexisting kinit-cached Kerberos ticket that impala-shell
can pass to the server automatically without the user to reenter the
password.
Testing:
- Passed exhaustive tests.
- Tested manually on a real cluster with a full Kerberos setup.
Change-Id: Ia59ba4004490735162adbd468a00a962165c5abd
Reviewed-on: http://gerrit.cloudera.org:8080/18493
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
client
In 'hs2-http' mode, the socket timeout is None, which could cause
hang like symptoms in case of a problematic remote server.
Added support for configurable socket timeout using the new impala-shell
config option '--http_socket_timeout_s'. If a reasonable timeout is
set, impala-shell client can retry in case of connection issues, when
possible. The default value of '--http_socket_timeout_s' is set to None,
to prevent behavior changes for existing clients.
More details on socket timeout here:
https://docs.python.org/3/library/socket.html#socket-timeouts
Testing:
- Added tests for various timeout values in test_shell_commandline.py
- Ran e2e shell tests.
Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94
Reviewed-on: http://gerrit.cloudera.org:8080/18336
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
The get_summary() thrift call is not supported in strict_hs2 mode
on impala-shell. The live_progress and live_summary options are
disabled when the strict_hs2_protocol flag is set.
Change-Id: I6aee838a80b4659a13a0a0cb9eabffa2c8767c8f
Reviewed-on: http://gerrit.cloudera.org:8080/18177
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
The insert command was broken for impala-shell in the strict_hs2
mode. The return parameter for close_dml should return two parameters.
The parameters returned by close_dml are rows returned and error
rows. These are not supported by strict hs2 mode since the close
does not return the TDmlResult structure. So the message to
the end user also had to be changed.
Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f
Reviewed-on: http://gerrit.cloudera.org:8080/18176
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Impala-shell already uses HS2 protocol to connect to Impalad.
This commit allows impala-shell to connect to any server (for
example, Hive) using the hs2 protocol. This will be done via
the "--strict_hs2_protocol" option.
When the "--strict_hs2_protocol" option is turned on, only features
supported by hs2 will work. For instance, "runtime-profile" is an
impalad specific feature and will be disabled.
The "--strict_hs2_protocol" will only work on servers that abide
by the strict definition of what is supported by HS2. So one will
be able to connect to Hive in this mode, but connections to Impala
will not work. Any feature supported by Hive (e.g. kerberos
authentication) should work as well.
Note: While authentication should work, the test framework is not
set up to create an HS2 server that does authentication at this point
so this feature should be used with caution.
Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e
Reviewed-on: http://gerrit.cloudera.org:8080/17660
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-10234 added support for cookie authentication for LDAP to
impala-shell. But it does not accept user input cookie name via
startup flags, and it retains only one cookie.
In some scenarios, we could use proxy to manage the sessions with
additional HTTP cookies added by proxy.
This patch made cookie support more generic for impala-shell.
It lets the user specify cookie names via a startup flag
"--http_cookie_names" and could retain more than one cookies.
Testing:
- Manualy tested the multiple cookies in HTTP headers with a
customized Impala server which could send and receive multiple
cookies.
- Passed core test, including new test cases.
Change-Id: I193422d5ec891886a522d82ecb0e9d974132ff2a
Reviewed-on: http://gerrit.cloudera.org:8080/17667
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.
Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).
The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.
The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.
Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
compilation happens with no_utf8strings. This also made things a
bit faster, e.g the following is ~0.22s instead of ~0.25
shell/impala_shell.py \
-B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
instead of its number, leading to slightly different messages
in some cases.
- "templates" was added to the thift generator's parameters to avoid
a compilation issue (related to IMPALA-10600). I didn't notice any
change in compilation time. This option generated .tcc files with
templetized readers/writers for Thrift types. Currently we don't
use these, but they could potentially speed up (de)serialization.
Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests
Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In Python2, print() converts all non-keyword arguments to strings like
str() does and writes them to the stream. str() on QueryStateException
returns its value(i.e. error message) which could be in unicode type.
Python2 will implicitly encode it to str type using the default
encoding, 'ascii'. This could result in UnicodeEncodeError when there
are non-ascii characters in the error message.
This patch explicitly encodes the error message using 'utf-8' encoding
if it's in unicode type and the shell is run in Python2.
Tests:
- Add test in test_shell_interactive.py
Change-Id: Ie10f5b03ecc5877053c2fbada1afaf256b423a71
Reviewed-on: http://gerrit.cloudera.org:8080/17099
Reviewed-by: Tamas Mate <tmate@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
To make impala-shell compatible for Python3, we explicitly distinguish
bytes and text in Python2 by decoding the bytes for all inputs.
Regression 1: multiple queries in one line with unicode chars will break
In precmd() of impala-shell, if there are multiple queries present in
one input line, we split it into individual queries (by
sqlparse.split()) and append them back to the 'cmdqueue'. They will be
passed to precmd() again. In our Python2 implementation, precmd()
expects them to be str type, and will decode them into unicode type.
However, the output type of sqlparse.split() is unicode which doesn't
have a decode() method. Calling decode() on a unicode var will let
Python2 implicitly encode it to str. This may cause UnicodeEncodeError
since implicitly encoding use 'ascii'.
Regression 2: multi-line query with unicode chars will break when
command history is enabled
In _check_for_command_completion(), when calling
readline.replace_history_item in Python2. We encode the completed_cmd
into bytes. However, we shouldn't replace it since the return type is
expected to be unicode.
Tests:
- Add tests for these two regressions in Python2.
Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4
Reviewed-on: http://gerrit.cloudera.org:8080/16960
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.
Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually with nginx HTTP proxy.
TODO:
- Test with Knox HTTP proxy as well.
Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Reviewed-on: http://gerrit.cloudera.org:8080/16660
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the --quiet flag is used with impala-shell, the intention is that
if the query is successful then only the query results should be
printed.
This patch fixes two cases where --quiet was not being respected:
- When using the HTTP transport and --client_connect_timeout_ms is
set, a warning is printed that the timeout is not applied.
- When running in non-interactive mode, a warning is printed that
--live_progress is automatically disabled. This warning is now also
only printed if --live_progress is actually set.
Testing:
- Added a test that runs a simple query with --quiet and confirms the
output is as expected.
Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe
Reviewed-on: http://gerrit.cloudera.org:8080/16673
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch extends the 'summary' command of impala-shell to support
retrieving the summary of the original query attempt. The new syntax is
SUMMARY [ALL | LATEST | ORIGINAL]
If 'ALL' is specified, both the latest and original summaries are
printed. If 'LATEST' is specified, only the summary of the latest query
attempt is printed. If 'ORIGINAL' is specified, only the summary of the
original query attempt is printed. The default option is 'LATEST'.
Support for this has only been added to HS2 given that Beeswax is being
deprecated soon.
Tests:
- Add new tests in test_shell_interactive.py
Change-Id: I8605dd0eb2d3a2f64f154afb6c2fd34251c1fec2
Reviewed-on: http://gerrit.cloudera.org:8080/16502
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch introduces a new startup flag
--ping_expose_webserver_url (true by default) to control whether
PingImpalaService, PingImpalaHS2Service RPC calls should expose
the debug web url to the client or not.
This is necessary as the debug web UI is not something that
end-users will necessarily have access to.
If the flag is set to false, the RPC calls will return an empty
string instead of the real url signalling that the debug web ui
is not available.
Note that if the webserver is disabled (--enable_webserver flag
is set to false) the RPC calls will behave the same and return an
empty string for the url.
Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
Reviewed-on: http://gerrit.cloudera.org:8080/16573
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The beeswax interface has been marked deprecated for awhile, but it
remains the default protocol for impala-shell because we felt that
changing the default protocol constituted a break change. Now that 4.0
is the next release, we can switch the default protocol to hs2.
This patch also adds a few more deprecation warnings around beeswax.
Change-Id: I65eb14ec03c1e1ef26782554aedd6670bbeedfe8
Reviewed-on: http://gerrit.cloudera.org:8080/16327
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When a query contains WITH clause impala-shell tries to identify whether
it is a DML query or not, so that later it can provide appropriate
result messages. Earlier shlex was used to create tokens and assess the
query type based on that. However shlex can misinterpret some query
strings where whitespace charachters are mixed with quotes, because it
splits the string based on whitespace charachters. In some scenarios
'ValueError: No closing quotation' error can occur.
This change moves the tokenization from shlex to sqlparse.
Testing:
- Added unit test to cover queries that contain mixed whitespaces
and strings
Change-Id: I442d3bc65b90a55c73c847948d5179a8586d71ad
Reviewed-on: http://gerrit.cloudera.org:8080/16389
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, the impala-shell 'profile' command only returns the profile
for the most recent profile attempt. There is no way to get the original
query profile (the profile of the first query attempt that failed) from
the impala-shell.
This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to
add support for returning both the original and retried profiles for a
retried query. When a query is retried, TGetRuntimeProfileResp currently
contains the profile for the most recent query attempt.
TGetRuntimeProfileReq has a new field called 'include_query_attempts'
and when it is set to true, the TGetRuntimeProfileResp will include all
failed profiles in a new field called failed_profiles /
failed_thrift_profiles.
impala-shell has been modified so the 'profile' command has a new set of
options. The syntax is now:
PROFILE [ALL | LATEST | ORIGINAL]
If 'ALL' is specified, both the latest and original profiles are
printed. If 'LATEST' is specified, only the latest profile is printed.
If 'ORIGINAL' is printed, only the original profile is printed. The
default behavior is equivalent to specifying 'LATEST' (which is the
current behavior before this patch as well).
Support for this has only been added to HS2 given that Beeswax is being
deprecated soon. The new 'profile' options have no affect when the
Beeswax protocol is used.
Most of the code change is in impala-hs2-server and impala-server; a lot
of the GetRuntimeProfile code has been re-factored.
Testing:
* Added new impala-shell tests
* Ran core tests
Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65
Reviewed-on: http://gerrit.cloudera.org:8080/16406
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.
The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");
This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.
This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.
For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).
Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'
Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05
Reviewed-on: http://gerrit.cloudera.org:8080/16418
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.
The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");
This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.
This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.
For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).
Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'
Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8
Reviewed-on: http://gerrit.cloudera.org:8080/16305
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
The Impala shell stops fetching rows if it receives a batch that
contains 0 rows. This is incorrect because a batch with 0 rows can be
returned if the fetch request hits a timeout. Instead, the shell should
rely on the value of has_rows / hasMoreRows to determine when to stop
issuing fetch requests.
Tests:
* Added a regression test to test_shell_commandline.py
* Ran all shell tests
Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Reviewed-on: http://gerrit.cloudera.org:8080/16222
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala-shell periodically calls GetExecSummary() when the query is
queuing or running. If the query is being retried, GetExecSummary()
should return the TExecSummary of the retried query. So the progress bar
and live_summary can reflect the most recent state.
This patch also modifies get_summary() to return retry information in
error_logs of TExecSummary. Impala-shell and other clients can print the
info right after the query starts being retried. Modified impala-shell
to print the retried query link when the retried query is running.
Example output when the retried query is running:
Query: select count(*) from functional.alltypes where bool_col = sleep(60)
Query submitted at: 2020-06-18 22:08:49 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=9444fe7f0df0da28:29134b0800000000
Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001
Retrying query using query id: 5748d9a3ccc28ba8:a75e2fab00000000
Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=5748d9a3ccc28ba8:a75e2fab00000000
[############################### ] 50%
Tests:
- Manually verify the progress bar and live_summary work when the query
is being retried.
- Add tests in test_query_retries.py to validate the get_summary()
results.
Change-Id: I8f96919f00e0b64d589efd15b6b5ec82fb725d56
Reviewed-on: http://gerrit.cloudera.org:8080/16096
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Beeswax clients use get_log() to retrieve the warning/error message
after the query finishes. HS2 clients use GetLog() for the same purpose.
This patch adds the retry information into the returned result if the
query is retried. So clients that print the log can show the original
query failure and the retried query id.
This patch also modifies impala-shell to extract the retried query id
and print the retried query link.
Here's an example of the impala-shell output:
Query: select count(*) from functional.alltypes where bool_col = sleep(60)
Query submitted at: 2020-06-18 21:23:52 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=7944ffee4d81cdd4:e7f9357a00000000
+----------+
| count(*) |
+----------+
| 3650 |
+----------+
WARNINGS: Original query failed:
Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001
Query has been retried using query id: 934b2734f67a1161:a0dbd60200000000
Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=934b2734f67a1161:a0dbd60200000000
Tests:
- Add tests in test_query_retries.py to verify client logs returned
from GetLog().
- Run test_query_retries.py.
- Manually run queries in impala-shell and kill impalads. Verify
printed messages when the retried queries succeed or fail.
Change-Id: I58cf94f91a0b92eb9a3088bee3894ac157a954dc
Reviewed-on: http://gerrit.cloudera.org:8080/16093
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds the option --fetch_size to the Impala shell. This new option allows
users to specify the fetch size used when issuing fetch RPCs to the
Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch).
This parameter applies for all client protocols: beeswax, hs2, hs2-http.
The default --fetch_size is set to 10240 (10x the default batch size).
The new --fetch_size parameter is most effective when result spooling is
enabled. When result spooling is disabled, Impala can only return a
single row batch per fetch RPC (so 1024 rows by default). When result
spooling is enabled, Impala can return up to 100 row batches per fetch
request.
Removes some logic in the the impala_client.py file that attempts to
simulate a fetch_size. The code would issue multiple fetch requests to
fullfill the given fetch_size. This logic is no longer needed now that
result spooling is available.
Testing:
* Ran core tests
* Added new tests in test_shell_client.py and test_shell_commandline.py
Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b
Reviewed-on: http://gerrit.cloudera.org:8080/16041
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Sahil Takiar <stakiar@cloudera.com>
A minor syntax error slipped past in a recent patch. In python3, the
syntax for catching exceptions requires the 'as' keyword. This error
was missed in code review.
Until automated python3 testing set up, this kind of error is likely
to repeat. See IMPALA-9724.
Change-Id: I0d36c609a3600c8084efcce0026537227144b27d
Reviewed-on: http://gerrit.cloudera.org:8080/15856
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: David Knupp <dknupp@cloudera.com>
This change adds a new condition to avoid re-reading the impala-shell
history when the cmdloop is broken. The loop can break due to exceptions
such as KeyboardInterrupt.
Testing:
- The change was tested manually on local dev env
- Added a new EE shell test to verify the history after SIGINT
Change-Id: If4faf46134f44d91e56748642f47d448707db53c
Reviewed-on: http://gerrit.cloudera.org:8080/15345
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
LdapImpalaShellTest was broken by IMPALA-3343 which changed the
wording of the startup message printed by impala-shell. This patch
fixes the test.
Change-Id: I9e7070fa6da4085ea858f64141529a7a23a86534
Reviewed-on: http://gerrit.cloudera.org:8080/15777
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.
Notable changes for reviewers to consider:
- With regard to validating the patch, my assumption is that simply passing
the existing set of e2e shell tests is sufficient to confirm that the shell
is functioning properly. No new tests were added.
- A new pytest command line option was added in conftest.py to enable a user
to specify a path to an alternate impala-shell executable to test. It's
possible to use this to point to an instance of the impala-shell that was
installed as a standalone python package in a separate virtualenv.
Example usage:
USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py
The target virtualenv may be based on either python3 or python2. However,
this has no effect on the version of python used to run the test framework,
which remains tied to python 2.7.x for the foreseeable future.
- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
environment independenty from bin/set-pythonpath.sh. The default version
of thrift is thrift-0.11.0 (See IMPALA-9489).
- The wording of the header changed a bit to include the python version
used to run the shell.
Starting Impala Shell with no authentication using Python 3.7.5
Opened TCP connection to localhost:21000
...
OR
Starting Impala Shell with LDAP-based authentication using Python 2.7.12
Opened TCP connection to localhost:21000
...
- By far, the biggest hassle has been juggling str versus unicode versus
bytes data types. Python 2.x was fairly loose and inconsistent in
how it dealt with strings. As a quick demo of what I mean:
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d = 'like a duck'
>>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
True
...and yet there are weird unexpected gotchas.
>>> d.decode('utf-8') == d.encode('utf-8')
True
>>> d.encode('utf-8') == bytearray(d, 'utf-8')
True
>>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property?
False
As a result, this was inconsistency was reflected in the way we handled
strings in the impala-shell code, but things still just worked.
In python3, there's a much clearer distinction between strings and bytes, and
as such, much tighter type consistency is expected by standard libs like
subprocess, re, sqlparse, prettytable, etc., which are used throughout the
shell. Even simple calls that worked in python 2.x:
>>> import re
>>> re.findall('foo', b'foobar')
['foo']
...can throw exceptions in python 3.x:
>>> import re
>>> re.findall('foo', b'foobar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
Exceptions like this resulted in a many, if not most shell tests failing
under python 3.
What ultimately seemed like a better approach was to try to weed out as many
existing spurious str.encode() and str.decode() calls as I could, and try to
implement what is has colloquially been called a "unicode sandwich" -- namely,
"bytes on the outside, unicode on the inside, encode/decode at the edges."
The primary spot in the shell where we call decode() now is when sanitising
input...
args = self.sanitise_input(args.decode('utf-8'))
...and also whenever a library like re required it. Similarly, str.encode()
is primarily used where a library like readline or csv requires is.
- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
python 2. Without this, piping or redirecting stdout results in unicode errors.
- from __future__ import unicode_literals was added throughout
Testing:
To test the changes, I ran the e2e shell tests the way we always do (against
the normal build tarball), and then I set up a python 3 virtual env with the
shell installed as a package, and manually ran the tests against that.
No effort has been made at this point to come up with a way to integrate
testing of the shell in a python3 environment into our automated test
processes.
Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.
As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.
No new tests were added, but all existing tests passed without
modification.
Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Added retries for idempotent rpcs:
OpenSession, PingImpalaHS2Service, GetResultSetMetadata,
CloseImpalaOperation (non dmls), CancelOperation, GetOperationStatus,
GetRuntimeProfile, GetExecSummary, GetLog
Retries were also added to the 'set all' query execution and subsequent
result fetch in the ImpalaHS2Client._open_session()
The retries are only supported for hs2-http protocol and enabled by
default. At most there are 3 retries for a failed rpc. There is a sleep
duration of 'n' seconds after nth retry.
Only failed rpcs due to an error in the http transport are retried and
if an rpc failed because the server returned an error in the rpc
response then such scenarios are not retriable.
Improved error diagnostics by dumping stack trace when ImpalaShell.
_execute_stmt() gets an 'Unknown Exception'.
Testing:
- Added a custom_cluster test which injects fault into the http transport
and checks expected behavior from the various rpcs. Some of these tests
leave the session in an open state and so these tests are not suitable
for the e2e test framework which have metric verifiers expecting related
metrics to be 0 at the end of the test.
- Manually tested real world scenarios with impala-shell client
communicating with an impala coordinator via a fault injecting istio mesh.
- Manually tested dropping connections on an nginx ingress gateway by sending
SIGTERM to all worker processes.
Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Reviewed-on: http://gerrit.cloudera.org:8080/15378
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A few built-ins were changed in python 3 -- e.g., xrange became range,
ConfigParser became configparser, etc. We can redefine some of those
things in a single place, and import them from there as needed. Other
items may also be added as we go along.
Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3
Reviewed-on: http://gerrit.cloudera.org:8080/15514
Reviewed-by: David Knupp <dknupp@cloudera.com>
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In an effort to keep the work of reviewing the changes more manageable
with regard to making the impala-shell python3 compatible, I'm trying
to break the patches up into smaller chunks.
The first patch is the easiest one -- simply addressing the handful of
syntax issues that aren't python 3 compatible, namely changing the
print statements to function calls, changing the way we catch exceptions,
and adding a few simple branches to work around the removal of such
things as dict.iteritems().
We needed the print function imported from __future__ because it allows
us to pass in a file descriptor, e.g., sys.stderr.
Notably, there's nothing in this patch related to string/bytes/unicode
changes from python 2 to 3.
Change-Id: I9a515da01ef03d5936cb1a4d9e4bc6d105386b1d
Reviewed-on: http://gerrit.cloudera.org:8080/15487
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The 'Expect: 100-continue' http header allows http clients to send
only the headers for their request, get a confirmation back from the
server that the headers are valid, and only then send the body of the
request, avoiding the overhead of sending large requests that will
ultimately fail.
This patch adds support for this in the HS2 HTTP server by having
THttpServer look for the header, and if it's present and the request
is validated returning a '100 Continue' response before reading the
body of the request.
It also adds supports for using this header on large requests sent by
impala-shell.
Testing:
- This case is covered by the existing test_large_sql, however that
test was previously broken and passing spuriously. This patch fixes
the test.
- Passed all other shell tests.
Change-Id: I4153968551acd58b25c7923c2ebf75ee29a7e76b
Reviewed-on: http://gerrit.cloudera.org:8080/15284
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
In order to improve usability, this patch makes Impala shell show query
processing status while the query is running. The patch enables shell
option live_progress by default when a user launches impala shell in
the interactive mode. The patch also adds a new command line flag
"--disable_live_progress", which allows a user to disable live_progress
at runtime. In the interactive mode, a user can disable live_progress
by either using the command line flag or setting the option as False in
the config file. As for in the non-interactive mode (when the -q or -f
options are used), live reporting is not supported. Impala-shell will
disable live_progress if the mode is detected.
Testing:
- Added and updated tests in test_shell_interactive.py and test_shell_commandline.py
- Successfully ran all shell related tests
Change-Id: I3765b775f663fa227e59728acffe4d5ea9a5e2d3
Reviewed-on: http://gerrit.cloudera.org:8080/15219
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Modified the '_signal_handler()' in impala-shell.py so when a user
cancels a multiline query by hitting CTRL+C it will cancel the
query, instead of just the current line.
Testing:
-Added 'test_cancellation_mid_command()' to test_shell_interactive.py
to test if it really cancels the partial commands.
-Manually tested by giving partial commands then cancelling them.
Change-Id: Id8d8bdaee929e2655eb66e886ae92a02d3fbd83f
Reviewed-on: http://gerrit.cloudera.org:8080/15233
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for live_summary and live_progress in impalarc.
Testing:
1) Added unit-test cases in test_shell_commandline.py and
test_shell_interactive.py for live_summary and live_progress.
2) Successfully ran all other tests in test_shell_interactive.py and
test_shell_commandline.py
Change-Id: If4549b775a7966ad89d661d0349cc78754e13a86
Reviewed-on: http://gerrit.cloudera.org:8080/14927
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When mt_dop > 0, the summary is reporting the number of fragment
instances, instead of the number of hosts as the header would
imply.
This commit fixes the issue so the number of hosts will be shown
under the #Hosts column. The commit also adds an #Inst column
where the number of instances are shown (current behaviour).
Tests:
* Changed profile tests with mt_dop > 0.
* Updated benchmark tests and shell tests accordingly.
Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69
Reviewed-on: http://gerrit.cloudera.org:8080/14715
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the impala-shell is disconnected, it will try to reconnect
for any command that a user runs (as part of ImpalaShell's precmd()).
This doesn't make sense when the user is trying to quit the shell
(i.e. by typing 'quit' or 'exit' or hitting Ctrl-D).
This skips the attempt to reconnect when quitting the shell.
Testing:
- Added test in test_shell_interactive.py
- Verified by hand
Change-Id: I6a76bc515db609498fa8772e9f0b0c547b82c09e
Reviewed-on: http://gerrit.cloudera.org:8080/14391
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.
Use --protocol='hs2-http' to enable this behavior.
Example usages:
---------------
impala-shell --protocol='hs2-http' (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)
Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.
Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.
Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Problem:
When assign --output_delimiter to invalid value, the validation of
the argument is done only after the query is running, ValueError is
raised in DelimitedOutputFormatter and caught in _exec_stmt in shell
Solution:
Add --output_delimiter option check before impala-shell initialization
Remove delimiter length check in DelimitedOutputFormatter
Testing:
tests/shell/test_shell_commandline.py passed
Example:
$ impala-shell.sh -B --output_delimiter '||' -q 'select 1,1,1'
Illegal delimiter ||, the delimiter must be a 1-character string.
Change-Id: I7ee2fccd305b104b3aff44c57659b6f14f2f4a05
Reviewed-on: http://gerrit.cloudera.org:8080/13690
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.
This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.
Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
regression. The HS2 protocol requires decoding
more primitive values (because its not a string-per-row),
which was slow with the pure python implementation of
TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
the columnar results into rows in impala_client.py. The transposition
needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
address.
* Add CloseImpalaOperation() extension to return DML row counts. This
possibly addresses IMPALA-1789, although we need to confirm that
this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
default. This captures the effective value of TIMEZONE, which
was previously missing. This also requires test changes where
the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
passing to avoid implicit parameter passing through multiple
function calls.
* Clean up argument handling in shell tests to consistently pass
around lists of arguments instead of strings that are subject
to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
HS2 sessions. This is enforced by making ImpalaShell a context
manager and also eliminating all sys.exit() calls that would
bypass the explicit connection closing.
Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
change as a result of switching to server-side vs client-side
formatting.
* Verified that newly-added tests were actually going through HS2
by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
are left open at the end of tests.
Performance:
Baseline from beeswax shell for large extract is as follows:
$ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
real 0m6.708s
user 0m5.132s
sys 0m0.204s
After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
real 0m7.625s
user 0m6.436s
sys 0m0.256s
Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, impalarc files can be specified on a per-user basis
(stored in ~/.impalarc), and they aren't created by default. The
Impala shell should pick up /etc/impalarc as well, in addition
to the user-specific configurations.
The intent here is to allow a "global" configuration of the shell
by a system administrator. The default path of the global config
file can be changed by setting the $IMPALA_SHELL_GLOBAL_CONFIG_FILE
environment variable.
Note that the options set in the user config file take precedence
over those in the global config file.
Change-Id: I3a3179b6d9c9e3b2b01d6d3c5847cadb68782816
Reviewed-on: http://gerrit.cloudera.org:8080/13313
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When impala-shell is used to connect to an impala cluster with
--ssl_minimum_version=tlsv1.2, if the Python version being used is
< 2.7.9 the connection will fail due to a limitation of TSSLSocket.
See IMPALA-6990 for more details.
Currently, when this occurs, the error that gets printed is "EOF
occurred in violation of protocol", which is not very helpful. This
patch detect this situation and prints a more informative warning.
Testing:
- Updated test_tls_v12 so that instead of being skipped on affected
platforms, it runs and checks for the presence of the warning.
Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Reviewed-on: http://gerrit.cloudera.org:8080/13003
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch updates the file format in Impala shell config file to accept
both short and long flag names in addition to optparse's dest names
(variable names to store flag values) for better user experience because
dest names are internal to Impala shell.
Format:
[impala]
flag_name=flag_value
Example:
[impala]
; This is long flag.
query=select 1
; This is short flag.
Q=DEFAULT_FILE_FORMAT=parquet
; Flags can be repeated with ,
var=msg1=hello,var=msg2=world
; The old format using internal variable name is still supported for
; backward compatibility.
keyval=msg3=foo,keyval=msg4=bar
Testing:
- Ran all E2E shell tests on Python 2.6 and 2.7.
Change-Id: Ic43603c1b538af08fddcab1b2c1f6ad1af1a6cb9
Reviewed-on: http://gerrit.cloudera.org:8080/12823
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for list type flags in Impala shell config
file, i.e. those that use action="append", such as --var and
--query_option. To make it less error-prone, this patch also updates
the logic for bool flags in the config file to also look at the
correct type from the argument parser instead of relying on whether or
not the default values are set in impala_shell_config_defaults.py.
Testing:
- Added a new test for list type flags
- Ran all shell E2E tests
Change-Id: I824ca15b4e1064a391b13deef9cecd34c928ef73
Reviewed-on: http://gerrit.cloudera.org:8080/12781
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change fixes a regression introduced by "IMPALA-2195 Improper
handling of comments in queries."
The Impala Shell parses input text into several strings using the
sqlparse library. One of the returned strings is the sql command, this
is used to determine the correct do_<command> method to call. Another of
the returned strings is the leading comment, which is a comment that
appears before legal sql text.
Python2 has strings with multiple encodings. The strings returned from
the sqlparse library have the Unicode encoding. Impala Shell converts
the sql command string to utf-8 encoding before using it.
If the Impala Shell needs to send the sql command to an Impala
Coordinator then it (re)constructs the query out of the strings
returned by the sqlparse library. This query is sent to the Coordinator
via Beeswax protocol. The query is converted to an ascii string before
being sent. The conversion can fail if the leading comment string
contains Unicode characters, which can't be directly converted to ascii.
So the trigger for the bug is that the leading comment contains Unicode.
The fix is that the leading comment string should be converted to utf-8
in the same way as the sql command.
TESTING:
Ran all end -to-end tests.
Added two test cases to tests/shell/test_shell_interactive.py
Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Reviewed-on: http://gerrit.cloudera.org:8080/12812
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala shell defines a dictionary of default values for some shell
options. Before this patch, the logic for --config_file checks if
a shell option exists by using the default value dictionary, which
does not contain the exhaustive list of shell options. This causes
a valid option in the Impala shell config file to be treated as
unrecognizable shell option due to the option not having a default
value. The patch fixes the issue by changing the logic that checks
for the existence of an option using the option list from optparse.
The patch also fixes the missing dest parameter for ldap_password_cmd
option.
Testing:
- Updated test_shell_commandline::test_config_file
- Ran all shell tests
Change-Id: Iff371d038fa77ba659e9b7c7a4ed5b374237f2ea
Reviewed-on: http://gerrit.cloudera.org:8080/12245
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>