Needed for python 3 compatibility.
Note that we had to amend the make_shell_tarball.sh to account for the
fact that execfile has been removed from python 3.
Tested by running gerrit-verify-dryrun, and also confirmed I can connect
to a kerberized host.
$ <path_to>/impala-shell-4.0.0-SNAPSHOT/impala-shell -k --ssl -i host.redacted.com
Starting Impala Shell with Kerberos authentication using Python 2.7.12
Using service name 'impala'
SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change)
No handlers could be found for logger "thrift.transport.sslcompat"
Opened TCP connection to host.redacted.com:21000
Connected to host.redacted.com:21000
Server version: impalad version 3.4.0-SNAPSHOT RELEASE (build d17bc21...)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-SNAPSHOT (7af6a8d) built on Tue May 5 10:39:12 PDT 2020)
To see more tips, run the TIP command.
***********************************************************************************
[host.redacted.com:21000] default>
Change-Id: Ibd02055d33e2da504eccd571f1f209ae2e5b7876
Reviewed-on: http://gerrit.cloudera.org:8080/15859
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.
As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.
No new tests were added, but all existing tests passed without
modification.
Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We've relied on a copied version of thrift_sasl.py, which needs
to be updated to be compatible with python 3, so taking this
opportunity to add the thrift_sasl 0.4.1 package to ext-py like
the other external python libs we use.
Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75
Reviewed-on: http://gerrit.cloudera.org:8080/15513
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.
This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.
Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
regression. The HS2 protocol requires decoding
more primitive values (because its not a string-per-row),
which was slow with the pure python implementation of
TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
the columnar results into rows in impala_client.py. The transposition
needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
address.
* Add CloseImpalaOperation() extension to return DML row counts. This
possibly addresses IMPALA-1789, although we need to confirm that
this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
default. This captures the effective value of TIMEZONE, which
was previously missing. This also requires test changes where
the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
passing to avoid implicit parameter passing through multiple
function calls.
* Clean up argument handling in shell tests to consistently pass
around lists of arguments instead of strings that are subject
to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
HS2 sessions. This is enforced by making ImpalaShell a context
manager and also eliminating all sys.exit() calls that would
bypass the explicit connection closing.
Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
change as a result of switching to server-side vs client-side
formatting.
* Verified that newly-added tests were actually going through HS2
by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
are left open at the end of tests.
Performance:
Baseline from beeswax shell for large extract is as follows:
$ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
real 0m6.708s
user 0m5.132s
sys 0m0.204s
After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
real 0m7.625s
user 0m6.436s
sys 0m0.256s
Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes a bug in sqlparse where sqlparse incorrectly splits a
statement that has a new line inside double quotes. The bug in sqlparse
causes Impala shell to go to infinite loop when a statement contains a
new line inside double quotes.
The patch in sqlparse is based on the upstream fix at
https://github.com/andialbrecht/sqlparse/pull/396
Testing:
- Added new end-to-end shell tests
- Ran end-to-end shell tests
Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Reviewed-on: http://gerrit.cloudera.org:8080/9195
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Upgrading sqlparse ended up trading one bug for another. The new bug is
not fixed upstream, I sent a patch. The problem is '\\' is not
considered a terminated string and we use this in the phrase "fields
escaped by '\\'" when creating tables.
Change-Id: Id57081f5a96e997afd3aa9b26dca23f627488fc3
Reviewed-on: http://gerrit.cloudera.org:8080/117
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The only thing this commit does is upgrade sqlparse. The upgrade was
done by downloading and extracting the tarball, nothing else (such as
patching). The older version of sqlparse would parse
SELECT
'
;
'
;
into two statements. Neither statement is complete due to the open quote
and this would cause an infinite loop. The bug is already fixed in the
newest version of sqlparse.
Change-Id: I7ce7c269769ae0cde3dc8ca386d0b0e11bea71c1
Reviewed-on: http://gerrit.cloudera.org:8080/102
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The shell uses an external module called sqlparse to strip the comments from a query file.
When sqlparse.format() is invoked, it runs several grouping functions on the
tokenized query text; some of these methods are very slow, and not needed for comment
removal. This change restricts sqlparse to only invoke the grouping function for removing
comments.
Change-Id: I3a067187667fcd3cd331156a325960a3de2db9c2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/944
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins