This patch adds kerberos-1.3.1 Python module to shell/ext-py so that
the egg file of Kerberos module is built and added into impala-shell
tarball when running script shell/make_shell_tarball.sh.
Kerberos Python module is distributed under Apache License Version 2.
Its source distribution is available at:
https://pypi.org/project/kerberos/
Testing:
- Passed core run.
- Installed impala-shell from impala-shell tarball on dev box as
standalone package. Verified that impala-shell could be ran without
additional configurations.
- Installed impala-shell from impala-shell tarball on a real cluster
with a full Kerberos setup. Verified that impala-shell could
connect to impala server with options "-k --protocol=hs2-http".
Change-Id: Id34074cbe725ba2cf1407fcf59e00475cd417a6d
Reviewed-on: http://gerrit.cloudera.org:8080/18523
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A recent patch upgraded thrift_sasl from 0.4.2 to 0.4.3. It broke
the builds on Centos-7.
The version of setuptools which is installed by Jenkins for the
virtualenvs on Centos 7 could be lower than the requirement by
thrift_sasl 0.4.3. Another issue is the new string syntax for
install_requires in setup.py is not accepted on Centos-7.
This patch remove the requirement for setuptools in setup.py of
thrift_sasl as work around. Also change back the syntax of strings
for install_requires.
Testing:
- Ran core test on impala-private-parameterized which use Centos-7.
- Ran core test on pre-review-test which use Ubuntu.
Change-Id: I2c256a8ec9a151bca8b3370bfce6ecebf060bad0
Reviewed-on: http://gerrit.cloudera.org:8080/17886
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We are going to publish impala-shell release 4.1.0a1 on PyPi.
This patch upgrades following three python libraries which are used
for generating egg files when building impala-shell tarball.
upgrade bitarray from 1.2.1 to 2.3.0
upgrade prettytable from 0.7.1 to 0.7.2
upgrade thrift_sasl from 0.4.2 to 0.4.3
Updates shell/packaging/requirements.txt for the versions of dependent
Python libraries.
Testing:
- Ran core tests.
- Built impala-shell package impala_shell-4.1.0a1.tar.gz, installed
impala-shell package from local impala_shell-4.1.0a1.tar.gz, verified
impala-shell was installed in ~/.local/lib/python2.7/site-packages.
Verified the version of installed impala-shell and dependent Python
libraries as expected.
- Set IMPALA_SHELL_HOME as ~/.local/lib/python2.7/site-packages/
impala_shell, copied over egg files under installed impala-shell
python package so we can run the end-to-end unit tests against
the impala-shell installed with the package downloaded from PyPi.
Passed end-to-end impala-shell unit tests.
- Verified the impala-shell tarball generated by
shell/make_shell_tarball.sh.
Change-Id: I378404e2407396d4de3bb0eea4d49a9c5bb4e46a
Reviewed-on: http://gerrit.cloudera.org:8080/17826
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.
Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).
The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.
The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.
Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
compilation happens with no_utf8strings. This also made things a
bit faster, e.g the following is ~0.22s instead of ~0.25
shell/impala_shell.py \
-B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
instead of its number, leading to slightly different messages
in some cases.
- "templates" was added to the thift generator's parameters to avoid
a compilation issue (related to IMPALA-10600). I didn't notice any
change in compilation time. This option generated .tcc files with
templetized readers/writers for Thrift types. Currently we don't
use these, but they could potentially speed up (de)serialization.
Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests
Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Needed for python 3 compatibility.
Note that we had to amend the make_shell_tarball.sh to account for the
fact that execfile has been removed from python 3.
Tested by running gerrit-verify-dryrun, and also confirmed I can connect
to a kerberized host.
$ <path_to>/impala-shell-4.0.0-SNAPSHOT/impala-shell -k --ssl -i host.redacted.com
Starting Impala Shell with Kerberos authentication using Python 2.7.12
Using service name 'impala'
SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change)
No handlers could be found for logger "thrift.transport.sslcompat"
Opened TCP connection to host.redacted.com:21000
Connected to host.redacted.com:21000
Server version: impalad version 3.4.0-SNAPSHOT RELEASE (build d17bc21...)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-SNAPSHOT (7af6a8d) built on Tue May 5 10:39:12 PDT 2020)
To see more tips, run the TIP command.
***********************************************************************************
[host.redacted.com:21000] default>
Change-Id: Ibd02055d33e2da504eccd571f1f209ae2e5b7876
Reviewed-on: http://gerrit.cloudera.org:8080/15859
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.
As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.
No new tests were added, but all existing tests passed without
modification.
Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We've relied on a copied version of thrift_sasl.py, which needs
to be updated to be compatible with python 3, so taking this
opportunity to add the thrift_sasl 0.4.1 package to ext-py like
the other external python libs we use.
Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75
Reviewed-on: http://gerrit.cloudera.org:8080/15513
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.
This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.
Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
regression. The HS2 protocol requires decoding
more primitive values (because its not a string-per-row),
which was slow with the pure python implementation of
TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
the columnar results into rows in impala_client.py. The transposition
needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
address.
* Add CloseImpalaOperation() extension to return DML row counts. This
possibly addresses IMPALA-1789, although we need to confirm that
this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
default. This captures the effective value of TIMEZONE, which
was previously missing. This also requires test changes where
the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
passing to avoid implicit parameter passing through multiple
function calls.
* Clean up argument handling in shell tests to consistently pass
around lists of arguments instead of strings that are subject
to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
HS2 sessions. This is enforced by making ImpalaShell a context
manager and also eliminating all sys.exit() calls that would
bypass the explicit connection closing.
Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
change as a result of switching to server-side vs client-side
formatting.
* Verified that newly-added tests were actually going through HS2
by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
are left open at the end of tests.
Performance:
Baseline from beeswax shell for large extract is as follows:
$ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
real 0m6.708s
user 0m5.132s
sys 0m0.204s
After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
real 0m7.625s
user 0m6.436s
sys 0m0.256s
Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes a bug in sqlparse where sqlparse incorrectly splits a
statement that has a new line inside double quotes. The bug in sqlparse
causes Impala shell to go to infinite loop when a statement contains a
new line inside double quotes.
The patch in sqlparse is based on the upstream fix at
https://github.com/andialbrecht/sqlparse/pull/396
Testing:
- Added new end-to-end shell tests
- Ran end-to-end shell tests
Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Reviewed-on: http://gerrit.cloudera.org:8080/9195
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Upgrading sqlparse ended up trading one bug for another. The new bug is
not fixed upstream, I sent a patch. The problem is '\\' is not
considered a terminated string and we use this in the phrase "fields
escaped by '\\'" when creating tables.
Change-Id: Id57081f5a96e997afd3aa9b26dca23f627488fc3
Reviewed-on: http://gerrit.cloudera.org:8080/117
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The only thing this commit does is upgrade sqlparse. The upgrade was
done by downloading and extracting the tarball, nothing else (such as
patching). The older version of sqlparse would parse
SELECT
'
;
'
;
into two statements. Neither statement is complete due to the open quote
and this would cause an infinite loop. The bug is already fixed in the
newest version of sqlparse.
Change-Id: I7ce7c269769ae0cde3dc8ca386d0b0e11bea71c1
Reviewed-on: http://gerrit.cloudera.org:8080/102
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The shell uses an external module called sqlparse to strip the comments from a query file.
When sqlparse.format() is invoked, it runs several grouping functions on the
tokenized query text; some of these methods are very slow, and not needed for comment
removal. This change restricts sqlparse to only invoke the grouping function for removing
comments.
Change-Id: I3a067187667fcd3cd331156a325960a3de2db9c2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/944
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins