Commit Graph

26 Commits

Author SHA1 Message Date
Joe McDonnell
a9cfc7b33f IMPALA-11624: Bump Impyla dependency to 0.18.0
IMPALA_THRIFT_PY_VERSION is also bumped to 0.16.0p3.
As 0.16.0p3 Thrift does not contain Python related
patches and Impyla 0.18.0 depends on Thrift 0.16.0,
now we are consistently using Thrift 0.16.0 in all
Python code. This also bumps the Thrift in the
shell's ext-py directory to 0.16.0 (based on the
Thrift 0.16.0 pypi tarball with the egg directory
removed).

Testing:
 - Ran a GVO job

Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5
Reviewed-on: http://gerrit.cloudera.org:8080/18456
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-02-27 20:39:26 +00:00
Michael Smith
16190b4f77 IMPALA-11737: Update sasl to 0.3.1 for Python 3.10
sasl 0.2.1 fails to build with Python 3.10. Updates to sasl 0.3.1 for
Python 3.10 compatibility.

Testing:
- built under Python 3.8
- automated tests will test with built bundle and pip install using
current Python version
- pip3 installed shell/build/dist on Ubuntu 22.04 with Python 3.10

Change-Id: I6b522f2b8cb5546150cd3274c7670a6ca9b8ff63
Reviewed-on: http://gerrit.cloudera.org:8080/19265
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2022-11-28 17:16:42 +00:00
Michael Smith
64b324ac40 IMPALA-11389: Include Python 3 eggs in tarball
Build Python 3 eggs for the shell tarball so it works with both Python 2
and Python 3. The impala-shell script selects eggs based on the
available Python version.

Inlines thrift for impala-shell so we can easily build Python 2 and
Python 3 versions, consistent with other libraries. The impala-shell
version should always be at least as new as IMPALA_THRIFT_PY_VERSION.

Thrift 0.13.0+ wraps all exceptions during TSocket read/write operations
in TTransportException. Specifically socket.error that we got as raw
exceptions are now wrapped. Unwraps them before raising to preserve
prior behavior.

A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE;
otherwise it will use 'python', and if unavailable try 'python3'.

Adds tests for impala-shell tarball with Python 3.

Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
Reviewed-on: http://gerrit.cloudera.org:8080/18653
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-14 23:52:04 +00:00
Joe McDonnell
7a26ff4b97 IMPALA-11379: Remove kerberos.egg-info directory
This directory is currently checked in, but it is
overwritten when building the shell. On some Linux
distributions, the output is different from what
is checked in. This causes problems for perf-AB-test
(based on bin/single_node_perf_run.py), which relies on
a build not causing any modifications.

This removes the kerberos.egg-info directory,
which does not need to be checked in.

This also adds checks to the GVO Jenkins jobs
to verify that the source tree is unmodified after
bootstrap_build.sh and boostrap_development.sh.
These checks are not included in those scripts
directly, because developers can run those scripts
in their development environments, which may have
modifications.

Tests:
 - Uploaded a change without removing the kerberos.egg-info
   directory and verified that the new checks fail
 - Verified that perf-AB-test gets past the current issue

Change-Id: I90b486bb6c1644fc18b56779d6c54e1e1b3c9aaa
Reviewed-on: http://gerrit.cloudera.org:8080/18650
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-22 23:58:44 +00:00
wzhou-code
b867f4c4f1 IMPALA-10745 (part 2): Support Kerberos over HTTP for impala-shell
This patch adds kerberos-1.3.1 Python module to shell/ext-py so that
the egg file of Kerberos module is built and added into impala-shell
tarball when running script shell/make_shell_tarball.sh.
Kerberos Python module is distributed under Apache License Version 2.
Its source distribution is available at:
https://pypi.org/project/kerberos/

Testing:
 - Passed core run.
 - Installed impala-shell from impala-shell tarball on dev box as
   standalone package. Verified that impala-shell could be ran without
   additional configurations.
 - Installed impala-shell from impala-shell tarball on a real cluster
   with a full Kerberos setup. Verified that impala-shell could
   connect to impala server with options "-k --protocol=hs2-http".

Change-Id: Id34074cbe725ba2cf1407fcf59e00475cd417a6d
Reviewed-on: http://gerrit.cloudera.org:8080/18523
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-15 21:46:06 +00:00
wzhou-code
1a61a8025c IMPALA-10937: Fix broken-build on Centos-7
A recent patch upgraded thrift_sasl from 0.4.2 to 0.4.3. It broke
the builds on Centos-7.
The version of setuptools which is installed by Jenkins for the
virtualenvs on Centos 7 could be lower than the requirement by
thrift_sasl 0.4.3. Another issue is the new string syntax for
install_requires in setup.py is not accepted on Centos-7.
This patch remove the requirement for setuptools in setup.py of
thrift_sasl as work around. Also change back the syntax of strings
for install_requires.

Testing:
  - Ran core test on impala-private-parameterized which use Centos-7.
  - Ran core test on pre-review-test which use Ubuntu.

Change-Id: I2c256a8ec9a151bca8b3370bfce6ecebf060bad0
Reviewed-on: http://gerrit.cloudera.org:8080/17886
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-09-30 05:41:14 +00:00
wzhou-code
9e76a8f7c3 IMPALA-10784 (part 3): Prepare to publish impala-shell on PyPi
We are going to publish impala-shell release 4.1.0a1 on PyPi.
This patch upgrades following three python libraries which are used
for generating egg files when building impala-shell tarball.
  upgrade bitarray from 1.2.1 to 2.3.0
  upgrade prettytable from 0.7.1 to 0.7.2
  upgrade thrift_sasl from 0.4.2 to 0.4.3
Updates shell/packaging/requirements.txt for the versions of dependent
Python libraries.

Testing:
 - Ran core tests.
 - Built impala-shell package impala_shell-4.1.0a1.tar.gz, installed
   impala-shell package from local impala_shell-4.1.0a1.tar.gz, verified
   impala-shell was installed in ~/.local/lib/python2.7/site-packages.
   Verified the version of installed impala-shell and dependent Python
   libraries as expected.
 - Set IMPALA_SHELL_HOME as ~/.local/lib/python2.7/site-packages/
   impala_shell, copied over egg files under installed impala-shell
   python package so we can run the end-to-end unit tests against
   the impala-shell installed with the package downloaded from PyPi.
   Passed end-to-end impala-shell unit tests.
 - Verified the impala-shell tarball generated by
   shell/make_shell_tarball.sh.

Change-Id: I378404e2407396d4de3bb0eea4d49a9c5bb4e46a
Reviewed-on: http://gerrit.cloudera.org:8080/17826
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-09-28 04:55:57 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
David Knupp
8b09343e7d IMPALA-9719: Upgrade sasl-0.1.1 -> 0.2.1
Needed for python 3 compatibility.

Note that we had to amend the make_shell_tarball.sh to account for the
fact that execfile has been removed from python 3.

Tested by running gerrit-verify-dryrun, and also confirmed I can connect
to a kerberized host.

$ <path_to>/impala-shell-4.0.0-SNAPSHOT/impala-shell -k --ssl -i host.redacted.com
Starting Impala Shell with Kerberos authentication using Python 2.7.12
Using service name 'impala'
SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change)
No handlers could be found for logger "thrift.transport.sslcompat"
Opened TCP connection to host.redacted.com:21000
Connected to host.redacted.com:21000
Server version: impalad version 3.4.0-SNAPSHOT RELEASE (build d17bc21...)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-SNAPSHOT (7af6a8d) built on Tue May  5 10:39:12 PDT 2020)

To see more tips, run the TIP command.
***********************************************************************************
[host.redacted.com:21000] default>

Change-Id: Ibd02055d33e2da504eccd571f1f209ae2e5b7876
Reviewed-on: http://gerrit.cloudera.org:8080/15859
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-05-22 05:07:00 +00:00
David Knupp
537c30dd06 IMPALA-9720: Update bitarray 0.9.0 -> 1.2.1
This is needed for python3 compatibility.

Tested by running gerrit-verify-dryrun.

Change-Id: I0641b03e880314a424d9d5a0651945c4f51273bc
Reviewed-on: http://gerrit.cloudera.org:8080/15858
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-05 12:33:31 +00:00
David Knupp
c26e3db4bd IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.

As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.

No new tests were added, but all existing tests passed without
modification.

Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-17 05:04:23 +00:00
David Knupp
5c541512f0 IMPALA-9582: Upgrade thrift_sasl to 0.4.2 for impala-shell
Change-Id: Iff739ebeaf5b022a7418883b638b5c5d17885f3b
Reviewed-on: http://gerrit.cloudera.org:8080/15610
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-01 04:22:38 +00:00
David Knupp
f1c8176e65 IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/
We've relied on a copied version of thrift_sasl.py, which needs
to be updated to be compatible with python 3, so taking this
opportunity to add the thrift_sasl 0.4.1 package to ext-py like
the other external python libs we use.

Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75
Reviewed-on: http://gerrit.cloudera.org:8080/15513
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-21 12:25:55 +00:00
David Knupp
df875dc05b IMPALA-9424: Add six to shell/ext-py
Change-Id: I003e0008c138ee1f2c290775553d4cfc66e9b7fe
Reviewed-on: http://gerrit.cloudera.org:8080/15293
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-28 01:37:38 +00:00
Tim Armstrong
f1f3ae9ec2 IMPALA-7290: part 2: Add HS2 support to Impala shell
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.

This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.

Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
  regression. The HS2 protocol requires decoding
  more primitive values (because its not a string-per-row),
  which was slow with the pure python implementation of
  TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
  the columnar results into rows in impala_client.py. The transposition
  needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
  address.
* Add CloseImpalaOperation() extension to return DML row counts. This
  possibly addresses IMPALA-1789, although we need to confirm that
  this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
  tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
  consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
  default. This captures the effective value of TIMEZONE, which
  was previously missing. This also requires test changes where
  the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
  shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
  passing to avoid implicit parameter passing through multiple
  function calls.
* Clean up argument handling in shell tests to consistently pass
  around lists of arguments instead of strings that are subject
  to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
  HS2 sessions. This is enforced by making ImpalaShell a context
  manager and also eliminating all sys.exit() calls that would
  bypass the explicit connection closing.

Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
  change as a result of switching to server-side vs client-side
  formatting.
* Verified that newly-added tests were actually going through HS2
  by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
  are left open at the end of tests.

Performance:
Baseline from beeswax shell for large extract is as follows:

  $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
  real    0m6.708s
  user    0m5.132s
  sys     0m0.204s

After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
  real    0m7.625s
  user    0m6.436s
  sys     0m0.256s

Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 10:23:28 +00:00
Fredy Wijaya
4cdb6dfa6d IMPALA-6337: Fix infinite loop in Impala shell
This patch fixes a bug in sqlparse where sqlparse incorrectly splits a
statement that has a new line inside double quotes. The bug in sqlparse
causes Impala shell to go to infinite loop when a statement contains a
new line inside double quotes.

The patch in sqlparse is based on the upstream fix at
https://github.com/andialbrecht/sqlparse/pull/396

Testing:
- Added new end-to-end shell tests
- Ran end-to-end shell tests

Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Reviewed-on: http://gerrit.cloudera.org:8080/9195
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-12 19:29:33 +00:00
Fredy Wijaya
49413d9c5b IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell
sqlparse-0.1.19 is the last version of sqlparse that supports Python
2.6.

Testing:
- Ran all end-to-end tests

Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb
Reviewed-on: http://gerrit.cloudera.org:8080/10354
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-10 19:27:52 +00:00
casey
554f1f779b Shell: Fix parsing of strings containing an escaped backslash
Upgrading sqlparse ended up trading one bug for another. The new bug is
not fixed upstream, I sent a patch. The problem is '\\' is not
considered a terminated string and we use this in the phrase "fields
escaped by '\\'" when creating tables.

Change-Id: Id57081f5a96e997afd3aa9b26dca23f627488fc3
Reviewed-on: http://gerrit.cloudera.org:8080/117
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-02-27 03:17:28 +00:00
casey
71c5ec7af5 IMPALA-1612: (shell) Upgrade sqlparse for bug fix
The only thing this commit does is upgrade sqlparse. The upgrade was
done by downloading and extracting the tarball, nothing else (such as
patching). The older version of sqlparse would parse

SELECT
'
;
'
;

into two statements. Neither statement is complete due to the open quote
and this would cause an infinite loop. The bug is already fixed in the
newest version of sqlparse.

Change-Id: I7ce7c269769ae0cde3dc8ca386d0b0e11bea71c1
Reviewed-on: http://gerrit.cloudera.org:8080/102
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-02-25 23:50:59 +00:00
ishaan
f262fcea64 Support utf-8 input and out in the shell
Also add --strict_unicode option which controls whether invalid unicode
code points should be ignored on input.

Change-Id: Ice59d6dd3df4557ab3b1fc91d7ddc0e1bf03f1c7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3218
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-07-02 23:18:27 -07:00
ishaan
8e6a998257 Remove generated files from impala-shell external packages.
Replaying an earlier change. It likely got re-written during a merge, and was only
partially applied.

Change-Id: Idfc656225545dfbef892ea4d21a7240a66931f77
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1327
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1456
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
2014-02-05 19:30:02 -08:00
ishaan
d3ffdbea15 IMPALA-644 The shell takes too long to parse a query file.
The shell uses an external module called sqlparse to strip the comments from a query file.
When sqlparse.format() is invoked, it runs several grouping functions on the
tokenized query text; some of these methods are very slow, and not needed for comment
removal. This change restricts sqlparse to only invoke the grouping function for removing
comments.

Change-Id: I3a067187667fcd3cd331156a325960a3de2db9c2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/944
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:11 -08:00
ishaan
1656e779b9 Improved comment parsing in query file passed to the shell 2014-01-08 10:50:03 -08:00
ishaan
fa3500eb43 Improve display of query results in the impala shell 2014-01-08 10:48:27 -08:00
ishaan
5166bda649 Make sasl buildable on ubuntu 11/12. 2014-01-08 10:47:52 -08:00
ishaan
83e9bccabb Add the sasl package to source for the shell. 2014-01-08 10:47:51 -08:00