Commit Graph

291 Commits

Author SHA1 Message Date
stiga-huang
1fbca6d43b IMPALA-9569: Fix progress bar and live_summary to show info of the retried query
Impala-shell periodically calls GetExecSummary() when the query is
queuing or running. If the query is being retried, GetExecSummary()
should return the TExecSummary of the retried query. So the progress bar
and live_summary can reflect the most recent state.

This patch also modifies get_summary() to return retry information in
error_logs of TExecSummary. Impala-shell and other clients can print the
info right after the query starts being retried. Modified impala-shell
to print the retried query link when the retried query is running.

Example output when the retried query is running:
Query: select count(*) from functional.alltypes where bool_col = sleep(60)
Query submitted at: 2020-06-18 22:08:49 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=9444fe7f0df0da28:29134b0800000000
Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001

Retrying query using query id: 5748d9a3ccc28ba8:a75e2fab00000000
Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=5748d9a3ccc28ba8:a75e2fab00000000
[###############################                               ] 50%

Tests:
- Manually verify the progress bar and live_summary work when the query
  is being retried.
- Add tests in test_query_retries.py to validate the get_summary()
  results.

Change-Id: I8f96919f00e0b64d589efd15b6b5ec82fb725d56
Reviewed-on: http://gerrit.cloudera.org:8080/16096
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-30 12:11:24 +00:00
stiga-huang
931063f0f2 IMPALA-9213: Add query retry info to GetLog result
Beeswax clients use get_log() to retrieve the warning/error message
after the query finishes. HS2 clients use GetLog() for the same purpose.
This patch adds the retry information into the returned result if the
query is retried. So clients that print the log can show the original
query failure and the retried query id.

This patch also modifies impala-shell to extract the retried query id
and print the retried query link.

Here's an example of the impala-shell output:

Query: select count(*) from functional.alltypes where bool_col = sleep(60)
Query submitted at: 2020-06-18 21:23:52 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=7944ffee4d81cdd4:e7f9357a00000000
+----------+
| count(*) |
+----------+
| 3650     |
+----------+
WARNINGS: Original query failed:
Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001

Query has been retried using query id: 934b2734f67a1161:a0dbd60200000000
Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=934b2734f67a1161:a0dbd60200000000

Tests:
 - Add tests in test_query_retries.py to verify client logs returned
   from GetLog().
 - Run test_query_retries.py.
 - Manually run queries in impala-shell and kill impalads. Verify
   printed messages when the retried queries succeed or fail.

Change-Id: I58cf94f91a0b92eb9a3088bee3894ac157a954dc
Reviewed-on: http://gerrit.cloudera.org:8080/16093
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-30 05:58:02 +00:00
Sahil Takiar
3088ca8580 IMPALA-9818: Add fetch size as option to impala shell
Adds the option --fetch_size to the Impala shell. This new option allows
users to specify the fetch size used when issuing fetch RPCs to the
Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch).
This parameter applies for all client protocols: beeswax, hs2, hs2-http.
The default --fetch_size is set to 10240 (10x the default batch size).

The new --fetch_size parameter is most effective when result spooling is
enabled. When result spooling is disabled, Impala can only return a
single row batch per fetch RPC (so 1024 rows by default). When result
spooling is enabled, Impala can return up to 100 row batches per fetch
request.

Removes some logic in the the impala_client.py file that attempts to
simulate a fetch_size. The code would issue multiple fetch requests to
fullfill the given fetch_size. This logic is no longer needed now that
result spooling is available.

Testing:
* Ran core tests
* Added new tests in test_shell_client.py and test_shell_commandline.py

Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b
Reviewed-on: http://gerrit.cloudera.org:8080/16041
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Sahil Takiar <stakiar@cloudera.com>
2020-06-10 17:46:21 +00:00
Joe McDonnell
00aef7f79e Only build ext-py directories tracked by git for make_shell_tarball.sh
When versions change in shell/ext-py/*, there can be leftover
directories when developers rebase. These leftover directories
are usually empty and unbuildable, so make_shell_tarball.sh will
fail with an error message like:
Creating an egg for .../Impala/shell/ext-py/bitarray-0.9.0
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'setup.py'
...

This changes the build logic to only build directories that are
tracked in git. When a version of an ext-py package changes, the
directory for the old version may stick around, but it is gone from
the git repository and won't be built. The downside is that when
a developer is adding a new package version, it won't be built until
it is added in git. This logic is disabled if IMPALA_HOME is not a
git repository, which can happen when building from release tarballs.

Testing:
 - Added an empty directory in shell/ext-py that was not tracked
   in git. Verified it is not built (and would fail before).
 - Tested the command detecting IMPALA_HOME as a git repository on
   a non-repository directory.

Change-Id: Ibb70ef2d5048d5cfeb260ce62c34f04835c7132d
Reviewed-on: http://gerrit.cloudera.org:8080/15886
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-01 20:45:57 +00:00
Joe McDonnell
56ee90c598 IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.

This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.

Testing:
 - The only impediment to building with
   IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
   Impala-lzo. With a custom Impala-lzo, compilation succeeds.
   Either Impala-lzo will be fixed or it will be removed.
 - Core tests

Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-30 16:25:37 +00:00
David Knupp
8b09343e7d IMPALA-9719: Upgrade sasl-0.1.1 -> 0.2.1
Needed for python 3 compatibility.

Note that we had to amend the make_shell_tarball.sh to account for the
fact that execfile has been removed from python 3.

Tested by running gerrit-verify-dryrun, and also confirmed I can connect
to a kerberized host.

$ <path_to>/impala-shell-4.0.0-SNAPSHOT/impala-shell -k --ssl -i host.redacted.com
Starting Impala Shell with Kerberos authentication using Python 2.7.12
Using service name 'impala'
SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change)
No handlers could be found for logger "thrift.transport.sslcompat"
Opened TCP connection to host.redacted.com:21000
Connected to host.redacted.com:21000
Server version: impalad version 3.4.0-SNAPSHOT RELEASE (build d17bc21...)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-SNAPSHOT (7af6a8d) built on Tue May  5 10:39:12 PDT 2020)

To see more tips, run the TIP command.
***********************************************************************************
[host.redacted.com:21000] default>

Change-Id: Ibd02055d33e2da504eccd571f1f209ae2e5b7876
Reviewed-on: http://gerrit.cloudera.org:8080/15859
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-05-22 05:07:00 +00:00
Joe McDonnell
0a0001e1a8 Revert "IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/"
The fix for IMPALA-9718 introduced test failures on Centos 7.
See IMPALA-9735.

This reverts commit 75d98b4b08.

Change-Id: Id09c55435f432a8626a45079f58860d6e27ac55e
Reviewed-on: http://gerrit.cloudera.org:8080/15881
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-05-07 23:15:32 +00:00
David Knupp
537c30dd06 IMPALA-9720: Update bitarray 0.9.0 -> 1.2.1
This is needed for python3 compatibility.

Tested by running gerrit-verify-dryrun.

Change-Id: I0641b03e880314a424d9d5a0651945c4f51273bc
Reviewed-on: http://gerrit.cloudera.org:8080/15858
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-05 12:33:31 +00:00
David Knupp
75d98b4b08 IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/
This file was originally copied here in 2013 because it was somehow
not availble on SUSE Linux systems. However, pkg_resources is part
of the stdlib, and should be available now on any system. Just to
sure about that, I tested on a SUSE Linux machine to be sure.

systest@dknupp-sles-test:~> python
Python 2.7.13 (default, Jan 11 2017, 10:56:06) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkg_resources
>>> pkg_resources.__file__
'/usr/lib/python2.7/site-packages/pkg_resources/__init__.pyc'
>>>

This version is now woefully out of date, and has had to be patched a
couple times for bugs. We should get rid of it.

Change-Id: I2a06f21177a6fa561478c3dd243d9262deb62db4
Reviewed-on: http://gerrit.cloudera.org:8080/15855
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: David Knupp <dknupp@cloudera.com>
2020-05-05 07:16:18 +00:00
David Knupp
6e0085c220 IMPALA-9721: Fix minor python2/3 syntax regression
A minor syntax error slipped past in a recent patch. In python3, the
syntax for catching exceptions requires the 'as' keyword. This error
was missed in code review.

Until automated python3 testing set up, this kind of error is likely
to repeat. See IMPALA-9724.

Change-Id: I0d36c609a3600c8084efcce0026537227144b27d
Reviewed-on: http://gerrit.cloudera.org:8080/15856
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: David Knupp <dknupp@cloudera.com>
2020-05-05 01:21:02 +00:00
Tamas Mate
1a36a0348b IMPALA-9398: Fix shell history duplication when cmdloop breaks
This change adds a new condition to avoid re-reading the impala-shell
history when the cmdloop is broken. The loop can break due to exceptions
such as KeyboardInterrupt.

Testing:
 - The change was tested manually on local dev env
 - Added a new EE shell test to verify the history after SIGINT

Change-Id: If4faf46134f44d91e56748642f47d448707db53c
Reviewed-on: http://gerrit.cloudera.org:8080/15345
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-30 01:55:51 +00:00
Thomas Tauber-Marshall
3a795b5af0 IMPALA-9681: Fix LdapImpalaShellTest
LdapImpalaShellTest was broken by IMPALA-3343 which changed the
wording of the startup message printed by impala-shell. This patch
fixes the test.

Change-Id: I9e7070fa6da4085ea858f64141529a7a23a86534
Reviewed-on: http://gerrit.cloudera.org:8080/15777
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-22 17:01:38 +00:00
David Knupp
bc9d7e063d IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. The default version
  of thrift is thrift-0.11.0 (See IMPALA-9489).

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result, this was inconsistency was reflected in the way we handled
  strings in the impala-shell code, but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

  The primary spot in the shell where we call decode() now is when sanitising
  input...

  args = self.sanitise_input(args.decode('utf-8'))

  ...and also whenever a library like re required it. Similarly, str.encode()
  is primarily used where a library like readline or csv requires is.

- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
  python 2. Without this, piping or redirecting stdout results in unicode errors.

- from __future__ import unicode_literals was added throughout

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and manually ran the tests against that.

  No effort has been made at this point to come up with a way to integrate
  testing of the shell in a python3 environment into our automated test
  processes.

Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-18 05:13:50 +00:00
David Knupp
c26e3db4bd IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1
Upgrades the impala-shell's bundled version of sqlparse to 0.3.1.
There were some API changes in 0.2.0+ that required a re-write of
the StripLeadingCommentFilter in impala_shell.py. A slight perf
optimization was also added to avoid using the filter altogether
if no leading comment is readily discernible.

As 0.1.19 was the last version of sqlparse to support python 2.6,
this patch also breaks Impala's compatibility with python 2.6.

No new tests were added, but all existing tests passed without
modification.

Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Reviewed-on: http://gerrit.cloudera.org:8080/15642
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-17 05:04:23 +00:00
David Knupp
5c541512f0 IMPALA-9582: Upgrade thrift_sasl to 0.4.2 for impala-shell
Change-Id: Iff739ebeaf5b022a7418883b638b5c5d17885f3b
Reviewed-on: http://gerrit.cloudera.org:8080/15610
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-01 04:22:38 +00:00
Abhishek Rawat
fc784f6e95 IMPALA-9466: impala-shell client retry for hs2-http protocol
Added retries for idempotent rpcs:
OpenSession, PingImpalaHS2Service, GetResultSetMetadata,
CloseImpalaOperation (non dmls), CancelOperation, GetOperationStatus,
GetRuntimeProfile, GetExecSummary, GetLog

Retries were also added to the 'set all' query execution and subsequent
result fetch in the ImpalaHS2Client._open_session()

The retries are only supported for hs2-http protocol and enabled by
default. At most there are 3 retries for a failed rpc. There is a sleep
duration of 'n' seconds after nth retry.

Only failed rpcs due to an error in the http transport are retried and
if an rpc failed because the server returned an error in the rpc
response then such scenarios are not retriable.

Improved error diagnostics by dumping stack trace when ImpalaShell.
_execute_stmt() gets an 'Unknown Exception'.

Testing:
- Added a custom_cluster test which injects fault into the http transport
and checks expected behavior from the various rpcs. Some of these tests
leave the session in an open state and so these tests are not suitable
for the e2e test framework which have metric verifiers expecting related
metrics to be 0 at the end of the test.
- Manually tested real world scenarios with impala-shell client
communicating with an impala coordinator via a fault injecting istio mesh.
- Manually tested dropping connections on an nginx ingress gateway by sending
SIGTERM to all worker processes.

Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Reviewed-on: http://gerrit.cloudera.org:8080/15378
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-28 00:32:18 +00:00
David Knupp
ed70492580 IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
A few built-ins were changed in python 3 -- e.g., xrange became range,
ConfigParser became configparser, etc. We can redefine some of those
things in a single place, and import them from there as needed. Other
items may also be added as we go along.

Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3
Reviewed-on: http://gerrit.cloudera.org:8080/15514
Reviewed-by: David Knupp <dknupp@cloudera.com>
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-21 19:52:07 +00:00
David Knupp
f1c8176e65 IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/
We've relied on a copied version of thrift_sasl.py, which needs
to be updated to be compatible with python 3, so taking this
opportunity to add the thrift_sasl 0.4.1 package to ext-py like
the other external python libs we use.

Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75
Reviewed-on: http://gerrit.cloudera.org:8080/15513
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-21 12:25:55 +00:00
David Knupp
ed15c2c58f IMPALA-3343: Part 1 -- Fix simple python 2->3 syntax errors
In an effort to keep the work of reviewing the changes more manageable
with regard to making the impala-shell python3 compatible, I'm trying
to break the patches up into smaller chunks.

The first patch is the easiest one -- simply addressing the handful of
syntax issues that aren't python 3 compatible, namely changing the
print statements to function calls, changing the way we catch exceptions,
and adding a few simple branches to work around the removal of such
things as dict.iteritems().

We needed the print function imported from __future__ because it allows
us to pass in a file descriptor, e.g., sys.stderr.

Notably, there's nothing in this patch related to string/bytes/unicode
changes from python 2 to 3.

Change-Id: I9a515da01ef03d5936cb1a4d9e4bc6d105386b1d
Reviewed-on: http://gerrit.cloudera.org:8080/15487
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-20 03:10:07 +00:00
Thomas Tauber-Marshall
3fd6f60b22 IMPALA-9414 (part 2): Support the 'Expect: 100-continue' http header
The 'Expect: 100-continue' http header allows http clients to send
only the headers for their request, get a confirmation back from the
server that the headers are valid, and only then send the body of the
request, avoiding the overhead of sending large requests that will
ultimately fail.

This patch adds support for this in the HS2 HTTP server by having
THttpServer look for the header, and if it's present and the request
is validated returning a '100 Continue' response before reading the
body of the request.

It also adds supports for using this header on large requests sent by
impala-shell.

Testing:
- This case is covered by the existing test_large_sql, however that
  test was previously broken and passing spuriously. This patch fixes
  the test.
- Passed all other shell tests.

Change-Id: I4153968551acd58b25c7923c2ebf75ee29a7e76b
Reviewed-on: http://gerrit.cloudera.org:8080/15284
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
2020-03-13 17:00:42 +00:00
Thomas Tauber-Marshall
c3d65cab55 IMPALA-9414 (part 1): Copy THttpClient from Thrift into Impala
This is a prelimary patch that simply copies THttpClient.py from
Thrift master into Impala, changes imports as appropriate, and adjusts
the formatting from 4 spaces to 2 spaces.

This is to allow us to make modifications to THttpClient in future
patches. There are no functional changes in this patch.

Change-Id: I2662f1d4d455120442ef7c0c198685c07207aeed
Reviewed-on: http://gerrit.cloudera.org:8080/15283
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
2020-03-13 17:00:42 +00:00
Alice Fan
e1d1428181 IMPALA-9384: Improve Impala shell usability by enabling live_progress in interactive mode
In order to improve usability, this patch makes Impala shell show query
processing status while the query is running. The patch enables shell
option live_progress by default when a user launches impala shell in
the interactive mode. The patch also adds a new command line flag
"--disable_live_progress", which allows a user to disable live_progress
at runtime. In the interactive mode, a user can disable live_progress
by either using the command line flag or setting the option as False in
the config file. As for in the non-interactive mode (when the -q or -f
options are used), live reporting is not supported. Impala-shell will
disable live_progress if the mode is detected.

Testing:
- Added and updated tests in test_shell_interactive.py and test_shell_commandline.py
- Successfully ran all shell related tests

Change-Id: I3765b775f663fa227e59728acffe4d5ea9a5e2d3
Reviewed-on: http://gerrit.cloudera.org:8080/15219
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-03-09 21:28:19 +00:00
David Knupp
df875dc05b IMPALA-9424: Add six to shell/ext-py
Change-Id: I003e0008c138ee1f2c290775553d4cfc66e9b7fe
Reviewed-on: http://gerrit.cloudera.org:8080/15293
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-28 01:37:38 +00:00
Adam Tamas
0282c024b4 IMPALA-9036: Fix CTRL+C a multiline query in impala-shell
Modified the '_signal_handler()' in impala-shell.py so when a user
cancels a multiline query by hitting CTRL+C it will cancel the
query, instead of just the current line.

Testing:
-Added 'test_cancellation_mid_command()' to test_shell_interactive.py
to test if it really cancels the partial commands.
-Manually tested by giving partial commands then cancelling them.

Change-Id: Id8d8bdaee929e2655eb66e886ae92a02d3fbd83f
Reviewed-on: http://gerrit.cloudera.org:8080/15233
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-24 15:18:16 +00:00
Tim Armstrong
0bb056e525 IMPALA-4224: execute separate join builds fragments
This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.

Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
  query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
   consumption for each fragment separately, and assuming that
   all fragments hit their peak resource consumption at the
   same time. IMPALA-9255 is the follow-on to make the resource
   estimation more accurate.

Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
  which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.

Runtime filter changes:
* Build sinks now produce runtime filters, which required
  planner and coordinator fixes to handle.

DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
  resources. This depends on Send() not holding onto references
  to input batches, which was true except for NljBuilder. This
  invariant is documented.

Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
  functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
  so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
  its input tree.

JoinNode changes:
* Join node blocks waiting for build-side to be ready,
  then eventually signals that it's done, allowing the builder
  to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
  the external builder. There is a 1:1 relationship between
  the node and the builder, so we don't deal with thread safety
  yet.
* Buffer reservations are transferred between the builder and join
  node when running with the separate builder. This is not really
  necessary right now, since it is all single-threaded, but will
  be important for the shared broadcast.
  - The builder transfers memory for probe buffers to the join node
    at the end of each build phase.
  - At end of each probe phase, reservation needs to be handed back
    to builder (or released).

ExecSummary changes:
* The summary logic was modified to handle connecting fragments
  via join builds. The logic is an extension of what was used
  for exchanges.

Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
  custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
  coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
  and mt_dop=4.

Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
  change.

Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Reviewed-on: http://gerrit.cloudera.org:8080/14859
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-20 01:51:54 +00:00
wzhou-code
6a23ec6985 IMPALA-6393: Add support for live_summary and live_progress in impalarc
This patch adds support for live_summary and live_progress in impalarc.

Testing:
1) Added unit-test cases in test_shell_commandline.py and
   test_shell_interactive.py for live_summary and live_progress.
2) Successfully ran all other tests in test_shell_interactive.py and
   test_shell_commandline.py

Change-Id: If4549b775a7966ad89d661d0349cc78754e13a86
Reviewed-on: http://gerrit.cloudera.org:8080/14927
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-23 01:48:13 +00:00
Andrew Sherman
ed5e7dae94 IMPALA-9240: add HTTP code handling to THttpClient.
Before this change Impala Shell is not checking HTTP return codes when
using the hs2-http protocol. The shell is sending a request message
(e.g. send_CloseOperation) but the HTTP call to send this message may
fail. This will result in a failure when reading the reply (e.g. in
recv_CloseOperation) as there is no reply data to read. This will
typically result in an 'EOFError'.

In code that overrides THttpClient.flush(), check the HTTP code that is
returned after the HTTP call is made. If the code is not 1XX
(informational response) or 2XX (successful) then throw an RPCException.

This change does not contain any attempt to recover from an HTTP failures
but it does allow the failure to be detected and a message to be
printed.

In future it may be possible to retry after certain HTTP errors.

Testing:
- Add a new test for impala-shell that tries to connect to an HTTP
  server that always returns a 503 error. Check that an appropriate
  error message is printed.

Change-Id: I3c105f4b8237b87695324d759ffff81821c08c43
Reviewed-on: http://gerrit.cloudera.org:8080/14924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-12-20 00:14:00 +00:00
norbert.luksa
2114fc6155 IMPALA-4618: Fixing #Hosts and adding #Instances in exec summary
When mt_dop > 0, the summary is reporting the number of fragment
instances, instead of the number of hosts as the header would
imply.

This commit fixes the issue so the number of hosts will be shown
under the #Hosts column. The commit also adds an #Inst column
where the number of instances are shown (current behaviour).

Tests:
 * Changed profile tests with mt_dop > 0.
 * Updated benchmark tests and shell tests accordingly.

Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69
Reviewed-on: http://gerrit.cloudera.org:8080/14715
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 07:28:23 +00:00
Joe McDonnell
53774770bc IMPALA-9028: impala-shell should not try to reconnect if quitting
When the impala-shell is disconnected, it will try to reconnect
for any command that a user runs (as part of ImpalaShell's precmd()).
This doesn't make sense when the user is trying to quit the shell
(i.e. by typing 'quit' or 'exit' or hitting Ctrl-D).

This skips the attempt to reconnect when quitting the shell.

Testing:
 - Added test in test_shell_interactive.py
 - Verified by hand

Change-Id: I6a76bc515db609498fa8772e9f0b0c547b82c09e
Reviewed-on: http://gerrit.cloudera.org:8080/14391
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-16 03:23:14 +00:00
David Knupp
ba808f67dd IMPALA-1071: Distributable python package for impala-shell
The patch adds a set of scripts for converting the impala-shell
into a true distributable python package. The package can be
installed using familiar python commands, e.g.:

  $ python setup.py (install|develop)

or

  $ pip install -e /path/to/dist/dir

The entry point script, make_python_package.sh, will run as a
part of the standard sequence of steps that results from calling
buildall.sh, and will produce a gzipped tarball inside of
Impala/shell/dist as an artifact. Thereafter, make_python_package.sh
can be run manually any time.

The expectation is that an official maintainer would need to manually
upload official releases to the Python Package Index as appropriate.

Change-Id: Ib8c745bddddf6a16f0c039430152745a2f00e044
Reviewed-on: http://gerrit.cloudera.org:8080/14181
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-10 06:50:50 +00:00
Tim Armstrong
e070dbb02c IMPALA-8932: addendum - protocol var not defined
Change-Id: I75c41a02bc7f1314e48bb5a39b945119264ce478
Reviewed-on: http://gerrit.cloudera.org:8080/14225
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-13 19:51:29 +00:00
Tim Armstrong
954e810b0e IMPALA-8932: shell shouldn't retry kerberos over http
Change-Id: I5dde277a6a0ddbe5a919bcf376bbc19f0b48e95e
Reviewed-on: http://gerrit.cloudera.org:8080/14201
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-10 04:25:04 +00:00
Bharath Vissapragada
fafb2c9786 IMPALA-8864: Handle py ssl library incompatibility in http mode
Older python versions shipped ssl libraries that did not implement
SSLContext class. THttpClient relies on it. This patch,

- Fails the shell gracefully when such a python version is used.
- Skips the http test dimension when running the test suite on a
machine that ships such a python verison (centos 6).

Change-Id: I28846bde0b8bb8f787e6330cddf91645dba4160e
Reviewed-on: http://gerrit.cloudera.org:8080/14069
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-08-15 14:01:27 +00:00
Bharath Vissapragada
72c9370856 IMPALA-8717: impala-shell support for HS2 HTTP endpoint
Adds impala-shell support to connect to HiveServer2 HTTP endpoint.
Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/.

Use --protocol='hs2-http' to enable this behavior.

Example usages:
---------------
impala-shell --protocol='hs2-http'  (No auth)
impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth)
impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS)
impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP +
TLS)

Limitations:
-----------
- Does not support Kerberos (-k) due to lack ot SPNEGO support.

Testing:
--------
- Parameterized existing shell tests to support this combination.
- Added shell test coverage for LDAP auth.

Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534
Reviewed-on: http://gerrit.cloudera.org:8080/13746
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
2019-07-29 05:43:48 +00:00
Fredy Wijaya
63b2df7632 Add ext-py/bitarray-0.9.0 in .gitignore
Change-Id: I22abd82eb2c5a4a52fd56fd2266b636a0dbde071
Reviewed-on: http://gerrit.cloudera.org:8080/13745
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-27 20:51:15 +00:00
Jiawei Wang
3a4f8b3ae1 IMPALA-8652 Illegal delimiter error in shell has unknown error
Problem:
When assign --output_delimiter to invalid value, the validation of
the argument is done only after the query is running, ValueError is
raised in DelimitedOutputFormatter and caught in _exec_stmt in shell

Solution:
Add --output_delimiter option check before impala-shell initialization
Remove delimiter length check in DelimitedOutputFormatter

Testing:
tests/shell/test_shell_commandline.py passed

Example:
$ impala-shell.sh -B --output_delimiter '||' -q 'select 1,1,1'
Illegal delimiter ||, the delimiter must be a 1-character string.

Change-Id: I7ee2fccd305b104b3aff44c57659b6f14f2f4a05
Reviewed-on: http://gerrit.cloudera.org:8080/13690
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-21 06:01:57 +00:00
Tim Armstrong
f1f3ae9ec2 IMPALA-7290: part 2: Add HS2 support to Impala shell
HS2 is added as an option via --protocol=hs2. The user-visible
differences in behaviour are minimal. Beeswax is still the
default and can be explicitly enabled via --protocol=beeswax
but will be deprecated. The default is unchanged because
changing the default could break certain workflows, e.g.
those that explicitly specify the port with -i or deployments
that hit --fe_service_threads for HS2 and somehow rely on
impala-shell not contributing to that limit. For most
workflows the change is transparent and we should change
the default in a major version change.

This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.

Differences between HS2 and Beeswax are abstracted into
ImpalaClient subclasses.
Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
  regression. The HS2 protocol requires decoding
  more primitive values (because its not a string-per-row),
  which was slow with the pure python implementation of
  TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
  the columnar results into rows in impala_client.py. The transposition
  needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
  address.
* Add CloseImpalaOperation() extension to return DML row counts. This
  possibly addresses IMPALA-1789, although we need to confirm that
  this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
  tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
  consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
  default. This captures the effective value of TIMEZONE, which
  was previously missing. This also requires test changes where
  the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
  shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
  passing to avoid implicit parameter passing through multiple
  function calls.
* Clean up argument handling in shell tests to consistently pass
  around lists of arguments instead of strings that are subject
  to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
  HS2 sessions. This is enforced by making ImpalaShell a context
  manager and also eliminating all sys.exit() calls that would
  bypass the explicit connection closing.

Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
  change as a result of switching to server-side vs client-side
  formatting.
* Verified that newly-added tests were actually going through HS2
  by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
  are left open at the end of tests.

Performance:
Baseline from beeswax shell for large extract is as follows:

  $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
  real    0m6.708s
  user    0m5.132s
  sys     0m0.204s

After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
  real    0m7.625s
  user    0m6.436s
  sys     0m0.256s

Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Reviewed-on: http://gerrit.cloudera.org:8080/12884
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 10:23:28 +00:00
Robbie Zhang
d5673bf241 IMPALA-8595: Support TLSv1.2 with Python < 2.7.9 in shell
IMPALA-5690 replaced thrift 0.9.0 with 0.9.3 in which THRIFT-3505
changed transport/TSSLSocket.py.
In thrift 0.9.3, if the python version is lower than 2.7.9, TSSLSocket
uses PROTOCOL_TLSv1 by default and the SSL version is passed to
TSSLSocket as a paramter when calling TSSLSocket.__init__.
Although TLSv1.2 is supported by Python from 2.7.9, Red Hat/CentOS
support TLSv1.2 from 2.7.5 with upgraded python-libs. We need to get
impala-shell support TLSv1.2 with Python 2.7.5 on Red Hat/CentOS.

TESTING:
impala-py.test tests/custom_cluster/test_client_ssl.py

Change-Id: I3fb6510f4b556bd8c6b1e86380379aba8be4b805
Reviewed-on: http://gerrit.cloudera.org:8080/13457
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-02 02:40:10 +00:00
Ethan Xue
487547ec44 IMPALA-6042: Allow Impala shell to use a global impalarc config
Currently, impalarc files can be specified on a per-user basis
(stored in ~/.impalarc), and they aren't created by default. The
Impala shell should pick up /etc/impalarc as well, in addition
to the user-specific configurations.

The intent here is to allow a "global" configuration of the shell
by a system administrator. The default path of the global config
file can be changed by setting the $IMPALA_SHELL_GLOBAL_CONFIG_FILE
environment variable.

Note that the options set in the user config file take precedence
over those in the global config file.

Change-Id: I3a3179b6d9c9e3b2b01d6d3c5847cadb68782816
Reviewed-on: http://gerrit.cloudera.org:8080/13313
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-30 03:59:54 +00:00
Thomas Tauber-Marshall
10b9195035 IMPALA-8407: Warn when Impala shell fails to connect due to tlsv1.2
When impala-shell is used to connect to an impala cluster with
--ssl_minimum_version=tlsv1.2, if the Python version being used is
< 2.7.9 the connection will fail due to a limitation of TSSLSocket.
See IMPALA-6990 for more details.

Currently, when this occurs, the error that gets printed is "EOF
occurred in violation of protocol", which is not very helpful. This
patch detect this situation and prints a more informative warning.

Testing:
- Updated test_tls_v12 so that instead of being skipped on affected
  platforms, it runs and checks for the presence of the warning.

Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Reviewed-on: http://gerrit.cloudera.org:8080/13003
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-18 23:19:04 +00:00
Yongzhi Chen
f3f7774d1a IMPALA-6216: Make PYTHON_EGG_CACHE configurable
User can set environment variable PYTHON_EGG_CACHE before call
impala-shell.

Testing:
Run impala-shell with or w/o PYTHON_EGG_CACHE configured

Change-Id: I695b2b31d9045eef1a53268f6516858935aed508
Reviewed-on: http://gerrit.cloudera.org:8080/12911
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
2019-04-05 00:43:57 +00:00
Fredy Wijaya
7adb411bd9 IMPALA-8330: Impala shell config file should support flag names
This patch updates the file format in Impala shell config file to accept
both short and long flag names in addition to optparse's dest names
(variable names to store flag values) for better user experience because
dest names are internal to Impala shell.

Format:
[impala]
flag_name=flag_value

Example:
[impala]
; This is long flag.
query=select 1
; This is short flag.
Q=DEFAULT_FILE_FORMAT=parquet
; Flags can be repeated with ,
var=msg1=hello,var=msg2=world
; The old format using internal variable name is still supported for
; backward compatibility.
keyval=msg3=foo,keyval=msg4=bar

Testing:
- Ran all E2E shell tests on Python 2.6 and 2.7.

Change-Id: Ic43603c1b538af08fddcab1b2c1f6ad1af1a6cb9
Reviewed-on: http://gerrit.cloudera.org:8080/12823
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-29 09:38:24 +00:00
Fredy Wijaya
ed9e1d499b IMPALA-8340: Rewrite fixes in IMPALA-8317 and IMPALA-8337
Fixes in IMPALA-8317 and IMPALA-8337 introduced third-party dependencies
in Impala shell which is problematic in multi-Python environment. This
patch rewrites the fixes using an alternative solution when dealing with
duplicate options without any third-party dependencies. For example:

[impala]
keyval=msg1=hello,keyval=msg2=world

Testing:
- Ran all shell tests on Python 2.6 and 2.7.
- Ran make_shell_tarball.sh and ran Impala shell from the tarball
  without any issue.

Change-Id: Ifc0bf391ba26cf5a34f622a4157d7287453cc539
Reviewed-on: http://gerrit.cloudera.org:8080/12844
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-26 04:03:46 +00:00
Fredy Wijaya
eb95c912cb IMPALA-8337: Fix OrderedDict Python compatibility issue in Impala shell
IMPALA-8317 added a patch that uses OrderedDict, which is not available
on Python 2.6 or lower. The patch in IMPALA-8317 also requires a newer
version of configparser to handle duplicate keys, which is not available
in Python 2.6. This patch fixes the issue by using the OrderedDict from
ordereddict third-party library when running on Python 2.6 or lower and
using configparser from the backport.

Testing:
- Ran all E2E shell tests on Python 2.6 and 2.7

Change-Id: Iab1a33542319e9bb75d806bfd38b158995b54aa9
Reviewed-on: http://gerrit.cloudera.org:8080/12830
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-23 04:14:29 +00:00
Andrew Sherman
00df388c5f IMPALA-8332: Remove Impala Shell warnings part 1
In Thrift 0.9.3 the TSSLSocket initializer TSSLSocket.__init__ prints
warnings if positional parameters are used. Change our usage of this
initializer to use named parameters.

Follow up work on "IMPALA-8333 Remove Impala Shell warnings part 2" will
remove one further warning message.

TESTING

Ran all end-to-end tests.
Added tests for the deprecation warnings to test_client_ssl.py.

Change-Id: I31f9a0bb12ca6a1da9129eacd29ac105b883e01b
Reviewed-on: http://gerrit.cloudera.org:8080/12837
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-22 23:11:18 +00:00
Fredy Wijaya
6853184234 IMPALA-8317: Add support for list type flags in Impala shell config file
This patch adds support for list type flags in Impala shell config
file, i.e. those that use action="append", such as --var and
--query_option. To make it less error-prone, this patch also updates
the logic for bool flags in the config file to also look at the
correct type from the argument parser instead of relying on whether or
not the default values are set in impala_shell_config_defaults.py.

Testing:
- Added a new test for list type flags
- Ran all shell E2E tests

Change-Id: I824ca15b4e1064a391b13deef9cecd34c928ef73
Reviewed-on: http://gerrit.cloudera.org:8080/12781
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-21 10:29:43 +00:00
Andrew Sherman
9ad9a1624a IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
This change fixes a regression introduced by "IMPALA-2195 Improper
handling of comments in queries."

The Impala Shell parses input text into several strings using the
sqlparse library. One of the returned strings is the sql command, this
is used to determine the correct do_<command> method to call. Another of
the returned strings is the leading comment, which is a comment that
appears before legal sql text.

Python2 has strings with multiple encodings. The strings returned from
the sqlparse library have the Unicode encoding. Impala Shell converts
the sql command string to utf-8 encoding before using it.

If the Impala Shell needs to send the sql command to an Impala
Coordinator then it (re)constructs the query out of the strings
returned by the sqlparse library. This query is sent to the Coordinator
via Beeswax protocol. The query is converted to an ascii string before
being sent. The conversion can fail if the leading comment string
contains Unicode characters, which can't be directly converted to ascii.
So the trigger for the bug is that the leading comment contains Unicode.

The fix is that the leading comment string should be converted to utf-8
in the same way as the sql command.

TESTING:

Ran all end -to-end tests.
Added two test cases to tests/shell/test_shell_interactive.py

Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Reviewed-on: http://gerrit.cloudera.org:8080/12812
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-20 22:18:16 +00:00
Fredy Wijaya
561158b306 IMPALA-3323: Fix unrecognizable shell option when --config_file is specified
Impala shell defines a dictionary of default values for some shell
options. Before this patch, the logic for --config_file checks if
a shell option exists by using the default value dictionary, which
does not contain the exhaustive list of shell options. This causes
a valid option in the Impala shell config file to be treated as
unrecognizable shell option due to the option not having a default
value. The patch fixes the issue by changing the logic that checks
for the existence of an option using the option list from optparse.
The patch also fixes the missing dest parameter for ldap_password_cmd
option.

Testing:
- Updated test_shell_commandline::test_config_file
- Ran all shell tests

Change-Id: Iff371d038fa77ba659e9b7c7a4ed5b374237f2ea
Reviewed-on: http://gerrit.cloudera.org:8080/12245
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-23 00:15:28 +00:00
Philip Zeyliger
a8d3b765d8 IMPALA-7666: Adding an opaque client identifier to query options.
We sometimes struggle to identify the client (e.g., a given version of a
JDBC driver, Tableau, Hue, etc.) for a given query. This commit adds a
User-Agent header style, called "Client Identifier", which clients can
set as a Query Option. Nothing is done with this header, but it's
written into logs and query profiles.

This commit includes changes to impala-shell to include the version of
impala shell with an associated test.

A future commit will serialize the name of the py.test being run into
this field, which is handy for figuring out where a query came from.

Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4
Reviewed-on: http://gerrit.cloudera.org:8080/12130
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-10 03:11:35 +00:00
Yongjun Zhang
7cc9092212 IMPALA-5474: Adding a trivial subquery turns error into warning
After adding a subquery to a query that fails with ERROR, it fails with WARNING.
The fix here makes it return ERROR.

Testing:
Added unit tests;
Done real cluster testing with reported cases.

Change-Id: Ibedb11dd3d50bcdb21d508f7d21691925491946e
Reviewed-on: http://gerrit.cloudera.org:8080/12022
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-01-04 21:51:48 +00:00