Commit Graph

21 Commits

Author SHA1 Message Date
Csaba Ringhofer
14035065fa IMPALA-12145: Fix profiles with non-ascii character in impala-shell (python2)
As __future__.unicode_literals is imported in impala-shell
concatenating an str with a literal leads to decoding the
string with 'ascii' codec which fails if there are non-ascii
characters. Converting the literal to str solves the issue.

Testing:
- added regression test + ran related EE tests

Change-Id: I99b72dd262fc7c382e8baee1dce7592880c84de2
Reviewed-on: http://gerrit.cloudera.org:8080/19893
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 00:33:34 +00:00
yx91490
e00a2cbd81 IMPALA-11380: Fix trailing whitespace for VerticalOutputFormatter
Similar to IMPALA-11332, The current VerticalOutputFormatter is
stripping trailing whitespaces from the last line of output. This
rstrip() was intended to remove an extra newline,
but it is matching other white space. This is a
problem for a SQL query like:
select 'Trailing whitespace          ';

This changes the rstrip() to rstrip('\n') to
avoid removing the other white space.

Testing:
 - Current shell tests pass
 - Added a shell test that verifies trailing whitespace
   is not being stripped.

Change-Id: Id66162d28498e7bef2933651616cf3df2fb0f354
Reviewed-on: http://gerrit.cloudera.org:8080/18722
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-07-27 04:23:41 +00:00
Joe McDonnell
7eb200abf1 IMPALA-11337: Flush row output before writing "Fetched X row(s)"
When redirecting stdout and stderr to a file, the
existing code can sometimes output the "Fetched X row(s)"
line before finishing the row output. e.g.
impala-shell -B -q "select 1" >> outfile.txt 2>> outfile.txt

The rows output goes to stdout while the control messages
like "Fetched X row(s)" go to stderr. Since stdout can buffer
output, that can delay the output. This adds a flush for
stdout before writing the "Fetched X row(s)" message.

Testing:
 - Added a shell test that redirects stdout and stderr to
   a file and verifies the contents. This consistently
   fails without the flush.
 - Other shell tests pass

Change-Id: I83f89c110fd90d2d54331c7121e407d9de99146c
Reviewed-on: http://gerrit.cloudera.org:8080/18625
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-15 05:28:05 +00:00
yx91490
c7784bde55 IMPALA-1682: Support printing the output of a query (rows) vertically.
In vertical mode, impala-shell will print each row in the format:
firstly print a line contains line number, then print this row's columns
line by line, each column line started with it's name and a colon.

To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL=
true' in interactive mode. to disable it in interactive mode: 'set
VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set
WRITE_DELIMITED=true' is specified.

Tests:
add methods in test_shell_interactive.py and test_shell_commandline.py.

Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf
Reviewed-on: http://gerrit.cloudera.org:8080/18549
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-13 15:41:07 +00:00
Joe McDonnell
c41e6941ca IMPALA-11332: Fix trailing whitespace for CSV output
The current CSV output is stripping trailing
whitespaces from the last line of CSV output. This
rstrip() was intended to remove an extra newline,
but it is matching other white space. This is a
problem for a SQL query like:
select 'Trailing whitespace          ';

This changes the rstrip() to rstrip('\n') to
avoid removing the other white space.

Testing:
 - Current shell tests pass
 - Added a shell test that verifies trailing whitespace
   is not being stripped.

Change-Id: I69d032ca2f581587b0938d0878fdf402fee0d57e
Reviewed-on: http://gerrit.cloudera.org:8080/18580
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-02 09:36:50 +00:00
Joe McDonnell
ed0d9341d3 IMPALA-11325: Fix UnicodeDecodeError for shell file output
When using the --output_file commandline option for
impala-shell, the shell fails with UnicodeDecodeError
if the output contains Unicode characters.

For example, if running this command:
impala-shell -B -q "select '引'" --output_file=output.txt
This fails with:
UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)

This happens due to an encode('utf-8') call happening
in OutputStream::write() on a string that is already UTF-8 encoded.
This changes the code to skip the encode('utf-8') call for Python 2.
Python 3 is using a string and still needs the encode call.

This is mostly a pragmatic fix to make the code a little bit
more functional, and there is more work to be done to have
clear contracts for the format() methods and clear points
of conversion to/from bytes.

Testing:
 - Ran shell tests with Python 2 and Python 3 on Ubuntu 18
 - Added a shell test that outputs a Unicode character
   to an output file. Without the fix, this test fails.

Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
Reviewed-on: http://gerrit.cloudera.org:8080/18576
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-02 01:53:09 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
Andrew Sherman
3b763b5c32 IMPALA-10447: Add a newline when exporting shell output to a file.
Impala shell outputs a batch of rows using OutputStream. Inside
OutputStream, output to a file is handled slightly differently from
output that is written to stdout. When writing to stdout we use print()
(which appends a newline) while when writing to a file we use write()
(which adds nothing). This difference was introduced in IMPALA-3343 so
this bug may be a regression introduced then. To ensure that output is
the same in either case we need to add a newline after writing each
batch of rows to a file.

TESTING:
    Added a new test for this case.

Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
Reviewed-on: http://gerrit.cloudera.org:8080/16966
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-26 08:32:29 +00:00
Adam Tamas
fe6e625747 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.

The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");

This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.

This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.

For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).

Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'

Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05
Reviewed-on: http://gerrit.cloudera.org:8080/16418
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
2020-09-05 09:42:46 +00:00
Csaba Ringhofer
b7965d8240 Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix"
This reverts commit 75146c9138.

Change-Id: I57f790389a8c847877999d2b9b8185939b416c07
Reviewed-on: http://gerrit.cloudera.org:8080/16417
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-09-04 12:28:56 +00:00
Adam Tamas
75146c9138 IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix
While the ds_hll_sketch() generates a string value as output the data
is not an ascii encoded text but a bitsketch, because of this, when
the shell get this data it disconnect while it tries to decode it.

The issue can be reproduced with a simple method like using unhex
with a wrong input.
Example: SELECT unhex("aa");

This patch contains a solution, where we replace any not UTF-8
decodable characters if we run into an UnicodeDecodeError after
fetching it.

This solution is working with the Thrift 0.9.3 autogenerated gen-py
but still fails with Thrift 0.11.0.

For Thrift 0.11.0 the error is catched and an error message is sent
(not working with beeswax protocol, because it generates a different
error (TypeError) which can come for other reasons too).

Testing:
-manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax'

Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8
Reviewed-on: http://gerrit.cloudera.org:8080/16305
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-09-04 12:18:28 +00:00
David Knupp
bc9d7e063d IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. The default version
  of thrift is thrift-0.11.0 (See IMPALA-9489).

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result, this was inconsistency was reflected in the way we handled
  strings in the impala-shell code, but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

  The primary spot in the shell where we call decode() now is when sanitising
  input...

  args = self.sanitise_input(args.decode('utf-8'))

  ...and also whenever a library like re required it. Similarly, str.encode()
  is primarily used where a library like readline or csv requires is.

- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
  python 2. Without this, piping or redirecting stdout results in unicode errors.

- from __future__ import unicode_literals was added throughout

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and manually ran the tests against that.

  No effort has been made at this point to come up with a way to integrate
  testing of the shell in a python3 environment into our automated test
  processes.

Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-on: http://gerrit.cloudera.org:8080/15524
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-18 05:13:50 +00:00
David Knupp
ed70492580 IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
A few built-ins were changed in python 3 -- e.g., xrange became range,
ConfigParser became configparser, etc. We can redefine some of those
things in a single place, and import them from there as needed. Other
items may also be added as we go along.

Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3
Reviewed-on: http://gerrit.cloudera.org:8080/15514
Reviewed-by: David Knupp <dknupp@cloudera.com>
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-21 19:52:07 +00:00
David Knupp
ed15c2c58f IMPALA-3343: Part 1 -- Fix simple python 2->3 syntax errors
In an effort to keep the work of reviewing the changes more manageable
with regard to making the impala-shell python3 compatible, I'm trying
to break the patches up into smaller chunks.

The first patch is the easiest one -- simply addressing the handful of
syntax issues that aren't python 3 compatible, namely changing the
print statements to function calls, changing the way we catch exceptions,
and adding a few simple branches to work around the removal of such
things as dict.iteritems().

We needed the print function imported from __future__ because it allows
us to pass in a file descriptor, e.g., sys.stderr.

Notably, there's nothing in this patch related to string/bytes/unicode
changes from python 2 to 3.

Change-Id: I9a515da01ef03d5936cb1a4d9e4bc6d105386b1d
Reviewed-on: http://gerrit.cloudera.org:8080/15487
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-20 03:10:07 +00:00
Jiawei Wang
3a4f8b3ae1 IMPALA-8652 Illegal delimiter error in shell has unknown error
Problem:
When assign --output_delimiter to invalid value, the validation of
the argument is done only after the query is running, ValueError is
raised in DelimitedOutputFormatter and caught in _exec_stmt in shell

Solution:
Add --output_delimiter option check before impala-shell initialization
Remove delimiter length check in DelimitedOutputFormatter

Testing:
tests/shell/test_shell_commandline.py passed

Example:
$ impala-shell.sh -B --output_delimiter '||' -q 'select 1,1,1'
Illegal delimiter ||, the delimiter must be a 1-character string.

Change-Id: I7ee2fccd305b104b3aff44c57659b6f14f2f4a05
Reviewed-on: http://gerrit.cloudera.org:8080/13690
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-21 06:01:57 +00:00
Tim Armstrong
318051cc21 IMPALA-2717: fix output of formatted unicode to non-TTY
The bug is that PrettyOutputFormatter.format() returned a unicode
object, and Python cannot automatically write unicode objects to
output streams where there is no default encoding.

The fix is to convert to UTF-8 encoded in a regular string, which
can be output to any output device. This makes the output type
consistent with DelimitedOutputFormatter.format().

Based on code by Marcell Szabo.

Testing:
Added a basic test.

Played around in an interactive shell to make sure that unicode
characters still work in interactive mode.

Change-Id: I9de641ecf767a2feef3b9f48b344ef2d55e17a7f
Reviewed-on: http://gerrit.cloudera.org:8080/9928
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-12 20:34:47 +00:00
Jim Apple
374f1121da IMPALA-3224: De-Cloudera non-docs JIRA URLs
John Russell is planning to fix the URLS in docs in a separate commit.

Fixed using:

    (git ls-files | xargs replace \
    'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \
    git checkout HEAD docs

Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Reviewed-on: http://gerrit.cloudera.org:8080/6487
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-05-07 04:44:57 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Martin Grund
ed18dd4a8b IMPALA-80: Dynamic progress reporting for the shell
This patch adds a way to allow for dynamic progress reporting in the
shell. There are two new command line flags for the shell

   --live_progress - will print the completed vs total # of scan ranges
   --live_summary - prints an updated exec summary

In addition to the command line flags, these options can be set from
within the shell using:

   set LIVE_SUMMARY=True
   set LIVE_PROGRESS=True

The new options will be listed under shell options. Both reports will be
updated at most every second, for longer running queries it will be
adjusted to the time between two RPC calls to get the query status. To
provide this information in the ExecSummary, the Thrift structure for
the ExecSummary was extended to contain a progress indicator. The output
is printed to stderr and only available in interactive mode.

An example video is available here:

https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k

Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2
Reviewed-on: http://gerrit.cloudera.org:8080/508
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
2015-07-17 17:59:29 +00:00
ishaan
2b4071bf5d Remove csv.field_size_limit from the shell. 2014-01-08 10:50:55 -08:00
ishaan
26684b07d7 Introduce output formatting for the shell. 2014-01-08 10:50:51 -08:00