impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	14035065fa	IMPALA-12145: Fix profiles with non-ascii character in impala-shell (python2) As __future__.unicode_literals is imported in impala-shell concatenating an str with a literal leads to decoding the string with 'ascii' codec which fails if there are non-ascii characters. Converting the literal to str solves the issue. Testing: - added regression test + ran related EE tests Change-Id: I99b72dd262fc7c382e8baee1dce7592880c84de2 Reviewed-on: http://gerrit.cloudera.org:8080/19893 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-25 00:33:34 +00:00
yx91490	e00a2cbd81	IMPALA-11380: Fix trailing whitespace for VerticalOutputFormatter Similar to IMPALA-11332, The current VerticalOutputFormatter is stripping trailing whitespaces from the last line of output. This rstrip() was intended to remove an extra newline, but it is matching other white space. This is a problem for a SQL query like: select 'Trailing whitespace '; This changes the rstrip() to rstrip('\n') to avoid removing the other white space. Testing: - Current shell tests pass - Added a shell test that verifies trailing whitespace is not being stripped. Change-Id: Id66162d28498e7bef2933651616cf3df2fb0f354 Reviewed-on: http://gerrit.cloudera.org:8080/18722 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-07-27 04:23:41 +00:00
Joe McDonnell	7eb200abf1	IMPALA-11337: Flush row output before writing "Fetched X row(s)" When redirecting stdout and stderr to a file, the existing code can sometimes output the "Fetched X row(s)" line before finishing the row output. e.g. impala-shell -B -q "select 1" >> outfile.txt 2>> outfile.txt The rows output goes to stdout while the control messages like "Fetched X row(s)" go to stderr. Since stdout can buffer output, that can delay the output. This adds a flush for stdout before writing the "Fetched X row(s)" message. Testing: - Added a shell test that redirects stdout and stderr to a file and verifies the contents. This consistently fails without the flush. - Other shell tests pass Change-Id: I83f89c110fd90d2d54331c7121e407d9de99146c Reviewed-on: http://gerrit.cloudera.org:8080/18625 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-15 05:28:05 +00:00
yx91490	c7784bde55	IMPALA-1682: Support printing the output of a query (rows) vertically. In vertical mode, impala-shell will print each row in the format: firstly print a line contains line number, then print this row's columns line by line, each column line started with it's name and a colon. To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL= true' in interactive mode. to disable it in interactive mode: 'set VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set WRITE_DELIMITED=true' is specified. Tests: add methods in test_shell_interactive.py and test_shell_commandline.py. Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf Reviewed-on: http://gerrit.cloudera.org:8080/18549 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-13 15:41:07 +00:00
Joe McDonnell	c41e6941ca	IMPALA-11332: Fix trailing whitespace for CSV output The current CSV output is stripping trailing whitespaces from the last line of CSV output. This rstrip() was intended to remove an extra newline, but it is matching other white space. This is a problem for a SQL query like: select 'Trailing whitespace '; This changes the rstrip() to rstrip('\n') to avoid removing the other white space. Testing: - Current shell tests pass - Added a shell test that verifies trailing whitespace is not being stripped. Change-Id: I69d032ca2f581587b0938d0878fdf402fee0d57e Reviewed-on: http://gerrit.cloudera.org:8080/18580 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-02 09:36:50 +00:00
Joe McDonnell	ed0d9341d3	IMPALA-11325: Fix UnicodeDecodeError for shell file output When using the --output_file commandline option for impala-shell, the shell fails with UnicodeDecodeError if the output contains Unicode characters. For example, if running this command: impala-shell -B -q "select '引'" --output_file=output.txt This fails with: UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128) This happens due to an encode('utf-8') call happening in OutputStream::write() on a string that is already UTF-8 encoded. This changes the code to skip the encode('utf-8') call for Python 2. Python 3 is using a string and still needs the encode call. This is mostly a pragmatic fix to make the code a little bit more functional, and there is more work to be done to have clear contracts for the format() methods and clear points of conversion to/from bytes. Testing: - Ran shell tests with Python 2 and Python 3 on Ubuntu 18 - Added a shell test that outputs a Unicode character to an output file. Without the fix, this test fails. Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba Reviewed-on: http://gerrit.cloudera.org:8080/18576 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-02 01:53:09 +00:00
Csaba Ringhofer	94f67a3432	IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Reviewed-on: http://gerrit.cloudera.org:8080/17170 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-27 13:36:54 +00:00
Andrew Sherman	3b763b5c32	IMPALA-10447: Add a newline when exporting shell output to a file. Impala shell outputs a batch of rows using OutputStream. Inside OutputStream, output to a file is handled slightly differently from output that is written to stdout. When writing to stdout we use print() (which appends a newline) while when writing to a file we use write() (which adds nothing). This difference was introduced in IMPALA-3343 so this bug may be a regression introduced then. To ensure that output is the same in either case we need to add a newline after writing each batch of rows to a file. TESTING: Added a new test for this case. Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9 Reviewed-on: http://gerrit.cloudera.org:8080/16966 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 08:32:29 +00:00
Adam Tamas	fe6e625747	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05 Reviewed-on: http://gerrit.cloudera.org:8080/16418 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2020-09-05 09:42:46 +00:00
Csaba Ringhofer	b7965d8240	Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix" This reverts commit `75146c9138`. Change-Id: I57f790389a8c847877999d2b9b8185939b416c07 Reviewed-on: http://gerrit.cloudera.org:8080/16417 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:28:56 +00:00
Adam Tamas	75146c9138	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8 Reviewed-on: http://gerrit.cloudera.org:8080/16305 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:18:28 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
David Knupp	ed70492580	IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports A few built-ins were changed in python 3 -- e.g., xrange became range, ConfigParser became configparser, etc. We can redefine some of those things in a single place, and import them from there as needed. Other items may also be added as we go along. Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Reviewed-on: http://gerrit.cloudera.org:8080/15514 Reviewed-by: David Knupp <dknupp@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-21 19:52:07 +00:00
David Knupp	ed15c2c58f	IMPALA-3343: Part 1 -- Fix simple python 2->3 syntax errors In an effort to keep the work of reviewing the changes more manageable with regard to making the impala-shell python3 compatible, I'm trying to break the patches up into smaller chunks. The first patch is the easiest one -- simply addressing the handful of syntax issues that aren't python 3 compatible, namely changing the print statements to function calls, changing the way we catch exceptions, and adding a few simple branches to work around the removal of such things as dict.iteritems(). We needed the print function imported from __future__ because it allows us to pass in a file descriptor, e.g., sys.stderr. Notably, there's nothing in this patch related to string/bytes/unicode changes from python 2 to 3. Change-Id: I9a515da01ef03d5936cb1a4d9e4bc6d105386b1d Reviewed-on: http://gerrit.cloudera.org:8080/15487 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-20 03:10:07 +00:00
Jiawei Wang	3a4f8b3ae1	IMPALA-8652 Illegal delimiter error in shell has unknown error Problem: When assign --output_delimiter to invalid value, the validation of the argument is done only after the query is running, ValueError is raised in DelimitedOutputFormatter and caught in _exec_stmt in shell Solution: Add --output_delimiter option check before impala-shell initialization Remove delimiter length check in DelimitedOutputFormatter Testing: tests/shell/test_shell_commandline.py passed Example: $ impala-shell.sh -B --output_delimiter '\|\|' -q 'select 1,1,1' Illegal delimiter \|\|, the delimiter must be a 1-character string. Change-Id: I7ee2fccd305b104b3aff44c57659b6f14f2f4a05 Reviewed-on: http://gerrit.cloudera.org:8080/13690 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-21 06:01:57 +00:00
Tim Armstrong	318051cc21	IMPALA-2717: fix output of formatted unicode to non-TTY The bug is that PrettyOutputFormatter.format() returned a unicode object, and Python cannot automatically write unicode objects to output streams where there is no default encoding. The fix is to convert to UTF-8 encoded in a regular string, which can be output to any output device. This makes the output type consistent with DelimitedOutputFormatter.format(). Based on code by Marcell Szabo. Testing: Added a basic test. Played around in an interactive shell to make sure that unicode characters still work in interactive mode. Change-Id: I9de641ecf767a2feef3b9f48b344ef2d55e17a7f Reviewed-on: http://gerrit.cloudera.org:8080/9928 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-12 20:34:47 +00:00
Jim Apple	374f1121da	IMPALA-3224: De-Cloudera non-docs JIRA URLs John Russell is planning to fix the URLS in docs in a separate commit. Fixed using: (git ls-files \| xargs replace \ 'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \ git checkout HEAD docs Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Reviewed-on: http://gerrit.cloudera.org:8080/6487 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-05-07 04:44:57 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Martin Grund	ed18dd4a8b	IMPALA-80: Dynamic progress reporting for the shell This patch adds a way to allow for dynamic progress reporting in the shell. There are two new command line flags for the shell --live_progress - will print the completed vs total # of scan ranges --live_summary - prints an updated exec summary In addition to the command line flags, these options can be set from within the shell using: set LIVE_SUMMARY=True set LIVE_PROGRESS=True The new options will be listed under shell options. Both reports will be updated at most every second, for longer running queries it will be adjusted to the time between two RPC calls to get the query status. To provide this information in the ExecSummary, the Thrift structure for the ExecSummary was extended to contain a progress indicator. The output is printed to stderr and only available in interactive mode. An example video is available here: https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2 Reviewed-on: http://gerrit.cloudera.org:8080/508 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-07-17 17:59:29 +00:00
ishaan	2b4071bf5d	Remove csv.field_size_limit from the shell.	2014-01-08 10:50:55 -08:00
ishaan	26684b07d7	Introduce output formatting for the shell.	2014-01-08 10:50:51 -08:00

21 Commits