impala

mirror of https://github.com/apache/impala.git synced 2026-01-18 15:00:35 -05:00

Author	SHA1	Message	Date
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
David Knupp	ed70492580	IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports A few built-ins were changed in python 3 -- e.g., xrange became range, ConfigParser became configparser, etc. We can redefine some of those things in a single place, and import them from there as needed. Other items may also be added as we go along. Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Reviewed-on: http://gerrit.cloudera.org:8080/15514 Reviewed-by: David Knupp <dknupp@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-21 19:52:07 +00:00
David Knupp	ed15c2c58f	IMPALA-3343: Part 1 -- Fix simple python 2->3 syntax errors In an effort to keep the work of reviewing the changes more manageable with regard to making the impala-shell python3 compatible, I'm trying to break the patches up into smaller chunks. The first patch is the easiest one -- simply addressing the handful of syntax issues that aren't python 3 compatible, namely changing the print statements to function calls, changing the way we catch exceptions, and adding a few simple branches to work around the removal of such things as dict.iteritems(). We needed the print function imported from __future__ because it allows us to pass in a file descriptor, e.g., sys.stderr. Notably, there's nothing in this patch related to string/bytes/unicode changes from python 2 to 3. Change-Id: I9a515da01ef03d5936cb1a4d9e4bc6d105386b1d Reviewed-on: http://gerrit.cloudera.org:8080/15487 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-20 03:10:07 +00:00
Jiawei Wang	3a4f8b3ae1	IMPALA-8652 Illegal delimiter error in shell has unknown error Problem: When assign --output_delimiter to invalid value, the validation of the argument is done only after the query is running, ValueError is raised in DelimitedOutputFormatter and caught in _exec_stmt in shell Solution: Add --output_delimiter option check before impala-shell initialization Remove delimiter length check in DelimitedOutputFormatter Testing: tests/shell/test_shell_commandline.py passed Example: $ impala-shell.sh -B --output_delimiter '\|\|' -q 'select 1,1,1' Illegal delimiter \|\|, the delimiter must be a 1-character string. Change-Id: I7ee2fccd305b104b3aff44c57659b6f14f2f4a05 Reviewed-on: http://gerrit.cloudera.org:8080/13690 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-21 06:01:57 +00:00
Tim Armstrong	318051cc21	IMPALA-2717: fix output of formatted unicode to non-TTY The bug is that PrettyOutputFormatter.format() returned a unicode object, and Python cannot automatically write unicode objects to output streams where there is no default encoding. The fix is to convert to UTF-8 encoded in a regular string, which can be output to any output device. This makes the output type consistent with DelimitedOutputFormatter.format(). Based on code by Marcell Szabo. Testing: Added a basic test. Played around in an interactive shell to make sure that unicode characters still work in interactive mode. Change-Id: I9de641ecf767a2feef3b9f48b344ef2d55e17a7f Reviewed-on: http://gerrit.cloudera.org:8080/9928 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-12 20:34:47 +00:00
Jim Apple	374f1121da	IMPALA-3224: De-Cloudera non-docs JIRA URLs John Russell is planning to fix the URLS in docs in a separate commit. Fixed using: (git ls-files \| xargs replace \ 'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \ git checkout HEAD docs Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Reviewed-on: http://gerrit.cloudera.org:8080/6487 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-05-07 04:44:57 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Martin Grund	ed18dd4a8b	IMPALA-80: Dynamic progress reporting for the shell This patch adds a way to allow for dynamic progress reporting in the shell. There are two new command line flags for the shell --live_progress - will print the completed vs total # of scan ranges --live_summary - prints an updated exec summary In addition to the command line flags, these options can be set from within the shell using: set LIVE_SUMMARY=True set LIVE_PROGRESS=True The new options will be listed under shell options. Both reports will be updated at most every second, for longer running queries it will be adjusted to the time between two RPC calls to get the query status. To provide this information in the ExecSummary, the Thrift structure for the ExecSummary was extended to contain a progress indicator. The output is printed to stderr and only available in interactive mode. An example video is available here: https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2 Reviewed-on: http://gerrit.cloudera.org:8080/508 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-07-17 17:59:29 +00:00
ishaan	2b4071bf5d	Remove csv.field_size_limit from the shell.	2014-01-08 10:50:55 -08:00
ishaan	26684b07d7	Introduce output formatting for the shell.	2014-01-08 10:50:51 -08:00

10 Commits