impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	5c003cdcda	IMPALA-12978: Fix impala-shell`s live progress with older Impalas If the Impala server has an older version that does not contain IMPALA-12048 then TExecProgress.total_fragment_instances will be None, leading to error when checking total_fragment_instances > 0. Note that this issue only comes with Python 3, in Python 2 None > 0 returns False. Testing: - Manually checked with a modified Impala that doesn't set total_fragment_instances. Only the scanner progress bar is shown in this case. Change-Id: Ic6562ff6c908bfebd09b7612bc5bcbd92623a8e6 Reviewed-on: http://gerrit.cloudera.org:8080/21256 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zihao Ye <eyizoha@163.com>	2024-04-09 02:23:05 +00:00
Zoltan Borok-Nagy	e326b3cc0d	IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables This patch adds limited UPDATE support for Iceberg tables. The limitations mean users cannot update Iceberg tables if any of the following is true: * UPDATE value of partitioning column * UPDATE table that went through partition evolution * Table has SORT BY properties The above limitations will be resolved by part 3. The usual limitations like writing non-Parquet files, using copy-on-write, modifying V1 tables are out of scope of IMPALA-12313. This patch implements UPDATEs with the merge-on-read technique. This means the UPDATE statement writes both data files and delete files. Data files contain the updated records, delete files contain the position delete records of the old data records that have been touched. To achieve the above this patch introduces a new sink: MultiDataSink. We can configure multiple TableSinks for a single MultiDataSink object. During execution, the row batches sent to the MultiDataSink will be forwarded to all the TableSinks that have been registered. The UPDATE statement for an Iceberg table creates a source select statement with all table columns and virtual columns INPUT__FILE__NAME and FILE__POSITION. E.g. imagine we have a table 'tbl' with schema (i int, s string, k int), and we update the table with: UPDATE tbl SET k = 5 WHERE i % 100 = 11; The generated source statement will be ==> SELECT i, s, 5, INPUT__FILE__NAME, FILE__POSITION FROM tbl WHERE i % 100 = 11; Then we create two table sinks that refer to expressions from the above source statement: Insert sink (i, s, 5) Delete sink (INPUT__FILE__NAME, FILE__POSITION) The tuples in the rowbatch of MultiDataSink contain slots for all the above expressions (i, s, 5, INPUT__FILE__NAME, FILE__POSITION). MultiDataSink forwards each row batch to each registered TableSink. They will pick their relevant expressions from the tuple and write data/delete files. The tuples are sorted by INPUTE__FILE__NAME and FILE__POSITION because we need to write the delete records in this order. For partitioned tables we need to shuffle and sort the input tuples. In this case we also add virtual columns "PARTITION__SPEC__ID" and "ICEBERG__PARTITION__SERIALIZED" to the source statement and shuffle and sort the rows based on them. Data files and delete files are now separated in the DmlExecState, so at the end of the operation we'll have two sets of files. We use these two sets to create a new Iceberg snapshot. Why does this patch have the limitations? - Because we are shuffling and sorting rows based on the delete records and their partitions. This means that the new data files might not get written in an efficient way, e.g. there will be too many of them, or we will need to keep too many open file handles during writing. Also, if the table has SORT BY properties, we cannot respect it as the input rows are ordered in a way to favor the position deletes. Part 3 will introduce a buffering writer for position delete files. This means we will shuffle and sort records based on the data records' partitions and SORT BY properties while delete records get buffered and written out at the end (sorted by file_path and position). In some edge cases the delete records might not get written efficiently, but it is a smaller problem then inefficient data files. Testing: * negative tests * planner tests * update all supported data types * partitioned tables * Impala/Hive interop tests * authz tests * concurrent tests Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08 Reviewed-on: http://gerrit.cloudera.org:8080/20677 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-09 03:04:05 +00:00
Eyizoha	52ad12bc0c	IMPALA-12544: Add additional query progress reporting for the shell This patch modifies the dynamic query progress reporting in impala-shell by adding an extra query progress bar below the scan progress bar. The query progress is calculated using the number of completed fragment instances divided by the total number of fragment instances. Compared to the scan progress, which is calculated based on completed scan ranges divided by the total scan ranges, the query progress provides a more accurate reflection of the actual completion progress of the query. Particularly for computationally intensive queries involving complex aggregations or sorting, such as tpcds query78, there is often additional computation time required after the scanning is complete. In such cases, displaying only 100% scan progress would be inaccurate. Change-Id: I11a704885505442b7499a026fcee3b86696cd064 Reviewed-on: http://gerrit.cloudera.org:8080/20672 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-11-10 16:18:51 +00:00
Joe McDonnell	bad064dbea	IMPALA-12224: Improve error handling for shell interactive tests Interactive shell tests can hang waiting for input if the shell process hits errors or exits. For example, the problems in the sasl package seen in IMPALA-12220 cause test_shell_interactive.py to hang. This improves the error detection/handling to avoid hangs for most common shell errors. Specifically, it adds a check for the impala-shell process exiting, and it adds a check for a failure to connect to Impala. Both would previous result in hangs. Testing: - Verified test_shell_interactive.py doesn't hang with hand tests - Remove a vital import from impala-shell so it exits instantly - Simulate a connection problem by overwriting the port with a non-functional port - Test on Redhat 9 with the IMPALA-12220 issue Change-Id: I7556fb687e06b41caa538d8c3231ec9f2ad98162 Reviewed-on: http://gerrit.cloudera.org:8080/20087 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-06-21 05:21:01 +00:00
Vincent Tran	9727b46f3b	IMPALA-11435: Fixup - Suppress logging for 'thrift' in impala-shell Commit `cd9f3f578` aims to suppres logging for the 'thrift' library within impala-shell. However, it does not work in all case. This change moves the fix into the 'main' function, which suppresses the unwanted messagge. Tested by connecting through impala-shell with Python2.7 and Python3.6 with SSL enabled. Change-Id: I4de95b1b67abe9a0b4637910b0894addddda23d5 Reviewed-on: http://gerrit.cloudera.org:8080/20074 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-15 12:06:26 +00:00
Joe McDonnell	e9fb8e717c	IMPALA-12114: Pull in fix for THRIFT-5705 and add test This pulls in a new toolchain to get a Thrift with the patch for THRIFT-5705. This fixes an issue where idle clients using TLS are needlessly disconnected due to a bug in the read retry count logic inside Thrift. Tests: - This modifies test_thrift_socket.py to make it do more idle polls and check that ImpalaShell is not disconnected. It fails without the THRIFT-5705 patch and passes now. Change-Id: Ifc7704cba032a91b9fd0d5d54d1e0a7e17fb10bb Reviewed-on: http://gerrit.cloudera.org:8080/19962 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com>	2023-06-02 15:57:37 +00:00
Csaba Ringhofer	14035065fa	IMPALA-12145: Fix profiles with non-ascii character in impala-shell (python2) As __future__.unicode_literals is imported in impala-shell concatenating an str with a literal leads to decoding the string with 'ascii' codec which fails if there are non-ascii characters. Converting the literal to str solves the issue. Testing: - added regression test + ran related EE tests Change-Id: I99b72dd262fc7c382e8baee1dce7592880c84de2 Reviewed-on: http://gerrit.cloudera.org:8080/19893 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-25 00:33:34 +00:00
Joe McDonnell	451543a2e5	IMPALA-11785: Warn if Thrift fastbinary is not working for impala-shell Thrift's fastbinary module provides native code that accelerations the BinaryProtocol. It can make a large performance difference when using the Hiveserver2 protocol with impala-shell. If the fastbinary is not working, it silently falls back to interpreted code. This can happen because the fastbinary couldn't load a particular library, etc. This adds a warning on impala-shell startup when it detects that Thrift's fastbinary is not working. When bin/impala-shell.sh is modified to use python3, impala-shell outputs this error (shortened for legibility): WARNING: Failed to load Thrift's fastbinary module. Thrift's BinaryProtocol will not be accelerated, which can reduce performance. Error was '{path to Python2 thrift fastbinary.so}: undefined symbol: _Py_ZeroStruct' Testing: - Added a simple test that verifies the impala-shell does not output the warning - Outputs warning when Python 2 thrift used for Python 3 shell Change-Id: Id5d0e5db5cfdf1db4521b00f912b4697a7f646e8 Reviewed-on: http://gerrit.cloudera.org:8080/19806 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-23 06:41:02 +00:00
jasonmfehr	63d13a35f3	IMPALA-11880: Adds support for authenticating to Impala using JWTs. This support was modeled after the LDAP authentication. If JWT authentication is used, the Impala shell enforces the use of the hs2-http protocol since the JWT is sent via the "Authentication" HTTP header. The following flags have been added to the Impala shell: * -j, --jwt: indicates that JWT authentication will be used * --jwt_cmd: shell command to run to retrieve the JWT to use for authentication Testing New Python tests have been added: * The shell tests ensure that the various command line arguments are handled properly. Situations such as a single authentication method, JWTs cannot be sent in clear text without the proper arguments, etc are asserted. * The Python custom cluster tests leverage a test JWKS and test JWTs. Then, a custom Impala cluster is started with the test JWKS. The Impala shell attempts to authenticate using a valid JWT, an expired (invalid) JWT, and a valid JWT signed by a different, untrusted JWKS. These tests also exercise the Impala JWT authentication mechanism and assert the prometheus JWT auth success and failure metrics are reported accurately. Change-Id: I52247f9262c548946269fe5358b549a3e8c86d4c Reviewed-on: http://gerrit.cloudera.org:8080/19837 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-11 23:22:05 +00:00
Michael Smith	910c6ecc85	IMPALA-12094: Fix impala shell summary command Fix various quality-of-life issues with the 'summary' command: - update regex to correctly match query ID for handling "Query id ... not found" errors - fail the command rather than exiting the shell when 'summary' is called with an incorrect argument (such as 'summary 1') - provide a useful message rather than print an exception when 'summary original' is invoked with no failed queries Testing: - added new tests for the 'summary' command Change-Id: I7523d45b27e5e63e1f962fb1f6ebb4f0adc85213 Reviewed-on: http://gerrit.cloudera.org:8080/19797 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-04-28 20:51:50 +00:00
Michael Smith	0a42185d17	IMPALA-9627: Update utility scripts for Python 3 (part 2) We're starting to see environments where the system Python ('python') is Python 3. Updates utility and build scripts to work with Python 3, and updates check-pylint-py3k.sh to check scripts that use system python. Fixes other issues found during a full build and test run with Python 3.8 as the default for 'python'. Fixes a impala-shell tip that was supposed to have been two tips (and had no space after period when they were printed). Removes out-of-date deploy.py and various Python 2.6 workarounds. Testing: - Full build with /usr/bin/python pointed to python3 - run-all-tests passed with python pointed to python3 - ran push_to_asf.py Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb Reviewed-on: http://gerrit.cloudera.org:8080/19697 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-04-26 18:52:23 +00:00
Peter Rozsa	cd9f3f578e	IMPALA-11435: Suppress logging for 'thrift' in impala-shell This change removes the "No handlers could be found for logger "thrift.transport.sslcompat" notification from impala-shell when SSL is enabled, by adding a NullHandler to logger 'thrift'. Change-Id: Idaa0871751969ec3a3aa8b44fe35f0743c03c547 Reviewed-on: http://gerrit.cloudera.org:8080/19671 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-04-04 07:23:54 +00:00
jasonmfehr	e17fd9a0d5	IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol. When using the hs2 protocol with the http transport, include several tracing http headers by default. These headers are: * X-Request-Id -- client defined string that identifies the http request, this string is meaningful only to the client * X-Impala-Session-Id -- session id generated by the Impala backend, will be omitted on http calls that occur before this id has been generated * X-Impala-Query-Id -- query id generated by the Impala backend, will be omitted on http calls that occur before this id has been generated The Impala shell includes these headers by default. The command line argument --no_http_tracing has been added to remove these headers. The Impala backend logs out these headers if they are on the http request. The log messages are written out at log level 2 (RPC). Testing: - manual testing (verified using debugging proxy and impala logs) - new python test Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f Reviewed-on: http://gerrit.cloudera.org:8080/19428 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-02-10 02:09:17 +00:00
jasonmfehr	f2f6b4b580	IMPALA-11375 Impala shell outputs details of each RPC When the Impala shell is using the hs2 protocol, it makes multiple RPCs to the Impala daemon. These calls pass Thrift objects back and forth. This change adds the '--show_rpc' which outputs the details of the RPCs to stdout and the '--rpc_file' flag which outputs the RPC details to the specified file path. RPC details include: - operation name - request attempt count - Impala session/query ids (if applicable) - call duration - call status (success/failure) - request Thrift objects - response Thrift objects Certain information is not included in the RPC details: - Thrift object attributes named 'secret' or 'password' are redacted. - Thrift objects with a type of TRowSet or TGetRuntimeProfileResp are not include as the information contained within them is already available in the standard output from the Impala shell. Testing: - Added new tests in the end-to-end test suite. Change-Id: I36f8dbc96726aa2a573133acbe8a558299381f8b Reviewed-on: http://gerrit.cloudera.org:8080/19388 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-12 23:31:14 +00:00
Michael Smith	b17446818c	IMPALA-11755: Fix impala-shell --ldap_password_cmd with python 3 subprocess.Popen returns a byte string in Python 3, which serializes incorrectly when sending it as the LDAP password and causes `endswith` to error with > first arg must be bytes or a tuple of bytes, not str Fixes `impala-shell --ldap_password_cmd` run with Python 3 by decoding bytes as unicode. Testing: confirmed that I can successfully authenticate via LDAP with impala-shell in Python 2.7 and Python 3.8. Change-Id: I3638d6f8d3ed7184495dbe3512d9e5ceb0ee8c45 Reviewed-on: http://gerrit.cloudera.org:8080/19283 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-11-29 22:24:43 +00:00
wzhou-code	8e350d0a8a	IMPALA-11304: impala-shell make the client retry attempts configurable Currently max tries for connecting to coordinator is hard coded to 4 in hs2-http mode. It's required to make the max tries when connecting to coordinator a configurable option, especially in the environment where coordinator is started slowly. This patch added support for configurable max tries in hs2-http mode using the new impala-shell config option '--connect_max_tries'. The default value of '--connect_max_tries' is set to 4. Testing: - Ran e2e shell tests. - Ran impala-shell with connect_max_tries as 100 before starting impala coordinator daemon, verified that impala-shell connects to coordinator after coordinator daemon was started. Change-Id: I5f7caeb91a69e71a38689785fb1636094295fdb1 Reviewed-on: http://gerrit.cloudera.org:8080/19105 Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-25 16:07:37 +00:00
Peter Rozsa	81e36d4584	IMPALA-10660: Impala shell prints DOUBLEs with less precision in HS2 than beeswax This change adds a shell option called "hs2_fp_format" which manipulates the print format of floating-point values in HS2. It lets the user to specify a Python-based format specification expression (https://docs.python.org/2.7/library/string.html#formatspec) which will get parsed and applied to floating-point column values. The default value is None, in this case the formatting is the same as the state before this change. This option does not support the Beeswax protocol, because Beeswax converts all of the column values to strings in its response. Tests: command line tests for various formatting options and for invalid formatting option Change-Id: I424339266be66437941be8bafaa83fa0f2dfbd4e Reviewed-on: http://gerrit.cloudera.org:8080/18990 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-09-23 14:34:52 +00:00
gaoxq	f5fc085733	IMPALA-11233: Unset all query option When using jdbc connection pool, a connection set some query options, after query finished, connection is closed and put back to the connection pool. When connection used again, the last query option also come into effect. We need a feature that a set statement can reset all query option without recreating a new connection. Support UNSET statements in SQL dialect. UNSET ALL can unset all query option. Testing: - add unset all query option in test_hs2.py Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Reviewed-on: http://gerrit.cloudera.org:8080/18430 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-23 05:59:02 +00:00
Joe McDonnell	7eb200abf1	IMPALA-11337: Flush row output before writing "Fetched X row(s)" When redirecting stdout and stderr to a file, the existing code can sometimes output the "Fetched X row(s)" line before finishing the row output. e.g. impala-shell -B -q "select 1" >> outfile.txt 2>> outfile.txt The rows output goes to stdout while the control messages like "Fetched X row(s)" go to stderr. Since stdout can buffer output, that can delay the output. This adds a flush for stdout before writing the "Fetched X row(s)" message. Testing: - Added a shell test that redirects stdout and stderr to a file and verifies the contents. This consistently fails without the flush. - Other shell tests pass Change-Id: I83f89c110fd90d2d54331c7121e407d9de99146c Reviewed-on: http://gerrit.cloudera.org:8080/18625 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-15 05:28:05 +00:00
yx91490	c7784bde55	IMPALA-1682: Support printing the output of a query (rows) vertically. In vertical mode, impala-shell will print each row in the format: firstly print a line contains line number, then print this row's columns line by line, each column line started with it's name and a colon. To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL= true' in interactive mode. to disable it in interactive mode: 'set VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set WRITE_DELIMITED=true' is specified. Tests: add methods in test_shell_interactive.py and test_shell_commandline.py. Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf Reviewed-on: http://gerrit.cloudera.org:8080/18549 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-13 15:41:07 +00:00
Joe McDonnell	0ee5f8084f	IMPALA-11317/IMPALA-11316/IMPALA-11315: impala-shell Python 3 fixes This fixes a few impala-shell Python 3 issues: 1. In ImpalaShell's do_history(), the decode() call needs to be avoided in Python 3, because in Python 3 the cmd is already a string and doesn't need further decoding. (IMPALA-11315) 2. TestImpalaShell.test_http_socket_timeout() gets a different error message in Python 3. It throws the "BlockingIOError" rather than "socker.error". (IMPALA-11316) 3. ImpalaHttpClient.py's code to retrieve the body when handling an HTTP error needs to have a decode() call for the body. Otherwise, the body remains bytes and causes TestImpalaShellInteractive.test_http_interactions_extra() to fail. (IMPALA-11317) Testing: - Ran shell tests in the standard way - Ran shell tests with the impala-shell executable coming from a Python 3 virtualenv using the PyPi package Change-Id: Ie58380a17d7e011f4ce96b27d34717509a0b80a6 Reviewed-on: http://gerrit.cloudera.org:8080/18556 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-25 22:47:40 +00:00
Riza Suminto	cf5eaae176	IMPALA-11305: Fix TypeError in impala-shell summary progress impala-shell fail with TypeError when installed with python3. This is due to behavior change of division operator ('/') between python2 vs python3. This patch fix the issue by changing the operator with floor division ('//') that result in integer type as described in https://peps.python.org/pep-0238/. Testing: - Manually install impala-shell with from pip with python3 and verify the fix works. Change-Id: Ifbe4df6a7a4136e590f383fc6475e2283e35eadc Reviewed-on: http://gerrit.cloudera.org:8080/18546 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-23 18:44:33 +00:00
wzhou-code	397d1d15a2	IMPALA-10745: Support Kerberos over HTTP for impala-shell This patch ports the implementation of GSSAPI authentication over http transport from Impyla (https://github.com/cloudera/impyla/pull/415) to impala-shell. The implementation adds a new dependency on 'kerberos' python module, which is a pip-installed module distributed under Apache License Version 2. When using impala-shell with Kerberos over http, it is assumed that the host has a preexisting kinit-cached Kerberos ticket that impala-shell can pass to the server automatically without the user to reenter the password. Testing: - Passed exhaustive tests. - Tested manually on a real cluster with a full Kerberos setup. Change-Id: Ia59ba4004490735162adbd468a00a962165c5abd Reviewed-on: http://gerrit.cloudera.org:8080/18493 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-10 03:22:41 +00:00
Abhishek Rawat	8e755e7571	IMPALA-11126: impala-shell: Support configurable socket timeout for http client In 'hs2-http' mode, the socket timeout is None, which could cause hang like symptoms in case of a problematic remote server. Added support for configurable socket timeout using the new impala-shell config option '--http_socket_timeout_s'. If a reasonable timeout is set, impala-shell client can retry in case of connection issues, when possible. The default value of '--http_socket_timeout_s' is set to None, to prevent behavior changes for existing clients. More details on socket timeout here: https://docs.python.org/3/library/socket.html#socket-timeouts Testing: - Added tests for various timeout values in test_shell_commandline.py - Ran e2e shell tests. Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94 Reviewed-on: http://gerrit.cloudera.org:8080/18336 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-04-01 16:31:19 +00:00
Steve Carlin	d34039ced9	IMPALA-11096: Strict_hs2 mode in impala-shell does not support get_summary The get_summary() thrift call is not supported in strict_hs2 mode on impala-shell. The live_progress and live_summary options are disabled when the strict_hs2_protocol flag is set. Change-Id: I6aee838a80b4659a13a0a0cb9eabffa2c8767c8f Reviewed-on: http://gerrit.cloudera.org:8080/18177 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-02-07 11:37:44 +00:00
Steve Carlin	5cfdff03f7	IMPALA-11095: Fix Impala-shell strict_hs2 mode inserts The insert command was broken for impala-shell in the strict_hs2 mode. The return parameter for close_dml should return two parameters. The parameters returned by close_dml are rows returned and error rows. These are not supported by strict hs2 mode since the close does not return the TDmlResult structure. So the message to the end user also had to be changed. Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f Reviewed-on: http://gerrit.cloudera.org:8080/18176 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-02-04 07:42:52 +00:00
Steve Carlin	bb9fb663ce	IMPALA-10778: Allow impala-shell to connect directly to HS2 Impala-shell already uses HS2 protocol to connect to Impalad. This commit allows impala-shell to connect to any server (for example, Hive) using the hs2 protocol. This will be done via the "--strict_hs2_protocol" option. When the "--strict_hs2_protocol" option is turned on, only features supported by hs2 will work. For instance, "runtime-profile" is an impalad specific feature and will be disabled. The "--strict_hs2_protocol" will only work on servers that abide by the strict definition of what is supported by HS2. So one will be able to connect to Hive in this mode, but connections to Impala will not work. Any feature supported by Hive (e.g. kerberos authentication) should work as well. Note: While authentication should work, the test framework is not set up to create an HS2 server that does authentication at this point so this feature should be used with caution. Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e Reviewed-on: http://gerrit.cloudera.org:8080/17660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-27 09:45:59 +00:00
wzhou-code	2b815cbd51	IMPALA-10784: Add support for retaining cookies in impala-shell IMPALA-10234 added support for cookie authentication for LDAP to impala-shell. But it does not accept user input cookie name via startup flags, and it retains only one cookie. In some scenarios, we could use proxy to manage the sessions with additional HTTP cookies added by proxy. This patch made cookie support more generic for impala-shell. It lets the user specify cookie names via a startup flag "--http_cookie_names" and could retain more than one cookies. Testing: - Manualy tested the multiple cookies in HTTP headers with a customized Impala server which could send and receive multiple cookies. - Passed core test, including new test cases. Change-Id: I193422d5ec891886a522d82ecb0e9d974132ff2a Reviewed-on: http://gerrit.cloudera.org:8080/17667 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-13 00:07:11 +00:00
Csaba Ringhofer	94f67a3432	IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Reviewed-on: http://gerrit.cloudera.org:8080/17170 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-27 13:36:54 +00:00
stiga-huang	d5f67fce41	IMPALA-10523: Fix impala-shell crash in printing error messages that contain UTF-8 characters In Python2, print() converts all non-keyword arguments to strings like str() does and writes them to the stream. str() on QueryStateException returns its value(i.e. error message) which could be in unicode type. Python2 will implicitly encode it to str type using the default encoding, 'ascii'. This could result in UnicodeEncodeError when there are non-ascii characters in the error message. This patch explicitly encodes the error message using 'utf-8' encoding if it's in unicode type and the shell is run in Python2. Tests: - Add test in test_shell_interactive.py Change-Id: Ie10f5b03ecc5877053c2fbada1afaf256b423a71 Reviewed-on: http://gerrit.cloudera.org:8080/17099 Reviewed-by: Tamas Mate <tmate@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-12 18:19:11 +00:00
stiga-huang	4c6cf4b2ef	IMPALA-10434: Fix impala-shell's unicode regressions on Python2 To make impala-shell compatible for Python3, we explicitly distinguish bytes and text in Python2 by decoding the bytes for all inputs. Regression 1: multiple queries in one line with unicode chars will break In precmd() of impala-shell, if there are multiple queries present in one input line, we split it into individual queries (by sqlparse.split()) and append them back to the 'cmdqueue'. They will be passed to precmd() again. In our Python2 implementation, precmd() expects them to be str type, and will decode them into unicode type. However, the output type of sqlparse.split() is unicode which doesn't have a decode() method. Calling decode() on a unicode var will let Python2 implicitly encode it to str. This may cause UnicodeEncodeError since implicitly encoding use 'ascii'. Regression 2: multi-line query with unicode chars will break when command history is enabled In _check_for_command_completion(), when calling readline.replace_history_item in Python2. We encode the completed_cmd into bytes. However, we shouldn't replace it since the return type is expected to be unicode. Tests: - Add tests for these two regressions in Python2. Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4 Reviewed-on: http://gerrit.cloudera.org:8080/16960 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-20 10:20:02 +00:00
Attila Jeges	1c72c5a8f9	IMPALA-10234: Add support for cookie authentication to impala-shell IMPALA-8584 added support for cookie authentication to Impala. This change adds cookie authentication support to impala-shell as well when using 'hs2-http' protocol. Testing: - Unit tests were added to test cookie handling methods. - Tested e2e manually with nginx HTTP proxy. TODO: - Test with Knox HTTP proxy as well. Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51 Reviewed-on: http://gerrit.cloudera.org:8080/16660 Reviewed-by: Attila Jeges <attilaj@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-17 19:21:19 +00:00
Thomas Tauber-Marshall	01e1b4df80	IMPALA-10303: Fix warnings from impala-shell with --quiet When the --quiet flag is used with impala-shell, the intention is that if the query is successful then only the query results should be printed. This patch fixes two cases where --quiet was not being respected: - When using the HTTP transport and --client_connect_timeout_ms is set, a warning is printed that the timeout is not applied. - When running in non-interactive mode, a warning is printed that --live_progress is automatically disabled. This warning is now also only printed if --live_progress is actually set. Testing: - Added a test that runs a simple query with --quiet and confirms the output is as expected. Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe Reviewed-on: http://gerrit.cloudera.org:8080/16673 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 02:17:29 +00:00
stiga-huang	2a1d3acaf1	IMPALA-9870: impala-shell 'summary' to show original and retried queries This patch extends the 'summary' command of impala-shell to support retrieving the summary of the original query attempt. The new syntax is SUMMARY [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original summaries are printed. If 'LATEST' is specified, only the summary of the latest query attempt is printed. If 'ORIGINAL' is specified, only the summary of the original query attempt is printed. The default option is 'LATEST'. Support for this has only been added to HS2 given that Beeswax is being deprecated soon. Tests: - Add new tests in test_shell_interactive.py Change-Id: I8605dd0eb2d3a2f64f154afb6c2fd34251c1fec2 Reviewed-on: http://gerrit.cloudera.org:8080/16502 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 05:11:06 +00:00
Attila Jeges	3d067572dd	IMPALA-10224: Add startup flag not to expose debug web url to clients This patch introduces a new startup flag --ping_expose_webserver_url (true by default) to control whether PingImpalaService, PingImpalaHS2Service RPC calls should expose the debug web url to the client or not. This is necessary as the debug web UI is not something that end-users will necessarily have access to. If the flag is set to false, the RPC calls will return an empty string instead of the real url signalling that the debug web ui is not available. Note that if the webserver is disabled (--enable_webserver flag is set to false) the RPC calls will behave the same and return an empty string for the url. Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc Reviewed-on: http://gerrit.cloudera.org:8080/16573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 14:39:39 +00:00
Thomas Tauber-Marshall	179b14876d	IMPALA-10074: Set impala-shell's default protocol to hs2 The beeswax interface has been marked deprecated for awhile, but it remains the default protocol for impala-shell because we felt that changing the default protocol constituted a break change. Now that 4.0 is the next release, we can switch the default protocol to hs2. This patch also adds a few more deprecation warnings around beeswax. Change-Id: I65eb14ec03c1e1ef26782554aedd6670bbeedfe8 Reviewed-on: http://gerrit.cloudera.org:8080/16327 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-22 03:03:32 +00:00
Tamas Mate	3ef7756628	IMPALA-10051: impala-shell exits with ValueError with WITH clauses When a query contains WITH clause impala-shell tries to identify whether it is a DML query or not, so that later it can provide appropriate result messages. Earlier shlex was used to create tokens and assess the query type based on that. However shlex can misinterpret some query strings where whitespace charachters are mixed with quotes, because it splits the string based on whitespace charachters. In some scenarios 'ValueError: No closing quotation' error can occur. This change moves the tokenization from shlex to sqlparse. Testing: - Added unit test to cover queries that contain mixed whitespaces and strings Change-Id: I442d3bc65b90a55c73c847948d5179a8586d71ad Reviewed-on: http://gerrit.cloudera.org:8080/16389 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-18 04:06:22 +00:00
Sahil Takiar	13f50eaec5	IMPALA-9229: impala-shell 'profile' to show original and retried queries Currently, the impala-shell 'profile' command only returns the profile for the most recent profile attempt. There is no way to get the original query profile (the profile of the first query attempt that failed) from the impala-shell. This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to add support for returning both the original and retried profiles for a retried query. When a query is retried, TGetRuntimeProfileResp currently contains the profile for the most recent query attempt. TGetRuntimeProfileReq has a new field called 'include_query_attempts' and when it is set to true, the TGetRuntimeProfileResp will include all failed profiles in a new field called failed_profiles / failed_thrift_profiles. impala-shell has been modified so the 'profile' command has a new set of options. The syntax is now: PROFILE [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original profiles are printed. If 'LATEST' is specified, only the latest profile is printed. If 'ORIGINAL' is printed, only the original profile is printed. The default behavior is equivalent to specifying 'LATEST' (which is the current behavior before this patch as well). Support for this has only been added to HS2 given that Beeswax is being deprecated soon. The new 'profile' options have no affect when the Beeswax protocol is used. Most of the code change is in impala-hs2-server and impala-server; a lot of the GetRuntimeProfile code has been re-factored. Testing: * Added new impala-shell tests * Ran core tests Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65 Reviewed-on: http://gerrit.cloudera.org:8080/16406 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-17 20:55:45 +00:00
Adam Tamas	fe6e625747	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05 Reviewed-on: http://gerrit.cloudera.org:8080/16418 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2020-09-05 09:42:46 +00:00
Csaba Ringhofer	b7965d8240	Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix" This reverts commit `75146c9138`. Change-Id: I57f790389a8c847877999d2b9b8185939b416c07 Reviewed-on: http://gerrit.cloudera.org:8080/16417 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:28:56 +00:00
Adam Tamas	75146c9138	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8 Reviewed-on: http://gerrit.cloudera.org:8080/16305 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:18:28 +00:00
Sahil Takiar	ea95691b77	IMPALA-9953: Shell should continue fetching even when 0 rows are returned The Impala shell stops fetching rows if it receives a batch that contains 0 rows. This is incorrect because a batch with 0 rows can be returned if the fetch request hits a timeout. Instead, the shell should rely on the value of has_rows / hasMoreRows to determine when to stop issuing fetch requests. Tests: * Added a regression test to test_shell_commandline.py * Ran all shell tests Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Reviewed-on: http://gerrit.cloudera.org:8080/16222 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-22 23:28:10 +00:00
stiga-huang	1fbca6d43b	IMPALA-9569: Fix progress bar and live_summary to show info of the retried query Impala-shell periodically calls GetExecSummary() when the query is queuing or running. If the query is being retried, GetExecSummary() should return the TExecSummary of the retried query. So the progress bar and live_summary can reflect the most recent state. This patch also modifies get_summary() to return retry information in error_logs of TExecSummary. Impala-shell and other clients can print the info right after the query starts being retried. Modified impala-shell to print the retried query link when the retried query is running. Example output when the retried query is running: Query: select count(*) from functional.alltypes where bool_col = sleep(60) Query submitted at: 2020-06-18 22:08:49 (Coordinator: http://quanlong-OptiPlex-BJ:25000) Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=9444fe7f0df0da28:29134b0800000000 Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001 Retrying query using query id: 5748d9a3ccc28ba8:a75e2fab00000000 Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=5748d9a3ccc28ba8:a75e2fab00000000 [############################### ] 50% Tests: - Manually verify the progress bar and live_summary work when the query is being retried. - Add tests in test_query_retries.py to validate the get_summary() results. Change-Id: I8f96919f00e0b64d589efd15b6b5ec82fb725d56 Reviewed-on: http://gerrit.cloudera.org:8080/16096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-30 12:11:24 +00:00
stiga-huang	931063f0f2	IMPALA-9213: Add query retry info to GetLog result Beeswax clients use get_log() to retrieve the warning/error message after the query finishes. HS2 clients use GetLog() for the same purpose. This patch adds the retry information into the returned result if the query is retried. So clients that print the log can show the original query failure and the retried query id. This patch also modifies impala-shell to extract the retried query id and print the retried query link. Here's an example of the impala-shell output: Query: select count() from functional.alltypes where bool_col = sleep(60) Query submitted at: 2020-06-18 21:23:52 (Coordinator: http://quanlong-OptiPlex-BJ:25000) Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=7944ffee4d81cdd4:e7f9357a00000000 +----------+ \| count() \| +----------+ \| 3650 \| +----------+ WARNINGS: Original query failed: Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001 Query has been retried using query id: 934b2734f67a1161:a0dbd60200000000 Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=934b2734f67a1161:a0dbd60200000000 Tests: - Add tests in test_query_retries.py to verify client logs returned from GetLog(). - Run test_query_retries.py. - Manually run queries in impala-shell and kill impalads. Verify printed messages when the retried queries succeed or fail. Change-Id: I58cf94f91a0b92eb9a3088bee3894ac157a954dc Reviewed-on: http://gerrit.cloudera.org:8080/16093 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-30 05:58:02 +00:00
Sahil Takiar	3088ca8580	IMPALA-9818: Add fetch size as option to impala shell Adds the option --fetch_size to the Impala shell. This new option allows users to specify the fetch size used when issuing fetch RPCs to the Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch). This parameter applies for all client protocols: beeswax, hs2, hs2-http. The default --fetch_size is set to 10240 (10x the default batch size). The new --fetch_size parameter is most effective when result spooling is enabled. When result spooling is disabled, Impala can only return a single row batch per fetch RPC (so 1024 rows by default). When result spooling is enabled, Impala can return up to 100 row batches per fetch request. Removes some logic in the the impala_client.py file that attempts to simulate a fetch_size. The code would issue multiple fetch requests to fullfill the given fetch_size. This logic is no longer needed now that result spooling is available. Testing: * Ran core tests * Added new tests in test_shell_client.py and test_shell_commandline.py Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b Reviewed-on: http://gerrit.cloudera.org:8080/16041 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Sahil Takiar <stakiar@cloudera.com>	2020-06-10 17:46:21 +00:00
David Knupp	6e0085c220	IMPALA-9721: Fix minor python2/3 syntax regression A minor syntax error slipped past in a recent patch. In python3, the syntax for catching exceptions requires the 'as' keyword. This error was missed in code review. Until automated python3 testing set up, this kind of error is likely to repeat. See IMPALA-9724. Change-Id: I0d36c609a3600c8084efcce0026537227144b27d Reviewed-on: http://gerrit.cloudera.org:8080/15856 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: David Knupp <dknupp@cloudera.com>	2020-05-05 01:21:02 +00:00
Tamas Mate	1a36a0348b	IMPALA-9398: Fix shell history duplication when cmdloop breaks This change adds a new condition to avoid re-reading the impala-shell history when the cmdloop is broken. The loop can break due to exceptions such as KeyboardInterrupt. Testing: - The change was tested manually on local dev env - Added a new EE shell test to verify the history after SIGINT Change-Id: If4faf46134f44d91e56748642f47d448707db53c Reviewed-on: http://gerrit.cloudera.org:8080/15345 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-30 01:55:51 +00:00
Thomas Tauber-Marshall	3a795b5af0	IMPALA-9681: Fix LdapImpalaShellTest LdapImpalaShellTest was broken by IMPALA-3343 which changed the wording of the startup message printed by impala-shell. This patch fixes the test. Change-Id: I9e7070fa6da4085ea858f64141529a7a23a86534 Reviewed-on: http://gerrit.cloudera.org:8080/15777 Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-22 17:01:38 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
David Knupp	c26e3db4bd	IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1 Upgrades the impala-shell's bundled version of sqlparse to 0.3.1. There were some API changes in 0.2.0+ that required a re-write of the StripLeadingCommentFilter in impala_shell.py. A slight perf optimization was also added to avoid using the filter altogether if no leading comment is readily discernible. As 0.1.19 was the last version of sqlparse to support python 2.6, this patch also breaks Impala's compatibility with python 2.6. No new tests were added, but all existing tests passed without modification. Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001 Reviewed-on: http://gerrit.cloudera.org:8080/15642 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-17 05:04:23 +00:00

1 2 3 4 5 ...

262 Commits