impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Michael Smith	09f15eea78	IMPALA-12517: Decode binary data with Python 3 When impala-shell receives binary data with the HS2 protocol, it uses a stringifier to decode it. In Python 3, 'str' on binary data wraps it in "b'...'"; to get equivalent output to 'str' in Python 2, we need to decode as UTF-8 and handle errors. Adds a test case for how impala-shell formats binary data. Change-Id: I9222cd1ac081a38ab2b37d58628faac0812695ec Reviewed-on: http://gerrit.cloudera.org:8080/20624 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-31 22:34:15 +00:00
jasonmfehr	ee300a1af0	IMPALA-12163: Fixes two issues when outputting RPC details. The end time of the exact same rpc call was different between stdout and the rpc details file because the end time was calculated each time the details were written out instead of calculating the end time once and reusing that value. The duration of each rpc call was being calculated incorrectly. Change-Id: Ifd9dec189d0f6fb8713fb1c7b2b6c663e492ef05 Reviewed-on: http://gerrit.cloudera.org:8080/19932 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-25 20:51:04 +00:00
Csaba Ringhofer	14035065fa	IMPALA-12145: Fix profiles with non-ascii character in impala-shell (python2) As __future__.unicode_literals is imported in impala-shell concatenating an str with a literal leads to decoding the string with 'ascii' codec which fails if there are non-ascii characters. Converting the literal to str solves the issue. Testing: - added regression test + ran related EE tests Change-Id: I99b72dd262fc7c382e8baee1dce7592880c84de2 Reviewed-on: http://gerrit.cloudera.org:8080/19893 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-25 00:33:34 +00:00
Joe McDonnell	451543a2e5	IMPALA-11785: Warn if Thrift fastbinary is not working for impala-shell Thrift's fastbinary module provides native code that accelerations the BinaryProtocol. It can make a large performance difference when using the Hiveserver2 protocol with impala-shell. If the fastbinary is not working, it silently falls back to interpreted code. This can happen because the fastbinary couldn't load a particular library, etc. This adds a warning on impala-shell startup when it detects that Thrift's fastbinary is not working. When bin/impala-shell.sh is modified to use python3, impala-shell outputs this error (shortened for legibility): WARNING: Failed to load Thrift's fastbinary module. Thrift's BinaryProtocol will not be accelerated, which can reduce performance. Error was '{path to Python2 thrift fastbinary.so}: undefined symbol: _Py_ZeroStruct' Testing: - Added a simple test that verifies the impala-shell does not output the warning - Outputs warning when Python 2 thrift used for Python 3 shell Change-Id: Id5d0e5db5cfdf1db4521b00f912b4697a7f646e8 Reviewed-on: http://gerrit.cloudera.org:8080/19806 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-23 06:41:02 +00:00
Michael Smith	910c6ecc85	IMPALA-12094: Fix impala shell summary command Fix various quality-of-life issues with the 'summary' command: - update regex to correctly match query ID for handling "Query id ... not found" errors - fail the command rather than exiting the shell when 'summary' is called with an incorrect argument (such as 'summary 1') - provide a useful message rather than print an exception when 'summary original' is invoked with no failed queries Testing: - added new tests for the 'summary' command Change-Id: I7523d45b27e5e63e1f962fb1f6ebb4f0adc85213 Reviewed-on: http://gerrit.cloudera.org:8080/19797 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-04-28 20:51:50 +00:00
Joe McDonnell	0c7c6a335e	IMPALA-11977: Fix Python 3 broken imports and object model differences Python 3 changed some object model methods: - __nonzero__ was removed in favor of __bool__ - func_dict / func_name were removed in favor of __dict__ / __name__ - The next() function was deprecated in favor of __next__ (Code locations should use next(iter) rather than iter.next()) - metaclasses are specified a different way - Locations that specify __eq__ should also specify __hash__ Python 3 also moved some packages around (urllib2, Queue, httplib, etc), and this adapts the code to use the new locations (usually handled on Python 2 via future). This also fixes the code to avoid referencing exception variables outside the exception block and variables outside of a comprehension. Several of these seem like false positives, but it is better to avoid the warning. This fixes these pylint warnings: bad-python3-import eq-without-hash metaclass-assignment next-method-called nonzero-method exception-escape comprehension-escape Testing: - Ran core tests - Ran release exhaustive tests Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee Reviewed-on: http://gerrit.cloudera.org:8080/19592 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	c71de994b0	IMPALA-11952 (part 1): Fix except syntax Python 3 does not support this old except syntax: except Exception, e: Instead, it needs to be: except Exception as e: This uses impala-futurize to fix all locations of the old syntax. Testing: - The check-python-syntax.sh no longer shows errors for except syntax. Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9 Reviewed-on: http://gerrit.cloudera.org:8080/19551 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Peter Rozsa	3dd5c9e661	IMPALA-3880: Add list of all tables queried to runtime profile This change adds a new info string to the frontend runtime profile which contains the referenced tables by the query in a comma-separated format. Tests: - Added tests to check if the referenced tables are enumerated correctly - Added test to check if referenced table is filled properly with different DLM statements Change-Id: Ib474a5c6522032679701103aa225a18edca62f5a Reviewed-on: http://gerrit.cloudera.org:8080/19401 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-26 16:07:44 +00:00
jasonmfehr	f2f6b4b580	IMPALA-11375 Impala shell outputs details of each RPC When the Impala shell is using the hs2 protocol, it makes multiple RPCs to the Impala daemon. These calls pass Thrift objects back and forth. This change adds the '--show_rpc' which outputs the details of the RPCs to stdout and the '--rpc_file' flag which outputs the RPC details to the specified file path. RPC details include: - operation name - request attempt count - Impala session/query ids (if applicable) - call duration - call status (success/failure) - request Thrift objects - response Thrift objects Certain information is not included in the RPC details: - Thrift object attributes named 'secret' or 'password' are redacted. - Thrift objects with a type of TRowSet or TGetRuntimeProfileResp are not include as the information contained within them is already available in the standard output from the Impala shell. Testing: - Added new tests in the end-to-end test suite. Change-Id: I36f8dbc96726aa2a573133acbe8a558299381f8b Reviewed-on: http://gerrit.cloudera.org:8080/19388 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-12 23:31:14 +00:00
wzhou-code	8e350d0a8a	IMPALA-11304: impala-shell make the client retry attempts configurable Currently max tries for connecting to coordinator is hard coded to 4 in hs2-http mode. It's required to make the max tries when connecting to coordinator a configurable option, especially in the environment where coordinator is started slowly. This patch added support for configurable max tries in hs2-http mode using the new impala-shell config option '--connect_max_tries'. The default value of '--connect_max_tries' is set to 4. Testing: - Ran e2e shell tests. - Ran impala-shell with connect_max_tries as 100 before starting impala coordinator daemon, verified that impala-shell connects to coordinator after coordinator daemon was started. Change-Id: I5f7caeb91a69e71a38689785fb1636094295fdb1 Reviewed-on: http://gerrit.cloudera.org:8080/19105 Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-25 16:07:37 +00:00
Peter Rozsa	81e36d4584	IMPALA-10660: Impala shell prints DOUBLEs with less precision in HS2 than beeswax This change adds a shell option called "hs2_fp_format" which manipulates the print format of floating-point values in HS2. It lets the user to specify a Python-based format specification expression (https://docs.python.org/2.7/library/string.html#formatspec) which will get parsed and applied to floating-point column values. The default value is None, in this case the formatting is the same as the state before this change. This option does not support the Beeswax protocol, because Beeswax converts all of the column values to strings in its response. Tests: command line tests for various formatting options and for invalid formatting option Change-Id: I424339266be66437941be8bafaa83fa0f2dfbd4e Reviewed-on: http://gerrit.cloudera.org:8080/18990 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-09-23 14:34:52 +00:00
Joe McDonnell	bf10341401	IMPALA-11523: Fix test_http_socket_timeout in Docker-based tests When running in the Docker-based tests, TestImpalaShell's test_http_socket_timeout fails with a mismatch in the error message. The test expected "Operation now in progress", but in Docker-based tests it throws "Cannot assign requested address". Since this is testing that a socket timeout of zero gets an error, it seems reasonable to tolerate this extra variant. This modifies the test to allow this error message. Testing: - TestImpalaShell.test_http_socket_timeout passes in the docker-based tests and in a normal core job Change-Id: If463f1100db673bb916b094c1402f1876342c80e Reviewed-on: http://gerrit.cloudera.org:8080/18899 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-24 16:15:11 +00:00
yx91490	e00a2cbd81	IMPALA-11380: Fix trailing whitespace for VerticalOutputFormatter Similar to IMPALA-11332, The current VerticalOutputFormatter is stripping trailing whitespaces from the last line of output. This rstrip() was intended to remove an extra newline, but it is matching other white space. This is a problem for a SQL query like: select 'Trailing whitespace '; This changes the rstrip() to rstrip('\n') to avoid removing the other white space. Testing: - Current shell tests pass - Added a shell test that verifies trailing whitespace is not being stripped. Change-Id: Id66162d28498e7bef2933651616cf3df2fb0f354 Reviewed-on: http://gerrit.cloudera.org:8080/18722 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-07-27 04:23:41 +00:00
Michael Smith	64b324ac40	IMPALA-11389: Include Python 3 eggs in tarball Build Python 3 eggs for the shell tarball so it works with both Python 2 and Python 3. The impala-shell script selects eggs based on the available Python version. Inlines thrift for impala-shell so we can easily build Python 2 and Python 3 versions, consistent with other libraries. The impala-shell version should always be at least as new as IMPALA_THRIFT_PY_VERSION. Thrift 0.13.0+ wraps all exceptions during TSocket read/write operations in TTransportException. Specifically socket.error that we got as raw exceptions are now wrapped. Unwraps them before raising to preserve prior behavior. A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE; otherwise it will use 'python', and if unavailable try 'python3'. Adds tests for impala-shell tarball with Python 3. Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444 Reviewed-on: http://gerrit.cloudera.org:8080/18653 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-07-14 23:52:04 +00:00
Joe McDonnell	7eb200abf1	IMPALA-11337: Flush row output before writing "Fetched X row(s)" When redirecting stdout and stderr to a file, the existing code can sometimes output the "Fetched X row(s)" line before finishing the row output. e.g. impala-shell -B -q "select 1" >> outfile.txt 2>> outfile.txt The rows output goes to stdout while the control messages like "Fetched X row(s)" go to stderr. Since stdout can buffer output, that can delay the output. This adds a flush for stdout before writing the "Fetched X row(s)" message. Testing: - Added a shell test that redirects stdout and stderr to a file and verifies the contents. This consistently fails without the flush. - Other shell tests pass Change-Id: I83f89c110fd90d2d54331c7121e407d9de99146c Reviewed-on: http://gerrit.cloudera.org:8080/18625 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-15 05:28:05 +00:00
Michael Smith	181fd94068	IMPALA-8373: Test impala-shell with python3 Sets up a python3 virtualenv, installs impala-shell into it, and runs tests. Change-Id: I8e123aecd53a7ded44a7da7eb8c8b853cebbfc56 Reviewed-on: http://gerrit.cloudera.org:8080/18588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-06-13 17:13:42 +00:00
Michael Smith	5263d13112	IMPALA-11314: Test PyPI package with system python Sets up a virtualenv with system python to install the impala-shell PyPI package into. Using system python provides better coverage for Python versions likely to be used by customers. Runs impala-shell tests using the PyPI package to provide better coverage for the artifact customers will use. Includes a PyPI install in notests_independent_targets because these seem to be used for Python testing despite -notests. Change-Id: I384ea6a7dab51945828cca629860400a23fa0c05 Reviewed-on: http://gerrit.cloudera.org:8080/18586 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-06-13 17:13:42 +00:00
yx91490	c7784bde55	IMPALA-1682: Support printing the output of a query (rows) vertically. In vertical mode, impala-shell will print each row in the format: firstly print a line contains line number, then print this row's columns line by line, each column line started with it's name and a colon. To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL= true' in interactive mode. to disable it in interactive mode: 'set VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set WRITE_DELIMITED=true' is specified. Tests: add methods in test_shell_interactive.py and test_shell_commandline.py. Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf Reviewed-on: http://gerrit.cloudera.org:8080/18549 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-13 15:41:07 +00:00
Joe McDonnell	c41e6941ca	IMPALA-11332: Fix trailing whitespace for CSV output The current CSV output is stripping trailing whitespaces from the last line of CSV output. This rstrip() was intended to remove an extra newline, but it is matching other white space. This is a problem for a SQL query like: select 'Trailing whitespace '; This changes the rstrip() to rstrip('\n') to avoid removing the other white space. Testing: - Current shell tests pass - Added a shell test that verifies trailing whitespace is not being stripped. Change-Id: I69d032ca2f581587b0938d0878fdf402fee0d57e Reviewed-on: http://gerrit.cloudera.org:8080/18580 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-02 09:36:50 +00:00
Joe McDonnell	ed0d9341d3	IMPALA-11325: Fix UnicodeDecodeError for shell file output When using the --output_file commandline option for impala-shell, the shell fails with UnicodeDecodeError if the output contains Unicode characters. For example, if running this command: impala-shell -B -q "select '引'" --output_file=output.txt This fails with: UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128) This happens due to an encode('utf-8') call happening in OutputStream::write() on a string that is already UTF-8 encoded. This changes the code to skip the encode('utf-8') call for Python 2. Python 3 is using a string and still needs the encode call. This is mostly a pragmatic fix to make the code a little bit more functional, and there is more work to be done to have clear contracts for the format() methods and clear points of conversion to/from bytes. Testing: - Ran shell tests with Python 2 and Python 3 on Ubuntu 18 - Added a shell test that outputs a Unicode character to an output file. Without the fix, this test fails. Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba Reviewed-on: http://gerrit.cloudera.org:8080/18576 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-02 01:53:09 +00:00
Joe McDonnell	6a199be854	IMPALA-11249: Fix add_test_dimensions() locations to call super() The original issue is that the strict HS2 shell tests are not running in precommit or nightly jobs, but they do run in local developer environments. Investigating this showed that the shell tests were running with a weird set of test dimensions that includes table_format_and_file_extension. That dimension is only used in test_insert.py::TestInsertFileExtension. What is happening is that the shell tests and other locations are running add_test_dimensions() without calling super(..., cls).add_test_dimensions(). The behavior is unclear, but there is clearly cross-talk between the different tests that do this. This changes all add_test_dimensions() locations to call super(..., cls).add_test_dimensions() if they don't already. Each location has been tuned to run the same set of tests as before (except the shell tests which now run the strict HS2 tests). As part of this, several shell tests need to be skipped or fixed for strict HS2. Testing: - Ran core job - Ran tests locally to verify the set of tests didn't change. Change-Id: Ib20fd479d3b91ed0ed89a0bc5623cd2a5a458614 Reviewed-on: http://gerrit.cloudera.org:8080/18557 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-26 03:42:51 +00:00
Joe McDonnell	0ee5f8084f	IMPALA-11317/IMPALA-11316/IMPALA-11315: impala-shell Python 3 fixes This fixes a few impala-shell Python 3 issues: 1. In ImpalaShell's do_history(), the decode() call needs to be avoided in Python 3, because in Python 3 the cmd is already a string and doesn't need further decoding. (IMPALA-11315) 2. TestImpalaShell.test_http_socket_timeout() gets a different error message in Python 3. It throws the "BlockingIOError" rather than "socker.error". (IMPALA-11316) 3. ImpalaHttpClient.py's code to retrieve the body when handling an HTTP error needs to have a decode() call for the body. Otherwise, the body remains bytes and causes TestImpalaShellInteractive.test_http_interactions_extra() to fail. (IMPALA-11317) Testing: - Ran shell tests in the standard way - Ran shell tests with the impala-shell executable coming from a Python 3 virtualenv using the PyPi package Change-Id: Ie58380a17d7e011f4ce96b27d34717509a0b80a6 Reviewed-on: http://gerrit.cloudera.org:8080/18556 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-25 22:47:40 +00:00
Joe McDonnell	a11450db86	IMPALA-11313: Use Thrift 0.14.2 for impala-shell PyPi package Thrift 0.11.0 has known issues where Unicode errors are not handler properly, including one case where the client can hang. The traditional form factor for impala-shell uses a patched Thrift that fixes those issues, but the PyPi package uses the unpatched Thrift 0.11.0. This modifies the requirements.txt file to use Thrift 0.14.2, which has fixes for these Unicode issues. Thrift 0.14.2 has a slightly different error message, so this amends the allowed error messages in test_utf8_decoding_error_handling(). This is a bit awkward, given that the Python code generation continues to happen with Thrift 0.11.0. Comparing the Python code for Thrift 0.11 vs Thrift 0.14, I didn't see noticeable differences. Given that the client can hang, this seems worth fixing ahead of the full conversion to Thrift 0.14 for all of Impala. Testing: - Ran the Unicode error handling tests with a PyPi impala-shell - Ran the shell tests normally Change-Id: I63e0a5dda98df20c9184a347397118b1f3529603 Reviewed-on: http://gerrit.cloudera.org:8080/18560 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-24 21:09:27 +00:00
Steve Carlin	c1f99d1369	IMPALA-11213: Fixed impala-shell strict hs2 mode for large fetches The strict hs2 protocol mode is broken when fetching large results. The FetchResults.hasMoreRows field is always returned as false. When there are no more results, Hive returns an empty batch with no rows. HIVE-26108 has been filed to support the hasMoreRows field. Added a framework test that retrieves 1M rows from tpcds. The default number of rows returned from Hive is 10K so this should be more than enough to ensure that multiple fetches are done. Change-Id: Ife436d91e7fe0c30bf020024e20a5d8ad89faa24 Reviewed-on: http://gerrit.cloudera.org:8080/18370 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-04-02 20:30:35 +00:00
Abhishek Rawat	8e755e7571	IMPALA-11126: impala-shell: Support configurable socket timeout for http client In 'hs2-http' mode, the socket timeout is None, which could cause hang like symptoms in case of a problematic remote server. Added support for configurable socket timeout using the new impala-shell config option '--http_socket_timeout_s'. If a reasonable timeout is set, impala-shell client can retry in case of connection issues, when possible. The default value of '--http_socket_timeout_s' is set to None, to prevent behavior changes for existing clients. More details on socket timeout here: https://docs.python.org/3/library/socket.html#socket-timeouts Testing: - Added tests for various timeout values in test_shell_commandline.py - Ran e2e shell tests. Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94 Reviewed-on: http://gerrit.cloudera.org:8080/18336 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-04-01 16:31:19 +00:00
Steve Carlin	5cfdff03f7	IMPALA-11095: Fix Impala-shell strict_hs2 mode inserts The insert command was broken for impala-shell in the strict_hs2 mode. The return parameter for close_dml should return two parameters. The parameters returned by close_dml are rows returned and error rows. These are not supported by strict hs2 mode since the close does not return the TDmlResult structure. So the message to the end user also had to be changed. Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f Reviewed-on: http://gerrit.cloudera.org:8080/18176 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-02-04 07:42:52 +00:00
Steve Carlin	bb9fb663ce	IMPALA-10778: Allow impala-shell to connect directly to HS2 Impala-shell already uses HS2 protocol to connect to Impalad. This commit allows impala-shell to connect to any server (for example, Hive) using the hs2 protocol. This will be done via the "--strict_hs2_protocol" option. When the "--strict_hs2_protocol" option is turned on, only features supported by hs2 will work. For instance, "runtime-profile" is an impalad specific feature and will be disabled. The "--strict_hs2_protocol" will only work on servers that abide by the strict definition of what is supported by HS2. So one will be able to connect to Hive in this mode, but connections to Impala will not work. Any feature supported by Hive (e.g. kerberos authentication) should work as well. Note: While authentication should work, the test framework is not set up to create an HS2 server that does authentication at this point so this feature should be used with caution. Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e Reviewed-on: http://gerrit.cloudera.org:8080/17660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-27 09:45:59 +00:00
Bikramjeet Vig	a9c8166694	IMPALA-10783: Fixed flakiness in run_and_verify_query_cancellation_test The issue was that after the impala-shell is started in a seperate process and an error is encountered then the process lingers on and a long running query can hold on to resources and potentially affect other tests running on the impala cluster. This patch just makes sure that the impala-shell process is killed regardless of any errors encountered. Change-Id: I9f6d22d639921051cde5675fae1845bedb61c8cc Reviewed-on: http://gerrit.cloudera.org:8080/17768 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-17 13:41:26 +00:00
stiga-huang	2dfc68d852	IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Support for caching GCS file handles will be addressed in IMPALA-10568. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on GCS (IMPALA-10562). - Some tests are skipped due to issues introduced by /etc/hosts setting on GCE instances (IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Reviewed-on: http://gerrit.cloudera.org:8080/17121 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-13 11:20:08 +00:00
Andrew Sherman	3b763b5c32	IMPALA-10447: Add a newline when exporting shell output to a file. Impala shell outputs a batch of rows using OutputStream. Inside OutputStream, output to a file is handled slightly differently from output that is written to stdout. When writing to stdout we use print() (which appends a newline) while when writing to a file we use write() (which adds nothing). This difference was introduced in IMPALA-3343 so this bug may be a regression introduced then. To ensure that output is the same in either case we need to add a newline after writing each batch of rows to a file. TESTING: Added a new test for this case. Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9 Reviewed-on: http://gerrit.cloudera.org:8080/16966 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 08:32:29 +00:00
Tim Armstrong	ab6b7960db	IMPALA-10027: configurable default anonymous user A username can be determined for a session via two mechanisms: * In a secure env, the user is authenticated by LDAP or Kerberos * In an unsecure env, the client specifies the user name, either as a parameter to the OpenSession API (HS2) or as a parameter to the first query run (beeswax) This patch affects what happens if neither of the above mechanisms is used. Previously we would end up with the username being an empty string, but this makes Ranger unhappy. Hive uses the name "anonymous" in this situation, so we change Impala's behaviour too. This is configurable by -anonymous_user_name. -anonymous_user_name= reverts to the old behaviour. Test * Add an end-to-end test that exercises this via impala-shell for HS2, HS2-HTTP and beeswax protocols. * Tweak a couple of existing tests that depended on the previous behavior. Change-Id: I6db491231fa22484aed476062b8fe4c8f69130b0 Reviewed-on: http://gerrit.cloudera.org:8080/16902 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-09 00:15:25 +00:00
stiga-huang	cc8ecd0926	IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift versions In some branches that impala-shell still uses older version of thrift, e.g. thrift-0.9.3-p8, test_utf8_decoding_error_handling will fail since the internal string representation of thrift versions lower than 0.10.0 is still bytes. Strings won't be decoded to unicodes so there won't be any decoding errors. The test expects some bytes that can't be decoded correctly be replaced with U+FFFD so fails. This patch improve the test by also expecting results from older thrift versions. So it can be cherry-picked to older branches. Tests: - Verify the test in master branch and a downstream branch that still uses thrift-0.9.3-p8 in impala-shell. Change-Id: Ieb0baa9b3a1480673af77f7cc35c05eacf4b449f Reviewed-on: http://gerrit.cloudera.org:8080/16767 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-23 11:30:48 +00:00
Andrew Sherman	47dbfde0b2	IMPALA-10249: Fix the flaky TestImpalaShell.test_queries_closed test. This test for IMPALA-897 is testing that queries run by Impala Shell from a script file are closed correctly. This is tested by an assertion that there is one in-flight query during execution of a script containing several queries. The test then closes the shell and checks that there are no in-flight queries. This is the assertion which failed. Change this assertion to instead wait for the number of in-flight queries to be zero. This avoids whatever race was causing the flakiness. Change-Id: Ib0485097c34282523ed0df6faa143fee6f74676d Reviewed-on: http://gerrit.cloudera.org:8080/16743 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-19 03:19:55 +00:00
stiga-huang	9d6bf35090	IMPALA-10145,IMPALA-10299: Bump impala-shell thrift version to 0.11.0-p4 Currently, impala-shell depends on thrift-0.11.0-p2, while impala servers depend on thrift-0.9.3-p8. After 0.10.0, thrift changes its internal strings representation from bytes to unicode (THRIFT-3503) to support Python3. THRIFT-2087 and THRIFT-5303 are two patches for specifying an error handling method in decoding utf-8 strings in thrift. Without them, impala-shell may get an unexpected UnicodeDecodeError when decoding thrift objects from impala servers. This patch bumps impala-shell's thrift version to 0.11.0-p4 to include these two patches. Tests: - This is a regression after we bump impala-shell's thrift version to 0.11. Added a test to avoid the regression in the future. Change-Id: I0f9898639b5648658efc2d3c5c0ee4721fb85776 Reviewed-on: http://gerrit.cloudera.org:8080/16700 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-10 04:25:50 +00:00
Thomas Tauber-Marshall	01e1b4df80	IMPALA-10303: Fix warnings from impala-shell with --quiet When the --quiet flag is used with impala-shell, the intention is that if the query is successful then only the query results should be printed. This patch fixes two cases where --quiet was not being respected: - When using the HTTP transport and --client_connect_timeout_ms is set, a warning is printed that the timeout is not applied. - When running in non-interactive mode, a warning is printed that --live_progress is automatically disabled. This warning is now also only printed if --live_progress is actually set. Testing: - Added a test that runs a simple query with --quiet and confirms the output is as expected. Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe Reviewed-on: http://gerrit.cloudera.org:8080/16673 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 02:17:29 +00:00
Sahil Takiar	13f50eaec5	IMPALA-9229: impala-shell 'profile' to show original and retried queries Currently, the impala-shell 'profile' command only returns the profile for the most recent profile attempt. There is no way to get the original query profile (the profile of the first query attempt that failed) from the impala-shell. This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to add support for returning both the original and retried profiles for a retried query. When a query is retried, TGetRuntimeProfileResp currently contains the profile for the most recent query attempt. TGetRuntimeProfileReq has a new field called 'include_query_attempts' and when it is set to true, the TGetRuntimeProfileResp will include all failed profiles in a new field called failed_profiles / failed_thrift_profiles. impala-shell has been modified so the 'profile' command has a new set of options. The syntax is now: PROFILE [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original profiles are printed. If 'LATEST' is specified, only the latest profile is printed. If 'ORIGINAL' is printed, only the original profile is printed. The default behavior is equivalent to specifying 'LATEST' (which is the current behavior before this patch as well). Support for this has only been added to HS2 given that Beeswax is being deprecated soon. The new 'profile' options have no affect when the Beeswax protocol is used. Most of the code change is in impala-hs2-server and impala-server; a lot of the GetRuntimeProfile code has been re-factored. Testing: * Added new impala-shell tests * Ran core tests Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65 Reviewed-on: http://gerrit.cloudera.org:8080/16406 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-17 20:55:45 +00:00
Sahil Takiar	ea95691b77	IMPALA-9953: Shell should continue fetching even when 0 rows are returned The Impala shell stops fetching rows if it receives a batch that contains 0 rows. This is incorrect because a batch with 0 rows can be returned if the fetch request hits a timeout. Instead, the shell should rely on the value of has_rows / hasMoreRows to determine when to stop issuing fetch requests. Tests: * Added a regression test to test_shell_commandline.py * Ran all shell tests Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Reviewed-on: http://gerrit.cloudera.org:8080/16222 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-22 23:28:10 +00:00
Tim Armstrong	6ec6aaae8e	IMPALA-3695: Remove KUDU_IS_SUPPORTED Testing: Ran exhaustive tests. Change-Id: I059d7a42798c38b570f25283663c284f2fcee517 Reviewed-on: http://gerrit.cloudera.org:8080/16085 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-18 01:11:18 +00:00
Sahil Takiar	3088ca8580	IMPALA-9818: Add fetch size as option to impala shell Adds the option --fetch_size to the Impala shell. This new option allows users to specify the fetch size used when issuing fetch RPCs to the Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch). This parameter applies for all client protocols: beeswax, hs2, hs2-http. The default --fetch_size is set to 10240 (10x the default batch size). The new --fetch_size parameter is most effective when result spooling is enabled. When result spooling is disabled, Impala can only return a single row batch per fetch RPC (so 1024 rows by default). When result spooling is enabled, Impala can return up to 100 row batches per fetch request. Removes some logic in the the impala_client.py file that attempts to simulate a fetch_size. The code would issue multiple fetch requests to fullfill the given fetch_size. This logic is no longer needed now that result spooling is available. Testing: * Ran core tests * Added new tests in test_shell_client.py and test_shell_commandline.py Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b Reviewed-on: http://gerrit.cloudera.org:8080/16041 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Sahil Takiar <stakiar@cloudera.com>	2020-06-10 17:46:21 +00:00
Tim Armstrong	c43c03c5ee	IMPALA-3926: part 2: avoid setting LD_LIBRARY_PATH This removes LD_LIBRARY_PATH and LD_PRELOAD from the developer's shell and cleans it up. With the preceding change, toolchain utilities like clang can be run without a special LD_LIBRARY_PATH. This fixes a bug where libjvm.so was registered as a static instead of a shared library, which adds it to the RUNPATH variable in the binary, which provides a default search location that can be overriden by LD_LIBRARY_PATH. Impala binaries don't have the rpath baked in for some libraries, including Impala-lzo, libgcc and libstdc++. , so we still need to set LD_LIBRARY_PATH when running those. That is solved with wrapper scripts that sets the environment variables only when invoking those binaries, e.g. starting a daemon or running a backend test. I added three scripts because there were 3 sets of environment variables. The scripts are: * run-binary.sh: just sets LD_LIBRARY_PATH * run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH * start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and kerberos-related environment variables. The binaries, in almost all cases, work fine without those tweaks, because libstdc++ and libgcc are picked up along with libkuduclient.so from the toolchain (they are in the same directory). I decided to leave good enough alone here. run-binary.sh and friends can be used in any remaining edge cases to run binaries. An alternative to the 3 scripts would be to have an uber-script that set all the variables, but I felt that it was better to be specific about what each binary needed. Cleaning the LD_LIBRARY_PATH mess up has given me a distaste for scattershot setting of environment variables. I am open to revisiting this. Testing: * Ran tests on centos 7 * Manually tested that my dev env with LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued to work (for now). All ubuntu 16.04 and 18.04 dev envs that were set up with bootstrap_development.sh will be in this state. Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603 Reviewed-on: http://gerrit.cloudera.org:8080/14494 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-07 08:50:44 +00:00
Tim Armstrong	748e41ab41	IMPALA-9380: async query unregistration This change improves query latency by doing much of the heavyweight work of unregistering a query asynchronously, instead of synchronously on the RPC thread. The biggest win is to move the profile serialization off the RPC thread. Unregistration processing is done by a thread pool with 4 threads by default. This is configurable by --unregistration_thread_pool_size and --unregistration_thread_pool_queue_depth. This fixes a pre-existing bug where a query was temporarily neither in the in-flight queries nor the completed queries. It would be much easier to hit this with async unregistration because there is less synchronisation on the client side. Now the query is briefly in both maps, but this is handled as follows: * All places that look up both the maps will check the in-flight map first, and return a reference to the ClientRequestState, i.e. ignoring the entry in the query log. * The /queries page does not return completed queries if they were found in the in-flight queries map, so avoids duplicate results. The thread safety story changes slightly. Before this change, only one thread could remove the query from the map and close it, with only one thread "winning" the race to remove the ClientRequestState from the map. Since we leave the query in the map while being finalized, we instead use an atomic in ClientRequestState to ensure that only one thread does the finalization. Some misc cleanup was done as a result of these changes: * Fix a pre-existing TSAN race in RuntimeProfile that was revealed by the new concurrent unregister test. * Consolidate the various unknown query handle errors into an error code so that we consistently return the same string. * "Unregister query" should include flushing audit events. Testing: * Add a test that unregisters a query concurrent with other operations. * Ran exhaustive tests Perf: Ran TPC-H 30 with mt_dop=4. No regressions and some improvements: +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 5.38 \| -2.67% \| 4.02 \| -2.01% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+ \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 5.36 \| 5.17 \| +3.61% \| 1.82% \| 1.17% \| 5 \| +3.73% \| 1.73 \| 3.65 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.77 \| 1.74 \| +1.48% \| 2.00% \| 2.50% \| 5 \| +2.89% \| 0.87 \| 1.03 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 3.02 \| 3.00 \| +0.79% \| 2.18% \| 2.21% \| 5 \| +1.55% \| 0.00 \| 0.57 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 1.65 \| 1.64 \| +0.81% \| 1.35% \| 0.03% \| 5 \| +0.07% \| 1.15 \| 1.34 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 1.21 \| 1.21 \| -0.07% \| 2.11% \| 2.14% \| 5 \| -0.04% \| -0.29 \| -0.05 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.50 \| 2.52 \| -0.49% \| 2.43% \| 3.34% \| 5 \| -0.09% \| -0.29 \| -0.27 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 2.86 \| 2.90 \| -1.28% \| 2.30% \| 1.24% \| 5 \| -0.02% \| -0.58 \| -1.11 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 4.35 \| 4.40 \| -1.15% \| 1.76% \| 1.78% \| 5 \| -1.12% \| -0.87 \| -1.03 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 4.10 \| 4.17 \| -1.80% \| 1.05% \| 1.31% \| 5 \| -1.25% \| -1.73 \| -2.40 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 3.20 \| 3.25 \| -1.52% \| 0.79% \| 2.56% \| 5 \| -1.56% \| -0.58 \| -1.26 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 10.81 \| 11.07 \| -2.34% \| 5.00% \| 7.01% \| 5 \| -1.40% \| -0.58 \| -0.61 \| \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 11.19 \| 11.56 \| -3.18% \| 3.47% \| 6.02% \| 5 \| -0.90% \| -0.87 \| -1.03 \| \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 19.91 \| 20.32 \| -2.02% \| 0.66% \| 0.47% \| 5 \| -2.18% \| -2.31 \| -5.64 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 5.63 \| 5.77 \| -2.40% \| 1.71% \| 2.01% \| 5 \| -1.84% \| -1.73 \| -2.05 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 3.91 \| 4.03 \| -2.74% \| 1.08% \| 1.86% \| 5 \| -2.45% \| -1.44 \| -2.88 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 4.55 \| 4.71 \| -3.48% \| 1.90% \| 3.53% \| 5 \| -2.35% \| -1.44 \| -1.96 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 1.93 \| 2.01 \| -3.96% \| 0.05% \| 4.05% \| 5 \| -2.59% \| -2.31 \| -2.19 \| \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 4.52 \| 4.73 \| -4.26% \| 1.26% \| 2.43% \| 5 \| -3.40% \| -2.02 \| -3.51 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.02 \| 1.05 \| -3.58% \| 3.94% \| 2.36% \| 5 \| -4.56% \| -1.44 \| -1.79 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 9.52 \| 10.04 \| I -5.24% \| 2.14% \| 0.56% \| 5 \| I -4.67% \| -2.31 \| -5.57 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 3.49 \| 3.68 \| I -5.08% \| 0.07% \| 0.56% \| 5 \| I -5.66% \| -2.31 \| -20.08 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 11.92 \| 12.71 \| I -6.19% \| 0.57% \| 3.15% \| 5 \| I -4.99% \| -2.31 \| -4.33 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+ Change-Id: I80027b1baeb4ab453938c0f6357b120f4035ba08 Reviewed-on: http://gerrit.cloudera.org:8080/15821 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-05 10:12:42 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
David Knupp	c26e3db4bd	IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1 Upgrades the impala-shell's bundled version of sqlparse to 0.3.1. There were some API changes in 0.2.0+ that required a re-write of the StripLeadingCommentFilter in impala_shell.py. A slight perf optimization was also added to avoid using the filter altogether if no leading comment is readily discernible. As 0.1.19 was the last version of sqlparse to support python 2.6, this patch also breaks Impala's compatibility with python 2.6. No new tests were added, but all existing tests passed without modification. Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001 Reviewed-on: http://gerrit.cloudera.org:8080/15642 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-17 05:04:23 +00:00
Tim Armstrong	35d2718d36	IMPALA-9547: retry accept in test_shell_commandline This is a point solution to this particular socket.accept() call failing. The more general problem is described in https://www.python.org/dev/peps/pep-0475/ and fixed in Python 3.5. Change-Id: Icc9cab98b059042855ca9149427d079951471be0 Reviewed-on: http://gerrit.cloudera.org:8080/15541 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-24 20:31:19 +00:00
Thomas Tauber-Marshall	3fd6f60b22	IMPALA-9414 (part 2): Support the 'Expect: 100-continue' http header The 'Expect: 100-continue' http header allows http clients to send only the headers for their request, get a confirmation back from the server that the headers are valid, and only then send the body of the request, avoiding the overhead of sending large requests that will ultimately fail. This patch adds support for this in the HS2 HTTP server by having THttpServer look for the header, and if it's present and the request is validated returning a '100 Continue' response before reading the body of the request. It also adds supports for using this header on large requests sent by impala-shell. Testing: - This case is covered by the existing test_large_sql, however that test was previously broken and passing spuriously. This patch fixes the test. - Passed all other shell tests. Change-Id: I4153968551acd58b25c7923c2ebf75ee29a7e76b Reviewed-on: http://gerrit.cloudera.org:8080/15284 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2020-03-13 17:00:42 +00:00
Alice Fan	e1d1428181	IMPALA-9384: Improve Impala shell usability by enabling live_progress in interactive mode In order to improve usability, this patch makes Impala shell show query processing status while the query is running. The patch enables shell option live_progress by default when a user launches impala shell in the interactive mode. The patch also adds a new command line flag "--disable_live_progress", which allows a user to disable live_progress at runtime. In the interactive mode, a user can disable live_progress by either using the command line flag or setting the option as False in the config file. As for in the non-interactive mode (when the -q or -f options are used), live reporting is not supported. Impala-shell will disable live_progress if the mode is detected. Testing: - Added and updated tests in test_shell_interactive.py and test_shell_commandline.py - Successfully ran all shell related tests Change-Id: I3765b775f663fa227e59728acffe4d5ea9a5e2d3 Reviewed-on: http://gerrit.cloudera.org:8080/15219 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-03-09 21:28:19 +00:00
wzhou-code	66e6879e8c	IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 ImpalaShell.test_config_file failed in negative test case, which ran impala shell with bad format config file - wrong option name and wrong option value. The testing code expect impala shell return both warning and error messages. But on CentOS6/Python 2.6, Impala shell only return error message. To fix it, separate the test cases as two test cases by running Impala shell in two different config file. Testing: - Passed all test cases in test_shell_commandline.py and test_shell_interactive.py. - Passed all core test in pre-review-test. - Passed EE tests in impala-private-parameterized with CentOS6. Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Reviewed-on: http://gerrit.cloudera.org:8080/15139 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-04 00:55:54 +00:00
wzhou-code	6a23ec6985	IMPALA-6393: Add support for live_summary and live_progress in impalarc This patch adds support for live_summary and live_progress in impalarc. Testing: 1) Added unit-test cases in test_shell_commandline.py and test_shell_interactive.py for live_summary and live_progress. 2) Successfully ran all other tests in test_shell_interactive.py and test_shell_commandline.py Change-Id: If4549b775a7966ad89d661d0349cc78754e13a86 Reviewed-on: http://gerrit.cloudera.org:8080/14927 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-23 01:48:13 +00:00

1 2 3

147 Commits