impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 03:18:15 -05:00

Author	SHA1	Message	Date
wzhou-code	397d1d15a2	IMPALA-10745: Support Kerberos over HTTP for impala-shell This patch ports the implementation of GSSAPI authentication over http transport from Impyla (https://github.com/cloudera/impyla/pull/415) to impala-shell. The implementation adds a new dependency on 'kerberos' python module, which is a pip-installed module distributed under Apache License Version 2. When using impala-shell with Kerberos over http, it is assumed that the host has a preexisting kinit-cached Kerberos ticket that impala-shell can pass to the server automatically without the user to reenter the password. Testing: - Passed exhaustive tests. - Tested manually on a real cluster with a full Kerberos setup. Change-Id: Ia59ba4004490735162adbd468a00a962165c5abd Reviewed-on: http://gerrit.cloudera.org:8080/18493 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-10 03:22:41 +00:00
Steve Carlin	c1f99d1369	IMPALA-11213: Fixed impala-shell strict hs2 mode for large fetches The strict hs2 protocol mode is broken when fetching large results. The FetchResults.hasMoreRows field is always returned as false. When there are no more results, Hive returns an empty batch with no rows. HIVE-26108 has been filed to support the hasMoreRows field. Added a framework test that retrieves 1M rows from tpcds. The default number of rows returned from Hive is 10K so this should be more than enough to ensure that multiple fetches are done. Change-Id: Ife436d91e7fe0c30bf020024e20a5d8ad89faa24 Reviewed-on: http://gerrit.cloudera.org:8080/18370 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-04-02 20:30:35 +00:00
Abhishek Rawat	8e755e7571	IMPALA-11126: impala-shell: Support configurable socket timeout for http client In 'hs2-http' mode, the socket timeout is None, which could cause hang like symptoms in case of a problematic remote server. Added support for configurable socket timeout using the new impala-shell config option '--http_socket_timeout_s'. If a reasonable timeout is set, impala-shell client can retry in case of connection issues, when possible. The default value of '--http_socket_timeout_s' is set to None, to prevent behavior changes for existing clients. More details on socket timeout here: https://docs.python.org/3/library/socket.html#socket-timeouts Testing: - Added tests for various timeout values in test_shell_commandline.py - Ran e2e shell tests. Change-Id: I29fa4ff96cdcf154c3aac7e43340af60d7d61e94 Reviewed-on: http://gerrit.cloudera.org:8080/18336 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-04-01 16:31:19 +00:00
Steve Carlin	5cfdff03f7	IMPALA-11095: Fix Impala-shell strict_hs2 mode inserts The insert command was broken for impala-shell in the strict_hs2 mode. The return parameter for close_dml should return two parameters. The parameters returned by close_dml are rows returned and error rows. These are not supported by strict hs2 mode since the close does not return the TDmlResult structure. So the message to the end user also had to be changed. Change-Id: Ibe837c99e54d68d1e27b97f0025e17faf0a2cb9f Reviewed-on: http://gerrit.cloudera.org:8080/18176 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-02-04 07:42:52 +00:00
Steve Carlin	bb9fb663ce	IMPALA-10778: Allow impala-shell to connect directly to HS2 Impala-shell already uses HS2 protocol to connect to Impalad. This commit allows impala-shell to connect to any server (for example, Hive) using the hs2 protocol. This will be done via the "--strict_hs2_protocol" option. When the "--strict_hs2_protocol" option is turned on, only features supported by hs2 will work. For instance, "runtime-profile" is an impalad specific feature and will be disabled. The "--strict_hs2_protocol" will only work on servers that abide by the strict definition of what is supported by HS2. So one will be able to connect to Hive in this mode, but connections to Impala will not work. Any feature supported by Hive (e.g. kerberos authentication) should work as well. Note: While authentication should work, the test framework is not set up to create an HS2 server that does authentication at this point so this feature should be used with caution. Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e Reviewed-on: http://gerrit.cloudera.org:8080/17660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-27 09:45:59 +00:00
wzhou-code	2b815cbd51	IMPALA-10784: Add support for retaining cookies in impala-shell IMPALA-10234 added support for cookie authentication for LDAP to impala-shell. But it does not accept user input cookie name via startup flags, and it retains only one cookie. In some scenarios, we could use proxy to manage the sessions with additional HTTP cookies added by proxy. This patch made cookie support more generic for impala-shell. It lets the user specify cookie names via a startup flag "--http_cookie_names" and could retain more than one cookies. Testing: - Manualy tested the multiple cookies in HTTP headers with a customized Impala server which could send and receive multiple cookies. - Passed core test, including new test cases. Change-Id: I193422d5ec891886a522d82ecb0e9d974132ff2a Reviewed-on: http://gerrit.cloudera.org:8080/17667 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-13 00:07:11 +00:00
Steve Carlin	09454cea79	IMPALA-10750: Impala-shell changes for HS2 compatibility Need some changes to impala-shell to make the client more HS2 compatible, including: - when the fetch returns the bitset containing nulls, the lack of presence of bits means it is not null. Currently it will fail the query. - adding fetchType to TCLIServiceThrift structure (though unused currently in Impala) Also a small refactor was done to put the functionality that retrieves all query options into its own function. Change-Id: Id3a4c4ce8a5d60db136df1743f32dba22172ee13 Reviewed-on: http://gerrit.cloudera.org:8080/17590 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2021-06-24 17:45:27 +00:00
Csaba Ringhofer	f672c315bc	IMPALA-10682: Add buffering to hs2-http client in impala-shell This change reduces to following command from 8.5s to 1.5s on my machine: shell/impala_shell.py -B -q "select * from tpch_parquet.lineitem limit 100000;" --protocol hs2-http > /dev/null This nearly eliminates the speed difference between hs2 and hs2-http. The root cause of the original slowness is the large number of calls to socket.recv(). The query above used to call it 2809090 times, now it is only 9007. Testing: - ran shell tests Change-Id: If11f287be65b10bee2b0afffea118e3dc70fdbbd Reviewed-on: http://gerrit.cloudera.org:8080/17346 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2021-04-28 06:34:34 +00:00
Csaba Ringhofer	94f67a3432	IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Reviewed-on: http://gerrit.cloudera.org:8080/17170 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-27 13:36:54 +00:00
Abhishek Rawat	573c60a298	IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign ImpalaHS2Client._open_session() has a 'retry_msg' variable which was not initialized in the code-path where retry was disabled. If an exception was hit with retry disabled, a compile time error was generated. The fix is to initialize 'retry_msg' in the non retry code-path. Testing: - Forced exception in ImpalaHS2Client._open_session() and verified that proper error message was generated. - Ran impala-shell e2e and custom cluster tests. Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Reviewed-on: http://gerrit.cloudera.org:8080/17172 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-16 01:14:06 +00:00
Attila Jeges	1c72c5a8f9	IMPALA-10234: Add support for cookie authentication to impala-shell IMPALA-8584 added support for cookie authentication to Impala. This change adds cookie authentication support to impala-shell as well when using 'hs2-http' protocol. Testing: - Unit tests were added to test cookie handling methods. - Tested e2e manually with nginx HTTP proxy. TODO: - Test with Knox HTTP proxy as well. Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51 Reviewed-on: http://gerrit.cloudera.org:8080/16660 Reviewed-by: Attila Jeges <attilaj@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-17 19:21:19 +00:00
Andrew Sherman	ea13e74497	IMPALA-10309: Use sleep time from a Retry-After header in Impala Shell When Impala Shell receives an http error message (that is a message with http code greater than or equal to 300), it may sleep for a time before retrying. If the message contains a 'Retry-After' header that has an integer value, then this will be used as the time for which to sleep. The implementation is to use a new HttpError exception (similar to that used in Impyla) which includes more information from the error message (including the headers) so that catchers of the exception can use the 'Retry-After' header if appropriate. TESTING: Hand testing with a proxy that uses the 'Retry-After' header. Added new tests that use the fault injection framework in test_hs2_fault_injection.py Change-Id: I2b4226e7723d585d61deb4d1d6777aac901bfd93 Reviewed-on: http://gerrit.cloudera.org:8080/16702 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-11 07:08:12 +00:00
Thomas Tauber-Marshall	01e1b4df80	IMPALA-10303: Fix warnings from impala-shell with --quiet When the --quiet flag is used with impala-shell, the intention is that if the query is successful then only the query results should be printed. This patch fixes two cases where --quiet was not being respected: - When using the HTTP transport and --client_connect_timeout_ms is set, a warning is printed that the timeout is not applied. - When running in non-interactive mode, a warning is printed that --live_progress is automatically disabled. This warning is now also only printed if --live_progress is actually set. Testing: - Added a test that runs a simple query with --quiet and confirms the output is as expected. Change-Id: I1e94c9445ffba159725bacd6f6bc36f7c91b88fe Reviewed-on: http://gerrit.cloudera.org:8080/16673 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 02:17:29 +00:00
stiga-huang	2a1d3acaf1	IMPALA-9870: impala-shell 'summary' to show original and retried queries This patch extends the 'summary' command of impala-shell to support retrieving the summary of the original query attempt. The new syntax is SUMMARY [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original summaries are printed. If 'LATEST' is specified, only the summary of the latest query attempt is printed. If 'ORIGINAL' is specified, only the summary of the original query attempt is printed. The default option is 'LATEST'. Support for this has only been added to HS2 given that Beeswax is being deprecated soon. Tests: - Add new tests in test_shell_interactive.py Change-Id: I8605dd0eb2d3a2f64f154afb6c2fd34251c1fec2 Reviewed-on: http://gerrit.cloudera.org:8080/16502 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-24 05:11:06 +00:00
Attila Jeges	3d067572dd	IMPALA-10224: Add startup flag not to expose debug web url to clients This patch introduces a new startup flag --ping_expose_webserver_url (true by default) to control whether PingImpalaService, PingImpalaHS2Service RPC calls should expose the debug web url to the client or not. This is necessary as the debug web UI is not something that end-users will necessarily have access to. If the flag is set to false, the RPC calls will return an empty string instead of the real url signalling that the debug web ui is not available. Note that if the webserver is disabled (--enable_webserver flag is set to false) the RPC calls will behave the same and return an empty string for the url. Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc Reviewed-on: http://gerrit.cloudera.org:8080/16573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-14 14:39:39 +00:00
Sahil Takiar	13f50eaec5	IMPALA-9229: impala-shell 'profile' to show original and retried queries Currently, the impala-shell 'profile' command only returns the profile for the most recent profile attempt. There is no way to get the original query profile (the profile of the first query attempt that failed) from the impala-shell. This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to add support for returning both the original and retried profiles for a retried query. When a query is retried, TGetRuntimeProfileResp currently contains the profile for the most recent query attempt. TGetRuntimeProfileReq has a new field called 'include_query_attempts' and when it is set to true, the TGetRuntimeProfileResp will include all failed profiles in a new field called failed_profiles / failed_thrift_profiles. impala-shell has been modified so the 'profile' command has a new set of options. The syntax is now: PROFILE [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original profiles are printed. If 'LATEST' is specified, only the latest profile is printed. If 'ORIGINAL' is printed, only the original profile is printed. The default behavior is equivalent to specifying 'LATEST' (which is the current behavior before this patch as well). Support for this has only been added to HS2 given that Beeswax is being deprecated soon. The new 'profile' options have no affect when the Beeswax protocol is used. Most of the code change is in impala-hs2-server and impala-server; a lot of the GetRuntimeProfile code has been re-factored. Testing: * Added new impala-shell tests * Ran core tests Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65 Reviewed-on: http://gerrit.cloudera.org:8080/16406 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-17 20:55:45 +00:00
Adam Tamas	fe6e625747	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: I0c5f1290356e21aed8ca7f896f953541942aed05 Reviewed-on: http://gerrit.cloudera.org:8080/16418 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>	2020-09-05 09:42:46 +00:00
Csaba Ringhofer	b7965d8240	Revert "IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix" This reverts commit `75146c9138`. Change-Id: I57f790389a8c847877999d2b9b8185939b416c07 Reviewed-on: http://gerrit.cloudera.org:8080/16417 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:28:56 +00:00
Adam Tamas	75146c9138	IMPALA-10012: ds_hll_sketch() results ascii codec decoding error fix While the ds_hll_sketch() generates a string value as output the data is not an ascii encoded text but a bitsketch, because of this, when the shell get this data it disconnect while it tries to decode it. The issue can be reproduced with a simple method like using unhex with a wrong input. Example: SELECT unhex("aa"); This patch contains a solution, where we replace any not UTF-8 decodable characters if we run into an UnicodeDecodeError after fetching it. This solution is working with the Thrift 0.9.3 autogenerated gen-py but still fails with Thrift 0.11.0. For Thrift 0.11.0 the error is catched and an error message is sent (not working with beeswax protocol, because it generates a different error (TypeError) which can come for other reasons too). Testing: -manual testing with these protocols: 'hs2-http', 'hs2', 'beeswax' Change-Id: Ic5cfb907871ca83e5f04a39ca9d7a8e138d711a8 Reviewed-on: http://gerrit.cloudera.org:8080/16305 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2020-09-04 12:18:28 +00:00
stiga-huang	931063f0f2	IMPALA-9213: Add query retry info to GetLog result Beeswax clients use get_log() to retrieve the warning/error message after the query finishes. HS2 clients use GetLog() for the same purpose. This patch adds the retry information into the returned result if the query is retried. So clients that print the log can show the original query failure and the retried query id. This patch also modifies impala-shell to extract the retried query id and print the retried query link. Here's an example of the impala-shell output: Query: select count() from functional.alltypes where bool_col = sleep(60) Query submitted at: 2020-06-18 21:23:52 (Coordinator: http://quanlong-OptiPlex-BJ:25000) Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=7944ffee4d81cdd4:e7f9357a00000000 +----------+ \| count() \| +----------+ \| 3650 \| +----------+ WARNINGS: Original query failed: Failed due to unreachable impalad(s): quanlong-OptiPlex-BJ:22001 Query has been retried using query id: 934b2734f67a1161:a0dbd60200000000 Retried query link: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=934b2734f67a1161:a0dbd60200000000 Tests: - Add tests in test_query_retries.py to verify client logs returned from GetLog(). - Run test_query_retries.py. - Manually run queries in impala-shell and kill impalads. Verify printed messages when the retried queries succeed or fail. Change-Id: I58cf94f91a0b92eb9a3088bee3894ac157a954dc Reviewed-on: http://gerrit.cloudera.org:8080/16093 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-30 05:58:02 +00:00
Sahil Takiar	3088ca8580	IMPALA-9818: Add fetch size as option to impala shell Adds the option --fetch_size to the Impala shell. This new option allows users to specify the fetch size used when issuing fetch RPCs to the Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch). This parameter applies for all client protocols: beeswax, hs2, hs2-http. The default --fetch_size is set to 10240 (10x the default batch size). The new --fetch_size parameter is most effective when result spooling is enabled. When result spooling is disabled, Impala can only return a single row batch per fetch RPC (so 1024 rows by default). When result spooling is enabled, Impala can return up to 100 row batches per fetch request. Removes some logic in the the impala_client.py file that attempts to simulate a fetch_size. The code would issue multiple fetch requests to fullfill the given fetch_size. This logic is no longer needed now that result spooling is available. Testing: * Ran core tests * Added new tests in test_shell_client.py and test_shell_commandline.py Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b Reviewed-on: http://gerrit.cloudera.org:8080/16041 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Sahil Takiar <stakiar@cloudera.com>	2020-06-10 17:46:21 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
Abhishek Rawat	fc784f6e95	IMPALA-9466: impala-shell client retry for hs2-http protocol Added retries for idempotent rpcs: OpenSession, PingImpalaHS2Service, GetResultSetMetadata, CloseImpalaOperation (non dmls), CancelOperation, GetOperationStatus, GetRuntimeProfile, GetExecSummary, GetLog Retries were also added to the 'set all' query execution and subsequent result fetch in the ImpalaHS2Client._open_session() The retries are only supported for hs2-http protocol and enabled by default. At most there are 3 retries for a failed rpc. There is a sleep duration of 'n' seconds after nth retry. Only failed rpcs due to an error in the http transport are retried and if an rpc failed because the server returned an error in the rpc response then such scenarios are not retriable. Improved error diagnostics by dumping stack trace when ImpalaShell. _execute_stmt() gets an 'Unknown Exception'. Testing: - Added a custom_cluster test which injects fault into the http transport and checks expected behavior from the various rpcs. Some of these tests leave the session in an open state and so these tests are not suitable for the e2e test framework which have metric verifiers expecting related metrics to be 0 at the end of the test. - Manually tested real world scenarios with impala-shell client communicating with an impala coordinator via a fault injecting istio mesh. - Manually tested dropping connections on an nginx ingress gateway by sending SIGTERM to all worker processes. Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Reviewed-on: http://gerrit.cloudera.org:8080/15378 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-28 00:32:18 +00:00
David Knupp	ed70492580	IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports A few built-ins were changed in python 3 -- e.g., xrange became range, ConfigParser became configparser, etc. We can redefine some of those things in a single place, and import them from there as needed. Other items may also be added as we go along. Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Reviewed-on: http://gerrit.cloudera.org:8080/15514 Reviewed-by: David Knupp <dknupp@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-21 19:52:07 +00:00
David Knupp	ed15c2c58f	IMPALA-3343: Part 1 -- Fix simple python 2->3 syntax errors In an effort to keep the work of reviewing the changes more manageable with regard to making the impala-shell python3 compatible, I'm trying to break the patches up into smaller chunks. The first patch is the easiest one -- simply addressing the handful of syntax issues that aren't python 3 compatible, namely changing the print statements to function calls, changing the way we catch exceptions, and adding a few simple branches to work around the removal of such things as dict.iteritems(). We needed the print function imported from __future__ because it allows us to pass in a file descriptor, e.g., sys.stderr. Notably, there's nothing in this patch related to string/bytes/unicode changes from python 2 to 3. Change-Id: I9a515da01ef03d5936cb1a4d9e4bc6d105386b1d Reviewed-on: http://gerrit.cloudera.org:8080/15487 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-20 03:10:07 +00:00
Thomas Tauber-Marshall	3fd6f60b22	IMPALA-9414 (part 2): Support the 'Expect: 100-continue' http header The 'Expect: 100-continue' http header allows http clients to send only the headers for their request, get a confirmation back from the server that the headers are valid, and only then send the body of the request, avoiding the overhead of sending large requests that will ultimately fail. This patch adds support for this in the HS2 HTTP server by having THttpServer look for the header, and if it's present and the request is validated returning a '100 Continue' response before reading the body of the request. It also adds supports for using this header on large requests sent by impala-shell. Testing: - This case is covered by the existing test_large_sql, however that test was previously broken and passing spuriously. This patch fixes the test. - Passed all other shell tests. Change-Id: I4153968551acd58b25c7923c2ebf75ee29a7e76b Reviewed-on: http://gerrit.cloudera.org:8080/15284 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2020-03-13 17:00:42 +00:00
Thomas Tauber-Marshall	c3d65cab55	IMPALA-9414 (part 1): Copy THttpClient from Thrift into Impala This is a prelimary patch that simply copies THttpClient.py from Thrift master into Impala, changes imports as appropriate, and adjusts the formatting from 4 spaces to 2 spaces. This is to allow us to make modifications to THttpClient in future patches. There are no functional changes in this patch. Change-Id: I2662f1d4d455120442ef7c0c198685c07207aeed Reviewed-on: http://gerrit.cloudera.org:8080/15283 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2020-03-13 17:00:42 +00:00
Tim Armstrong	0bb056e525	IMPALA-4224: execute separate join builds fragments This enables parallel plans with the join build in a separate fragment and fixes all of the ensuing fallout. After this change, mt_dop plans with joins have separate build fragments. There is still a 1:1 relationship between join nodes and builders, so the builders are only accessed by the join node's thread after it is handed off. This lets us defer the work required to make PhjBuilder and NljBuilder safe to be shared between nodes. Planner changes: * Combined the parallel and distributed planning code paths. * Misc fixes to generate reasonable thrift structures in the query exec requests, i.e. containing the right nodes. * Fixes to resource calculations for the separate build plans. Calculate separate join/build resource consumption. Simplified the resource estimation by calculating resource consumption for each fragment separately, and assuming that all fragments hit their peak resource consumption at the same time. IMPALA-9255 is the follow-on to make the resource estimation more accurate. Scheduler changes: * Various fixes to handle multiple TPlanExecInfos correctly, which are generated by the planner for the different cohorts. * Add logic to colocate build fragments with parent fragments. Runtime filter changes: * Build sinks now produce runtime filters, which required planner and coordinator fixes to handle. DataSink changes: * Close the input plan tree before calling FlushFinal() to release resources. This depends on Send() not holding onto references to input batches, which was true except for NljBuilder. This invariant is documented. Join builder changes: * Add a common base class for PhjBuilder and NljBuilder with functions to handle synchronisation with the join node. * Close plan tree earlier in FragmentInstanceState::Exec() so that peak resource requirements are lower. * The NLJ always copies input batches, so that it can close its input tree. JoinNode changes: * Join node blocks waiting for build-side to be ready, then eventually signals that it's done, allowing the builder to be cleaned up. * NLJ and PHJ nodes handle both the integrated builder and the external builder. There is a 1:1 relationship between the node and the builder, so we don't deal with thread safety yet. * Buffer reservations are transferred between the builder and join node when running with the separate builder. This is not really necessary right now, since it is all single-threaded, but will be important for the shared broadcast. - The builder transfers memory for probe buffers to the join node at the end of each build phase. - At end of each probe phase, reservation needs to be handed back to builder (or released). ExecSummary changes: * The summary logic was modified to handle connecting fragments via join builds. The logic is an extension of what was used for exchanges. Testing: * Enable --unlock_mt_dop for end-to-end tests * Migrate some tests to run as part of end-to-end tests instead of custom cluster. * Add mt_dop dimension to various end-to-end tests to provide coverage of join queries, spill-to-disk and cancellation. * Ran a single node TPC-H and TPC-DS stress test with mt_dop=0 and mt_dop=4. Perf: * Ran TPC-H scale factor 30 locally with mt_dop=0. No significant change. Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Reviewed-on: http://gerrit.cloudera.org:8080/14859 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-20 01:51:54 +00:00
Andrew Sherman	ed5e7dae94	IMPALA-9240: add HTTP code handling to THttpClient. Before this change Impala Shell is not checking HTTP return codes when using the hs2-http protocol. The shell is sending a request message (e.g. send_CloseOperation) but the HTTP call to send this message may fail. This will result in a failure when reading the reply (e.g. in recv_CloseOperation) as there is no reply data to read. This will typically result in an 'EOFError'. In code that overrides THttpClient.flush(), check the HTTP code that is returned after the HTTP call is made. If the code is not 1XX (informational response) or 2XX (successful) then throw an RPCException. This change does not contain any attempt to recover from an HTTP failures but it does allow the failure to be detected and a message to be printed. In future it may be possible to retry after certain HTTP errors. Testing: - Add a new test for impala-shell that tries to connect to an HTTP server that always returns a 503 error. Check that an appropriate error message is printed. Change-Id: I3c105f4b8237b87695324d759ffff81821c08c43 Reviewed-on: http://gerrit.cloudera.org:8080/14924 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-12-20 00:14:00 +00:00
norbert.luksa	2114fc6155	IMPALA-4618: Fixing #Hosts and adding #Instances in exec summary When mt_dop > 0, the summary is reporting the number of fragment instances, instead of the number of hosts as the header would imply. This commit fixes the issue so the number of hosts will be shown under the #Hosts column. The commit also adds an #Inst column where the number of instances are shown (current behaviour). Tests: * Changed profile tests with mt_dop > 0. * Updated benchmark tests and shell tests accordingly. Change-Id: I3bdf9a06d9bd842b2397cd16c28294b6bec7af69 Reviewed-on: http://gerrit.cloudera.org:8080/14715 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-26 07:28:23 +00:00
David Knupp	ba808f67dd	IMPALA-1071: Distributable python package for impala-shell The patch adds a set of scripts for converting the impala-shell into a true distributable python package. The package can be installed using familiar python commands, e.g.: $ python setup.py (install\|develop) or $ pip install -e /path/to/dist/dir The entry point script, make_python_package.sh, will run as a part of the standard sequence of steps that results from calling buildall.sh, and will produce a gzipped tarball inside of Impala/shell/dist as an artifact. Thereafter, make_python_package.sh can be run manually any time. The expectation is that an official maintainer would need to manually upload official releases to the Python Package Index as appropriate. Change-Id: Ib8c745bddddf6a16f0c039430152745a2f00e044 Reviewed-on: http://gerrit.cloudera.org:8080/14181 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-10 06:50:50 +00:00
Tim Armstrong	954e810b0e	IMPALA-8932: shell shouldn't retry kerberos over http Change-Id: I5dde277a6a0ddbe5a919bcf376bbc19f0b48e95e Reviewed-on: http://gerrit.cloudera.org:8080/14201 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-10 04:25:04 +00:00
Bharath Vissapragada	fafb2c9786	IMPALA-8864: Handle py ssl library incompatibility in http mode Older python versions shipped ssl libraries that did not implement SSLContext class. THttpClient relies on it. This patch, - Fails the shell gracefully when such a python version is used. - Skips the http test dimension when running the test suite on a machine that ships such a python verison (centos 6). Change-Id: I28846bde0b8bb8f787e6330cddf91645dba4160e Reviewed-on: http://gerrit.cloudera.org:8080/14069 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-08-15 14:01:27 +00:00
Bharath Vissapragada	72c9370856	IMPALA-8717: impala-shell support for HS2 HTTP endpoint Adds impala-shell support to connect to HiveServer2 HTTP endpoint. Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/. Use --protocol='hs2-http' to enable this behavior. Example usages: --------------- impala-shell --protocol='hs2-http' (No auth) impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth) impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS) impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP + TLS) Limitations: ----------- - Does not support Kerberos (-k) due to lack ot SPNEGO support. Testing: -------- - Parameterized existing shell tests to support this combination. - Added shell test coverage for LDAP auth. Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534 Reviewed-on: http://gerrit.cloudera.org:8080/13746 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-07-29 05:43:48 +00:00
Tim Armstrong	f1f3ae9ec2	IMPALA-7290: part 2: Add HS2 support to Impala shell HS2 is added as an option via --protocol=hs2. The user-visible differences in behaviour are minimal. Beeswax is still the default and can be explicitly enabled via --protocol=beeswax but will be deprecated. The default is unchanged because changing the default could break certain workflows, e.g. those that explicitly specify the port with -i or deployments that hit --fe_service_threads for HS2 and somehow rely on impala-shell not contributing to that limit. For most workflows the change is transparent and we should change the default in a major version change. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Differences between HS2 and Beeswax are abstracted into ImpalaClient subclasses. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. This possibly addresses IMPALA-1789, although we need to confirm that this is a sufficient solution. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. This also requires test changes where the tests set non-default values, e.g. for ABORT_ON_ERROR. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. * Consistently close connections in the shell to avoid leaking HS2 sessions. This is enforced by making ImpalaShell a context manager and also eliminating all sys.exit() calls that would bypass the explicit connection closing. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Verified that newly-added tests were actually going through HS2 by disabling hs2 on the minicluster and running tests. * Add checks to test_verify_metrics.py to ensure that no sessions are left open at the end of tests. Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real 0m6.708s user 0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real 0m7.625s user 0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Reviewed-on: http://gerrit.cloudera.org:8080/12884 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 10:23:28 +00:00
Yongjun Zhang	7cc9092212	IMPALA-5474: Adding a trivial subquery turns error into warning After adding a subquery to a query that fails with ERROR, it fails with WARNING. The fix here makes it return ERROR. Testing: Added unit tests; Done real cluster testing with reported cases. Change-Id: Ibedb11dd3d50bcdb21d508f7d21691925491946e Reviewed-on: http://gerrit.cloudera.org:8080/12022 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-01-04 21:51:48 +00:00
Tim Armstrong	d8792c21c5	IMPALA-1048: show sinks in exec summary The exec summary now includes the total time taken and memory consumed by the data sink at the root of each fragment. Previously the exec summary could hide where time and memory went while executing a query. The high-level changes are: * Generalising logic in the exec summary and runtime profile to handle data sinks, not just plan nodes, including adding richer metadata to runtime profile nodes. * Threading through metadata about the data sinks, like names and estimates, so that it can appear in the exec summary. The major potential downside is that the new timings reported for data stream sender can overlap with the receiver's time and potentially cause confusion. [localhost:21000] default> select count(distinct l_comment) from tpch_parquet.lineitem; summary; Query: select count(distinct l_comment) from tpch_parquet.lineitem Query submitted at: 2018-11-20 16:47:03 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=f5464383a3bb6878:54b5252b00000000 +---------------------------+ \| count(distinct l_comment) \| +---------------------------+ \| 4580667 \| +---------------------------+ Fetched 1 row(s) in 4.53s +---------------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ \| Operator \| #Hosts \| Avg Time \| Max Time \| #Rows \| Est. #Rows \| Peak Mem \| Est. Peak Mem \| Detail \| +---------------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ \| F02:ROOT \| 1 \| 50.56us \| 50.56us \| \| \| 0 B \| 0 B \| \| \| 06:AGGREGATE \| 1 \| 275.89us \| 275.89us \| 1 \| 1 \| 16.00 KB \| 10.00 MB \| FINALIZE \| \| 05:EXCHANGE \| 1 \| 49.08us \| 49.08us \| 3 \| 1 \| 32.00 KB \| 16.00 KB \| UNPARTITIONED \| \| F01:EXCHANGE SENDER \| 3 \| 100.06us \| 113.49us \| \| \| 16.00 KB \| 0 B \| \| \| 02:AGGREGATE \| 3 \| 19.32ms \| 19.57ms \| 3 \| 1 \| 16.00 KB \| 10.00 MB \| \| \| 04:AGGREGATE \| 3 \| 1.29s \| 1.43s \| 4.58M \| 4.65M \| 98.02 MB \| 62.63 MB \| \| \| 03:EXCHANGE \| 3 \| 241.64ms \| 246.54ms \| 5.01M \| 4.65M \| 9.05 MB \| 10.12 MB \| HASH(l_comment) \| \| F00:EXCHANGE SENDER \| 3 \| 2.43s \| 2.58s \| \| \| 337.53 KB \| 0 B \| \| \| 01:AGGREGATE \| 3 \| 1.26s \| 1.46s \| 5.01M \| 4.65M \| 97.20 MB \| 121.17 MB \| STREAMING \| \| 00:SCAN HDFS \| 3 \| 39.87ms \| 41.36ms \| 6.00M \| 6.00M \| 27.87 MB \| 80.00 MB \| tpch_parquet.lineitem \| +---------------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ Testing: Added a basic observability test. Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Reviewed-on: http://gerrit.cloudera.org:8080/11967 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-07 07:33:01 +00:00
aphadke	a18809ba1f	IMPALA-7943: Bump the default client timeout set on impala-shell As part of IMPALA-7555, we added a default socket timeout of 5 seconds when connecting to an impalad. Under heavy load with kerberos and SSL enabled, we could hit this default timeout. This change bumps up the timeout to 60 secs to make the impala-shell more robust. Change-Id: Ifc40069e86cbf93634320804efba003fb5551afe Reviewed-on: http://gerrit.cloudera.org:8080/12051 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-07 06:57:24 +00:00
aphadke	2fb8ebaef2	IMPALA-7555: Set socket timeout in impala-shell impala-shell does not set any socket timeout while connecting to the impala server. This change sets a timeout on the socket before connecting and unsets it back after successfully connecting. The default timeout on this socket is 5 sec. Usage: impala-shell --client_connect_timeout=<value in ms> Testing: 1. Added a test where I create a random listening socket. impala-shell (with ssl enabled) connects to this socket and times out after 2 sec. 2. Created a kerberized impala cluster with ssl enabled and connected to the impalad using an openssl client (block the beeswax server thread to accept new connection) - E.g. - openssl s_client -connect <IP Addr>:21000 Used impala-shell to connect to the same impalad later. impala-shell timed out after the default of 5 sec.I verified it manually. Change-Id: I130fc47f7a83f591918d6842634b4e5787d00813 Reviewed-on: http://gerrit.cloudera.org:8080/11540 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-18 01:41:42 +00:00
Nghia Le	72db58acd0	IMPALA-6490: Reconnect shell when remote restarts If the remote impalad died while the shell waited for a command to complete, the shell disconnected. Previously after restarting the remote impalad, we needed to run "connect;" to reconnect, now the shell will automatically reconnect. Testing: Added test_auto_connect_after_impalad_died in test_shell_interactive_reconnect.py Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Reviewed-on: http://gerrit.cloudera.org:8080/10992 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-31 21:50:33 +00:00
Vincent Tran	e7d5a25a45	IMPALA-7130: impala-shell -b / --kerberos_host_fqdn flag overrides value passed in via -i / --impalad After additional testing around IMPALA-2782, it was discovered that impala-shell starts the session displaying the expected hostname (as passed -i flag) on the prompt. This gives the impression that the load balancer was bypassed, however the actual TSSLSocket is still created with the hostname passed in via the -b or --kerberos_host_fqdn flag. This change ensures that the hostname used to create the TSSLSocket will always be the one passed in via the -i flag on impala-shell. This change is required by IMPALA-2782. Testing: Using netcat, we verified that the impala daemon host[:port] value passed into the -i/--impalad option is indeed the one impala-shell tries to connect to in both cases (with and without -b) Change-Id: Ibee05bd0dbe8c6ae108b890f0ae0f6900149773a Reviewed-on: http://gerrit.cloudera.org:8080/10580 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-17 06:45:38 +00:00
Zoltan Borok-Nagy	2ee914d5b3	IMPALA-5903: Inconsistent specification of result set and result set metadata Before this commit it was quite random which DDL oprations returned a result set and which didn't. With this commit, every DDL operations return a summary of its execution. They declare their result set schema in Frontend.java, and provide the summary in CalatogOpExecutor.java. Updated the tests according to the new behavior. Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9 Reviewed-on: http://gerrit.cloudera.org:8080/9090 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-11 02:21:48 +00:00
Vincent Tran	2c1fbecc9f	IMPALA-2782: Allow impala-shell to connect directly to impalad when configured with load balancer and kerberos. This change adds an impala-shell option -b / --kerberos_host_fqdn. This allows user to optionally specify the load-balancer's host so that impala-shell will accept a direct connection to impala daemons in a kerberized cluster. Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9 Reviewed-on: http://gerrit.cloudera.org:8080/7241 Reviewed-by: <andy@phdata.io> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 20:45:48 +00:00
Gabor Kaszab	6d9da17288	IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C Issue 1: When query is cancelled via CTRL-C while being executed in Impala-shell then an exception is thrown from Impala backend saying 'Invalid query handle'. This is because one ImpalaClient was making RPC's while another ImpalaClient cancelled the query on the backend. As a result RPC handlers in ImpalaServer try to access a ClientRequestState that had been cleared from the backend. The issue is confidently reproducable both in wait_to_finish and in fetch states of the query. As a solution the query cancellation is indicated to ImpalaClient via a bool flag. Once a cancellation originated exception reaches Impala shell this flag is checked to decide whether to suppress the error or not. Issue 2: Every time a query was cancelled a 'use db' command was issued automatically. This happened to historical reasons but is not needed anymore (see Jira for more details). Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Reviewed-on: http://gerrit.cloudera.org:8080/8549 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:44:51 +00:00
Gabor Kaszab	88cb68cfbe	IMPALA-2181: Add query option levels for display Four display levels are introduced for each query option: REGULAR, ADVANCED, DEVELOPMENT and DEPRECATED. When the query options are displayed in Impala shell using SET then only the REGULAR and ADVANCED options are shown. A new command called SET ALL shows all the options grouped by their option levels. When the query options are displayed through the SET SQL statement then the result set would contain an extra column indicating the level of each option. Similarly to Impala shell here the SET command only diplays the REGULAR and ADVANCED options while SET ALL shows them all. If the Impala shell connects to an Impala daemon that predates this change then all the options would be displayed in the REGULAR group. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Reviewed-on: http://gerrit.cloudera.org:8080/8447 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-28 00:31:15 +00:00
Thomas Tauber-Marshall	6757b6235c	IMPALA-5708: Test failure with invalid exec summary For some queries, the exec summary will not be completely filled in even if the query is FINISHED. In particular, the exec_stats field may not be set. This was causing an error in our test code that converts the exec summary to a more usable format. The situation is essentially deterministic for some queries, but it was being hidden by testing code that caught the error and discarded it in most situations, leading to flaky tests. This patch removes the 'try' that was hiding the error and makes the code check for the presence of exec_stats and handle it rather than generating an error. I filed IMPALA-5783 for followup work to be more rigorous about when the exec summary should and shouldn't be fully present. Testing: - Ran the affected tests in a loop and they are no longer flaky. Change-Id: Id52ac62da2b01f9e163e97cbe4590f8db6b663d2 Reviewed-on: http://gerrit.cloudera.org:8080/7627 Tested-by: Impala Public Jenkins Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2017-08-14 19:35:12 +00:00
Matthew Jacobs	77a2941a42	IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting Adds support in the shell to report the number of modified rows for all DML operations, as well as the number of rows with errors. Testing: Added shell tests. Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Reviewed-on: http://gerrit.cloudera.org:8080/5103 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-17 04:13:25 +00:00
Amos Bird	628685ae74	IMPALA-1654: General partition exprs in DDL operations. This commit handles partition related DDL in a more general way. We can now use compound predicates to specify a list of partitions in statements like ALTER TABLE DROP PARTITION and COMPUTE INCREMENTAL STATS, etc. It will also make sure some statements only accept one partition at a time, such as PARTITION SET LOCATION and LOAD DATA. ALTER TABLE ADD PARTITION remains using the old PartitionKeyValue's logic. The changed partition related DDLs are as follows, Table: p (i int) partitioned by (j int, k string) Partitions: +-------+---+-------+--------+------+--------------+-------------------+ \| j \| k \| #Rows \| #Files \| Size \| Bytes Cached \| Cache Replication \| +-------+---+-------+--------+------+--------------+-------------------+ \| 1 \| a \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| b \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| c \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| d \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| e \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 2 \| f \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| Total \| \| -1 \| 0 \| 0B \| 0B \| \| +-------+---+-------+--------+------+--------------+-------------------+ 1. show files in p partition (j<2, k='a'); 2. alter table p partition (j<2, k in ("b","c") set cached in 'testPool'; // j can appear more than once, 3.1. alter table p partition (j<2, j>0, k<>"d") set uncached; // it is the same as 3.2. alter table p partition (j<2 and j>0, not k="e") set uncached; // we can also do 'or' 3.3. alter table p partition (j<2 or j>0, k like "%") set uncached; // missing 'k' matches all values of k 4. alter table p partition (j<2) set fileformat textfile; 5. alter table p partition (k rlike ".*") set serdeproperties ("k"="v"); 6. alter table p partition (j is not null) set tblproperties ("k"="v"); 7. alter table p drop partition (j<2); 8. compute incremental stats p partition(j<2); The remaining old partition related DDLs are as follows, 1. load data inpath '/path/from' into table p partition (j=2, k="d"); 2. alter table p add partition (j=2, k="g"); 3. alter table p partition (j=2, k="g") set location '/path/to'; 4. insert into p partition (j=2, k="g") values (1), (2), (3); General partition expressions or partially specified partition specs allows partition predicates to return empty partition set no matter 'IF EXISTS' is specified. Examples: [localhost.localdomain:21000] > alter table p drop partition (j=2, k="f"); Query: alter table p drop partition (j=2, k="f") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 1 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.78s [localhost.localdomain:21000] > alter table p drop partition (j=2, k<"f"); Query: alter table p drop partition (j=2, k<"f") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 2 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.41s [localhost.localdomain:21000] > alter table p drop partition (k="a"); Query: alter table p drop partition (k="a") +-------------------------+ \| summary \| +-------------------------+ \| Dropped 1 partition(s). \| +-------------------------+ Fetched 1 row(s) in 0.25s [localhost.localdomain:21000] > show partitions p; Query: show partitions p +-------+---+-------+--------+------+--------------+-------------------+ \| j \| k \| #Rows \| #Files \| Size \| Bytes Cached \| Cache Replication \| +-------+---+-------+--------+------+--------------+-------------------+ \| 1 \| b \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| 1 \| c \| -1 \| 0 \| 0B \| NOT CACHED \| NOT CACHED \| \| Total \| \| -1 \| 0 \| 0B \| 0B \| \| +-------+---+-------+--------+------+--------------+-------------------+ Fetched 3 row(s) in 0.01s Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f Reviewed-on: http://gerrit.cloudera.org:8080/3942 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 03:27:36 +00:00
Matthew Jacobs	99ed6dc67a	IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 02:06:10 +00:00
Thomas Tauber-Marshall	7fad3e5dc3	IMPALA-3002/IMPALA-1473: Cardinality observability cleanup IMPALA-3002: The shell prints an incorrect value for '#Rows' in the exec summary for broadcast nodes due to incorrect logic around whether to use max or agg stats. This patch makes the behavior consistent with the way the be treats exec summaries in summary-util.cc. This incorrect logic was also duplicated in the impala_beeswax test framework. IMPALA-1473: When there is a merging exchange with a limit, we may copy rows into the output batch beyond the limit. In this case, we currently update the output batch's size to reflect the limit, but we also need to update ExecNode::num_rows_returned_ or the exec summary may show that the exchange node returned more rows than it really did. Additionally, PlanFragmentExecutor::GetNext does not update rows_produced_counter_ in some cases, leading the runtime profile to display an incorrect value for 'RowsProduced'. Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa Reviewed-on: http://gerrit.cloudera.org:8080/4679 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-10-15 01:25:51 +00:00

1 2

60 Commits