impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	2357958e73	IMPALA-10304: Fix log level and format for pytests Recent testing showed that the pytests are not respecting the log level and format set in conftest.py's configure_logging(). It is using the default log level of WARNING and the default formatter. The issue is that logging.basicConfig() is only effective the first time it is called. The code in lib/python/impala_py_lib/helpers.py does a call to logging.basicConfig() at the global level, and conftest.py imports that file. This renders the call in configure_logging() ineffective. To avoid this type of confusion, logging.basicConfig() should only be called from the main() functions for libraries. This removes the call in lib/python/impala_py_lib (as it is not needed for a library without a main function). It also fixes up various other locations to move the logging.basicConfig() call to the main() function. Testing: - Ran the end to end tests and custom cluster tests - Confirmed the logging format - Added an assert in configure_logging() to test that the INFO log level is applied to the root logger. Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588 Reviewed-on: http://gerrit.cloudera.org:8080/16679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 15:32:21 +00:00
Sahil Takiar	13f50eaec5	IMPALA-9229: impala-shell 'profile' to show original and retried queries Currently, the impala-shell 'profile' command only returns the profile for the most recent profile attempt. There is no way to get the original query profile (the profile of the first query attempt that failed) from the impala-shell. This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to add support for returning both the original and retried profiles for a retried query. When a query is retried, TGetRuntimeProfileResp currently contains the profile for the most recent query attempt. TGetRuntimeProfileReq has a new field called 'include_query_attempts' and when it is set to true, the TGetRuntimeProfileResp will include all failed profiles in a new field called failed_profiles / failed_thrift_profiles. impala-shell has been modified so the 'profile' command has a new set of options. The syntax is now: PROFILE [ALL \| LATEST \| ORIGINAL] If 'ALL' is specified, both the latest and original profiles are printed. If 'LATEST' is specified, only the latest profile is printed. If 'ORIGINAL' is printed, only the original profile is printed. The default behavior is equivalent to specifying 'LATEST' (which is the current behavior before this patch as well). Support for this has only been added to HS2 given that Beeswax is being deprecated soon. The new 'profile' options have no affect when the Beeswax protocol is used. Most of the code change is in impala-hs2-server and impala-server; a lot of the GetRuntimeProfile code has been re-factored. Testing: * Added new impala-shell tests * Ran core tests Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65 Reviewed-on: http://gerrit.cloudera.org:8080/16406 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-17 20:55:45 +00:00
Andrew Sherman	8aeb28287f	IMPALA-9540 Test that Impala Shell no longer sends duplicate "Host" headers in http mode. Many http servers will not accept an http request that has multiple copies of the "Host" header. A recent toolchain change patches Thrift so that will not send the extraneous header (in THttpClient). This change tests that the duplicate headers are not sent, TESTING: Ran all end-to-end tests. Rewrote an existing Shell test to check that only one "Host" header is sent. Change-Id: I82996015d0205923e854dac8bb88604778684c46 Reviewed-on: http://gerrit.cloudera.org:8080/15752 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-30 21:46:27 +00:00
Tim Armstrong	c43c03c5ee	IMPALA-3926: part 2: avoid setting LD_LIBRARY_PATH This removes LD_LIBRARY_PATH and LD_PRELOAD from the developer's shell and cleans it up. With the preceding change, toolchain utilities like clang can be run without a special LD_LIBRARY_PATH. This fixes a bug where libjvm.so was registered as a static instead of a shared library, which adds it to the RUNPATH variable in the binary, which provides a default search location that can be overriden by LD_LIBRARY_PATH. Impala binaries don't have the rpath baked in for some libraries, including Impala-lzo, libgcc and libstdc++. , so we still need to set LD_LIBRARY_PATH when running those. That is solved with wrapper scripts that sets the environment variables only when invoking those binaries, e.g. starting a daemon or running a backend test. I added three scripts because there were 3 sets of environment variables. The scripts are: * run-binary.sh: just sets LD_LIBRARY_PATH * run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH * start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and kerberos-related environment variables. The binaries, in almost all cases, work fine without those tweaks, because libstdc++ and libgcc are picked up along with libkuduclient.so from the toolchain (they are in the same directory). I decided to leave good enough alone here. run-binary.sh and friends can be used in any remaining edge cases to run binaries. An alternative to the 3 scripts would be to have an uber-script that set all the variables, but I felt that it was better to be specific about what each binary needed. Cleaning the LD_LIBRARY_PATH mess up has given me a distaste for scattershot setting of environment variables. I am open to revisiting this. Testing: * Ran tests on centos 7 * Manually tested that my dev env with LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued to work (for now). All ubuntu 16.04 and 18.04 dev envs that were set up with bootstrap_development.sh will be in this state. Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603 Reviewed-on: http://gerrit.cloudera.org:8080/14494 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-07 08:50:44 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
Bharath Vissapragada	72c9370856	IMPALA-8717: impala-shell support for HS2 HTTP endpoint Adds impala-shell support to connect to HiveServer2 HTTP endpoint. Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/. Use --protocol='hs2-http' to enable this behavior. Example usages: --------------- impala-shell --protocol='hs2-http' (No auth) impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth) impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS) impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP + TLS) Limitations: ----------- - Does not support Kerberos (-k) due to lack ot SPNEGO support. Testing: -------- - Parameterized existing shell tests to support this combination. - Added shell test coverage for LDAP auth. Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534 Reviewed-on: http://gerrit.cloudera.org:8080/13746 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-07-29 05:43:48 +00:00
Tim Armstrong	9ecbe7d3dc	IMPALA-8553,IMPALA-8552: fix checks for remote cluster Apparently IMPALA_REMOTE_URL is not generally used for remote cluster tests: only --testing_remote_cluster is reliably set. Fix the is_remote_cluster() implementation to take into account REMOTE_DATA_LOAD and --testing_remote_cluster in addition to IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in other tests instead of checking the pytest flag directly. There were a few lifecycle headaches with how ImpalaTestClusterProperties is used: * common.environ is imported from conftest, which means that the top-level code in the file runs before pytest command-line arguments have been registered and parsed. * ImpalaTestClusterProperties is used by various code, like build_flavor_timeout(), which runs before pytest command-line arguments have been parsed. * ImpalaTestClusterProperties is called from non-pytest scripts like start-impala-cluster.py, so the command-line arguments are not available. I dealt with the above challenges by making a few changes to do the detection later: * Lazily initializing a singleton ImpalaTestClusterProperties. This was not strictly necessary but makes the whole problem less sensitive to import order and module dependencies. * Adding cluster_properties fixture to make ImpalaTestClusterProperties available in tests without additional boilerplate. * Removing the caching of the local/remote build calculation. ImpalaTestClusterProperties is instantiated outside of python tests, but is_remote_cluster() is only called from python tests, so if we check flags in is_remote_cluster() we'll get the right results reliably. As a workaround to unblock remote tests, also assume catalog_v1 if accessing the web UI fails. Testing: Ran core tests against a regular minicluster. Ran tests against a remote cluster Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4 Reviewed-on: http://gerrit.cloudera.org:8080/13386 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 20:27:31 +00:00
Tim Armstrong	f1f3ae9ec2	IMPALA-7290: part 2: Add HS2 support to Impala shell HS2 is added as an option via --protocol=hs2. The user-visible differences in behaviour are minimal. Beeswax is still the default and can be explicitly enabled via --protocol=beeswax but will be deprecated. The default is unchanged because changing the default could break certain workflows, e.g. those that explicitly specify the port with -i or deployments that hit --fe_service_threads for HS2 and somehow rely on impala-shell not contributing to that limit. For most workflows the change is transparent and we should change the default in a major version change. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Differences between HS2 and Beeswax are abstracted into ImpalaClient subclasses. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. This possibly addresses IMPALA-1789, although we need to confirm that this is a sufficient solution. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. This also requires test changes where the tests set non-default values, e.g. for ABORT_ON_ERROR. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. * Consistently close connections in the shell to avoid leaking HS2 sessions. This is enforced by making ImpalaShell a context manager and also eliminating all sys.exit() calls that would bypass the explicit connection closing. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Verified that newly-added tests were actually going through HS2 by disabling hs2 on the minicluster and running tests. * Add checks to test_verify_metrics.py to ensure that no sessions are left open at the end of tests. Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real 0m6.708s user 0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real 0m7.625s user 0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Reviewed-on: http://gerrit.cloudera.org:8080/12884 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 10:23:28 +00:00
Ethan Xue	487547ec44	IMPALA-6042: Allow Impala shell to use a global impalarc config Currently, impalarc files can be specified on a per-user basis (stored in ~/.impalarc), and they aren't created by default. The Impala shell should pick up /etc/impalarc as well, in addition to the user-specific configurations. The intent here is to allow a "global" configuration of the shell by a system administrator. The default path of the global config file can be changed by setting the $IMPALA_SHELL_GLOBAL_CONFIG_FILE environment variable. Note that the options set in the user config file take precedence over those in the global config file. Change-Id: I3a3179b6d9c9e3b2b01d6d3c5847cadb68782816 Reviewed-on: http://gerrit.cloudera.org:8080/13313 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-30 03:59:54 +00:00
Tim Armstrong	b55d905322	IMPALA-8515: port shell tests to use shell build shell/make_shell_tarball.sh builds a tarball with all the shell dependencies bundled. We should test the contents of that tarball in the shell tests instead of using infra/python/env and the libraries bundled there. This tarball is one of the default targets (e.g. run by buildall.sh) so this should not affect any typical development workflows. Note that this means the shell tests now requires the shell tarball to be built locally, which doesn't necessarily happen for remote cluster tests, so we preserve the old behaviour in that case. Testing: Ran core tests on CentOS 6 and CentOS 7. Change-Id: I581363639b279a9c2ff1fd982bdb140260b24baa Reviewed-on: http://gerrit.cloudera.org:8080/13267 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-14 01:32:47 +00:00
Tim Armstrong	0a9ea803d2	IMPALA-7290: part 1: clean up shell tests This sets up the tests to be extensible to test shell in both beeswax and HS2 modes. Testing: * Add test dimension containing only beeswax in preparation for HS2 dimension. * Factor out hardcoded ports. * Add tests for formatting of all types and NULL values. * Merge date shell test into general type tests. * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Use unique_database for tests that create tables. Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Reviewed-on: http://gerrit.cloudera.org:8080/13083 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-30 11:30:45 +00:00
Fredy Wijaya	9c44853998	IMPALA-6591: Fix test_ssl flaky test test_ssl has a logic that waits for the number of in-flight queries to be 1. However, the logic for wait_for_num_in_flight_queries(1) only waits for the condition to be true for a period of time and does not throw an exception when the time has elapsed and the condition is not met. In other words, the logic in test_ssl that loops while the number of in-flight queries is 1 never gets executed. I was able to simulate this issue by making Impala shell start much longer. Prior to this patch, in the event that Impala shell took much longer to start, the test started sending the commands to Impala shell even when Impala shell was not ready to receive commands. The patch fixes the issue by waiting until Impala shell is connected. The patch also adds assert in other places that calls wait_for_num_in_flight_queries and updates the default behavior for Impala shell to wait until it is connected. Testing: - Ran core and exhaustive tests several times on CentOS 6 without any issue Change-Id: I9805269d8b806aecf5d744c219967649a041d49f Reviewed-on: http://gerrit.cloudera.org:8080/12047 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-12 22:44:34 +00:00
Thomas Tauber-Marshall	dccc2de86a	IMPALA-7407: Fix test_cancellation failure on KeyboardInterrupt test_cancellation runs a shell process, executes a query, sleeps, sends a sigint to the process, and then checks that the query is cancelled. If the sigint is sent prior to the shell installing its signal handler, the test can fail with a KeyboardInterrupt. This patch removes the reliance on the sleep being long enough by actually reading the output of the shell and only cancelling the query once the shell shows that it has started running. Testing: - Ran test_cancellation in a loop. Change-Id: I65302ffb838d5185f77853bc2e53296f3a701d93 Reviewed-on: http://gerrit.cloudera.org:8080/11255 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>	2018-08-20 19:56:11 +00:00
Todd Lipcon	bf24a814cc	IMPALA-6335. Allow most shell tests to run in parallel This adds an IMPALA_HISTFILE environment variable (and --history_file argument) to the shell which overrides the default location of ~/.impalahistory for the shell history. The shell tests now override this variable to /dev/null so they don't store history. The tests that need history use a pytest fixture to use a temporary file for their history. This allows so that they can run in parallel without stomping on each other's history. This also fixes a couple flaky test which were previously missing the "execute_serially" annotation -- that annotation is no longer needed after this fix. A couple of the tests still need to be executed serially because they look at metrics such as the number of executed or running queries, and those metrics are unstable if other tests run in parallel. I tested this by running: ./bin/impala-py.test tests/shell/test_shell_interactive.py \ -m 'not execute_serially' \ -n 80 \ --random ... several times in a row on an 88-core box. Prior to the change, several would fail each time. Now they pass. Change-Id: I1da5739276e63a50590dfcb2b050703f8e35fec7 Reviewed-on: http://gerrit.cloudera.org:8080/11045 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Todd Lipcon <todd@apache.org>	2018-08-08 03:39:39 +00:00
Gabor Kaszab	281f7ab010	IMPALA-6318: Revert "Adjustment for hanging query cancellation test" Jenkins jobs occasionally hang on test_query_cancellation_during_fetch. There was a workaround proposal submitted under this Jira ID, however, apparently jobs still hang on this test randomly. Reverting the workaround and skipping the test until further fix proposal provided. This reverts commit `7810d1f9a2`. Change-Id: I51acee49b5a17c4852410b7568fd1d092b114a6d Reviewed-on: http://gerrit.cloudera.org:8080/8972 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-23 01:24:29 +00:00
Gabor Kaszab	7810d1f9a2	IMPALA-6318: Adjustment for hanging query cancellation test Apparently test_query_cancellation_during_fetch hangs occasionally in Jenkins builds. The Impala debug page shows the query being cancelled, however, on the host the ImpalaShell process related to that query is still running. Since I had no luck in reproducing the issue locally I only have a theory what might be going on here: The query is cancelled successfully on Impala backend and when the test tries to get the stdout and stderr from the ImpalaShell it gets stuck. It might be the case that ImpalaShell process fetching the query results holds the stdout. According to the documentation of subprocess.communicate() it may cause issues to fetch data when the data size is large or unlimited, that we can consider to be the case here. As a workaround there is a new optional parameter to util.ImpalaShell to omit the stdout because this test wouldn't use it anyway and we get rid of fetching the large result from ImpalaShell. Change-Id: I082c83b91b6d0c527de92c7992f0dc9d1b290433 Reviewed-on: http://gerrit.cloudera.org:8080/8852 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-03 20:32:24 +00:00
Zoltan Borok-Nagy	6539e89c81	IMPALA-2235: Fix current db when shell auto-reconnects The ImpalaShell didn't issue the 'USE <current-db>' command after reconnecting to the Impala daemon. Therefore the client session used the default DB after reconnection, not the previously selected DB. Setting the current DB is done by the _validate_database method. Before this commit it appended the "use <db>" command to the command queue of the Cmd class. But, at this point we might already have commands in the command queue that will run before the "use <db>" command. In case of reconnection, we want to invoke the USE command right away. Also, the command processed by the precmd() method can entirely skip the command queue, therefore it is not enough to insert the USE command to the front of the command queue. We need to issue the USE command with the onecmd() method to execute it immediately. I extended the _validate_database method with an "immediately" flag. If this flag is true, _validate_database will use the onecmd() method. Otherwise, it will append the USE command to the command queue to maintain the previous behaviour. I added a new automated test suite named test_shell_interactive_reconnect.py to the "custom cluster" tests. It sets the default database, and after reconnection it checks if the shell set it again automatically. One test case checks if the shell set the DB after manually reconnecting to the impala daemon by issuing the CONNECT command. The other test case checks if the shell set the DB after automatic reconnection due to cluster restart. I needed to backup the impala shell history file because I didn't want to pollute it by the test cases (just like the way it is done in tests/shell/test_shell_interactive.py). I created utility functions for this in tests/shell/util.py and now test_shell_interactive.py and the newly created test suite are using these utility functions. Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Reviewed-on: http://gerrit.cloudera.org:8080/8368 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-15 22:42:22 +00:00
Henry Robinson	9f61397fc4	IMPALA-2905: Handle coordinator fragment lifecycle like all others The plan-root fragment instance that runs on the coordinator should be handled like all others: started via RPC and run asynchronously. Without this, the fragment requires special-case code throughout the coordinator, and does not show up in system metrics etc. This patch adds a new sink type, PlanRootSink, to the root fragment instance so that the coordinator can pull row batches that are pushed by the root instance. The coordinator signals completion to the fragment instance via closing the consumer side of the sink, whereupon the instance is free to complete. Since the root instance now runs asynchronously wrt to the coordinator, we add several coordination methods to allow the coordinator to wait for a point in the instance's execution to be hit - e.g. to wait until the instance has been opened. Done in this patch: * Add PlanRootSink * Add coordination to PFE to allow coordinator to observe lifecycle * Make FragmentMgr a singleton * Removed dead code from Coordinator::Wait() and elsewhere. * Moved result output exprs out of QES and into PlanRootSink. * Remove special-case limit-based teardown of coordinator fragment, and supporting functions in PlanFragmentExecutor. * Simplified lifecycle of PlanFragmentExecutor by separating Open() into Open() and Exec(), the latter of which drives the sink by reading rows from the plan tree. * Add child profile to PlanFragmentExecutor to measure time spent in each lifecycle phase. * Removed dependency between InitExecProfiles() and starting root fragment. * Removed mostly dead-code handling of LIMIT 0 queries. * Ensured that SET returns a result set in all cases. * Fix test_get_log() HS2 test. Errors are only guaranteed to be visible after fetch calls return EOS, but test was assuming this would happen after first fetch. Change-Id: Ibb0064ec2f085fa3a5598ea80894fb489a01e4df Reviewed-on: http://gerrit.cloudera.org:8080/4402 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2016-10-16 15:55:29 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Sailesh Mukil	45ff0f9e67	IMPALA-3159: impala-shell does not accept wildcard or SAN certificates The impala-shell could not accept wildcard or SAN certificates previously as the thrift library it depended on did not support them. This patch subclasses TSSLSocket and adds the logic to take care of the above mentioned cases by introducing the new TSSLSocketWithWildcardSAN class. The certificate matching logic is based on the python-ssl source code. Added custom cluster tests to test both wildcard matching and SAN matching. Added be/src/testutil/certificates-info.txt which contains all the information about the certificates which are added for the tests. This has been tested with Python2.4 and Python2.6. Change-Id: I75e37012eeeb0bcf87a5edf875f0ff915daf8b89 Reviewed-on: http://gerrit.cloudera.org:8080/3765 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-07-26 02:44:25 +00:00
Henry Robinson	0dde1c2f86	IMPALA-3628: Fix cancellation from shell when security is enabled To cancel a query, the shell will create a separate connection inside it's SIGINT handler, and send the cancellation RPC. However this connection did not start a secure connection if it needed to, meaning that the cancellation attempt would just hang. A workaround is to kill the shell process, which I expect is what users have been doing with this bug which has been around since 2014. Testing: I added a custom cluster test that starts Impala with SSL enabled, and wrote two tests - one just to check SSL connectivity, and the other to mimic the existing test_cancellation which sends SIGINT to the shell process. In doing so I refactored the shell testing code a bit so that all tests use a single ImpalaShell object, rather than rolling their own Popen() based approaches when they needed to do something unusual, like cancel a query. In the cancellation test on my machine, SIGINT can take a few tries to be effective. I'm not sure if this is a timing thing - perhaps the Python interpreter doesn't correctly pass signals through to a handler if it's in a blocking call, for example. The test reliably passes within ~5 tries on my machine, so the test tries 30 times, once per second. Change-Id: If99085e75708d92a08dbecf0131a2234fedad33a Reviewed-on: http://gerrit.cloudera.org:8080/3302 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2016-07-05 16:40:23 -07:00

21 Commits