impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	1913ab46ed	IMPALA-14501: Migrate most scripts from impala-python to impala-python3 To remove the dependency on Python 2, existing scripts need to use python3 rather than python. These commands find those locations (for impala-python and regular python): git grep impala-python \| grep -v impala-python3 \| grep -v impala-python-common \| grep -v init-impala-python git grep bin/python \| grep -v python3 This removes or switches most of these locations by various means: 1. If a python file has a #!/bin/env impala-python (or python) but doesn't have a main function, it removes the hash-bang and makes sure that the file is not executable. 2. Most scripts can simply switch from impala-python to impala-python3 (or python to python3) with minimal changes. 3. The cm-api pypi package (which doesn't support Python 3) has been replaced by the cm-client pypi package and interfaces have changed. Rather than migrating the code (which hasn't been used in years), this deletes the old code and stops installing cm-api into the virtualenv. The code can be restored and revamped if there is any interest in interacting with CM clusters. 4. This switches tests/comparison over to impala-python3, but this code has bit-rotted. Some pieces can be run manually, but it can't be fully verified with Python 3. It shouldn't hold back the migration on its own. 5. This also replaces locations of impala-python in comments / documentation / READMEs. 6. kazoo (used for interacting with HBase) needed to be upgraded to a version that supports Python 3. The newest version of kazoo requires upgrades of other component versions, so this uses kazoo 2.8.0 to avoid needing other upgrades. The two remaining uses of impala-python are: - bin/cmake_aux/create_virtualenv.sh - bin/impala-env-versioned-python These will be removed separately when we drop Python 2 support completely. In particular, these are useful for testing impala-shell with Python 2 until we stop supporting Python 2 for impala-shell. The docker-based tests still use /usr/bin/python, but this can be switched over independently (and doesn't impact impala-python) Testing: - Ran core job - Ran build + dataload on Centos 7, Redhat 8 - Manual testing of individual scripts (except some bitrotted areas like the random query generator) Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc Reviewed-on: http://gerrit.cloudera.org:8080/23468 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-22 16:30:17 +00:00
Riza Suminto	28cff4022d	IMPALA-14333: Run impala-py.test using Python3 Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 10:01:29 +00:00
Joe McDonnell	f4e7551094	IMPALA-14087: Fix shell live_progress output display issue on Python 3 When running the shell in a terminal with live_progress=true, live progress overwrites its output by using the ANSI up character to rewrite lines with updated on the query progress. On Python 3, we found that the updates to clear the live progress were overwriting the actual output in the terminal. e.g. +----------+ \| count(*) \| +----------+ Fetched 1 row(s) in 5.20s To avoid this, the live progress lines need to be fully flushed to stderr before starting to output the result to stdout. This adds a flush call in OverwritingStdErrOutputStream::clear() to force this. Testing: - Hand tested queries with live progress - Added test that redirects stdout and stderr to the same file and verifies that no ANSI up character comes after the query output Change-Id: Id2e21224253f76b2a04767a57b3ade49ce2c914f Reviewed-on: http://gerrit.cloudera.org:8080/22941 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-24 04:29:14 +00:00
Joe McDonnell	ea0969a772	IMPALA-11980 (part 2): Fix absolute import issues for impala_shell Python 3 changed the behavior of imports with PEP328. Existing imports become absolute unless they use the new relative import syntax. This adapts the impala-shell code to use absolute imports, fixing issues where it is imported from our test code. There are several parts to this: 1. It moves impala shell code into shell/impala_shell. This matches the directory structure of the PyPi package. 2. It changes the imports in the shell code to be absolute paths (i.e. impala_shell.foo rather than foo). This fixes issues with Python 3 absolute imports. It also eliminates the need for ugly hacks in the PyPi package's __init__.py. 3. This changes Thrift generation to put it directly in $IMPALA_HOME/shell rather than $IMPALA_HOME/shell/gen-py. This means that the generated Thrift code is rooted in the same directory as the shell code. 4. This changes the PYTHONPATH to include $IMPALA_HOME/shell and not $IMPALA_HOME/shell/gen-py. This means that the test code is using the same import paths as the pypi package. With all of these changes, the source code is very close to the directory structure of the PyPi package. As long as CMake has generated the thrift files and the Python version file, only a few differences remain. This removes those differences by moving the setup.py / MANIFEST.in and other files from the packaging directory to the top-level shell/ directory. This means that one can pip install directly from the source code. i.e. pip install $IMPALA_HOME/shell This also moves the shell tarball generation script to the packaging directory and changes bin/impala-shell.sh to use Python 3. This sorts the imports using isort for the affected Python files. Testing: - Ran a regular core job with Python 2 - Ran a core job with Python 3 and verified that the absolute import issues are gone. Change-Id: Ica75a24fa6bcb78999b9b6f4f4356951b81c3124 Reviewed-on: http://gerrit.cloudera.org:8080/22330 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-21 15:14:11 +00:00
Riza Suminto	96ae16b60b	IMPALA-13584: Add option to shows num row report in impala-shell In beeswax all statements with the exception of USE print 'Fetched X row(s) in Ys', while in HS2 some statements (REFRESH, INVALIDATE) metadata does not print it. While these statements always return 0 rows, the amount of time spent with the statement can be useful. This patch modifies add impala-shell to let it print elapsed time for that query, even if query is not expected to return result metadata. Added --beeswax_compat_num_rows option in impala-shell. It default to False. If this option is set (True), 'Fetched 0 row(s) in' will be printed for all Impala protocol, just like beeswax. One exception for this is USE query, which will remain silent. Testing: - Added test_beeswax_compat_num_rows in test_shell_interactive.py. - Pass test_shell_interactive.py. Change-Id: Id76ede98c514f73ff1dfa123a0d951e80e7508b4 Reviewed-on: http://gerrit.cloudera.org:8080/22813 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-28 19:13:39 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Xuebin Su	242095ac8a	IMPALA-13729: Accept error messages not starting with prompt Previously, error_msg_expected() only accepted error messages starting with the following error prompt: ``` Query <query_id> failed:\n ``` However, for some tests using the Beeswax protocol, the error prompt may appear in the middle of the error message instead of at its beginning. Therefore, this patch adapts error_msg_expected() to accept error messages not starting with the error prompt. The error_msg_expected() function is renamed to error_msg_startswith() to better describe its behavior. Change-Id: Iac3e68bcc36776f7fd6cc9c838dd8da9c3ecf58b Reviewed-on: http://gerrit.cloudera.org:8080/22468 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-02-26 15:29:36 +00:00
Xuebin Su	ad868b9947	IMPALA-13115: Add query id to error messages This patch adds the query id to the error messages in both - the result of the `get_log()` RPC, and - the error message in an RPC response before they are returned to the client, so that the users can easily figure out the errored queries on the client side. To achieve this, the query id of the thread debug info is set in the RPC handler method, and is retrieved from the thread debug info each time the error reporting function or `get_log()` gets called. Due to the change of the error message format, some checks in the impala-shell.py are adapted to keep them valid. Testing: - Added helper function `error_msg_expected()` to check whether an error message is expected. It is stricter than only using the `in` operator. - Added helper function `error_msg_equal()` to check if two error messages are equal regardless of the query ids. - Various test cases are adapted to match the new error message format. - `ImpalaBeeswaxException`, which is used in tests only, is simplified so that it has the same error message format as the exceptions for HS2. - Added an assertion to the case of killing and restarting a worker in the custom cluster test to ensure that the query id is in the error message in the client log retrieved with `get_log()`. Change-Id: I67e659681e36162cad1d9684189106f8eedbf092 Reviewed-on: http://gerrit.cloudera.org:8080/21587 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-08 14:11:04 +00:00
Joe McDonnell	2b98e5fb95	IMPALA-13230: Dump stacktrace for impala-shell when it receives SIGUSR1 It can be useful to get a stacktrace for a running impala-shell for debugging. This uses Python 3's faulthandler to handle the SIGUSR1, so it prints a stacktrace for all threads when it receives SIGUSR1. This does not implement an equivalent functionality for Python 2. Python 2 doesn't have the faulthandler library, and hand tests showed that sending SIGUSR1 to Python 2 impala-shell can interrupt network calls and abort a running query. Testing: - Added a test that verifies the stacktrace is printed and a running query succeeds. Change-Id: If7dae2686b65a1a4f02488abadca3b3c90e48bf1 Reviewed-on: http://gerrit.cloudera.org:8080/21611 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-08-02 18:07:20 +00:00
pranav.lodha	c86582cfe0	IMPALA-12465: Unicode column name support Impala depends on Hive functions for column name validation and uses validateName() function for the same. Since Hive already supports unicode column names, the patch just updates the column name validation function to validateColumnName(). validateName() checks for a certain conformance based on pattern matching standards while validateColumnName() places no restrictions on column names at the Metadata level. Testing: The support is tested and cross-checked with Hive. The tests can be found in unicode-column-name.test. Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Reviewed-on: http://gerrit.cloudera.org:8080/20506 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-12-15 21:56:07 +00:00
Joe McDonnell	0c7c6a335e	IMPALA-11977: Fix Python 3 broken imports and object model differences Python 3 changed some object model methods: - __nonzero__ was removed in favor of __bool__ - func_dict / func_name were removed in favor of __dict__ / __name__ - The next() function was deprecated in favor of __next__ (Code locations should use next(iter) rather than iter.next()) - metaclasses are specified a different way - Locations that specify __eq__ should also specify __hash__ Python 3 also moved some packages around (urllib2, Queue, httplib, etc), and this adapts the code to use the new locations (usually handled on Python 2 via future). This also fixes the code to avoid referencing exception variables outside the exception block and variables outside of a comprehension. Several of these seem like false positives, but it is better to avoid the warning. This fixes these pylint warnings: bad-python3-import eq-without-hash metaclass-assignment next-method-called nonzero-method exception-escape comprehension-escape Testing: - Ran core tests - Ran release exhaustive tests Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee Reviewed-on: http://gerrit.cloudera.org:8080/19592 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	aa4050b4d9	IMPALA-11976: Fix use of deprecated functions/fields removed in Python 3 Python 3 moved several things around or removed deprecated functions / fields: - sys.maxint was removed, but sys.maxsize provides similar functionality - long was removed, but int provides the same range - file() was removed, but open() already provided the same functionality - Exception.message was removed, but str(exception) is equivalent - Some encodings (like hex) were moved to codecs.encode() - string.letters -> string.ascii_letters - string.lowercase -> string.ascii_lowercase - string.strip was removed This fixes all of those locations. Python 3 also has slightly different rounding behavior from round(), so this changes round() to use future's builtins.round() to get the Python 3 behavior. This fixes the following pylint warnings: - file-builtin - long-builtin - invalid-str-codec - round-builtin - deprecated-string-function - sys-max-int - exception-message-attribute Testing: - Ran cores tests Change-Id: I094cd7fd06b0d417fc875add401d18c90d7a792f Reviewed-on: http://gerrit.cloudera.org:8080/19591 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Michael Smith	64b324ac40	IMPALA-11389: Include Python 3 eggs in tarball Build Python 3 eggs for the shell tarball so it works with both Python 2 and Python 3. The impala-shell script selects eggs based on the available Python version. Inlines thrift for impala-shell so we can easily build Python 2 and Python 3 versions, consistent with other libraries. The impala-shell version should always be at least as new as IMPALA_THRIFT_PY_VERSION. Thrift 0.13.0+ wraps all exceptions during TSocket read/write operations in TTransportException. Specifically socket.error that we got as raw exceptions are now wrapped. Unwraps them before raising to preserve prior behavior. A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE; otherwise it will use 'python', and if unavailable try 'python3'. Adds tests for impala-shell tarball with Python 3. Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444 Reviewed-on: http://gerrit.cloudera.org:8080/18653 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-07-14 23:52:04 +00:00
gaoxq	f5fc085733	IMPALA-11233: Unset all query option When using jdbc connection pool, a connection set some query options, after query finished, connection is closed and put back to the connection pool. When connection used again, the last query option also come into effect. We need a feature that a set statement can reset all query option without recreating a new connection. Support UNSET statements in SQL dialect. UNSET ALL can unset all query option. Testing: - add unset all query option in test_hs2.py Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Reviewed-on: http://gerrit.cloudera.org:8080/18430 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-23 05:59:02 +00:00
Michael Smith	181fd94068	IMPALA-8373: Test impala-shell with python3 Sets up a python3 virtualenv, installs impala-shell into it, and runs tests. Change-Id: I8e123aecd53a7ded44a7da7eb8c8b853cebbfc56 Reviewed-on: http://gerrit.cloudera.org:8080/18588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-06-13 17:13:42 +00:00
Michael Smith	5263d13112	IMPALA-11314: Test PyPI package with system python Sets up a virtualenv with system python to install the impala-shell PyPI package into. Using system python provides better coverage for Python versions likely to be used by customers. Runs impala-shell tests using the PyPI package to provide better coverage for the artifact customers will use. Includes a PyPI install in notests_independent_targets because these seem to be used for Python testing despite -notests. Change-Id: I384ea6a7dab51945828cca629860400a23fa0c05 Reviewed-on: http://gerrit.cloudera.org:8080/18586 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-06-13 17:13:42 +00:00
yx91490	c7784bde55	IMPALA-1682: Support printing the output of a query (rows) vertically. In vertical mode, impala-shell will print each row in the format: firstly print a line contains line number, then print this row's columns line by line, each column line started with it's name and a colon. To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL= true' in interactive mode. to disable it in interactive mode: 'set VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set WRITE_DELIMITED=true' is specified. Tests: add methods in test_shell_interactive.py and test_shell_commandline.py. Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf Reviewed-on: http://gerrit.cloudera.org:8080/18549 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-13 15:41:07 +00:00
Joe McDonnell	6a199be854	IMPALA-11249: Fix add_test_dimensions() locations to call super() The original issue is that the strict HS2 shell tests are not running in precommit or nightly jobs, but they do run in local developer environments. Investigating this showed that the shell tests were running with a weird set of test dimensions that includes table_format_and_file_extension. That dimension is only used in test_insert.py::TestInsertFileExtension. What is happening is that the shell tests and other locations are running add_test_dimensions() without calling super(..., cls).add_test_dimensions(). The behavior is unclear, but there is clearly cross-talk between the different tests that do this. This changes all add_test_dimensions() locations to call super(..., cls).add_test_dimensions() if they don't already. Each location has been tuned to run the same set of tests as before (except the shell tests which now run the strict HS2 tests). As part of this, several shell tests need to be skipped or fixed for strict HS2. Testing: - Ran core job - Ran tests locally to verify the set of tests didn't change. Change-Id: Ib20fd479d3b91ed0ed89a0bc5623cd2a5a458614 Reviewed-on: http://gerrit.cloudera.org:8080/18557 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-26 03:42:51 +00:00
Joe McDonnell	f77f577074	IMPALA-11251: Fix TestImpalaShellInteractive.test_unicode_input on Ubuntu 20 TestImpalaShellInteractive.test_unicode_input() was spawning impala-shell using a command that preserves the environment. This is a problem on Ubuntu 20, because Ubuntu 20 uses a newer GCC/libstdc++. Preserving the environment keeps an LD_LIBRARY_PATH setting that uses an older libstdc++ and causes the shell to fail at startup. This switches the test to use shell/util.py's spawn_shell(), which cleans up the environment before running the shell. Testing: - Ran tests on Ubuntu 20 Change-Id: Ib07f557ab3c21d6b39f814dcfc0bf9eb1b61f090 Reviewed-on: http://gerrit.cloudera.org:8080/18558 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-26 03:31:38 +00:00
Steve Carlin	d34039ced9	IMPALA-11096: Strict_hs2 mode in impala-shell does not support get_summary The get_summary() thrift call is not supported in strict_hs2 mode on impala-shell. The live_progress and live_summary options are disabled when the strict_hs2_protocol flag is set. Change-Id: I6aee838a80b4659a13a0a0cb9eabffa2c8767c8f Reviewed-on: http://gerrit.cloudera.org:8080/18177 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-02-07 11:37:44 +00:00
Steve Carlin	bb9fb663ce	IMPALA-10778: Allow impala-shell to connect directly to HS2 Impala-shell already uses HS2 protocol to connect to Impalad. This commit allows impala-shell to connect to any server (for example, Hive) using the hs2 protocol. This will be done via the "--strict_hs2_protocol" option. When the "--strict_hs2_protocol" option is turned on, only features supported by hs2 will work. For instance, "runtime-profile" is an impalad specific feature and will be disabled. The "--strict_hs2_protocol" will only work on servers that abide by the strict definition of what is supported by HS2. So one will be able to connect to Hive in this mode, but connections to Impala will not work. Any feature supported by Hive (e.g. kerberos authentication) should work as well. Note: While authentication should work, the test framework is not set up to create an HS2 server that does authentication at this point so this feature should be used with caution. Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e Reviewed-on: http://gerrit.cloudera.org:8080/17660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-27 09:45:59 +00:00
stiga-huang	d5f67fce41	IMPALA-10523: Fix impala-shell crash in printing error messages that contain UTF-8 characters In Python2, print() converts all non-keyword arguments to strings like str() does and writes them to the stream. str() on QueryStateException returns its value(i.e. error message) which could be in unicode type. Python2 will implicitly encode it to str type using the default encoding, 'ascii'. This could result in UnicodeEncodeError when there are non-ascii characters in the error message. This patch explicitly encodes the error message using 'utf-8' encoding if it's in unicode type and the shell is run in Python2. Tests: - Add test in test_shell_interactive.py Change-Id: Ie10f5b03ecc5877053c2fbada1afaf256b423a71 Reviewed-on: http://gerrit.cloudera.org:8080/17099 Reviewed-by: Tamas Mate <tmate@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-12 18:19:11 +00:00
stiga-huang	4c6cf4b2ef	IMPALA-10434: Fix impala-shell's unicode regressions on Python2 To make impala-shell compatible for Python3, we explicitly distinguish bytes and text in Python2 by decoding the bytes for all inputs. Regression 1: multiple queries in one line with unicode chars will break In precmd() of impala-shell, if there are multiple queries present in one input line, we split it into individual queries (by sqlparse.split()) and append them back to the 'cmdqueue'. They will be passed to precmd() again. In our Python2 implementation, precmd() expects them to be str type, and will decode them into unicode type. However, the output type of sqlparse.split() is unicode which doesn't have a decode() method. Calling decode() on a unicode var will let Python2 implicitly encode it to str. This may cause UnicodeEncodeError since implicitly encoding use 'ascii'. Regression 2: multi-line query with unicode chars will break when command history is enabled In _check_for_command_completion(), when calling readline.replace_history_item in Python2. We encode the completed_cmd into bytes. However, we shouldn't replace it since the return type is expected to be unicode. Tests: - Add tests for these two regressions in Python2. Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4 Reviewed-on: http://gerrit.cloudera.org:8080/16960 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-20 10:20:02 +00:00
Tamas Mate	07f3ae3881	IMPALA-10066: Fix test_cancellation_mid_command flakiness This change adds additional synchronisation to fix the flaky test. The test failures were happening because the test did not wait for the output of the SIGINT (^C) to arrive. When this was delayed it cluttered the impala-shell output and other expect calls could fail. Testing: - executed the test locally 250 times without failures, without this fix there were about 3 failures in a 100 execution Change-Id: Ief384ce59f3ce24f1ab2dfb5fbaf7c9a39b434e0 Reviewed-on: http://gerrit.cloudera.org:8080/16847 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-11 13:43:28 +00:00
Tim Armstrong	24015a6a5d	IMPALA-10312: bump timeout in test_ddl_queries_are_closed This increases the timeout from 10s to 30s for waiting for the queries to be closed under the theory that the test failure is caused by random slowness. Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6 Reviewed-on: http://gerrit.cloudera.org:8080/16762 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-24 00:55:47 +00:00
wzhou-code	1af60a1560	IMPALA-9180 (part 3): Remove legacy backend port The legacy Thrift based Impala internal service has been removed so the backend port 22000 can be freed up. This patch set flag be_port as a REMOVED_FLAG and all infrastructures around it are cleaned up. StatestoreSubscriber::subscriber_id is set as hostname + krpc_port. Testing: - Passed the exhaustive test. Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6 Reviewed-on: http://gerrit.cloudera.org:8080/16533 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-03 00:56:26 +00:00
Tamas Mate	3ef7756628	IMPALA-10051: impala-shell exits with ValueError with WITH clauses When a query contains WITH clause impala-shell tries to identify whether it is a DML query or not, so that later it can provide appropriate result messages. Earlier shlex was used to create tokens and assess the query type based on that. However shlex can misinterpret some query strings where whitespace charachters are mixed with quotes, because it splits the string based on whitespace charachters. In some scenarios 'ValueError: No closing quotation' error can occur. This change moves the tokenization from shlex to sqlparse. Testing: - Added unit test to cover queries that contain mixed whitespaces and strings Change-Id: I442d3bc65b90a55c73c847948d5179a8586d71ad Reviewed-on: http://gerrit.cloudera.org:8080/16389 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-18 04:06:22 +00:00
Tamas Mate	2359a1be9d	IMPALA-10119: Fix impala-shell history duplication test The flaky test was TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt The test failed with timeout error when the interrupt signal arrived later after the next test query was started. The impala-shell output was ^C instead of the expected query result. This change adds an additional blocking expect call to wait for the interrupt signal to arrive before sending in the next query. Change-Id: I242eb47cc8093c4566de206f46b75b3feab1183c Reviewed-on: http://gerrit.cloudera.org:8080/16391 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-09-03 18:25:58 +00:00
Andrew Sherman	e187c40543	IMPALA-9909: Print body of http error code in Impala Shell. Make Impala Shell closer to Impyla by printing the body of any http error code message received when using hs2-over-http. The common case is that there is nothing in the body, in which case the behavior is unchanged. TESTING Added a test for the new functionality. Ran all end-to-end tests. Change-Id: Iabc45eda0b87ca694b8359148cda6a7c1d5a8fff Reviewed-on: http://gerrit.cloudera.org:8080/16269 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-04 22:33:58 +00:00
Andrew Sherman	8aeb28287f	IMPALA-9540 Test that Impala Shell no longer sends duplicate "Host" headers in http mode. Many http servers will not accept an http request that has multiple copies of the "Host" header. A recent toolchain change patches Thrift so that will not send the extraneous header (in THttpClient). This change tests that the duplicate headers are not sent, TESTING: Ran all end-to-end tests. Rewrote an existing Shell test to check that only one "Host" header is sent. Change-Id: I82996015d0205923e854dac8bb88604778684c46 Reviewed-on: http://gerrit.cloudera.org:8080/15752 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-30 21:46:27 +00:00
Tim Armstrong	c43c03c5ee	IMPALA-3926: part 2: avoid setting LD_LIBRARY_PATH This removes LD_LIBRARY_PATH and LD_PRELOAD from the developer's shell and cleans it up. With the preceding change, toolchain utilities like clang can be run without a special LD_LIBRARY_PATH. This fixes a bug where libjvm.so was registered as a static instead of a shared library, which adds it to the RUNPATH variable in the binary, which provides a default search location that can be overriden by LD_LIBRARY_PATH. Impala binaries don't have the rpath baked in for some libraries, including Impala-lzo, libgcc and libstdc++. , so we still need to set LD_LIBRARY_PATH when running those. That is solved with wrapper scripts that sets the environment variables only when invoking those binaries, e.g. starting a daemon or running a backend test. I added three scripts because there were 3 sets of environment variables. The scripts are: * run-binary.sh: just sets LD_LIBRARY_PATH * run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH * start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and kerberos-related environment variables. The binaries, in almost all cases, work fine without those tweaks, because libstdc++ and libgcc are picked up along with libkuduclient.so from the toolchain (they are in the same directory). I decided to leave good enough alone here. run-binary.sh and friends can be used in any remaining edge cases to run binaries. An alternative to the 3 scripts would be to have an uber-script that set all the variables, but I felt that it was better to be specific about what each binary needed. Cleaning the LD_LIBRARY_PATH mess up has given me a distaste for scattershot setting of environment variables. I am open to revisiting this. Testing: * Ran tests on centos 7 * Manually tested that my dev env with LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued to work (for now). All ubuntu 16.04 and 18.04 dev envs that were set up with bootstrap_development.sh will be in this state. Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603 Reviewed-on: http://gerrit.cloudera.org:8080/14494 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-07 08:50:44 +00:00
Tamas Mate	1a36a0348b	IMPALA-9398: Fix shell history duplication when cmdloop breaks This change adds a new condition to avoid re-reading the impala-shell history when the cmdloop is broken. The loop can break due to exceptions such as KeyboardInterrupt. Testing: - The change was tested manually on local dev env - Added a new EE shell test to verify the history after SIGINT Change-Id: If4faf46134f44d91e56748642f47d448707db53c Reviewed-on: http://gerrit.cloudera.org:8080/15345 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-30 01:55:51 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
Alice Fan	e1d1428181	IMPALA-9384: Improve Impala shell usability by enabling live_progress in interactive mode In order to improve usability, this patch makes Impala shell show query processing status while the query is running. The patch enables shell option live_progress by default when a user launches impala shell in the interactive mode. The patch also adds a new command line flag "--disable_live_progress", which allows a user to disable live_progress at runtime. In the interactive mode, a user can disable live_progress by either using the command line flag or setting the option as False in the config file. As for in the non-interactive mode (when the -q or -f options are used), live reporting is not supported. Impala-shell will disable live_progress if the mode is detected. Testing: - Added and updated tests in test_shell_interactive.py and test_shell_commandline.py - Successfully ran all shell related tests Change-Id: I3765b775f663fa227e59728acffe4d5ea9a5e2d3 Reviewed-on: http://gerrit.cloudera.org:8080/15219 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-03-09 21:28:19 +00:00
Adam Tamas	0282c024b4	IMPALA-9036: Fix CTRL+C a multiline query in impala-shell Modified the '_signal_handler()' in impala-shell.py so when a user cancels a multiline query by hitting CTRL+C it will cancel the query, instead of just the current line. Testing: -Added 'test_cancellation_mid_command()' to test_shell_interactive.py to test if it really cancels the partial commands. -Manually tested by giving partial commands then cancelling them. Change-Id: Id8d8bdaee929e2655eb66e886ae92a02d3fbd83f Reviewed-on: http://gerrit.cloudera.org:8080/15233 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-24 15:18:16 +00:00
wzhou-code	6a23ec6985	IMPALA-6393: Add support for live_summary and live_progress in impalarc This patch adds support for live_summary and live_progress in impalarc. Testing: 1) Added unit-test cases in test_shell_commandline.py and test_shell_interactive.py for live_summary and live_progress. 2) Successfully ran all other tests in test_shell_interactive.py and test_shell_commandline.py Change-Id: If4549b775a7966ad89d661d0349cc78754e13a86 Reviewed-on: http://gerrit.cloudera.org:8080/14927 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-23 01:48:13 +00:00
Andrew Sherman	ed5e7dae94	IMPALA-9240: add HTTP code handling to THttpClient. Before this change Impala Shell is not checking HTTP return codes when using the hs2-http protocol. The shell is sending a request message (e.g. send_CloseOperation) but the HTTP call to send this message may fail. This will result in a failure when reading the reply (e.g. in recv_CloseOperation) as there is no reply data to read. This will typically result in an 'EOFError'. In code that overrides THttpClient.flush(), check the HTTP code that is returned after the HTTP call is made. If the code is not 1XX (informational response) or 2XX (successful) then throw an RPCException. This change does not contain any attempt to recover from an HTTP failures but it does allow the failure to be detected and a message to be printed. In future it may be possible to retry after certain HTTP errors. Testing: - Add a new test for impala-shell that tries to connect to an HTTP server that always returns a 503 error. Check that an appropriate error message is printed. Change-Id: I3c105f4b8237b87695324d759ffff81821c08c43 Reviewed-on: http://gerrit.cloudera.org:8080/14924 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-12-20 00:14:00 +00:00
Lars Volker	74c7b7e55f	IMPALA-8863: Add support to run tests over HTTP/HS2 This change adds support to run backend tests over HTTP using a new version of Impyla (0.16.1). It also adds a test that exercises authentication over HTTP. Change-Id: I7156558071781378fcb9c8941c0f4dd82eb0d018 Reviewed-on: http://gerrit.cloudera.org:8080/14059 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-26 22:46:40 +00:00
Joe McDonnell	53774770bc	IMPALA-9028: impala-shell should not try to reconnect if quitting When the impala-shell is disconnected, it will try to reconnect for any command that a user runs (as part of ImpalaShell's precmd()). This doesn't make sense when the user is trying to quit the shell (i.e. by typing 'quit' or 'exit' or hitting Ctrl-D). This skips the attempt to reconnect when quitting the shell. Testing: - Added test in test_shell_interactive.py - Verified by hand Change-Id: I6a76bc515db609498fa8772e9f0b0c547b82c09e Reviewed-on: http://gerrit.cloudera.org:8080/14391 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-16 03:23:14 +00:00
Bharath Vissapragada	72c9370856	IMPALA-8717: impala-shell support for HS2 HTTP endpoint Adds impala-shell support to connect to HiveServer2 HTTP endpoint. Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/. Use --protocol='hs2-http' to enable this behavior. Example usages: --------------- impala-shell --protocol='hs2-http' (No auth) impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth) impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS) impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP + TLS) Limitations: ----------- - Does not support Kerberos (-k) due to lack ot SPNEGO support. Testing: -------- - Parameterized existing shell tests to support this combination. - Added shell test coverage for LDAP auth. Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534 Reviewed-on: http://gerrit.cloudera.org:8080/13746 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-07-29 05:43:48 +00:00
Tim Armstrong	f1f3ae9ec2	IMPALA-7290: part 2: Add HS2 support to Impala shell HS2 is added as an option via --protocol=hs2. The user-visible differences in behaviour are minimal. Beeswax is still the default and can be explicitly enabled via --protocol=beeswax but will be deprecated. The default is unchanged because changing the default could break certain workflows, e.g. those that explicitly specify the port with -i or deployments that hit --fe_service_threads for HS2 and somehow rely on impala-shell not contributing to that limit. For most workflows the change is transparent and we should change the default in a major version change. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Differences between HS2 and Beeswax are abstracted into ImpalaClient subclasses. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. This possibly addresses IMPALA-1789, although we need to confirm that this is a sufficient solution. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. This also requires test changes where the tests set non-default values, e.g. for ABORT_ON_ERROR. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. * Consistently close connections in the shell to avoid leaking HS2 sessions. This is enforced by making ImpalaShell a context manager and also eliminating all sys.exit() calls that would bypass the explicit connection closing. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Verified that newly-added tests were actually going through HS2 by disabling hs2 on the minicluster and running tests. * Add checks to test_verify_metrics.py to ensure that no sessions are left open at the end of tests. Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real 0m6.708s user 0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real 0m7.625s user 0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Reviewed-on: http://gerrit.cloudera.org:8080/12884 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 10:23:28 +00:00
Ethan Xue	487547ec44	IMPALA-6042: Allow Impala shell to use a global impalarc config Currently, impalarc files can be specified on a per-user basis (stored in ~/.impalarc), and they aren't created by default. The Impala shell should pick up /etc/impalarc as well, in addition to the user-specific configurations. The intent here is to allow a "global" configuration of the shell by a system administrator. The default path of the global config file can be changed by setting the $IMPALA_SHELL_GLOBAL_CONFIG_FILE environment variable. Note that the options set in the user config file take precedence over those in the global config file. Change-Id: I3a3179b6d9c9e3b2b01d6d3c5847cadb68782816 Reviewed-on: http://gerrit.cloudera.org:8080/13313 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-30 03:59:54 +00:00
Tim Armstrong	0a9ea803d2	IMPALA-7290: part 1: clean up shell tests This sets up the tests to be extensible to test shell in both beeswax and HS2 modes. Testing: * Add test dimension containing only beeswax in preparation for HS2 dimension. * Factor out hardcoded ports. * Add tests for formatting of all types and NULL values. * Merge date shell test into general type tests. * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Use unique_database for tests that create tables. Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Reviewed-on: http://gerrit.cloudera.org:8080/13083 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-30 11:30:45 +00:00
Fredy Wijaya	6853184234	IMPALA-8317: Add support for list type flags in Impala shell config file This patch adds support for list type flags in Impala shell config file, i.e. those that use action="append", such as --var and --query_option. To make it less error-prone, this patch also updates the logic for bool flags in the config file to also look at the correct type from the argument parser instead of relying on whether or not the default values are set in impala_shell_config_defaults.py. Testing: - Added a new test for list type flags - Ran all shell E2E tests Change-Id: I824ca15b4e1064a391b13deef9cecd34c928ef73 Reviewed-on: http://gerrit.cloudera.org:8080/12781 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-21 10:29:43 +00:00
Andrew Sherman	9ad9a1624a	IMPALA-8325: Leading Unicode comments cause Impala Shell failure. This change fixes a regression introduced by "IMPALA-2195 Improper handling of comments in queries." The Impala Shell parses input text into several strings using the sqlparse library. One of the returned strings is the sql command, this is used to determine the correct do_<command> method to call. Another of the returned strings is the leading comment, which is a comment that appears before legal sql text. Python2 has strings with multiple encodings. The strings returned from the sqlparse library have the Unicode encoding. Impala Shell converts the sql command string to utf-8 encoding before using it. If the Impala Shell needs to send the sql command to an Impala Coordinator then it (re)constructs the query out of the strings returned by the sqlparse library. This query is sent to the Coordinator via Beeswax protocol. The query is converted to an ascii string before being sent. The conversion can fail if the leading comment string contains Unicode characters, which can't be directly converted to ascii. So the trigger for the bug is that the leading comment contains Unicode. The fix is that the leading comment string should be converted to utf-8 in the same way as the sql command. TESTING: Ran all end -to-end tests. Added two test cases to tests/shell/test_shell_interactive.py Change-Id: I8633935b6e0ca33594afd32ad242779555e09944 Reviewed-on: http://gerrit.cloudera.org:8080/12812 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-20 22:18:16 +00:00
Paul Rogers	282199a5ab	IMPALA-7915: Wrap SQL parser to avoid redundant code The FE has several repeated blocks of code to set up the lexer and parser, to parse, and to handle errors. This patch moves this code into a static function that can be used in place of the copies. At the same time, provide a specific ParseException to replace the generic Exception thrown by the parser to allow easier error handling. Some of the uses of the parser assume the return value is Object, others that the value is ParseNode and still others that it is StatementBase. Since the actual return is StatementBase, declares that as the return value of the new static method to clearly state the actual output. Testing: This is just a refactoring. Reran all FE tests to ensure no regressions. Change-Id: I174c59d38542ff311c6c3dc10cf3ad4e40f8b30e Reviewed-on: http://gerrit.cloudera.org:8080/12016 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-15 01:38:37 +00:00
Thomas Tauber-Marshall	1cbcd0c37d	IMPALA-7926: Fix flakiness in test_reconnect test_reconnect launches a shell that connects to one impalad in the minicluster then reconnects to a different impalad while checking that the impalad's open session metric changes accordingly. To do this, the test gets the number of open sessions at the start of the test and then expects that the number of sessions will have increased by 1 on the impalad that the shell is currently connected to. This can be a problem if there is a session left over from another test that is still active when test_reconnect starts but exits while it's running. test_reconnect is already marked to run serially, so there shouldn't be any other sessions open while it runs anyways. The solution is to wait at the start of the test until any sessions left over from other tests have exited. Testing: - Ran the test in an environment where the timing was previously causing it to fail almost deterministically and it now passes. Change-Id: I3017ca3bf7b4e33440cffb80e9a48a63bec14434 Reviewed-on: http://gerrit.cloudera.org:8080/12045 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-13 01:08:52 +00:00
Fredy Wijaya	9c44853998	IMPALA-6591: Fix test_ssl flaky test test_ssl has a logic that waits for the number of in-flight queries to be 1. However, the logic for wait_for_num_in_flight_queries(1) only waits for the condition to be true for a period of time and does not throw an exception when the time has elapsed and the condition is not met. In other words, the logic in test_ssl that loops while the number of in-flight queries is 1 never gets executed. I was able to simulate this issue by making Impala shell start much longer. Prior to this patch, in the event that Impala shell took much longer to start, the test started sending the commands to Impala shell even when Impala shell was not ready to receive commands. The patch fixes the issue by waiting until Impala shell is connected. The patch also adds assert in other places that calls wait_for_num_in_flight_queries and updates the default behavior for Impala shell to wait until it is connected. Testing: - Ran core and exhaustive tests several times on CentOS 6 without any issue Change-Id: I9805269d8b806aecf5d744c219967649a041d49f Reviewed-on: http://gerrit.cloudera.org:8080/12047 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-12 22:44:34 +00:00

1 2

100 Commits