Python 3 changed the behavior of imports with PEP328. Existing
imports become absolute unless they use the new relative import
syntax. This adapts the impala-shell code to use absolute
imports, fixing issues where it is imported from our test code.
There are several parts to this:
1. It moves impala shell code into shell/impala_shell.
This matches the directory structure of the PyPi package.
2. It changes the imports in the shell code to be
absolute paths (i.e. impala_shell.foo rather than foo).
This fixes issues with Python 3 absolute imports.
It also eliminates the need for ugly hacks in the PyPi
package's __init__.py.
3. This changes Thrift generation to put it directly in
$IMPALA_HOME/shell rather than $IMPALA_HOME/shell/gen-py.
This means that the generated Thrift code is rooted in
the same directory as the shell code.
4. This changes the PYTHONPATH to include $IMPALA_HOME/shell
and not $IMPALA_HOME/shell/gen-py. This means that the
test code is using the same import paths as the pypi
package.
With all of these changes, the source code is very close
to the directory structure of the PyPi package. As long as
CMake has generated the thrift files and the Python version
file, only a few differences remain. This removes those
differences by moving the setup.py / MANIFEST.in and other
files from the packaging directory to the top-level
shell/ directory. This means that one can pip install
directly from the source code. i.e. pip install $IMPALA_HOME/shell
This also moves the shell tarball generation script to the
packaging directory and changes bin/impala-shell.sh to use
Python 3.
This sorts the imports using isort for the affected Python files.
Testing:
- Ran a regular core job with Python 2
- Ran a core job with Python 3 and verified that the absolute
import issues are gone.
Change-Id: Ica75a24fa6bcb78999b9b6f4f4356951b81c3124
Reviewed-on: http://gerrit.cloudera.org:8080/22330
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
This is the final patch to move all Impala e2e and custom cluster tests
to use HS2 protocol by default. Only beeswax-specific test remains
testing against beeswax protocol by default. We can remove them once
Impala officially remove beeswax support.
HS2 error message formatting in impala-hs2-server.cc is adjusted a bit
to match with formatting in impala-beeswax-server.cc.
Move TestWebPageAndCloseSession from webserver/test_web_pages.py to
custom_cluster/test_web_pages.py to disable glog log buffering.
Testing:
- Pass exhaustive tests, except for some known and unrelated flaky
tests.
Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28
Reviewed-on: http://gerrit.cloudera.org:8080/22845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
This patch adds __reset_impala_clients() method in ImpalaConnection.
__reset_impala_clients() then simply clear configuration. It is called
on each setup_method() to ensure that each EE test uses clean test
client. All subclasses of ImpalaTestSuite that declare setup() method
are refactored to declare setup_method() instead, to match newer py.test
convention. Also implement teardown_method() to complement
setup_method(). See "Method and function level setup/teardown" at
https://docs.pytest.org/en/stable/how-to/xunit_setup.html.
CustomClusterTestSuite fully overrides setup_method() and
teardown_method() because it subclasses can be destructive. The custom
cluster test method often restart the whole Impala cluster, rendering
default impala clients initialized at setup_class() unusable. Each
subclass of CustomClusterTestSuite is responsible to ensure that impala
client they are using is in a good state.
This patch improve BeeswaxConnection and ImpylaHS2Connection to only
consider non-REMOVED options as its default options. They lookup for
valid (not REMOVED) query options with their own appropriate way,
memorized the option names as lowercase string and the values as string.
List values are wrapped with double quote. Log in
ImpalaConnection.set_configuration_option() is differentiated from how
SET query looks.
Note that ImpalaTestSuite.run_test_case() modify and restore query
option written at .test file by issuing SET query, not by calling
ImpalaConnection.set_configuration_option(). It is remain unchanged.
Consistently lower case query option everywhere in Impala test code
infrastructure. Fixed several tests that has been unknowingly override
'exec_option' vector dimension due to case sensitive mismatch. Also
fixed some flake8 issues.
Added convenience method execute_query_using_vector() and
create_impala_client_from_vector() in ImpalaTestSuite.
Testing:
- Pass core tests.
Change-Id: Ieb47fec9f384cb58b19fdbd10ff7aa0850ad6277
Reviewed-on: http://gerrit.cloudera.org:8080/22404
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change will print timestamp of an exception or warning
occurred during execution of a query via impala-shell.
The timestamp will use timezone of the machine running impala-shell.
example:
Query submitted at: 2024-08-22 16:17:57 (Coordinator: http://host:25000)
Query state can be monitored at:
http://localhost:25000/query_plan?query_id=e04dcc55e560d1ee:11173fe800000000
^C Cancelling Query
Opened TCP connection to localhost:21050
2024-08-22 16:17:58 [Exception] type=<class 'socket.error'> in FetchResults.
[Errno 4] Interrupted system call
2024-08-22 16:17:58 [Warning] Cancelling Query
2024-08-22 16:17:58 [Warning] close session RPC failed: <class
'shell_exceptions.QueryCancelledByShellException'>
Opened TCP connection to localhost:21050
[localhost:21050] default>
Change-Id: I4abbd02aa9f61210b0333495bf191e72c22a5944
Reviewed-on: http://gerrit.cloudera.org:8080/21426
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The issue occurred in Python 3 when 0 rows were deleted from Iceberg.
It could also happen in other DMLs with older Impala servers where
TDmlResult.rows_deleted was not set. See the Jira for details of
the error.
Testing:
Extended shell tests for Kudu DML reporting to also cover Iceberg.
Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438
Reviewed-on: http://gerrit.cloudera.org:8080/21284
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 moved several things around or removed deprecated
functions / fields:
- sys.maxint was removed, but sys.maxsize provides similar functionality
- long was removed, but int provides the same range
- file() was removed, but open() already provided the same functionality
- Exception.message was removed, but str(exception) is equivalent
- Some encodings (like hex) were moved to codecs.encode()
- string.letters -> string.ascii_letters
- string.lowercase -> string.ascii_lowercase
- string.strip was removed
This fixes all of those locations. Python 3 also has slightly different
rounding behavior from round(), so this changes round() to use future's
builtins.round() to get the Python 3 behavior.
This fixes the following pylint warnings:
- file-builtin
- long-builtin
- invalid-str-codec
- round-builtin
- deprecated-string-function
- sys-max-int
- exception-message-attribute
Testing:
- Ran cores tests
Change-Id: I094cd7fd06b0d417fc875add401d18c90d7a792f
Reviewed-on: http://gerrit.cloudera.org:8080/19591
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This adds a Python 3 equivalent to the impala-python
virtualenv base on the toolchain Python 3.7.16.
This modifies bootstrap_virtualenv.py to support
the two different modes. This adds py2-requirements.txt
and py3-requirements.txt to allow some differences
between the Python 2 and Python 3 virtualenvs.
Here are some specific package changes:
- allpairs is replaced with allpairspy, as allpairs did
not support Python 3.
- requests is upgraded slightly, because otherwise is has issues
with idna==2.8.
- pylint is limited to Python 3, because we are adding it
and don't need it on both
- flake8 is limited to Python 2, because it will take
some work to switch to a version that works on Python 3
- cm_api is limited to Python 2, because it doesn't support
Python 3
- pytest-random does not support Python 3 and it is unused,
so it is removed
- Bump the version of setuptool-scm to support Python 3
This adds impala-pylint, which can be used to do further
Python 3 checks via --py3k. This also adds a bin/check-pylint-py3k.sh
script to enforce specific py3k checks. The banned py3k warnings
are specified in the bin/banned_py3k_warnings.txt. This is currently
empty, but this can ratchet up the py3k strictness over time
to avoid regressions.
This pulls in a new toolchain with the fix for IMPALA-11956
to get Python 3.7.16.
Testing:
- Hand tested that the allpairs libraries produce the
same results
- The python3 virtualenv has no influence on regular
tests yet
Change-Id: Ica4853f440c9a46a79bd5fb8e0a66730b0b4efc0
Reviewed-on: http://gerrit.cloudera.org:8080/19567
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Python 3 does not support this old except syntax:
except Exception, e:
Instead, it needs to be:
except Exception as e:
This uses impala-futurize to fix all locations of
the old syntax.
Testing:
- The check-python-syntax.sh no longer shows errors
for except syntax.
Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9
Reviewed-on: http://gerrit.cloudera.org:8080/19551
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
When Impala Shell receives an http error message (that is a message with
http code greater than or equal to 300), it may sleep for a time before
retrying. If the message contains a 'Retry-After' header that has an
integer value, then this will be used as the time for which to sleep.
The implementation is to use a new HttpError exception (similar to that
used in Impyla) which includes more information from the error message
(including the headers) so that catchers of the exception can use the
'Retry-After' header if appropriate.
TESTING:
Hand testing with a proxy that uses the 'Retry-After' header.
Added new tests that use the fault injection framework in
test_hs2_fault_injection.py
Change-Id: I2b4226e7723d585d61deb4d1d6777aac901bfd93
Reviewed-on: http://gerrit.cloudera.org:8080/16702
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds the option --fetch_size to the Impala shell. This new option allows
users to specify the fetch size used when issuing fetch RPCs to the
Impala Coordinator (e.g. TFetchResultsReq and BeeswaxService.fetch).
This parameter applies for all client protocols: beeswax, hs2, hs2-http.
The default --fetch_size is set to 10240 (10x the default batch size).
The new --fetch_size parameter is most effective when result spooling is
enabled. When result spooling is disabled, Impala can only return a
single row batch per fetch RPC (so 1024 rows by default). When result
spooling is enabled, Impala can return up to 100 row batches per fetch
request.
Removes some logic in the the impala_client.py file that attempts to
simulate a fetch_size. The code would issue multiple fetch requests to
fullfill the given fetch_size. This logic is no longer needed now that
result spooling is available.
Testing:
* Ran core tests
* Added new tests in test_shell_client.py and test_shell_commandline.py
Change-Id: I8dc7962aada6b38795241d067a99bd94fabca57b
Reviewed-on: http://gerrit.cloudera.org:8080/16041
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Sahil Takiar <stakiar@cloudera.com>
Added retries for idempotent rpcs:
OpenSession, PingImpalaHS2Service, GetResultSetMetadata,
CloseImpalaOperation (non dmls), CancelOperation, GetOperationStatus,
GetRuntimeProfile, GetExecSummary, GetLog
Retries were also added to the 'set all' query execution and subsequent
result fetch in the ImpalaHS2Client._open_session()
The retries are only supported for hs2-http protocol and enabled by
default. At most there are 3 retries for a failed rpc. There is a sleep
duration of 'n' seconds after nth retry.
Only failed rpcs due to an error in the http transport are retried and
if an rpc failed because the server returned an error in the rpc
response then such scenarios are not retriable.
Improved error diagnostics by dumping stack trace when ImpalaShell.
_execute_stmt() gets an 'Unknown Exception'.
Testing:
- Added a custom_cluster test which injects fault into the http transport
and checks expected behavior from the various rpcs. Some of these tests
leave the session in an open state and so these tests are not suitable
for the e2e test framework which have metric verifiers expecting related
metrics to be 0 at the end of the test.
- Manually tested real world scenarios with impala-shell client
communicating with an impala coordinator via a fault injecting istio mesh.
- Manually tested dropping connections on an nginx ingress gateway by sending
SIGTERM to all worker processes.
Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Reviewed-on: http://gerrit.cloudera.org:8080/15378
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>