Impala Shell gets cookies from an HTTMessage object formed from a
response to an HTTP message. The format of cookies in the message
differs across the python versions. In Python 2 the HTTPMessage is a
mimetools.Message object, and the Set-Cookie values all appear in a
single header, separated by newlines. In Python 3 the HTTPMessage is an
email.message.Message, and the Set-Cookie values appear as duplicate
headers.
Add platform dependent code to get_all_matching_cookies() that loads
cookies from all the Set-Cookie headers.
TESTING:
Changed test_get_all_matching_cookies() to build the HTTPMessage
using a new utility method that creates Set-Cookie headers in
the appropriate format for the platform.
Validated that the KNOX_BACKEND-IMPALA cookies is correctly set in
Impala Shell on a Red Hat 9 system using Python 3 (which is how
the problem was first observed).
Change-Id: I057b5c2b9d78e36f32865537d091c4ac0e80d37f
Reviewed-on: http://gerrit.cloudera.org:8080/20216
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The impala-shell tarball ships its external dependencies
by building eggs and including them in the ext-py* directories.
On Redhat 9 and Ubuntu 22, the impala-shell tarball encountered
a regression where the sasl package could not access its
Client class:
Error connecting: AttributeError, module 'sasl' has no attribute 'Client'
This only occurs when using eggs (which are zip files). The virtualenv
installs worked fine. Unpacking the eggs and using the content directly
also avoids the problem.
This reworks the shell tarball to instead build wheels and install
them with 'pip install'. This means that the external dependencies
are not packaged in eggs, and this avoids the issue with sasl. This
is a minimal change to avoid the issue until the shell tarball build
can be reworked more extensively.
Testing:
- Ran shell tests on Redhat 9
Change-Id: I49403979c559b7f8bbe038865c06db6024468d72
Reviewed-on: http://gerrit.cloudera.org:8080/20095
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for Redhat 9 / Ubuntu 22. It updates
to a newer toolchain that has those builds, and it adds
supporting code in bootstrap_system.sh.
Redhat 9 and Ubuntu 22 use python = python3, which requires
various changes to build scripts and tests. Ubuntu 22 uses
Python 3.10, which deprecates certain ssl.PROTOCOL_TLS, so
this adapts test_client_ssl.py to that change until it
can be fully addressed in IMPALA-12219.
Various OpenSSL methods have been deprecated. As a workaround
until these can be addressed properly, this specifies
-Wno-deprecated-declarations. This can be removed once the
code is adapted to the non-deprecated APIs in IMPALA-12226.
Impala crashes with tcmalloc errors unless we update to a newer
gperftools, so this moves to gperftools 2.10. gperftools changed
the default for tcmalloc.aggressive_memory_decommit to off, so
this adapts our code to set it for backend tests. The gperftools
upgrade does not show any performance regression:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.08 | -0.64% | 2.20 | -0.37% |
+----------+-----------------------+---------+------------+------------+----------------+
With newer Python versions, the impala-virtualenv command
fails to create a Python 3 virtualenv. This switches to
using Python 3's builtin venv command for Python >=3.6.
Kudu needed a newer version and LLVM required a couple patches.
Testing:
- Ran a core job on Ubuntu 22 and Redhat 9. The tests run
to completion without crashing. There are test failures
that will be addressed in follow-up JIRAs.
- Ran dockerised tests on Ubuntu 22.
- Ran dockerised tests on Ubuntu 20 and Rocky 8.5.
Change-Id: If1fcdb2f8c635ecd6dc7a8a1db81f5f389c78b86
Reviewed-on: http://gerrit.cloudera.org:8080/20073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Interactive shell tests can hang waiting for input if the
shell process hits errors or exits. For example, the problems
in the sasl package seen in IMPALA-12220 cause test_shell_interactive.py
to hang.
This improves the error detection/handling to avoid hangs for
most common shell errors. Specifically, it adds a check for
the impala-shell process exiting, and it adds a check for
a failure to connect to Impala. Both would previous result
in hangs.
Testing:
- Verified test_shell_interactive.py doesn't hang with hand
tests
- Remove a vital import from impala-shell so it exits instantly
- Simulate a connection problem by overwriting the port
with a non-functional port
- Test on Redhat 9 with the IMPALA-12220 issue
Change-Id: I7556fb687e06b41caa538d8c3231ec9f2ad98162
Reviewed-on: http://gerrit.cloudera.org:8080/20087
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Commit cd9f3f578 aims to suppres logging for the 'thrift' library
within impala-shell. However, it does not work in all case. This change
moves the fix into the 'main' function, which suppresses the unwanted
messagge.
Tested by connecting through impala-shell with Python2.7 and Python3.6
with SSL enabled.
Change-Id: I4de95b1b67abe9a0b4637910b0894addddda23d5
Reviewed-on: http://gerrit.cloudera.org:8080/20074
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This pulls in a new toolchain to get a Thrift with
the patch for THRIFT-5705. This fixes an issue where
idle clients using TLS are needlessly disconnected due
to a bug in the read retry count logic inside Thrift.
Tests:
- This modifies test_thrift_socket.py to make it do
more idle polls and check that ImpalaShell is not
disconnected. It fails without the THRIFT-5705 patch
and passes now.
Change-Id: Ifc7704cba032a91b9fd0d5d54d1e0a7e17fb10bb
Reviewed-on: http://gerrit.cloudera.org:8080/19962
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
The previous fetch_size of 10240 turned out to be suboptimal for HS2
server side, likely because it leads to overallocation in result
'std::vector's. Changed to the closest power of 2 size (8192).
With this change RowMaterializationTimer decreased from 3.4s to 2.7s
for "SELECT * FROM tpch_parquet.lineitem".
Change-Id: I34973cb705db53c496b9944c74995b45cf720d46
Reviewed-on: http://gerrit.cloudera.org:8080/19965
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The end time of the exact same rpc call was different between stdout
and the rpc details file because the end time was calculated each
time the details were written out instead of calculating the end time
once and reusing that value.
The duration of each rpc call was being calculated incorrectly.
Change-Id: Ifd9dec189d0f6fb8713fb1c7b2b6c663e492ef05
Reviewed-on: http://gerrit.cloudera.org:8080/19932
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As __future__.unicode_literals is imported in impala-shell
concatenating an str with a literal leads to decoding the
string with 'ascii' codec which fails if there are non-ascii
characters. Converting the literal to str solves the issue.
Testing:
- added regression test + ran related EE tests
Change-Id: I99b72dd262fc7c382e8baee1dce7592880c84de2
Reviewed-on: http://gerrit.cloudera.org:8080/19893
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Thrift's fastbinary module provides native code that
accelerations the BinaryProtocol. It can make a large
performance difference when using the Hiveserver2
protocol with impala-shell. If the fastbinary is not
working, it silently falls back to interpreted code.
This can happen because the fastbinary couldn't load
a particular library, etc.
This adds a warning on impala-shell startup when
it detects that Thrift's fastbinary is not working.
When bin/impala-shell.sh is modified to use python3,
impala-shell outputs this error (shortened for legibility):
WARNING: Failed to load Thrift's fastbinary module. Thrift's
BinaryProtocol will not be accelerated, which can reduce performance.
Error was '{path to Python2 thrift fastbinary.so}: undefined symbol: _Py_ZeroStruct'
Testing:
- Added a simple test that verifies the impala-shell
does not output the warning
- Outputs warning when Python 2 thrift used for Python 3 shell
Change-Id: Id5d0e5db5cfdf1db4521b00f912b4697a7f646e8
Reviewed-on: http://gerrit.cloudera.org:8080/19806
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This support was modeled after the LDAP authentication.
If JWT authentication is used, the Impala shell enforces the use of the
hs2-http protocol since the JWT is sent via the "Authentication"
HTTP header.
The following flags have been added to the Impala shell:
* -j, --jwt: indicates that JWT authentication will be used
* --jwt_cmd: shell command to run to retrieve the JWT to use for
authentication
Testing
New Python tests have been added:
* The shell tests ensure that the various command line arguments are
handled properly. Situations such as a single authentication method,
JWTs cannot be sent in clear text without the proper arguments, etc
are asserted.
* The Python custom cluster tests leverage a test JWKS and test JWTs.
Then, a custom Impala cluster is started with the test JWKS. The
Impala shell attempts to authenticate using a valid JWT, an expired
(invalid) JWT, and a valid JWT signed by a different, untrusted JWKS.
These tests also exercise the Impala JWT authentication mechanism and
assert the prometheus JWT auth success and failure metrics are
reported accurately.
Change-Id: I52247f9262c548946269fe5358b549a3e8c86d4c
Reviewed-on: http://gerrit.cloudera.org:8080/19837
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Pip sporadically hits an error when installing impala-shell into
a virtualenv. An example symptom is this (though the issue is
not specific to thrift):
WARNING: Skipping page https://pypi.org/simple/thrift/ because the
GET request got Content-Type: Unknown. The only supported
Content-Types are application/vnd.pypi.simple.v1+json,
application/vnd.pypi.simple.v1+html, and text/html
ERROR: Could not find a version that satisfies the requirement
thrift==0.16.0 (from impala-shell) (from versions: none)
ERROR: No matching distribution found for thrift==0.16.0
It appears that this error can occur when two pip processes
are installing into virtualenvs simultaneously and share a
cache directory. This happens for our impala-shell build,
because we are doing pip install for Python 2 and Python 3
simultaneously. The impala-python/impala-python3 virtualenvs
do not use a cache directory and are not impacted.
This changes the shell's pip install to give the Python 2 and
Python 3 separate cache directories. The cache directories are
placed in ~/.cache like the regular pip cache. These do not
consume much space (a couple MB).
Testing:
- Ran all-build-options-ub2004 ten times without seeing the failure
Change-Id: I3f834b9f8c8cbc09830745ad132677a2fe17e07b
Reviewed-on: http://gerrit.cloudera.org:8080/19813
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Fix various quality-of-life issues with the 'summary' command:
- update regex to correctly match query ID for handling "Query id ...
not found" errors
- fail the command rather than exiting the shell when 'summary' is
called with an incorrect argument (such as 'summary 1')
- provide a useful message rather than print an exception when 'summary
original' is invoked with no failed queries
Testing:
- added new tests for the 'summary' command
Change-Id: I7523d45b27e5e63e1f962fb1f6ebb4f0adc85213
Reviewed-on: http://gerrit.cloudera.org:8080/19797
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
In _do_beeswax_rpc(), the exception handling code tries to recognize the
exception and raise more meaningful exceptions. However, in the last
case for unknown exceptions, it does nothing so the method just returns
None. This makes the caller come into the error complaining 'NoneType'
object is not iterable, since the caller expects the result is a tuple
of two items:
handle, rpc_status = self._do_beeswax_rpc(...)
This patch prints more details of the unknown exception and finally
raise an exception in _do_beeswax_rpc(). So the callers can show more
meaningful errors.
Tests:
I can't reproduce the error mentioned in the JIRA description. So I
manually modify the code to give _do_beeswax_rpc() a function that will
always throw an exception. Here is the console output:
$ impala-shell.sh --protocol=beeswax
[localhost:21000] default> select 1;
Query: select 1
Query submitted at: 2023-04-21 10:24:57 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Caught exception My error, type=<type 'exceptions.Exception'>
Traceback (most recent call last):
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1531, in _do_beeswax_rpc
ret = rpc()
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1412, in myFunc
raise Exception("My error")
Exception: My error
Unknown Exception : Encountered unknown exception
Traceback (most recent call last):
File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 1325, in _execute_stmt
query_str, self.set_query_options)
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1414, in execute_query
handle, rpc_status = self._do_beeswax_rpc(myFunc)
File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1604, in _do_beeswax_rpc
raise Exception("Encountered unknown exception")
Exception: Encountered unknown exception
[Not connected] > Goodbye quanlong
Change-Id: I7d847251d3dab815af2427bf7701d60dc05af659
Reviewed-on: http://gerrit.cloudera.org:8080/19777
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Removes import future from pkg_resources.py as this file is fairly old
and only used with Python 2. Adding unicode_literals in IMPALA-3343 /
IMPALA-9489 broke reading files in the HOME directory during package
load via pkg_resources when any of those files contain special
characters (above ASCII 127).
Testing:
- added a file with an em dash to the HOME directory, then run
impala-shell with Python 2.7 and without setuptools installed. This
reproduced the issue before adding this patch, and is fixed by this
patch.
Change-Id: Ia9e05904adf9ffe303cac281538df1bbcff5e48b
Reviewed-on: http://gerrit.cloudera.org:8080/19641
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
encodestring has been a deprecated alias to encodebytes since Python
3.1. It was removed in Python 3.9. However encodebytes was only added in
Python 3.1, so we need to test and use the appropriate call for each
version.
Change-Id: If802eafa984a980d4442c4891876140ff9708096
Reviewed-on: http://gerrit.cloudera.org:8080/19635
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Uses impala-python when running packaging scripts to use a known python
version with setuptools available. This supports running on systems
where the `python` binary is available (as Python 2) but doesn't include
setuptools.
In this configuration IMPALA_SYSTEM_PYTHON2_OVERRIDE= is set to disable
building with python2, and only python3 is used for shell packaging.
Also ensures that we use IMPALA_SYSTEM_PYTHON2/3 when using system
python for building.
Testing:
- Manual build with python as a minimal Python 2 install, and Python 3.8
(including setuptools).
Change-Id: I51c257010ef8fb1790482cdc3315aede908ef095
Reviewed-on: http://gerrit.cloudera.org:8080/19619
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Detect system Pythons (2 and 3) during build configuration. Build the
impala-shell tarball only using available Python versions, and test
available versions.
Drops support for DISABLE_PYTHON3_TEST as it's now automatically
detected. If python3 is present on the system, it's expected to be
usable.
Testing:
- built in SLES 15 SP4 container with Python 3
Change-Id: Iba36d0feba163e1251c66a6a49121d4dac625afc
Reviewed-on: http://gerrit.cloudera.org:8080/19560
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA_THRIFT_PY_VERSION is also bumped to 0.16.0p3.
As 0.16.0p3 Thrift does not contain Python related
patches and Impyla 0.18.0 depends on Thrift 0.16.0,
now we are consistently using Thrift 0.16.0 in all
Python code. This also bumps the Thrift in the
shell's ext-py directory to 0.16.0 (based on the
Thrift 0.16.0 pypi tarball with the egg directory
removed).
Testing:
- Ran a GVO job
Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5
Reviewed-on: http://gerrit.cloudera.org:8080/18456
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
When using the hs2 protocol with the http transport, include several
tracing http headers by default. These headers are:
* X-Request-Id -- client defined string that identifies the
http request, this string is meaningful only
to the client
* X-Impala-Session-Id -- session id generated by the Impala backend,
will be omitted on http calls that occur
before this id has been generated
* X-Impala-Query-Id -- query id generated by the Impala backend,
will be omitted on http calls that occur
before this id has been generated
The Impala shell includes these headers by default. The command
line argument --no_http_tracing has been added to remove these
headers.
The Impala backend logs out these headers if they are on the http
request. The log messages are written out at log level 2 (RPC).
Testing:
- manual testing (verified using debugging proxy and impala logs)
- new python test
Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Reviewed-on: http://gerrit.cloudera.org:8080/19428
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala's shell tarball used to include a copy of pkg_resources.py (from
setuptools); due to the Python version we use for packaging, all modules
with native libraries use pkg_resources to load the library. It was
removed in IMPALA-9718 because Impala's copy of pkg_resources didn't
work with Python 3.
Some platforms - RHEL 7 for Python 2, Ubuntu/Debian - don't install
setuptools by default as part of the python package, which causes
impala-shell to error with "ImportError: No module named pkg_resources".
Restores Impala's copy of pkg_resources.py to PYTHONPATH when running
impala-shell under Python 2. Omits it for Python 3 so we use updated
setuptools when available. python-setuptools will still be a manual
requirement with Python 3.
Testing
- manually confirmed impala-shell starts in Ubuntu 20.04 docker
container after 'apt install python' (omits setuptools).
- manually confirmed impala-shell starts in Ubuntu 20.04 docker
container after 'apt install python3-setuptools' (includes python).
Change-Id: I78c05bce75ecc68de2296b1c2e57cd3c17c3cb0a
Reviewed-on: http://gerrit.cloudera.org:8080/19467
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
setuptools 66.0.0 introduced a breaking change, it does not support non
PEP440 compliant version names. This breaks impala_shell's packaging and
installing test if the system python3's version is 3.8+.
This is a quick fix to unblock builds. The rest of the work will be done
in IMPALA-11849 (e.g. stabilizing the python environments version).
impala_shell releases should not be affected by this, as the version
number we generate is already PEP440 compliant.
Testing:
- Built locally with python3.8
Change-Id: I4eb0957fb576e590b86b6fe570216cfb72d11aef
Reviewed-on: http://gerrit.cloudera.org:8080/19431
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the Impala shell is using the hs2 protocol, it makes multiple RPCs
to the Impala daemon. These calls pass Thrift objects back and forth.
This change adds the '--show_rpc' which outputs the details of the RPCs
to stdout and the '--rpc_file' flag which outputs the RPC details to the
specified file path.
RPC details include:
- operation name
- request attempt count
- Impala session/query ids (if applicable)
- call duration
- call status (success/failure)
- request Thrift objects
- response Thrift objects
Certain information is not included in the RPC details:
- Thrift object attributes named 'secret' or 'password'
are redacted.
- Thrift objects with a type of TRowSet or TGetRuntimeProfileResp
are not include as the information contained within them is
already available in the standard output from the Impala shell.
Testing:
- Added new tests in the end-to-end test suite.
Change-Id: I36f8dbc96726aa2a573133acbe8a558299381f8b
Reviewed-on: http://gerrit.cloudera.org:8080/19388
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-10660 introduced the "hs2_fp_format" shell option. In the help
section describing the query option it states:
Use '%16G' to match Beeswax protocol's floating-point output format
However, '%16G' is not accepted by the shell:
bin/impala-shell.sh --hs2_fp_format='%16G'
Invalid floating point format specification: %16G
This commit changes the example to '16G'.
Also corrected the name of the option in
shell/impala_shell_config_defaults.py from 'fp_format_specification' to
'hs2_fp_format'.
Change-Id: If53e69b495dfeb8d6d65878eff9580c5e12f793d
Reviewed-on: http://gerrit.cloudera.org:8080/19359
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
subprocess.Popen returns a byte string in Python 3, which serializes
incorrectly when sending it as the LDAP password and causes `endswith`
to error with
> first arg must be bytes or a tuple of bytes, not str
Fixes `impala-shell --ldap_password_cmd` run with Python 3 by decoding
bytes as unicode.
Testing: confirmed that I can successfully authenticate via LDAP with
impala-shell in Python 2.7 and Python 3.8.
Change-Id: I3638d6f8d3ed7184495dbe3512d9e5ceb0ee8c45
Reviewed-on: http://gerrit.cloudera.org:8080/19283
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
sasl 0.2.1 fails to build with Python 3.10. Updates to sasl 0.3.1 for
Python 3.10 compatibility.
Testing:
- built under Python 3.8
- automated tests will test with built bundle and pip install using
current Python version
- pip3 installed shell/build/dist on Ubuntu 22.04 with Python 3.10
Change-Id: I6b522f2b8cb5546150cd3274c7670a6ca9b8ff63
Reviewed-on: http://gerrit.cloudera.org:8080/19265
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Currently max tries for connecting to coordinator is hard coded to 4
in hs2-http mode. It's required to make the max tries when connecting
to coordinator a configurable option, especially in the environment
where coordinator is started slowly.
This patch added support for configurable max tries in hs2-http mode
using the new impala-shell config option '--connect_max_tries'.
The default value of '--connect_max_tries' is set to 4.
Testing:
- Ran e2e shell tests.
- Ran impala-shell with connect_max_tries as 100 before starting
impala coordinator daemon, verified that impala-shell connects to
coordinator after coordinator daemon was started.
Change-Id: I5f7caeb91a69e71a38689785fb1636094295fdb1
Reviewed-on: http://gerrit.cloudera.org:8080/19105
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a shell option called "hs2_fp_format"
which manipulates the print format of floating-point values in HS2.
It lets the user to specify a Python-based format specification
expression (https://docs.python.org/2.7/library/string.html#formatspec)
which will get parsed and applied to floating-point
column values. The default value is None, in this case the
formatting is the same as the state before this change.
This option does not support the Beeswax protocol, because Beeswax
converts all of the column values to strings in its response.
Tests: command line tests for various formatting options and
for invalid formatting option
Change-Id: I424339266be66437941be8bafaa83fa0f2dfbd4e
Reviewed-on: http://gerrit.cloudera.org:8080/18990
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Similar to IMPALA-11332, The current VerticalOutputFormatter is
stripping trailing whitespaces from the last line of output. This
rstrip() was intended to remove an extra newline,
but it is matching other white space. This is a
problem for a SQL query like:
select 'Trailing whitespace ';
This changes the rstrip() to rstrip('\n') to
avoid removing the other white space.
Testing:
- Current shell tests pass
- Added a shell test that verifies trailing whitespace
is not being stripped.
Change-Id: Id66162d28498e7bef2933651616cf3df2fb0f354
Reviewed-on: http://gerrit.cloudera.org:8080/18722
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Build Python 3 eggs for the shell tarball so it works with both Python 2
and Python 3. The impala-shell script selects eggs based on the
available Python version.
Inlines thrift for impala-shell so we can easily build Python 2 and
Python 3 versions, consistent with other libraries. The impala-shell
version should always be at least as new as IMPALA_THRIFT_PY_VERSION.
Thrift 0.13.0+ wraps all exceptions during TSocket read/write operations
in TTransportException. Specifically socket.error that we got as raw
exceptions are now wrapped. Unwraps them before raising to preserve
prior behavior.
A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE;
otherwise it will use 'python', and if unavailable try 'python3'.
Adds tests for impala-shell tarball with Python 3.
Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
Reviewed-on: http://gerrit.cloudera.org:8080/18653
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This reverts commit 0a0001e1a8 to re-apply
IMPALA-9718. CentOS 7 has pkg_resources available in its latest Python
2.7 release; we may need to install python-setuptools if not present in
test environments.
Also provides an isolated PYTHON_EGG_CACHE to avoid polution from the
general dev/test environment and specifically address
> UserWarning: /var/lib/jenkins/.python-eggs is writable by group/others
and vulnerable to attack when used with get_resource_filename.
Change-Id: I8e443d78671d8afab70d784664e71a70ccfcd587
Reviewed-on: http://gerrit.cloudera.org:8080/18585
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When using jdbc connection pool, a connection set some query options,
after query finished, connection is closed and put back to the connection
pool. When connection used again, the last query option also come into
effect. We need a feature that a set statement can reset all query option
without recreating a new connection.
Support UNSET statements in SQL dialect. UNSET ALL can unset all query
option.
Testing:
- add unset all query option in test_hs2.py
Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Reviewed-on: http://gerrit.cloudera.org:8080/18430
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This directory is currently checked in, but it is
overwritten when building the shell. On some Linux
distributions, the output is different from what
is checked in. This causes problems for perf-AB-test
(based on bin/single_node_perf_run.py), which relies on
a build not causing any modifications.
This removes the kerberos.egg-info directory,
which does not need to be checked in.
This also adds checks to the GVO Jenkins jobs
to verify that the source tree is unmodified after
bootstrap_build.sh and boostrap_development.sh.
These checks are not included in those scripts
directly, because developers can run those scripts
in their development environments, which may have
modifications.
Tests:
- Uploaded a change without removing the kerberos.egg-info
directory and verified that the new checks fail
- Verified that perf-AB-test gets past the current issue
Change-Id: I90b486bb6c1644fc18b56779d6c54e1e1b3c9aaa
Reviewed-on: http://gerrit.cloudera.org:8080/18650
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala used to have one thrift compiler version to compile C++, Java,
and Python code.
Most Thrift serialization/deserialization between minor versions are
compatible with each other. So it is possible to have different thrift
compiler versions for different target codes. It is beneficial to do so
because it will allow Impala to upgrade separate components
independently.
This patch implements the infrastructure change required to do so. It
replace most of the 'THRIFT_*' environment variable and CMake variable
with 'THRFIT_CPP_*', 'THRFIT_JAVA_*', and 'THRFIT_PY_*' to compile C++,
Java, and Python code accordingly. All three still refer to the same
thrift version (thrift-0.11.0-p5).
Testing:
- Build Impala and pass core tests.
Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4
Reviewed-on: http://gerrit.cloudera.org:8080/18636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The version string is generated in bin/save-version.sh where the
BUILD_TIME is the result of `date` command. It could contain Non-ASCII
characters depending on the locale. The date string is then used in
generating impala_build_version.py. So we need to explicitly set the
script's encoding to utf-8.
Change-Id: I596121e08a52a4bc6d8668cf7e8b61b6c34eb4b9
Reviewed-on: http://gerrit.cloudera.org:8080/18632
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gergely Fürnstáhl <gfurnstahl@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When redirecting stdout and stderr to a file, the
existing code can sometimes output the "Fetched X row(s)"
line before finishing the row output. e.g.
impala-shell -B -q "select 1" >> outfile.txt 2>> outfile.txt
The rows output goes to stdout while the control messages
like "Fetched X row(s)" go to stderr. Since stdout can buffer
output, that can delay the output. This adds a flush for
stdout before writing the "Fetched X row(s)" message.
Testing:
- Added a shell test that redirects stdout and stderr to
a file and verifies the contents. This consistently
fails without the flush.
- Other shell tests pass
Change-Id: I83f89c110fd90d2d54331c7121e407d9de99146c
Reviewed-on: http://gerrit.cloudera.org:8080/18625
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Sets up a virtualenv with system python to install the impala-shell PyPI
package into. Using system python provides better coverage for Python
versions likely to be used by customers. Runs impala-shell tests using
the PyPI package to provide better coverage for the artifact customers
will use.
Includes a PyPI install in notests_independent_targets because these
seem to be used for Python testing despite -notests.
Change-Id: I384ea6a7dab51945828cca629860400a23fa0c05
Reviewed-on: http://gerrit.cloudera.org:8080/18586
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
In vertical mode, impala-shell will print each row in the format:
firstly print a line contains line number, then print this row's columns
line by line, each column line started with it's name and a colon.
To enable it: use shell option '-E' or '--vertical', or 'set VERTICAL=
true' in interactive mode. to disable it in interactive mode: 'set
VERTICAL=false'. NOTICE: it will be disabled if '-B' option or 'set
WRITE_DELIMITED=true' is specified.
Tests:
add methods in test_shell_interactive.py and test_shell_commandline.py.
Change-Id: I5cee48d5a239d6b7c0f51331275524a25130fadf
Reviewed-on: http://gerrit.cloudera.org:8080/18549
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fixes an error building shell_pypi_package with junit_xml_wrapper.sh.
Previously `make shell_pypi_package` would error with
/home/michael/Impala/bin/junitxml_command_wrapper.sh: line 46:
DIST_DIR=/home/michael/Impala/shell/dist CLEAN_DIST=true
/home/michael/Impala/shell/packaging/make_python_package.sh:
No such file or directory
Updates the default `DIST_DIR` to point to the documented path that
CMake sets it to, and removes setting `DIST_DIR` as it matches the
default value. Also removes `CLEAN_DIST` as that value is ignored.
Change-Id: I60ffac3edf1a6027afa4ca46ab6dadfc6bfc660a
Reviewed-on: http://gerrit.cloudera.org:8080/18578
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The current CSV output is stripping trailing
whitespaces from the last line of CSV output. This
rstrip() was intended to remove an extra newline,
but it is matching other white space. This is a
problem for a SQL query like:
select 'Trailing whitespace ';
This changes the rstrip() to rstrip('\n') to
avoid removing the other white space.
Testing:
- Current shell tests pass
- Added a shell test that verifies trailing whitespace
is not being stripped.
Change-Id: I69d032ca2f581587b0938d0878fdf402fee0d57e
Reviewed-on: http://gerrit.cloudera.org:8080/18580
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When using the --output_file commandline option for
impala-shell, the shell fails with UnicodeDecodeError
if the output contains Unicode characters.
For example, if running this command:
impala-shell -B -q "select '引'" --output_file=output.txt
This fails with:
UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
This happens due to an encode('utf-8') call happening
in OutputStream::write() on a string that is already UTF-8 encoded.
This changes the code to skip the encode('utf-8') call for Python 2.
Python 3 is using a string and still needs the encode call.
This is mostly a pragmatic fix to make the code a little bit
more functional, and there is more work to be done to have
clear contracts for the format() methods and clear points
of conversion to/from bytes.
Testing:
- Ran shell tests with Python 2 and Python 3 on Ubuntu 18
- Added a shell test that outputs a Unicode character
to an output file. Without the fix, this test fails.
Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
Reviewed-on: http://gerrit.cloudera.org:8080/18576
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fixes a few impala-shell Python 3 issues:
1. In ImpalaShell's do_history(), the decode() call needs to be
avoided in Python 3, because in Python 3 the cmd is already
a string and doesn't need further decoding. (IMPALA-11315)
2. TestImpalaShell.test_http_socket_timeout() gets a different
error message in Python 3. It throws the "BlockingIOError"
rather than "socker.error". (IMPALA-11316)
3. ImpalaHttpClient.py's code to retrieve the body when
handling an HTTP error needs to have a decode() call
for the body. Otherwise, the body remains bytes and
causes TestImpalaShellInteractive.test_http_interactions_extra()
to fail. (IMPALA-11317)
Testing:
- Ran shell tests in the standard way
- Ran shell tests with the impala-shell executable coming from
a Python 3 virtualenv using the PyPi package
Change-Id: Ie58380a17d7e011f4ce96b27d34717509a0b80a6
Reviewed-on: http://gerrit.cloudera.org:8080/18556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Thrift 0.11.0 has known issues where Unicode errors are
not handler properly, including one case where the client
can hang. The traditional form factor for impala-shell
uses a patched Thrift that fixes those issues, but the
PyPi package uses the unpatched Thrift 0.11.0.
This modifies the requirements.txt file to use Thrift 0.14.2,
which has fixes for these Unicode issues. Thrift 0.14.2 has
a slightly different error message, so this amends the
allowed error messages in test_utf8_decoding_error_handling().
This is a bit awkward, given that the Python code generation
continues to happen with Thrift 0.11.0. Comparing the
Python code for Thrift 0.11 vs Thrift 0.14, I didn't see
noticeable differences. Given that the client can hang,
this seems worth fixing ahead of the full conversion to
Thrift 0.14 for all of Impala.
Testing:
- Ran the Unicode error handling tests with a PyPi
impala-shell
- Ran the shell tests normally
Change-Id: I63e0a5dda98df20c9184a347397118b1f3529603
Reviewed-on: http://gerrit.cloudera.org:8080/18560
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
impala-shell fail with TypeError when installed with python3. This is
due to behavior change of division operator ('/') between python2 vs
python3. This patch fix the issue by changing the operator with floor
division ('//') that result in integer type as described in
https://peps.python.org/pep-0238/.
Testing:
- Manually install impala-shell with from pip with python3 and verify
the fix works.
Change-Id: Ifbe4df6a7a4136e590f383fc6475e2283e35eadc
Reviewed-on: http://gerrit.cloudera.org:8080/18546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>