Commit Graph

6 Commits

Author SHA1 Message Date
Michael Smith
512a73771f IMPALA-14452: Fix impala-shell SSL with Python 3.12
Removes deprecated ImpalaHttpClient constructor that supported port and
path as it has been deprecated since at least 2020 and appears unused.

Removes cert_file and key_file as they were also never used, and if
required must now be passed in via ssl_context.

Updates TSSLSocket fixes for Thrift 0.16 and Python 3.12. _validate_cert
was removed by Thrift 0.16, but everything worked because Thrift used
ssl.match_hostname instead. With Python 3.12 ssl.match_hostname no
longer exists so we rely on OpenSSL to handle verification with
ssl.PROTOCOL_TLS_CLIENT.

Only uses ssl.PROTOCOL_TLS_CLIENT when match_hostname is unavailable to
avoid changing existing behavior. THRIFT-792 identifies that TSocket
suppresses connection errors, where we would otherwise see SSL hostname
verification errors like

    ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED]
    certificate verify failed: IP address mismatch, certificate is not
    valid for '::1'. (_ssl.c:1131)

Python 2.7.9 and 3.2 are minimum required versions; both have been EOL
for several years.

Testing:
- ran custom_cluster/{test_client_ssl.py,test_ipv6.py} on Ubuntu 24 with
  Python 3.12, OpenSSL 3.0.13.
- ran custom_cluster/test_client_ssl.py on RHEL 7.9 with Python 2.7.5
  and Python 3.6.8, OpenSSL 1.0.2k-fips.
- adds test that hostname checking is configured.

Change-Id: I046a9010ac4cb1f7d705935054b306cddaf8bdc7
Reviewed-on: http://gerrit.cloudera.org:8080/23519
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-10-20 09:55:22 +00:00
Riza Suminto
0292e296c0 IMPALA-14426: Deflake TestImpalaShell.test_cancellation
TestImpalaShell.test_cancellation start failing when run with Python 3.9
with following error message

RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>

This patch is a quick fix the by changing the stderr write from using
print() to os.write(). Note that the thread-safetyness isssue within
_signal_handler in impala_shell.py during query cancellation still
remains.

Testing:
Run and pass test_cancellation in RHEL9 with Python 3.9.

Change-Id: I5403c7b8126b1a35ea841496fdfb6eb93e83376e
Reviewed-on: http://gerrit.cloudera.org:8080/23416
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-12 20:49:22 +00:00
Riza Suminto
28cff4022d IMPALA-14333: Run impala-py.test using Python3
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-03 10:01:29 +00:00
Csaba Ringhofer
5cca1aa9e5 IMPALA-13820: add ipv6 support for webui/hs2/hs2-http/beeswax
Main changes:
- added flag external_interface to override hostname for
  beeswax/hs2/hs2-http port to allow testing ipv6 on these
  interfaces without forcing ipv6 on internal communication
- compile Squeasel with USE_IPV6 to allow ipv6 on webui (webui
  interface can be configured with existing flag webserver_interface)
- fixed the handling of [<ipv6addr>].<port> style addresses in
  impala-shell (e.g. [::1]:21050) and test framework
- improved handling of custom clusters in test framework to
  allow webui/ImpalaTestSuite's clients to work with non
  standard settings (also fixes these clients with SSL)

Using ipv4 vs ipv6 vs dual stack can be configured by setting
the interface to bind to with flag webserver_interface and
external_interface. The Thrift server behind hs2/hs2-http/beeswax
only accepts a single host name and uses the first address
returned by getaddrinfo() that it can successfully bind to. This
means that unless an ipv6 address is used (like ::1) the behavior
will depend on the order of addresses returned by getaddrinfo():
63b7a263fc/lib/cpp/src/thrift/transport/TServerSocket.cpp (L481)
For dual stack the only way currently is to bind to "::",
as the Thrift server can only listen a single socket.

Testing:
- added custom cluster tests for ipv6 only/dual interface
  with and without SSL
- manually tested in dual stack environment with client on a
  different host
- among clients impala-shell and impyla are tested, but not
  JDBC/ODBC
- no tests yet on truly ipv6 only environment, as internal
  communication (e.g. krpc) is not ready for ipv6

To test manually the dev cluster can be started with ipv6 support:
dual mode:
bin/start-impala-cluster.py --impalad_args="--external_interface=:: --webserver_interface=::" --catalogd_args="--webserver_interface=::" --state_store_args="--webserver_interface=::"

ipv6 only:
bin/start-impala-cluster.py --impalad_args="--external_interface=::1 --webserver_interface=::1" --catalogd_args="--webserver_interface=::1" --state_store_args="--webserver_interface=::1"

Change-Id: I51ac66c568cc9bb06f4a3915db07a53c100109b6
Reviewed-on: http://gerrit.cloudera.org:8080/22527
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-21 14:00:31 +00:00
gaurav1086
3781132ef6 IMPALA-13675: OAuth AuthN Support for Impala Shell
This patch adds the support to fetch access tokens
from the OAuth Server using the OAuth client_id and
client_secret if the access token is not provided.
It covers the flow: client_credentials.
The client_secret can either be passed as a file or
be prompted to enter.

Added a test param for impala shell oauth_mock_response_cmd
to mock oauth server response only to be used for testing.
Also suppressed existing option hs2_x_forward from the
impala --help output.

Testing(okta oauth server):
- Added custom_cluster tests in test_shell_jwt_auth.py:
    test_oauth_auth_with_clientid_and_secret_success
    test_oauth_auth_with_clientid_and_secret_failure
- Tested manually by providing --user <user> and
  --oauth_client_secret_cmd="cat password_file.txt"
- Tested manually by providing --user <user> and no
  --oauth_client_secret_cmd, thereby prompting the user
  to enter the client_secret.

Example command: impala-shell.sh -a
--auth_creds_ok_in_clear --protocol="hs2-http"
--oauth_client_id="client_id"
--oauth_client_secret_cmd="cat client_secret.txt"
--oauth_server="dev.us.auth01.com"
--oauth_endpoint="/oauth/token"

Change-Id: I84e26d54f6a53696660728efb239ffd43de4c55d
Reviewed-on: http://gerrit.cloudera.org:8080/22424
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-05 21:15:47 +00:00
Joe McDonnell
ea0969a772 IMPALA-11980 (part 2): Fix absolute import issues for impala_shell
Python 3 changed the behavior of imports with PEP328. Existing
imports become absolute unless they use the new relative import
syntax. This adapts the impala-shell code to use absolute
imports, fixing issues where it is imported from our test code.

There are several parts to this:
1. It moves impala shell code into shell/impala_shell.
   This matches the directory structure of the PyPi package.
2. It changes the imports in the shell code to be
   absolute paths (i.e. impala_shell.foo rather than foo).
   This fixes issues with Python 3 absolute imports.
   It also eliminates the need for ugly hacks in the PyPi
   package's __init__.py.
3. This changes Thrift generation to put it directly in
   $IMPALA_HOME/shell rather than $IMPALA_HOME/shell/gen-py.
   This means that the generated Thrift code is rooted in
   the same directory as the shell code.
4. This changes the PYTHONPATH to include $IMPALA_HOME/shell
   and not $IMPALA_HOME/shell/gen-py. This means that the
   test code is using the same import paths as the pypi
   package.

With all of these changes, the source code is very close
to the directory structure of the PyPi package. As long as
CMake has generated the thrift files and the Python version
file, only a few differences remain. This removes those
differences by moving the setup.py / MANIFEST.in and other
files from the packaging directory to the top-level
shell/ directory. This means that one can pip install
directly from the source code. i.e. pip install $IMPALA_HOME/shell

This also moves the shell tarball generation script to the
packaging directory and changes bin/impala-shell.sh to use
Python 3.

This sorts the imports using isort for the affected Python files.

Testing:
 - Ran a regular core job with Python 2
 - Ran a core job with Python 3 and verified that the absolute
   import issues are gone.

Change-Id: Ica75a24fa6bcb78999b9b6f4f4356951b81c3124
Reviewed-on: http://gerrit.cloudera.org:8080/22330
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-21 15:14:11 +00:00