This patch adds the support to fetch access tokens
from the OAuth Server using the OAuth client_id and
client_secret if the access token is not provided.
It covers the flow: client_credentials.
The client_secret can either be passed as a file or
be prompted to enter.
Added a test param for impala shell oauth_mock_response_cmd
to mock oauth server response only to be used for testing.
Also suppressed existing option hs2_x_forward from the
impala --help output.
Testing(okta oauth server):
- Added custom_cluster tests in test_shell_jwt_auth.py:
test_oauth_auth_with_clientid_and_secret_success
test_oauth_auth_with_clientid_and_secret_failure
- Tested manually by providing --user <user> and
--oauth_client_secret_cmd="cat password_file.txt"
- Tested manually by providing --user <user> and no
--oauth_client_secret_cmd, thereby prompting the user
to enter the client_secret.
Example command: impala-shell.sh -a
--auth_creds_ok_in_clear --protocol="hs2-http"
--oauth_client_id="client_id"
--oauth_client_secret_cmd="cat client_secret.txt"
--oauth_server="dev.us.auth01.com"
--oauth_endpoint="/oauth/token"
Change-Id: I84e26d54f6a53696660728efb239ffd43de4c55d
Reviewed-on: http://gerrit.cloudera.org:8080/22424
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
every rpc call
This patch optimizes the OAuth/JWT flow by setting
cookies in order to avoid token verification in every
RPC call. The default cookie expiry time is 1 day.
This is only valid for hs2-http protocol.
Testing: Modified existing custom cluster tests:
test_jwt_auth_valid and test_oauth_auth_valid:
- total jwt token verification success count = 1:
Reason: Verify jwt/oauth token only the first time
and then set the cookie so do not need to re-verify
the token for subsequent rpc queries.
- total cookie auth success = rpc count - 1:
Reason: After first verification, all subsequent
authentication will be cookie auth based.
- Benchmarking the query SELECT 1; executed 10,000
times with OAuth authentication showed a total time
of 2.16s with the cookie enabled vs. 2.38s
without the cookie. This indicates a modest
performance gain (~9%) when cookie support is
enabled. The time command output in both scenarios
are:
With cookie enabled:
- real 2.16
- user 0.99
- sys 0.21
With cookie disabled:
- real 2.38
- user 1.12
- sys 0.22
Change-Id: I0e3e5d9cf8bdb99920611b06571515e05e15164e
Reviewed-on: http://gerrit.cloudera.org:8080/22600
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The custom cluster tests for Impala shell JWT authentication all
contain magic numbers for the expected count of RPCs for the hs2-http
protocol. Thus, any time the rpcs are modified, these tests have the
potential to fail.
Since the JWT tests are focused on all JWT authentications either
succeeding or failing, the actual number of rpcs is not relevant. The
tests now use existing metrics to determine the expected rpc count.
Additionally, the tests use existing metrics to determine when the
assertions can run instead of relying on a sleep statement.
The modified tests passed locally and in Jenkins.
Change-Id: Icf0eebd74e1ce10ad24055b7fab4b1901ce61e03
Reviewed-on: http://gerrit.cloudera.org:8080/22201
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We have plenty of custom_cluster tests that assert against content of
Impala daemon log files while the process is still running using
assert_log_contains() and it's wrappers. The method specifically mention
about disabling glog buffering ('-logbuflevel=-1'), but not all
custom_cluster tests do that. This often result in flaky test that hard
to triage and often neglected if it does not frequently run in core
exploration.
This patch adds boolean param 'disable_log_buffering' into
CustomClusterTestSuite.with_args for test to declare intention to
inspect log files in live minicluster. If it is True, start minicluster
with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on
any calls to assert_log_contains().
There are several complex custom_cluster tests that left unchanged and
print out such WARNING logs, such as:
- TestQueryLive
- TestQueryLogTableBeeswax
- TestQueryLogOtherTable
- TestQueryLogTableHS2
- TestQueryLogTableAll
- TestQueryLogTableBufferPool
- TestStatestoreRpcErrors
- TestWorkloadManagementInitWait
- TestWorkloadManagementSQLDetails
This patch also fixed some small flake8 issues on modified tests.
There is a flakiness sign at test_query_live.py where test query is
submitted to coordinator and fail because sys.impala_query_live table
has not exist yet from coordinator's perspective. This patch modify
test_query_live.py to wait for few seconds until sys.impala_query_live
is queryable.
Testing:
- Pass custom_cluster tests in exhaustive exploration.
Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61
Reviewed-on: http://gerrit.cloudera.org:8080/22015
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change will print timestamp of an exception or warning
occurred during execution of a query via impala-shell.
The timestamp will use timezone of the machine running impala-shell.
example:
Query submitted at: 2024-08-22 16:17:57 (Coordinator: http://host:25000)
Query state can be monitored at:
http://localhost:25000/query_plan?query_id=e04dcc55e560d1ee:11173fe800000000
^C Cancelling Query
Opened TCP connection to localhost:21050
2024-08-22 16:17:58 [Exception] type=<class 'socket.error'> in FetchResults.
[Errno 4] Interrupted system call
2024-08-22 16:17:58 [Warning] Cancelling Query
2024-08-22 16:17:58 [Warning] close session RPC failed: <class
'shell_exceptions.QueryCancelledByShellException'>
Opened TCP connection to localhost:21050
[localhost:21050] default>
Change-Id: I4abbd02aa9f61210b0333495bf191e72c22a5944
Reviewed-on: http://gerrit.cloudera.org:8080/21426
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There are many custom cluster tests that require creating temporary
directory. The temporary directory typically live within a scope of test
method and cleaned afterwards. However, some test do create temporary
directory directly and forgot to clean them afterwards, leaving junk
dirs under /tmp/ or $LOG_DIR.
This patch unify the temporary directory management inside
CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in
CustomClusterTestSuite.with_args() that list tmp dirs to create.
'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept
formatting pattern that is replaceable by a temporary dir path, defined
through 'tmp_dir_placeholders'.
There are few occurrences where mkdtemp is called and not replaceable by
this work, such as tests/comparison/cluster.py. In that case, this patch
change them to supply prefix arg so that developer knows that it comes
from Impala test script.
This patch also addressed several flake8 errors in modified files.
Testing:
- Pass custom cluster tests in exhaustive mode.
- Manually run few modified tests and observe that the temporary dirs
are created and removed under logs/custom_cluster_tests/ as the tests
go.
Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5
Reviewed-on: http://gerrit.cloudera.org:8080/21836
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Added backend flag 'use_xff_address_as_origin' for using the client IP
address from 'X-Forwarded-For' HTTP header as the origin of HTTP
connection. The origin IP address in the SessionState is used by the
ranger client for both authorization (RangerAccessRequestImpl) and
auditing (RangerBufferAuditHandler). Impala does not do any verification
or sanitization of this IP address, so its value should only be trusted
if the deployment environment protects against spoofing.
Also, added a new function 'GetXFFOriginClientAddress' for parsing XFF
header with comma separated IP addresses, which is the most common form
of XFF header representing client and intermediate proxies:
X-Forwarded-For: <client>, <proxy1>, <proxy2>
'GetXFFOriginClientAddress' is now also used for getting the client IP
from XFF header in existing use cases such as trusted domain based
authentication for both HS2 HTTP server and web server.
Testing:
- Added unit tests for the new GetXFFOriginClientAddress function for
parsing comma separated IP addresses in XFF header
- Updated existing tests for trusted domain authentication to use
XFF with comma separated IP addresses
- Added custom cluster test which ensures that client IP address from
XFF header is included in the ranger audit logs.
Change-Id: Ib784ad805c649e9576ef34f125509c904b7773ab
Reviewed-on: http://gerrit.cloudera.org:8080/21780
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This support was modeled after the LDAP authentication.
If JWT authentication is used, the Impala shell enforces the use of the
hs2-http protocol since the JWT is sent via the "Authentication"
HTTP header.
The following flags have been added to the Impala shell:
* -j, --jwt: indicates that JWT authentication will be used
* --jwt_cmd: shell command to run to retrieve the JWT to use for
authentication
Testing
New Python tests have been added:
* The shell tests ensure that the various command line arguments are
handled properly. Situations such as a single authentication method,
JWTs cannot be sent in clear text without the proper arguments, etc
are asserted.
* The Python custom cluster tests leverage a test JWKS and test JWTs.
Then, a custom Impala cluster is started with the test JWKS. The
Impala shell attempts to authenticate using a valid JWT, an expired
(invalid) JWT, and a valid JWT signed by a different, untrusted JWKS.
These tests also exercise the Impala JWT authentication mechanism and
assert the prometheus JWT auth success and failure metrics are
reported accurately.
Change-Id: I52247f9262c548946269fe5358b549a3e8c86d4c
Reviewed-on: http://gerrit.cloudera.org:8080/19837
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>