IMPALA-14333: Run impala-py.test using Python3

Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Riza Suminto
2025-08-17 10:11:59 -07:00
committed by Impala Public Jenkins
parent 789991c6cc
commit 28cff4022d
29 changed files with 402 additions and 263 deletions

View File

@@ -347,7 +347,7 @@ export IMPALA_KERBERIZE=false
unset IMPALA_TOOLCHAIN_KUDU_MAVEN_REPOSITORY
unset IMPALA_TOOLCHAIN_KUDU_MAVEN_REPOSITORY_ENABLED
export IMPALA_USE_PYTHON3_TESTS=${IMPALA_USE_PYTHON3_TESTS:-false}
export IMPALA_USE_PYTHON3_TESTS=${IMPALA_USE_PYTHON3_TESTS:-true}
# Source the branch and local config override files here to override any
# variables above or any variables below that allow overriding via environment

View File

@@ -1553,7 +1553,7 @@ class ImpalaShell(cmd.Cmd, object):
# undecodable elements.
if self.last_query_handle is not None:
self.imp_client.close_query(self.last_query_handle)
log_exception_with_timestamp(e, "UnicodeDecodeError", "Please check for"
log_exception_with_timestamp(e, "UnicodeDecodeError", "Please check for "
"columns containing binary data to find the possible source of the error")
except QueryStateException as e:
# an exception occurred while executing the query
@@ -1965,6 +1965,7 @@ class ImpalaShell(cmd.Cmd, object):
print("Error: OAuth access token not found in json payload")
sys.exit(1)
TIPS = [
"Press TAB twice to see a list of available commands.",
"After running a query, type SUMMARY to see a summary of where time was spent.",

View File

@@ -1,20 +1,5 @@
====
---- QUERY
# test querying text table "extended" ASCII (latin) delimiters:
# fields terminated by '-2' -- thorn character
# escaped by '-22' -- lowercase e with circumflex
# lines terminated by '\n'
select * from functional.text_thorn_ecirc_newline
---- RESULTS
'one','two',3,4
'one\xfeone','two',3,4
'one\xea','two',3,4
'one\xea\xfeone','two',3,4
'one\xea\xea','two',3,4
---- TYPES
STRING,STRING,INT,INT
====
---- QUERY
# create new tables like the ones above to test inserting
create table tecn like functional.text_thorn_ecirc_newline;
---- RESULTS

View File

@@ -45,6 +45,7 @@ from tests.util.calculation_util import get_random_id
from tests.util.filesystem_utils import WAREHOUSE, WAREHOUSE_PREFIX
from tests.util.hdfs_util import NAMENODE
from tests.util.iceberg_util import get_snapshots
from tests.util.parse_util import bytes_to_str
ADMIN = "admin"
OWNER_USER = getuser()
@@ -791,7 +792,7 @@ class TestRanger(CustomClusterTestSuite):
"{0}/service/public/v2/api/policy?servicename=test_impala&policyname={1}".format(
RANGER_HOST, policy_name),
auth=RANGER_AUTH, headers=REST_HEADERS)
assert 300 > r.status_code >= 200, r.content
assert 300 > r.status_code >= 200, bytes_to_str(r.content)
@staticmethod
def _check_privileges(result, expected):
@@ -807,7 +808,14 @@ class TestRanger(CustomClusterTestSuite):
def _run_query_as_user(self, query, username, expect_success):
"""Helper to run an input query as a given user."""
impala_client = self.create_impala_client(user=username)
with self.create_impala_client(user=username) as impala_client:
if expect_success:
return self.execute_query_expect_success(
impala_client, query, query_options={'sync_ddl': 1})
return self.execute_query_expect_failure(impala_client, query)
def _run_query_with_client(self, query, impala_client, expect_success):
"""Helper to run an input query using a given impala_client."""
if expect_success:
return self.execute_query_expect_success(
impala_client, query, query_options={'sync_ddl': 1})
@@ -882,6 +890,7 @@ class TestRanger(CustomClusterTestSuite):
grantee_role = "grantee_role"
resource_owner_role = OWNER_USER
admin_client = self.create_impala_client(user=ADMIN)
user_client = self.create_impala_client(user=OWNER_USER)
unique_database = unique_name + "_db"
table_name = "tbl"
column_names = ["a", "b"]
@@ -899,15 +908,15 @@ class TestRanger(CustomClusterTestSuite):
# able to create a UDF.
admin_client.execute("grant all on uri '{0}{1}' to user {2}"
.format(os.getenv("FILESYSTEM_PREFIX"), udf_uri, OWNER_USER))
self._run_query_as_user("create database {0}".format(unique_database), OWNER_USER,
True)
self._run_query_as_user("create table {0}.{1} ({2} int, {3} string)"
self._run_query_with_client("create database {0}".format(unique_database),
user_client, True)
self._run_query_with_client("create table {0}.{1} ({2} int, {3} string)"
.format(unique_database, table_name, column_names[0], column_names[1]),
OWNER_USER, True)
self._run_query_as_user("create function {0}.{1} "
user_client, True)
self._run_query_with_client("create function {0}.{1} "
"location '{2}{3}' symbol='org.apache.impala.TestUdf'"
.format(unique_database, udf_name, os.getenv("FILESYSTEM_PREFIX"), udf_uri),
OWNER_USER, True)
user_client, True)
for data in test_data:
grantee_type = data[0]
@@ -935,6 +944,8 @@ class TestRanger(CustomClusterTestSuite):
admin_client.execute("revoke create on server from user {0}".format(OWNER_USER))
admin_client.execute("revoke all on uri '{0}{1}' from user {2}"
.format(os.getenv("FILESYSTEM_PREFIX"), udf_uri, OWNER_USER))
admin_client.close()
user_client.close()
def _test_grant_revoke_by_owner_on_database(self, privilege, unique_database,
grantee_type, grantee, resource_owner_role):
@@ -946,10 +957,11 @@ class TestRanger(CustomClusterTestSuite):
set_database_owner_role_stmt = "alter database {0} set owner role {1}"
resource_owner_group = OWNER_USER
admin_client = self.create_impala_client(user=ADMIN)
user_client = self.create_impala_client(user=OWNER_USER)
try:
self._run_query_as_user(grant_database_stmt
.format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
self._run_query_with_client(grant_database_stmt
.format(privilege, unique_database, grantee_type, grantee), user_client,
True)
result = admin_client.execute(show_grant_database_stmt
.format(grantee_type, grantee, unique_database))
@@ -959,8 +971,8 @@ class TestRanger(CustomClusterTestSuite):
[grantee_type, grantee, unique_database, "*", "*", "", "", "", "",
privilege, "false"]])
self._run_query_as_user(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
self._run_query_with_client(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee), user_client,
True)
result = admin_client.execute(show_grant_database_stmt
.format(grantee_type, grantee, unique_database))
@@ -977,12 +989,12 @@ class TestRanger(CustomClusterTestSuite):
.format(unique_database, resource_owner_group))
admin_client.execute("invalidate metadata")
result = self._run_query_as_user(grant_database_stmt
.format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
result = self._run_query_with_client(grant_database_stmt
.format(privilege, unique_database, grantee_type, grantee), user_client,
False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
result = self._run_query_with_client(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee), user_client,
False)
assert ERROR_REVOKE in str(result)
@@ -992,12 +1004,12 @@ class TestRanger(CustomClusterTestSuite):
admin_client.execute(set_database_owner_role_stmt
.format(unique_database, resource_owner_role))
result = self._run_query_as_user(grant_database_stmt
.format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
result = self._run_query_with_client(grant_database_stmt
.format(privilege, unique_database, grantee_type, grantee), user_client,
False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
result = self._run_query_with_client(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee), user_client,
False)
assert ERROR_REVOKE in str(result)
# Change the database owner back to the user 'OWNER_USER'.
@@ -1009,6 +1021,8 @@ class TestRanger(CustomClusterTestSuite):
# from interfering with other tests.
admin_client.execute(revoke_database_stmt
.format(privilege, unique_database, grantee_type, grantee))
admin_client.close()
user_client.close()
def _test_grant_revoke_by_owner_on_table(self, privilege, unique_database, table_name,
grantee_type, grantee, resource_owner_role):
@@ -1020,23 +1034,24 @@ class TestRanger(CustomClusterTestSuite):
show_grant_table_stmt = "show grant {0} {1} on table {2}.{3}"
resource_owner_group = OWNER_USER
admin_client = self.create_impala_client(user=ADMIN)
user_client = self.create_impala_client(user=OWNER_USER)
set_table_owner_user_stmt = "alter table {0}.{1} set owner user {2}"
set_table_owner_group_stmt = "alter table {0}.{1} set owner group {2}"
set_table_owner_role_stmt = "alter table {0}.{1} set owner role {2}"
try:
self._run_query_as_user(grant_table_stmt
self._run_query_with_client(grant_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee),
OWNER_USER, True)
user_client, True)
result = admin_client.execute(show_grant_table_stmt
.format(grantee_type, grantee, unique_database, table_name))
TestRanger._check_privileges(result, [
[grantee_type, grantee, unique_database, table_name, "*", "", "", "",
"", privilege, "false"]])
self._run_query_as_user(revoke_table_stmt
self._run_query_with_client(revoke_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee),
OWNER_USER, True)
user_client, True)
result = admin_client.execute(show_grant_table_stmt
.format(grantee_type, grantee, unique_database, table_name))
TestRanger._check_privileges(result, [])
@@ -1052,13 +1067,13 @@ class TestRanger(CustomClusterTestSuite):
.format(unique_database, table_name, resource_owner_group))
admin_client.execute("refresh {0}.{1}".format(unique_database, table_name))
result = self._run_query_as_user(grant_table_stmt
result = self._run_query_with_client(grant_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee),
OWNER_USER, False)
user_client, False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user(revoke_table_stmt
result = self._run_query_with_client(revoke_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee),
OWNER_USER, False)
user_client, False)
assert ERROR_REVOKE in str(result)
# Set the owner of the table to a role that has the same name as
@@ -1067,13 +1082,13 @@ class TestRanger(CustomClusterTestSuite):
admin_client.execute(set_table_owner_role_stmt
.format(unique_database, table_name, resource_owner_role))
result = self._run_query_as_user(grant_table_stmt
result = self._run_query_with_client(grant_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee),
OWNER_USER, False)
user_client, False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user(revoke_table_stmt
result = self._run_query_with_client(revoke_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee),
OWNER_USER, False)
user_client, False)
assert ERROR_REVOKE in str(result)
# Change the table owner back to the user 'OWNER_USER'.
admin_client.execute(set_table_owner_user_stmt
@@ -1084,6 +1099,8 @@ class TestRanger(CustomClusterTestSuite):
# from interfering with other tests.
admin_client.execute(revoke_table_stmt
.format(privilege, unique_database, table_name, grantee_type, grantee))
admin_client.close()
user_client.close()
def _test_grant_revoke_by_owner_on_column(self, privilege, column_names,
unique_database, table_name, grantee_type, grantee, resource_owner_role):
@@ -1095,14 +1112,15 @@ class TestRanger(CustomClusterTestSuite):
show_grant_column_stmt = "show grant {0} {1} on column {2}.{3}.{4}"
resource_owner_group = OWNER_USER
admin_client = self.create_impala_client(user=ADMIN)
user_client = self.create_impala_client(user=OWNER_USER)
set_table_owner_user_stmt = "alter table {0}.{1} set owner user {2}"
set_table_owner_group_stmt = "alter table {0}.{1} set owner group {2}"
set_table_owner_role_stmt = "alter table {0}.{1} set owner role {2}"
try:
self._run_query_as_user(grant_column_stmt
self._run_query_with_client(grant_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee), OWNER_USER, True)
grantee_type, grantee), user_client, True)
result = admin_client.execute(show_grant_column_stmt
.format(grantee_type, grantee, unique_database, table_name,
column_names[0]))
@@ -1110,9 +1128,9 @@ class TestRanger(CustomClusterTestSuite):
[grantee_type, grantee, unique_database, table_name, column_names[0],
"", "", "", "", privilege, "false"]])
self._run_query_as_user(revoke_column_stmt
self._run_query_with_client(revoke_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee), OWNER_USER, True)
grantee_type, grantee), user_client, True)
result = admin_client.execute(show_grant_column_stmt
.format(grantee_type, grantee, unique_database, table_name,
column_names[0]))
@@ -1125,13 +1143,13 @@ class TestRanger(CustomClusterTestSuite):
.format(unique_database, table_name, resource_owner_group))
admin_client.execute("refresh {0}.{1}".format(unique_database, table_name))
result = self._run_query_as_user(grant_column_stmt
result = self._run_query_with_client(grant_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee), OWNER_USER, False)
grantee_type, grantee), user_client, False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user(revoke_column_stmt
result = self._run_query_with_client(revoke_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee), OWNER_USER, False)
grantee_type, grantee), user_client, False)
assert ERROR_REVOKE in str(result)
# Set the owner of the table to a role that has the same name as 'OWNER_USER' and
@@ -1140,13 +1158,13 @@ class TestRanger(CustomClusterTestSuite):
admin_client.execute(set_table_owner_role_stmt
.format(unique_database, table_name, resource_owner_role))
result = self._run_query_as_user(grant_column_stmt
result = self._run_query_with_client(grant_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee), OWNER_USER, False)
grantee_type, grantee), user_client, False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user(revoke_column_stmt
result = self._run_query_with_client(revoke_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee), OWNER_USER, False)
grantee_type, grantee), user_client, False)
assert ERROR_REVOKE in str(result)
# Change the table owner back to the user 'owner_user'.
admin_client.execute(set_table_owner_user_stmt
@@ -1158,19 +1176,22 @@ class TestRanger(CustomClusterTestSuite):
admin_client.execute(revoke_column_stmt
.format(privilege, column_names[0], unique_database, table_name,
grantee_type, grantee))
admin_client.close()
user_client.close()
def _test_grant_revoke_by_owner_on_udf(self, privilege, unique_database, udf_name,
grantee_type, grantee):
# Due to IMPALA-11743 and IMPALA-12685, the owner of a UDF could not grant
# or revoke the SELECT privilege.
result = self._run_query_as_user("grant {0} on user_defined_fn "
"{1}.{2} to {3} {4}".format(privilege, unique_database, udf_name,
grantee_type, grantee), OWNER_USER, False)
assert ERROR_GRANT in str(result)
result = self._run_query_as_user("revoke {0} on user_defined_fn "
"{1}.{2} from {3} {4}".format(privilege, unique_database, udf_name,
grantee_type, grantee), OWNER_USER, False)
assert ERROR_REVOKE in str(result)
with self.create_impala_client(user=OWNER_USER) as user_client:
result = self._run_query_with_client("grant {0} on user_defined_fn "
"{1}.{2} to {3} {4}".format(privilege, unique_database, udf_name,
grantee_type, grantee), user_client, False)
assert ERROR_GRANT in str(result)
result = self._run_query_with_client("revoke {0} on user_defined_fn "
"{1}.{2} from {3} {4}".format(privilege, unique_database, udf_name,
grantee_type, grantee), user_client, False)
assert ERROR_REVOKE in str(result)
def _test_allow_catalog_cache_op_from_masked_users(self, unique_name):
"""Verify that catalog cache operations are allowed for masked users
@@ -1820,6 +1841,7 @@ class TestRangerIndependent(TestRanger):
def test_grant_multiple_columns_consolidate_grant_revoke_requests(self):
self._test_grant_multiple_columns(1)
@SkipIfFS.hive
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args=LEGACY_CATALOG_IMPALAD_ARGS,
@@ -2199,10 +2221,12 @@ class TestRangerLegacyCatalog(TestRanger):
def test_legacy_catalog_ownership(self):
self._test_ownership()
@SkipIfFS.hive
@pytest.mark.execute_serially
def test_grant_revoke_by_owner_legacy_catalog(self, unique_name):
self._test_grant_revoke_by_owner(unique_name)
@SkipIfFS.hive
@pytest.mark.execute_serially
def test_select_view_created_by_non_superuser_with_catalog_v1(self, unique_name):
self._test_select_view_created_by_non_superuser(unique_name)
@@ -2231,10 +2255,12 @@ class TestRangerLocalCatalog(TestRanger):
pytest.xfail("getTableIfCached() faulty behavior, known issue")
self._test_ownership()
@SkipIfFS.hive
@pytest.mark.execute_serially
def test_grant_revoke_by_owner_local_catalog(self, unique_name):
self._test_grant_revoke_by_owner(unique_name)
@SkipIfFS.hive
@pytest.mark.execute_serially
def test_select_view_created_by_non_superuser_with_local_catalog(self, unique_name):
self._test_select_view_created_by_non_superuser(unique_name)
@@ -2453,6 +2479,8 @@ class TestRangerLocalCatalog(TestRanger):
else:
assert "Error revoking a privilege in Ranger. Ranger error message: " \
"HTTP 403 Error: Grantee group invalid_group doesn't exist" in str(result)
invalid_impala_client.close()
valid_impala_client.close()
@pytest.mark.execute_serially
def test_show_functions(self, unique_name):

View File

@@ -204,7 +204,9 @@ class CustomClusterTestSuite(ImpalaTestSuite):
args[LOG_SYMLINKS] = True
if workload_mgmt:
args[WORKLOAD_MGMT] = True
if force_restart:
if force_restart or pytest.config.option.shard_tests:
# When sharding tests, always restart the cluster to avoid issues with tests
# that depend on a specific test order within a shard.
args[FORCE_RESTART] = True
def merge_args(args_first, args_last):
@@ -349,7 +351,7 @@ class CustomClusterTestSuite(ImpalaTestSuite):
kwargs[IMPALAD_TIMEOUT_S] = args[IMPALAD_TIMEOUT_S]
if FORCE_RESTART in args:
kwargs[FORCE_RESTART] = args[FORCE_RESTART]
if args[FORCE_RESTART] is True:
if args[FORCE_RESTART] is True and not pytest.config.option.shard_tests:
LOG.warning("Test uses force_restart=True to avoid restarting the cluster. "
"Test reorganization/assertion rewrite is needed")
else:
@@ -398,6 +400,7 @@ class CustomClusterTestSuite(ImpalaTestSuite):
@classmethod
def cluster_teardown(cls, name, args):
if args.get(WORKLOAD_MGMT, False):
cls.close_impala_clients()
cls.cluster.graceful_shutdown_impalads()
cls.clear_tmp_dirs()
@@ -618,6 +621,9 @@ class CustomClusterTestSuite(ImpalaTestSuite):
except Exception as e:
LOG.info("Failed to reuse running cluster: %s" % e)
pass
finally:
cls.cluster = ImpalaCluster.get_e2e_test_cluster()
cls.impalad_test_service = cls.create_impala_service()
LOG.info("Starting cluster with command: %s" % cmd_str)
try:

View File

@@ -35,6 +35,7 @@ import socket
import subprocess
import time
import string
from functools import wraps
from getpass import getuser
from impala.hiveserver2 import HiveServer2Cursor
@@ -455,7 +456,7 @@ class ImpalaTestSuite(BaseTestSuite):
@classmethod
def close_impala_clients(cls):
"""Closes Impala clients created by create_impala_clients()."""
# cls.client should be equal to one of belove, unless test method implicitly override.
# cls.client should be equal to one of below, unless test method implicitly override.
# Closing twice would lead to error in some clients (impyla+SSL).
if cls.client not in (cls.beeswax_client, cls.hs2_client, cls.hs2_http_client):
cls.client.close()

View File

@@ -81,7 +81,8 @@ class SkipIfFS:
incorrent_reported_ec = pytest.mark.skipif(IS_OZONE and IS_EC, reason="HDDS-8543")
# These need test infra work to re-enable.
hive = pytest.mark.skipif(not IS_HDFS, reason="Hive doesn't work")
hive = pytest.mark.skipif(
not IS_HDFS, reason="HiveServer2 doesn't work or not started")
hbase = pytest.mark.skipif(not IS_HDFS, reason="HBase not started")
qualified_path = pytest.mark.skipif(not IS_HDFS,
reason="Tests rely on HDFS qualified paths, IMPALA-1872")
@@ -135,6 +136,7 @@ class SkipIf:
not_tuple_cache = pytest.mark.skipif(not IS_TUPLE_CACHE,
reason="Tuple Cache needed")
class SkipIfLocal:
# These are skipped due to product limitations.
hdfs_blocks = pytest.mark.skipif(IS_LOCAL,
@@ -152,6 +154,7 @@ class SkipIfLocal:
root_path = pytest.mark.skipif(IS_LOCAL,
reason="Tests rely on the root directory")
class SkipIfNotHdfsMinicluster:
# These are skipped when not running against a local HDFS mini-cluster.
plans = pytest.mark.skipif(
@@ -165,6 +168,7 @@ class SkipIfNotHdfsMinicluster:
reason="Test is tuned for scheduling decisions made on a 3-node HDFS minicluster "
"with no EC")
class SkipIfBuildType:
dev_build = pytest.mark.skipif(IMPALA_TEST_CLUSTER_PROPERTIES.is_dev(),
reason="Test takes too much time on debug build.")
@@ -173,6 +177,7 @@ class SkipIfBuildType:
remote = pytest.mark.skipif(IMPALA_TEST_CLUSTER_PROPERTIES.is_remote_cluster(),
reason="Test depends on running against a local Impala cluster")
class SkipIfEC:
contain_full_explain = pytest.mark.skipif(IS_EC, reason="Contain full explain output "
"for hdfs tables.")
@@ -225,6 +230,7 @@ class SkipIfHive2:
ranger_auth = pytest.mark.skipif(HIVE_MAJOR_VERSION <= 2,
reason="Hive 2 doesn't support Ranger authorization.")
class SkipIfCatalogV2:
"""Expose decorators as methods so that is_catalog_v2_cluster() can be evaluated lazily
when needed, instead of whenever this module is imported."""
@@ -261,6 +267,7 @@ class SkipIfCatalogV2:
IMPALA_TEST_CLUSTER_PROPERTIES.is_catalog_v2_cluster(),
reason="Table isn't invalidated with Local catalog and enabled hms_event_polling.")
class SkipIfApacheHive():
feature_not_supported = pytest.mark.skipif(IS_APACHE_HIVE,
reason="Apache Hive does not support this feature")

View File

@@ -30,7 +30,6 @@ import requests
import shutil
import subprocess
from abc import ABCMeta, abstractproperty
from cm_api.api_client import ApiResource as CmApiResource
from collections import defaultdict
from collections import OrderedDict
from contextlib import contextmanager
@@ -38,7 +37,6 @@ from getpass import getuser
from io import BytesIO
from multiprocessing.pool import ThreadPool
from random import choice
from sys import maxsize
from tempfile import mkdtemp
from threading import Lock
from time import mktime, strptime
@@ -50,6 +48,13 @@ try:
except ImportError:
from urlparse import urlparse
try:
from cm_api.api_client import ApiResource as CmApiResource
except ImportError:
# If the cm_api module is not available, we will not be able to use Cloudera Manager.
# This is fine for local testing.
pass
from tests.comparison.db_connection import HiveConnection, ImpalaConnection
from tests.common.environ import HIVE_MAJOR_VERSION
from tests.common.errors import Timeout
@@ -179,12 +184,9 @@ class Cluster(with_metaclass(ABCMeta, object)):
"""
Print the cluster impalad version info to the console sorted by hostname.
"""
def _sorter(i1, i2):
return cmp(i1.host_name, i2.host_name)
version_info = self.impala.get_version_info()
print("Cluster Impalad Version Info:")
for impalad in sorted(version_info.keys(), cmp=_sorter):
for impalad in sorted(version_info.keys(), key=lambda x: x.host_name):
print("{0}: {1}".format(impalad.host_name, version_info[impalad]))
@@ -635,7 +637,7 @@ class Impala(Service):
impalads = self.impalads
promise = self._thread_pool.map_async(func, impalads)
# Python doesn't handle ctrl-c well unless a timeout is provided.
results = promise.get(maxsize)
results = promise.get(timeout=(2 ** 31 - 1))
if as_dict:
results = dict(zip(impalads, results))
return results

View File

@@ -26,6 +26,7 @@ from tests.comparison.common import ValExpr, ValExprList
module_contents = dict()
class DataTypeMetaclass(type):
'''Provides sorting of classes used to determine upcasting.'''
@@ -39,9 +40,23 @@ class DataTypeMetaclass(type):
def __cmp__(cls, other):
if not isinstance(other, DataTypeMetaclass):
return -1
return cmp(
getattr(cls, 'CMP_VALUE', cls.__name__),
getattr(other, 'CMP_VALUE', other.__name__))
val_this = getattr(cls, 'CMP_VALUE', cls.__name__)
val_other = getattr(other, 'CMP_VALUE', other.__name__)
if (val_this < val_other):
return -1
elif (val_this > val_other):
return 1
else:
return 0
def __lt__(cls, other):
# This __lt__ method replace __cmp__ that is removed in Python3.
# See https://peps.python.org/pep-0207/.
# It is mainly to serve max() inside update_return_type_and_append() of funcs.py.
if not isinstance(other, DataTypeMetaclass):
return False
return (getattr(cls, 'CMP_VALUE', cls.__name__)
< getattr(other, 'CMP_VALUE', other.__name__))
class DataType(with_metaclass(DataTypeMetaclass, ValExpr)):
@@ -92,7 +107,6 @@ class DataType(with_metaclass(DataTypeMetaclass, ValExpr)):
return type(self)
class Boolean(DataType):
pass
@@ -213,6 +227,10 @@ JOINABLE_TYPES = (Char, Decimal, Int, Timestamp)
TYPES = tuple(set(type_.type for type_ in EXACT_TYPES))
__DECIMAL_TYPE_CACHE = dict()
__CHAR_TYPE_CACHE = dict()
__VARCHAR_TYPE_CACHE = dict()
def get_decimal_class(total_digits, fractional_digits):
cache_key = (total_digits, fractional_digits)
if cache_key not in __DECIMAL_TYPE_CACHE:
@@ -223,7 +241,6 @@ def get_decimal_class(total_digits, fractional_digits):
return __DECIMAL_TYPE_CACHE[cache_key]
__CHAR_TYPE_CACHE = dict()
def get_char_class(length):
if length not in __CHAR_TYPE_CACHE:
__CHAR_TYPE_CACHE[length] = type(
@@ -233,7 +250,6 @@ def get_char_class(length):
return __CHAR_TYPE_CACHE[length]
__VARCHAR_TYPE_CACHE = dict()
def get_varchar_class(length):
if length not in __VARCHAR_TYPE_CACHE:
__VARCHAR_TYPE_CACHE[length] = type(

View File

@@ -696,7 +696,7 @@ def validate_python_version():
def pytest_collection_modifyitems(items, config, session):
"""Hook to handle --shard_tests command line option.
If set, this "deselects" a subset of tests, by hashing
If set, this "deselects" a subset of tests, by hashing (using crc32())
their id into buckets.
"""
if not config.option.shard_tests:
@@ -710,7 +710,7 @@ def pytest_collection_modifyitems(items, config, session):
items_selected, items_deselected = [], []
for i in items:
if hash(i.nodeid) % num_shards == this_shard:
if crc32(i.nodeid.encode('utf-8')) % num_shards == this_shard:
items_selected.append(i)
else:
items_deselected.append(i)

View File

@@ -22,6 +22,7 @@ import logging
import os
import pytest
import requests
import sys
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.network import SKIP_SSL_MSG
@@ -66,7 +67,8 @@ WEBUI_PORTS = [25000, 25010, 25020]
# Error text can depend on both protocol and python version.
CONN_ERR = ["Could not connect", "Connection refused"]
CERT_ERR = ["doesn't match", "certificate verify failed"]
WEB_CERT_ERR = "CertificateError"
WEB_CERT_ERR = ("CertificateError" if sys.version_info.major < 3
else "SSLCertVerificationError")
class TestIPv6Base(CustomClusterTestSuite):

View File

@@ -46,21 +46,24 @@ class TestLoggingCore(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(cluster_size=1,
impalad_args="--max_error_logs_per_instance=2",
disable_log_buffering=True)
disable_log_buffering=True,
force_restart=True)
def test_max_errors(self):
self._test_max_errors(2, 4, True)
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(cluster_size=1,
impalad_args="--max_error_logs_per_instance=3",
disable_log_buffering=True)
disable_log_buffering=True,
force_restart=True)
def test_max_errors_0(self):
self._test_max_errors(3, 0, True)
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(cluster_size=1,
impalad_args="--max_error_logs_per_instance=2",
disable_log_buffering=True)
disable_log_buffering=True,
force_restart=True)
def test_max_errors_no_downgrade(self):
self._test_max_errors(2, -1, False)

View File

@@ -93,6 +93,7 @@ class TestQueryRetries(CustomClusterTestSuite):
_count_query_result = "55"
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(force_restart=True)
def test_retries_from_cancellation_pool(self):
"""Tests that queries are retried instead of cancelled if one of the nodes leaves the
cluster. The retries are triggered by the cancellation pool in the ImpalaServer. The
@@ -138,7 +139,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="-statestore_heartbeat_frequency_ms=1000")
statestored_args="-statestore_heartbeat_frequency_ms=1000",
force_restart=True)
def test_kill_impalad_expect_retry(self):
"""Launch a query, wait for it to start running, kill a random impalad and then
validate that the query has successfully been retried. Increase the statestore
@@ -251,7 +253,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="-statestore_heartbeat_frequency_ms=60000")
statestored_args="-statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_exec_rpc_failure(self):
"""Test ExecFInstance RPC failures. Set a really high statestort heartbeat frequency
so that killed impalads are not removed from the cluster membership. This will cause
@@ -306,8 +309,9 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args="--debug_actions=" + _get_rpc_fail_action(FAILED_KRPC_PORT),
statestored_args="--statestore_heartbeat_frequency_ms=1000 \
--statestore_max_missed_heartbeats=2")
statestored_args=("--statestore_heartbeat_frequency_ms=1000 "
"--statestore_max_missed_heartbeats=2"),
force_restart=True)
def test_retry_exec_rpc_failure_before_admin_delay(self):
"""Test retried query triggered by RPC failures by simulating RPC errors at the port
of the 2nd node in the cluster. Simulate admission delay for query with debug_action
@@ -375,7 +379,7 @@ class TestQueryRetries(CustomClusterTestSuite):
impalad_args="--debug_actions=" + _get_rpc_fail_action(FAILED_KRPC_PORT),
statestored_args="--statestore_heartbeat_frequency_ms=1000 \
--statestore_max_missed_heartbeats=2",
cluster_size=2, num_exclusive_coordinators=1)
cluster_size=2, num_exclusive_coordinators=1, force_restart=True)
def test_retry_query_failure_all_executors_blacklisted(self):
"""Test retried query triggered by RPC failures by simulating RPC errors at the port
of the 2nd node, which is the only executor in the cluster. Simulate admission delay
@@ -438,7 +442,7 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="-statestore_heartbeat_frequency_ms=1000")
statestored_args="-statestore_heartbeat_frequency_ms=1000", force_restart=True)
def test_multiple_retries(self):
"""Test that a query can only be retried once, and that if the retry attempt fails,
it fails correctly and with the right error message. Multiple retry attempts are
@@ -498,6 +502,7 @@ class TestQueryRetries(CustomClusterTestSuite):
self.__validate_memz()
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(force_restart=True)
def test_retry_fetched_rows(self):
"""Test that query retries are not triggered if some rows have already been
fetched. Run a query, fetch some rows from it, kill one of the impalads that is
@@ -612,7 +617,7 @@ class TestQueryRetries(CustomClusterTestSuite):
self.client.close_query(handle)
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(disable_log_buffering=True)
@CustomClusterTestSuite.with_args(disable_log_buffering=True, force_restart=True)
def test_query_retry_reaches_spool_limit(self):
"""Test retryable queries with results spooling enabled and
spool_all_results_for_retries=true that reach spooling mem limit will return rows and
@@ -661,6 +666,7 @@ class TestQueryRetries(CustomClusterTestSuite):
"fetched some rows" % self.client.handle_id(handle) in str(e)
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(force_restart=True)
def test_original_query_cancel(self):
"""Test canceling a retryable query with spool_all_results_for_retries=true. Make sure
Coordinator::Wait() won't block in cancellation."""
@@ -682,6 +688,7 @@ class TestQueryRetries(CustomClusterTestSuite):
assert "Cancelled" in str(e)
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(force_restart=True)
def test_retry_finished_query(self):
"""Test that queries in FINISHED state can still be retried before the client fetch
any rows. Sets batch_size to 1 so results will be available as soon as possible.
@@ -708,7 +715,7 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="-statestore_heartbeat_frequency_ms=60000")
statestored_args="-statestore_heartbeat_frequency_ms=60000", force_restart=True)
def test_retry_query_cancel(self):
"""Trigger a query retry, and then cancel the retried query. Validate that the
cancelled query fails with the correct error message. Set a really high statestore
@@ -750,7 +757,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args="--debug_actions=RETRY_DELAY_CHECKING_ORIGINAL_DRIVER:SLEEP@1000",
statestored_args="--statestore_heartbeat_frequency_ms=60000")
statestored_args="--statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retrying_query_cancel(self):
"""Trigger a query retry, and then cancel and close the retried query in RETRYING
state. Validate that it doesn't crash the impalad. Set a really high statestore
@@ -780,7 +788,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="--statestore_heartbeat_frequency_ms=60000")
statestored_args="--statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retrying_query_before_inflight(self):
"""Trigger a query retry, and delay setting the original query inflight as that may
happen after the query is retried. Validate that the query succeeds. Set a really
@@ -820,7 +829,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args="--debug_actions=RETRY_DELAY_GET_QUERY_DRIVER:SLEEP@2000",
statestored_args="--statestore_heartbeat_frequency_ms=60000")
statestored_args="--statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_query_close_before_getting_query_driver(self):
"""Trigger a query retry, and then close the retried query before getting
the query driver. Validate that it doesn't crash the impalad.
@@ -847,7 +857,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args="--debug_actions=QUERY_RETRY_SET_RESULT_CACHE:FAIL",
statestored_args="--statestore_heartbeat_frequency_ms=60000")
statestored_args="--statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_query_result_cacheing_failed(self):
"""Test setting up results cacheing failed."""
@@ -874,7 +885,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
impalad_args="--debug_actions=QUERY_RETRY_SET_QUERY_IN_FLIGHT:FAIL",
statestored_args="--statestore_heartbeat_frequency_ms=60000")
statestored_args="--statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_query_set_query_in_flight_failed(self):
"""Test setting query in flight failed."""
@@ -898,7 +910,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="-statestore_heartbeat_frequency_ms=60000")
statestored_args="-statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_query_timeout(self):
"""Trigger a query retry, and then leave the query handle open until the
'query_timeout_s' causes the handle to be closed. Assert that the runtime profile of
@@ -941,8 +954,10 @@ class TestQueryRetries(CustomClusterTestSuite):
assert impalad_service.get_metric_value('impala-server.num-queries-expired') == 1
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(impalad_args="--idle_session_timeout=1",
statestored_args="--statestore_heartbeat_frequency_ms=60000")
@CustomClusterTestSuite.with_args(
impalad_args="--idle_session_timeout=1",
statestored_args="--statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_query_session_timeout(self):
"""Similar to 'test_retry_query_timeout' except with an idle session timeout."""
self.close_impala_clients()
@@ -977,7 +992,8 @@ class TestQueryRetries(CustomClusterTestSuite):
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="-statestore_heartbeat_frequency_ms=60000")
statestored_args="-statestore_heartbeat_frequency_ms=60000",
force_restart=True)
def test_retry_query_hs2(self):
"""Test query retries with the HS2 protocol. Enable the results set cache as well and
test that query retries work with the results cache."""

View File

@@ -1010,7 +1010,7 @@ class TestGracefulShutdown(CustomClusterTestSuite, HS2TestSuite):
assert cancel == "{0}s000ms".format(get_remain_shutdown_query_cancel(
self.COORD_SHUTDOWN_FAST_DEADLINE_S, self.COORD_SHUTDOWN_FAST_DEADLINE_S))
assert registered == "0"
assert running > 0
assert int(running) > 0
self.cluster.impalads[1].wait_for_exit()
# The slow query should be cancelled.
self.__check_deadline_expired(SLOW_QUERY, slow_query_handle, True)

View File

@@ -25,7 +25,7 @@ from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import SkipIf
from tests.util.filesystem_utils import WAREHOUSE
tmp = tempfile.NamedTemporaryFile(delete=False)
tmp = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
BAD_KEY_FILE = tmp.name

View File

@@ -21,6 +21,7 @@ import base64
import datetime
import os
import pytest
import sys
import uuid
import xml.etree.ElementTree as ET
import zlib
@@ -40,7 +41,7 @@ from tests.shell.util import run_impala_shell_cmd
class NoRedirection(HTTPErrorProcessor):
"""Allows inspecting http redirection responses. """
def http_response(self, request, response):
def http_response(self, request, response): # noqa: U100
return response
@@ -49,6 +50,13 @@ def format_time(time):
return time.strftime("%Y-%m-%dT%H:%M:%SZ")
def encode_if_needed(value):
""" Encodes the value to bytes if needed, depending on the python version. """
if sys.version_info.major < 3:
return value.encode('utf-8') if isinstance(value, str) else value
return value if isinstance(value, bytes) else value.encode('utf-8')
class TestClientSaml(CustomClusterTestSuite):
""" Tests for a client using SAML2 browser profile.
@@ -100,9 +108,9 @@ class TestClientSaml(CustomClusterTestSuite):
"--saml2_ee_test_mode=true"
% (CERT_DIR, CERT_DIR, SP_CALLBACK_URL))
SSO_ARGS_WITH_GROUP_FILTER = (SSO_ARGS + " " +
"--saml2_group_filter=group1,group2 "
"--saml2_group_attribute_name=eduPersonAffiliation")
SSO_ARGS_WITH_GROUP_FILTER = (
"{} --saml2_group_filter=group1,group2 "
"--saml2_group_attribute_name=eduPersonAffiliation").format(SSO_ARGS)
@CustomClusterTestSuite.with_args(impalad_args=SSO_ARGS, cluster_size=1)
def test_saml2_browser_profile_no_group_filter(self, vector):
@@ -131,7 +139,7 @@ class TestClientSaml(CustomClusterTestSuite):
@CustomClusterTestSuite.with_args(
impalad_args=SSO_ARGS_WITH_GROUP_FILTER, cluster_size=1)
def test_saml2_browser_profile_with_group_filter(self, vector):
def test_saml2_browser_profile_with_group_filter(self):
# test the SAML worflow with different attributes
self._test_saml2_browser_workflow("", False)
@@ -157,7 +165,8 @@ class TestClientSaml(CustomClusterTestSuite):
""" Initial POST request to hs2-http port, response should be redirected
to IDP and contain the authnrequest. """
opener = build_opener(NoRedirection)
req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, " ")
payload = encode_if_needed(" ")
req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, payload)
req.add_header('X-Hive-Token-Response-Port', TestClientSaml.CLIENT_PORT)
response = opener.open(req)
relay_state, client_id, saml_req_xml = \
@@ -171,7 +180,10 @@ class TestClientSaml(CustomClusterTestSuite):
assert client_id is not None
new_url = response.info()["location"]
assert new_url.startswith(TestClientSaml.IDP_URL)
query = parse_qs(urlparse(new_url).query.encode('ASCII'))
query_part = urlparse(new_url).query
query = parse_qs(query_part.encode('ASCII') if sys.version_info.major < 3
else query_part)
assert "RelayState" in query, query
relay_state = query["RelayState"][0]
assert relay_state is not None
saml_req = query["SAMLRequest"][0]
@@ -187,7 +199,8 @@ class TestClientSaml(CustomClusterTestSuite):
def _request_resource_with_bearer(self, client_id, bearer_token):
""" Send POST request to hs2-http port again, this time with bearer tokan.
The response should contain a security cookie if the validation succeeded """
req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, " ")
payload = encode_if_needed(" ")
req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, payload)
req.add_header('X-Hive-Client-Identifier', client_id)
req.add_header('Authorization', "Bearer " + bearer_token)
opener = build_opener(NoRedirection)
@@ -205,10 +218,11 @@ class TestClientSaml(CustomClusterTestSuite):
Impala should answer with a form that submits to CLIENT_PORT and contains
the bearer token as a hidden state. """
authn_resp = self._generate_authn_response(request_id, attributes_xml)
encoded_authn_resp = base64.urlsafe_b64encode(authn_resp)
body = "SAMLResponse=%s&RelayState=%s" % (encoded_authn_resp, relay_state)
encoded_authn_resp = base64.urlsafe_b64encode(authn_resp.encode('utf-8'))
body = (b"SAMLResponse=" + encoded_authn_resp + b"&RelayState="
+ encode_if_needed(relay_state))
opener = build_opener(NoRedirection)
req = Request(TestClientSaml.SP_CALLBACK_URL, body)
req = Request(TestClientSaml.SP_CALLBACK_URL, encode_if_needed(body))
response = opener.open(req)
bearer_token = self._parse_xhtml_form(response, expect_success)
return bearer_token
@@ -244,7 +258,7 @@ class TestClientSaml(CustomClusterTestSuite):
message = input.attrib["value"]
if expect_success:
assert token.startswith("u=user1")
assert token.startswith("u=user1"), str(content)
else:
assert message == TestClientSaml.ASSERTATION_ERROR_MESSAGE
return token

View File

@@ -30,6 +30,7 @@ import time
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import SkipIf
from tests.util.hdfs_util import NAMENODE
from tests.util.parse_util import bytes_to_str
class TestScratchDir(CustomClusterTestSuite):
@@ -537,16 +538,18 @@ class TestScratchDir(CustomClusterTestSuite):
hostname = socket.gethostname()
# Verify that there are leftover files in the remote scratch dirs after being killed.
full_dfs_tmp_path = "{}/impala-scratch".format(self.dfs_tmp_path())
files_result = subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path])
files_result = bytes_to_str(
subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path]))
assert "Found 1 items" in files_result
assert hostname in files_result
full_dfs_tmp_path_with_hostname = "{}/{}".format(full_dfs_tmp_path, hostname)
files_result = subprocess.check_output(["hdfs", "dfs", "-ls",
full_dfs_tmp_path_with_hostname])
files_result = bytes_to_str(
subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path_with_hostname]))
assert "Found 1 items" in files_result
impalad.start()
# Verify that the leftover files being removed after the impala daemon restarted.
files_result = subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path])
files_result = bytes_to_str(
subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path]))
assert files_result == ""
@pytest.mark.execute_serially
@@ -578,6 +581,7 @@ class TestScratchDir(CustomClusterTestSuite):
client.close()
# Verify that no host-level dir in the remote scratch dirs after shutdown.
full_dfs_tmp_path = "{}/impala-scratch".format(self.dfs_tmp_path())
files_result = subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path])
files_result = bytes_to_str(
subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path]))
assert files_result == ""
impalad.start()

View File

@@ -18,7 +18,6 @@
from __future__ import absolute_import, division, print_function
import socket
import pexpect
import pytest
# Follow tests/shell/test_shell_interactive.py naming.
@@ -32,6 +31,7 @@ from tests.verifiers.metric_verifier import MetricVerifier
NUM_QUERIES = 'impala-server.num-queries'
class TestShellInteractiveReconnect(CustomClusterTestSuite):
""" Check if interactive shell is using the current DB after reconnecting """
@pytest.mark.execute_serially
@@ -75,6 +75,7 @@ class TestShellInteractiveReconnect(CustomClusterTestSuite):
assert "alltypesaggmultifilesnopart" in result.stdout, result.stdout
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(force_restart=True)
def test_auto_reconnect_after_impalad_died(self):
"""Test reconnect after restarting the remote impalad without using connect;"""
# Use pexpect instead of ImpalaShell() since after using get_result() in ImpalaShell()

View File

@@ -26,6 +26,7 @@ from tests.common.test_dimensions import (
create_single_exec_option_dimension,
create_uncompressed_text_dimension)
class TestDelimitedText(ImpalaTestSuite):
"""
Tests delimited text files with different tuple delimiters, field delimiters
@@ -43,7 +44,7 @@ class TestDelimitedText(ImpalaTestSuite):
def test_delimited_text(self, vector, unique_database):
self.run_test_case('QueryTest/delimited-text', vector, unique_database)
def test_delimited_text_newlines(self, vector, unique_database):
def test_delimited_text_newlines(self, unique_database):
""" Test text with newlines in strings - IMPALA-1943. Execute queries from Python to
avoid issues with newline handling in test file format. """
self.execute_query_expect_success(self.client, """
@@ -62,7 +63,7 @@ class TestDelimitedText(ImpalaTestSuite):
result = self.execute_query("select * from %s.nl_queries" % unique_database)
assert len(result.data) == 2
assert result.data[0].split("\t") == ["the\n", "\nquick\nbrown", "fox\n"]
assert result.data[1].split("\t") == ["\njumped","over the lazy\n","\ndog"]
assert result.data[1].split("\t") == ["\njumped", "over the lazy\n", "\ndog"]
# The row count may be computed without parsing each row, so could be inconsistent.
result = self.execute_query("select count(*) from %s.nl_queries" % unique_database)
assert len(result.data) == 1
@@ -72,8 +73,15 @@ class TestDelimitedText(ImpalaTestSuite):
"""Verifies Impala is able to properly handle delimited text that contains
extended ASCII/latin characters. Marked as running serial because of shared
cleanup/setup"""
self.run_test_case('QueryTest/delimited-latin-text', vector, unique_database,
encoding="latin-1")
result = self.execute_query_expect_success(
self.hs2_client,
"select * from functional.text_thorn_ecirc_newline order by col1")
assert result.tuples() == [('one', 'two', 3, 4),
(b'one\xea', 'two', 3, 4),
(b'one\xea\xea', 'two', 3, 4),
(b'one\xea\xfeone', 'two', 3, 4),
(b'one\xfeone', 'two', 3, 4)]
self.run_test_case('QueryTest/delimited-latin-text', vector, unique_database)
def test_large_file_of_field_delimiters(self, vector, unique_database):
"""IMPALA-13161: Verifies reading a large file which has full of field delimiters
@@ -83,7 +91,7 @@ class TestDelimitedText(ImpalaTestSuite):
table_loc = self._get_table_location(tbl, vector)
# Generate a 3GB data file that has full of '\x00' (the default field delimiter)
with open("data.txt", "wb") as f:
long_str = "\x00" * 1024 * 1024 * 3
long_str = b"\x00" * 1024 * 1024 * 3
[f.write(long_str) for i in range(1024)]
check_call(["hdfs", "dfs", "-put", "data.txt", table_loc])
self.execute_query("refresh " + tbl)

View File

@@ -47,6 +47,7 @@ from tests.shell.util import run_impala_shell_cmd
from tests.util.filesystem_utils import FILESYSTEM_PREFIX, get_fs_path, IS_HDFS, WAREHOUSE
from tests.util.get_parquet_metadata import get_parquet_metadata
from tests.util.iceberg_util import cast_ts, get_snapshots, IcebergCatalogs, quote
from tests.util.parse_util import bytes_to_str
LOG = logging.getLogger(__name__)
@@ -1130,35 +1131,35 @@ class TestIcebergTable(IcebergTestSuite):
{'key': 6, 'value': 1},
{'key': 7, 'value': 2}]
assert datafiles[1]['lower_bounds'] == \
[{'key': 1, 'value': 'abc'},
[{'key': 1, 'value': b'abc'},
# INT is serialized as 4-byte little endian
{'key': 2, 'value': '\x00\x00\x00\x00'},
{'key': 2, 'value': b'\x00\x00\x00\x00'},
# BOOLEAN is serialized as 0x00 for FALSE
{'key': 3, 'value': '\x00'},
{'key': 3, 'value': b'\x00'},
# BIGINT is serialized as 8-byte little endian
{'key': 4, 'value': '\x40\xaf\x0d\x86\x48\x70\x00\x00'},
{'key': 4, 'value': b'\x40\xaf\x0d\x86\x48\x70\x00\x00'},
# TIMESTAMP is serialized as 8-byte little endian (number of microseconds since
# 1970-01-01 00:00:00)
{'key': 5, 'value': '\xc0\xd7\xff\x06\xd0\xff\xff\xff'},
{'key': 5, 'value': b'\xc0\xd7\xff\x06\xd0\xff\xff\xff'},
# DATE is serialized as 4-byte little endian (number of days since 1970-01-01)
{'key': 6, 'value': '\x93\xfe\xff\xff'},
{'key': 6, 'value': b'\x93\xfe\xff\xff'},
# Unlike other numerical values, DECIMAL is serialized as big-endian.
{'key': 7, 'value': '\xd8\xf0'}]
{'key': 7, 'value': b'\xd8\xf0'}]
assert datafiles[1]['upper_bounds'] == \
[{'key': 1, 'value': 'ghij'},
[{'key': 1, 'value': b'ghij'},
# INT is serialized as 4-byte little endian
{'key': 2, 'value': '\x03\x00\x00\x00'},
{'key': 2, 'value': b'\x03\x00\x00\x00'},
# BOOLEAN is serialized as 0x01 for TRUE
{'key': 3, 'value': '\x01'},
{'key': 3, 'value': b'\x01'},
# BIGINT is serialized as 8-byte little endian
{'key': 4, 'value': '\x81\x58\xc2\x97\x56\xd5\x00\x00'},
{'key': 4, 'value': b'\x81\x58\xc2\x97\x56\xd5\x00\x00'},
# TIMESTAMP is serialized as 8-byte little endian (number of microseconds since
# 1970-01-01 00:00:00)
{'key': 5, 'value': '\x80\x02\x86\xef\x2f\x00\x00\x00'},
{'key': 5, 'value': b'\x80\x02\x86\xef\x2f\x00\x00\x00'},
# DATE is serialized as 4-byte little endian (number of days since 1970-01-01)
{'key': 6, 'value': '\x6d\x01\x00\x00'},
{'key': 6, 'value': b'\x6d\x01\x00\x00'},
# Unlike other numerical values, DECIMAL is serialized as big-endian.
{'key': 7, 'value': '\x00\xdc\x14'}]
{'key': 7, 'value': b'\x00\xdc\x14'}]
def test_using_upper_lower_bound_metrics(self, vector, unique_database):
self.run_test_case('QueryTest/iceberg-upper-lower-bound-metrics', vector,
@@ -2224,7 +2225,7 @@ class TestIcebergV2Table(IcebergTestSuite):
table_location = "{0}/test-warehouse/{1}.db/{2}/data".format(
FILESYSTEM_PREFIX, unique_database, table_name)
files_result = check_output(["hdfs", "dfs", "-ls", table_location])
assert "Found 1 items" in files_result
assert "Found 1 items" in bytes_to_str(files_result)
def test_predicate_push_down_hint(self, vector, unique_database):
self.run_test_case('QueryTest/iceberg-predicate-push-down-hint', vector,

View File

@@ -161,21 +161,20 @@ class TestQueries(ImpalaTestSuite):
file_format = vector.get_value('table_format').file_format
if file_format == 'hbase':
pytest.xfail(reason="IMPALA-283 - select count(*) produces inconsistent results")
vector.get_value('exec_option')['disable_outermost_topn'] = 1
vector.get_value('exec_option')['analytic_rank_pushdown_threshold'] = 0
self.run_test_case('QueryTest/sort', vector)
new_vector = deepcopy(vector)
options = new_vector.get_value('exec_option')
options['disable_outermost_topn'] = 1
options['analytic_rank_pushdown_threshold'] = 0
self.run_test_case('QueryTest/sort', new_vector)
# We can get the sort tests for free from the top-n file
self.run_test_case('QueryTest/top-n', vector)
self.run_test_case('QueryTest/top-n', new_vector)
if file_format in ['parquet', 'orc']:
# set timestamp options to get consistent results for both format.
new_vector = deepcopy(vector)
options = new_vector.get_value('exec_option')
options['convert_legacy_hive_parquet_utc_timestamps'] = 1
options['timezone'] = '"Europe/Budapest"'
self.run_test_case('QueryTest/sort-complex', new_vector)
def test_partitioned_top_n(self, vector):
"""Test partitioned Top-N operator."""
if vector.get_value('table_format').file_format == "hbase":

View File

@@ -33,11 +33,11 @@ from time import sleep, time
from builtins import range
import pytest
from impala_shell.impala_client import utf8_encode_if_needed
from impala_shell.impala_shell import ImpalaShell as ImpalaShellClass
from tests.common.environ import ImpalaTestClusterProperties
from tests.common.impala_service import ImpaladService
from tests.common.impala_test_suite import IMPALAD_HS2_HOST_PORT, ImpalaTestSuite
from tests.common.skip import SkipIf
from tests.common.test_dimensions import (
create_client_protocol_dimension,
create_client_protocol_strict_dimension,
@@ -60,8 +60,9 @@ from tests.shell.util import (
DEFAULT_QUERY = 'select 1'
QUERY_FILE_PATH = os.path.join(os.environ['IMPALA_HOME'], 'tests', 'shell')
RUSSIAN_CHARS = (u"А, Б, В, Г, Д, Е, Ё, Ж, З, И, Й, К, Л, М, Н, О, П, Р,"
u"С, Т, У, Ф, Х, Ц,Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я")
RUSSIAN_CHARS = utf8_encode_if_needed(
u"А, Б, В, Г, Д, Е, Ё, Ж, З, И, Й, К, Л, М, Н, О, П, Р,"
u"С, Т, У, Ф, Х, Ц,Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я")
"""IMPALA-12216 implemented timestamp to be printed in case of any error/warning
during query execution, below is an example :
@@ -90,6 +91,7 @@ def find_query_option(key, string, strip_brackets=True):
assert len(values) == 1
return values[0].strip("[]") if strip_brackets else values[0]
@pytest.fixture
def empty_table(unique_database, request):
"""Create an empty table within the test database before executing test.
@@ -299,7 +301,6 @@ class TestImpalaShell(ImpalaTestSuite):
"Column metadata states there are 11 values, but read 10 values from column id."
)
def test_completed_query_errors_2(self, vector):
if vector.get_value('strict_hs2_protocol'):
pytest.skip("Impala-10827: Multiple queries not supported in strict hs2 mode.")
@@ -412,7 +413,7 @@ class TestImpalaShell(ImpalaTestSuite):
args = ['-p', '-q', 'select 1; profile;']
result_set = run_impala_shell_cmd(vector, args)
# This regex helps us uniquely identify a profile.
regex = re.compile("Operator\s+#Hosts\s+#Inst\s+Avg\s+Time")
regex = re.compile(r"Operator\s+#Hosts\s+#Inst\s+Avg\s+Time")
# We expect two query profiles.
assert len(re.findall(regex, result_set.stdout)) == 2, \
"Could not detect two profiles, stdout: %s" % result_set.stdout
@@ -626,24 +627,24 @@ class TestImpalaShell(ImpalaTestSuite):
def test_international_characters(self, vector):
"""Sanity test to ensure that the shell can read international characters."""
args = ['-B', '-q', "select '{0}'".format(RUSSIAN_CHARS.encode('utf-8'))]
args = ['-B', '-q', "select '{0}'".format(RUSSIAN_CHARS)]
result = run_impala_shell_cmd(vector, args)
assert 'UnicodeDecodeError' not in result.stderr
assert RUSSIAN_CHARS.encode('utf-8') in result.stdout
assert RUSSIAN_CHARS in result.stdout
def test_international_characters_prettyprint(self, vector):
"""IMPALA-2717: ensure we can handle international characters in pretty-printed
output"""
args = ['-q', "select '{0}'".format(RUSSIAN_CHARS.encode('utf-8'))]
args = ['-q', "select '{0}'".format(RUSSIAN_CHARS)]
result = run_impala_shell_cmd(vector, args)
assert 'UnicodeDecodeError' not in result.stderr
assert RUSSIAN_CHARS.encode('utf-8') in result.stdout
assert RUSSIAN_CHARS in result.stdout
def test_international_characters_prettyprint_tabs(self, vector):
"""IMPALA-2717: ensure we can handle international characters in pretty-printed
output when pretty-printing falls back to delimited output."""
args = ['-q', "select '{0}\\t'".format(RUSSIAN_CHARS.encode('utf-8'))]
args = ['-q', "select '{0}\\t'".format(RUSSIAN_CHARS)]
result = run_impala_shell_cmd(vector, args)
protocol = vector.get_value('protocol')
@@ -654,13 +655,13 @@ class TestImpalaShell(ImpalaTestSuite):
assert protocol in ('hs2', 'hs2-http'), protocol
assert 'Reverting to tab delimited text' not in result.stderr
assert 'UnicodeDecodeError' not in result.stderr
assert RUSSIAN_CHARS.encode('utf-8') in result.stdout
assert RUSSIAN_CHARS in result.stdout
def test_international_characters_profile(self, vector):
"""IMPALA-12145: ensure we can handle international characters in the profile. """
if vector.get_value('strict_hs2_protocol'):
pytest.skip("Profile not supported in strict hs2 mode.")
text = RUSSIAN_CHARS.encode('utf-8')
text = RUSSIAN_CHARS
args = ['-o', '/dev/null', '-p', '-q', "select '{0}'".format(text)]
result = run_impala_shell_cmd(vector, args)
assert 'UnicodeDecodeError' not in result.stderr
@@ -687,7 +688,7 @@ class TestImpalaShell(ImpalaTestSuite):
# is running against Hive which is another variable. On thrift 0.14 and higher,
# talking to Hive, the result is b'\\xaa', so allow this as another possibility.
assert '\xef\xbf\xbd' in result.stdout or '\xaa' in result.stdout or \
'\\xaa' in result.stdout
'\\xaa' in result.stdout or '<EFBFBD>' in result.stdout
def test_global_config_file(self, vector):
"""Test global and user configuration files."""
@@ -776,13 +777,13 @@ class TestImpalaShell(ImpalaTestSuite):
assert "WARNING: Option 'config_file' can be only set from shell." in result.stderr
err_msg = ("WARNING: Unable to read configuration file correctly. "
"Ignoring unrecognized config option: 'invalid_option'\n")
assert err_msg in result.stderr
assert err_msg in result.stderr
args = ['--config_file=%s/impalarc_with_error' % QUERY_FILE_PATH]
result = run_impala_shell_cmd(vector, args, expect_success=False)
err_msg = ("Unexpected value in configuration file. "
"'maybe' is not a valid value for a boolean option.")
assert err_msg in result.stderr
assert err_msg in result.stderr
# live_progress and live_summary are not supported with strict_hs2_protocol
if not vector.get_value('strict_hs2_protocol'):
@@ -886,7 +887,7 @@ class TestImpalaShell(ImpalaTestSuite):
# Test with an escaped variable.
result = run_impala_shell_cmd(vector, ['--var=msg1=1', '--var=msg2=${var:msg1}2',
'--var=msg3=\${var:msg1}${var:msg2}',
'--var=msg3=\\${var:msg1}${var:msg2}',
"--query=select '${var:msg3}'"])
self._validate_shell_messages(result.stderr, ['${var:msg1}12', 'Fetched 1 row(s)'],
should_exist=True)
@@ -894,7 +895,7 @@ class TestImpalaShell(ImpalaTestSuite):
# Referencing a non-existent variable will result in an error.
result = run_impala_shell_cmd(vector, [
'--var=msg1=1', '--var=msg2=${var:doesnotexist}2',
'--var=msg3=\${var:msg1}${var:msg2}', "--query=select '${var:msg3}'"],
'--var=msg3=\\${var:msg1}${var:msg2}', "--query=select '${var:msg3}'"],
expect_success=False)
self._validate_shell_messages(result.stderr,
['Error: Unknown variable DOESNOTEXIST',
@@ -1045,7 +1046,7 @@ class TestImpalaShell(ImpalaTestSuite):
if e.errno != errno.EINTR:
raise
data = connection.recv(1024)
assert expected_output in data
assert expected_output in str(data)
finally:
if impala_shell.poll() is None:
impala_shell.kill()
@@ -1063,7 +1064,7 @@ class TestImpalaShell(ImpalaTestSuite):
try:
socket.setdefaulttimeout(10)
s = socket.socket()
s.bind(("",0))
s.bind(("", 0))
s.listen(1)
test_impalad_port = s.getsockname()[1]
load_balancer_fqdn = "my-load-balancer.local"
@@ -1087,7 +1088,7 @@ class TestImpalaShell(ImpalaTestSuite):
assert "Could not execute command:" in result.stderr
else:
assert "Encountered: EOF" in result.stderr
args = ['-q', 'with v as (select 1) \;"']
args = ['-q', 'with v as (select 1) \\;"']
result = run_impala_shell_cmd(vector, args, expect_success=False)
if vector.get_value('strict_hs2_protocol'):
assert "cannot recognize input near"
@@ -1103,15 +1104,16 @@ class TestImpalaShell(ImpalaTestSuite):
sql_file, sql_path = tempfile.mkstemp()
# This generates a sql file size of ~50K.
num_cols = 1000
os.write(sql_file, "select \n")
for i in range(num_cols):
if i < num_cols:
os.write(sql_file, "col_{0} as a{1},\n".format(i, i))
os.write(sql_file, "col_{0} as b{1},\n".format(i, i))
os.write(sql_file, "col_{0} as c{1}{2}\n".format(
i, i, "," if i < num_cols - 1 else ""))
os.write(sql_file, "from non_existence_large_table;")
os.close(sql_file)
with open(sql_path, 'w') as f:
f.write("select \n")
for i in range(num_cols):
if i < num_cols:
f.write("col_{0} as a{1},\n".format(i, i))
f.write("col_{0} as b{1},\n".format(i, i))
f.write("col_{0} as c{1}{2}\n".format(
i, i, "," if i < num_cols - 1 else ""))
f.write("from non_existence_large_table;")
f.close()
try:
args = ['-f', sql_path, '-d', unique_database]
@@ -1148,7 +1150,7 @@ class TestImpalaShell(ImpalaTestSuite):
tzname = find_query_option("TIMEZONE", result_set.stdout)
assert os.path.isfile("/usr/share/zoneinfo/" + tzname)
def test_find_query_option(self, vector):
def test_find_query_option(self):
"""Test utility function find_query_option()."""
test_input = """
not_an_option
@@ -1177,8 +1179,8 @@ class TestImpalaShell(ImpalaTestSuite):
# --connect_timeout_ms not supported with HTTP transport. Refer to the comment
# in ImpalaClient::_get_http_transport() for details.
# --http_socket_timeout_s not supported for strict_hs2_protocol.
if (vector.get_value('protocol') == 'hs2-http' and
vector.get_value('strict_hs2_protocol')):
if (vector.get_value('protocol') == 'hs2-http'
and vector.get_value('strict_hs2_protocol')):
pytest.skip("THRIFT-4600")
with closing(socket.socket()) as s:
@@ -1259,9 +1261,9 @@ class TestImpalaShell(ImpalaTestSuite):
assert "| |" in result.stdout, result.stdout
assert "| árvíztűrőtükörfúró |" in result.stdout, result.stdout
assert "| 你好hello |" in result.stdout, result.stdout
assert "| \x00\xef\xbf\xbd\x00\xef\xbf\xbd |" in result.stdout, \
result.stdout
assert '| \xef\xbf\xbdD3"\x11\x00 |' in result.stdout, result.stdout
# The last two output lines are malformed UTF-8 strings.
assert "| " + utf8_encode_if_needed("\0<EFBFBD>\0<EFBFBD>") in result.stdout, result.stdout
assert "| " + utf8_encode_if_needed("<EFBFBD>D3\"") in result.stdout, result.stdout
def test_binary_as_string(self, vector):
query = """select cast(binary_col as string) from functional.binary_tbl
@@ -1292,8 +1294,8 @@ class TestImpalaShell(ImpalaTestSuite):
# way that python2 and python3 represent floating point values, the output
# from the shell will differ with regard to which version of python the
# shell is running under.
assert("3\t3\t30.299999999999997" in result.stdout or
"3\t3\t30.3" in result.stdout), result.stdout
assert ("3\t3\t30.299999999999997" in result.stdout
or "3\t3\t30.3" in result.stdout), result.stdout
assert "4\t4\t40.4" in result.stdout, result.stdout
@@ -1410,8 +1412,8 @@ class TestImpalaShell(ImpalaTestSuite):
def test_http_socket_timeout(self, vector):
"""Test setting different http_socket_timeout_s values."""
if (vector.get_value('strict_hs2_protocol') or
vector.get_value('protocol') != 'hs2-http'):
if (vector.get_value('strict_hs2_protocol')
or vector.get_value('protocol') != 'hs2-http'):
pytest.skip("http socket timeout not supported in strict hs2 mode."
" Only supported with hs2-http protocol.")
# Test http_socket_timeout_s=0, expect errors

View File

@@ -35,6 +35,7 @@ from time import sleep
import pexpect
import pytest
from impala_shell.impala_client import utf8_encode_if_needed
# This import is the actual ImpalaShell class from impala_shell.py.
# We rename it to ImpalaShellClass here because we later import another
# class called ImpalaShell from tests/shell/util.py, and we don't want
@@ -63,6 +64,7 @@ from tests.shell.util import (
spawn_shell,
stderr_get_first_error_msg,
)
from tests.util.parse_util import bytes_to_str
QUERY_FILE_PATH = os.path.join(os.environ['IMPALA_HOME'], 'tests', 'shell')
@@ -117,7 +119,7 @@ class RequestHandler503(http.server.SimpleHTTPRequestHandler):
self.end_headers()
if self.should_send_body_text():
# Optionally send body text with 503 message.
self.wfile.write("EXTRA")
self.wfile.write(b"EXTRA")
class RequestHandler503Extra(RequestHandler503):
@@ -195,7 +197,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
proc.expect(":{0}] {1}>".format(get_impalad_port(vector), db))
if not expectations: return
for e in expectations:
assert e in proc.before
assert e in bytes_to_str(proc.before)
def _wait_for_num_open_sessions(self, vector, impala_service, expected, err):
"""Helper method to wait for the number of open sessions to reach 'expected'."""
@@ -403,7 +405,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
def test_cancellation_mid_command(self, vector):
"""Test that keyboard interrupt cancels multiline query strings"""
if vector.get_value('strict_hs2_protocol'):
pytest.skip("IMPALA-10827: Cancellation infrastructure does not " +
pytest.skip("IMPALA-10827: Cancellation infrastructure does not "
"work in strict hs2 mode.")
shell_cmd = get_shell_cmd(vector)
multiline_query = ["select column_1\n", "from table_1\n", "where ..."]
@@ -413,7 +415,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
for query_line in multiline_query:
child_proc.send(query_line)
child_proc.sendintr()
child_proc.expect("\^C")
child_proc.expect(r"\^C")
child_proc.expect(PROMPT_REGEX)
child_proc.sendline('quit;')
child_proc.wait()
@@ -425,7 +427,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
child_proc.send("\n")
child_proc.expect(">")
child_proc.sendintr()
child_proc.expect("> \^C")
child_proc.expect(r"> \^C")
child_proc.expect(PROMPT_REGEX)
child_proc.sendline('quit;')
child_proc.wait()
@@ -435,7 +437,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
if vector.get_value('strict_hs2_protocol'):
pytest.skip("IMPALA-10827: Failed, need to investigate.")
# test a unicode query spanning multiple lines
unicode_bytes = u'\ufffd'.encode('utf-8')
unicode_bytes = utf8_encode_if_needed(u'\ufffd')
args = "select '{0}'\n;".format(unicode_bytes)
result = run_impala_shell_interactive(vector, args)
assert "Fetched 1 row(s)" in result.stderr, result.stderr
@@ -449,7 +451,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
child_proc = spawn_shell(shell_cmd)
child_proc.expect(PROMPT_REGEX)
child_proc.sendline("select '{0}'\n;".format(unicode_bytes))
child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(PROMPT_REGEX)
child_proc.sendline('quit;')
child_proc.wait()
@@ -477,7 +479,6 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
assert "Fetched 1 row(s)" in result.stderr, result.stderr
assert "세율중분류구분코드" in result.stdout
def test_welcome_string(self, vector):
"""Test that the shell's welcome message is only printed once
when the shell is started. Ensure it is not reprinted on errors.
@@ -616,7 +617,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
run_impala_shell_interactive(vector, "drop database if exists foo;")
self.create_impala_clients()
def test_multiline_queries_in_history(self, vector, tmp_history_file):
def test_multiline_queries_in_history(self, vector, tmp_history_file): # noqa: U100
"""Test to ensure that multiline queries with comments are preserved in history
Ensure that multiline queries are preserved when they're read back from history.
@@ -638,7 +639,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
for query, _ in queries:
child_proc.expect(PROMPT_REGEX)
child_proc.sendline(query)
child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(PROMPT_REGEX)
child_proc.sendline('quit;')
child_proc.wait()
@@ -649,7 +650,8 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
assert history_entry in result.stderr, "'%s' not in '%s'" % (history_entry,
result.stderr)
def test_history_does_not_duplicate_on_interrupt(self, vector, tmp_history_file):
def test_history_does_not_duplicate_on_interrupt(
self, vector, tmp_history_file): # noqa: U100
"""This test verifies that once the cmdloop is broken the history file will not be
re-read. The cmdloop can be broken when the user sends a SIGINT or exceptions
occur."""
@@ -662,7 +664,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
# initialize history
child_proc.expect(PROMPT_REGEX)
child_proc.sendline("select 1;")
child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(PROMPT_REGEX)
child_proc.sendline("quit;")
child_proc.wait()
@@ -671,9 +673,9 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
child_proc = spawn_shell(shell_cmd)
child_proc.expect(PROMPT_REGEX)
child_proc.sendintr()
child_proc.expect("\^C")
child_proc.expect(r"\^C")
child_proc.sendline("select 2;")
child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
child_proc.expect(PROMPT_REGEX)
child_proc.sendline("quit;")
child_proc.wait()
@@ -682,12 +684,14 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
p = ImpalaShell(vector)
p.send_cmd('history')
result = p.get_result().stderr.splitlines()
assert "[1]: select 1;" == result[1]
assert "[2]: quit;" == result[2]
assert "[3]: select 2;" == result[3]
assert "[4]: quit;" == result[4]
# Python 2 and Python 3 shell have different first lines in the history output.
start_idx = 1 if "Server version: " in result[0] else 0
assert "[1]: select 1;" == result[start_idx]
assert "[2]: quit;" == result[start_idx + 1]
assert "[3]: select 2;" == result[start_idx + 2]
assert "[4]: quit;" == result[start_idx + 3]
def test_history_file_option(self, vector, tmp_history_file):
def test_history_file_option(self, vector, tmp_history_file): # noqa: U100
"""
Setting the 'tmp_history_file' fixture above means that the IMPALA_HISTFILE
environment will be overridden. Here we override that environment by passing
@@ -704,7 +708,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
history_contents = open(new_hist.name).read()
assert "select 'hi'" in history_contents
def test_rerun(self, vector, tmp_history_file):
def test_rerun(self, vector, tmp_history_file): # noqa: U100
"""Smoke test for the 'rerun' command"""
if vector.get_value('strict_hs2_protocol'):
pytest.skip("Rerun not supported in strict hs2 mode.")
@@ -721,11 +725,12 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
("second_command"))
child_proc.sendline('history;')
child_proc.expect(":{0}] default>".format(get_impalad_port(vector)))
assert '[1]: select \'first_command\';' in child_proc.before
assert '[2]: select \'second_command\';' in child_proc.before
assert '[3]: history;' in child_proc.before
before_line = child_proc.before.decode('UTF-8', 'replace')
assert '[1]: select \'first_command\';' in before_line
assert '[2]: select \'second_command\';' in before_line
assert '[3]: history;' in before_line
# Rerunning command should not add an entry into history.
assert '[4]' not in child_proc.before
assert '[4]' not in before_line
self._expect_with_cmd(child_proc, "@0", vector, ("Command index out of range"))
self._expect_with_cmd(child_proc, "rerun 4", vector, ("Command index out of range"))
self._expect_with_cmd(child_proc, "@-4", vector, ("Command index out of range"))
@@ -861,7 +866,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
# IMPALA-5416: Test that two source commands on a line won't crash the shell.
result = run_impala_shell_interactive(
vector, "source shell.cmds;source shell.cmds;")
assert len(re.findall("version\(\)", result.stdout)) == 2
assert len(re.findall(r"version\(\)", result.stdout)) == 2
finally:
os.chdir(cwd)
@@ -888,11 +893,11 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
# the client does not fetch. For statements returning 0 rows we do not
# want an empty line in stdout.
result = run_impala_shell_interactive(vector, "-- foo \n use default;")
assert re.search('> \[', result.stdout)
assert re.search(r'> \[', result.stdout)
result = run_impala_shell_interactive(vector,
"select * from functional.alltypes limit 0;")
assert "Fetched 0 row(s)" in result.stderr
assert re.search('> \[', result.stdout)
assert re.search(r'> \[', result.stdout)
def test_set_and_set_all(self, vector):
"""IMPALA-2181. Tests the outputs of SET and SET ALL commands. SET should contain the
@@ -1050,15 +1055,15 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
assert '| \'--\' |' in result.stdout
assert '| -- |' in result.stdout
query = ('select * from (\n' +
'select count(*) from functional.alltypes\n' +
query = ('select * from (\n'
'select count(*) from functional.alltypes\n'
') v; -- Incomplete SQL statement in this line')
result = run_impala_shell_interactive(vector, query)
assert '| count(*) |' in result.stdout
query = ('select id from functional.alltypes\n' +
'order by id; /*\n' +
'* Multi-line comment\n' +
query = ('select id from functional.alltypes\n'
'order by id; /*\n'
'* Multi-line comment\n'
'*/')
result = run_impala_shell_interactive(vector, query)
assert '| id |' in result.stdout
@@ -1154,7 +1159,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
proc.sendeof()
proc.wait()
def test_strip_leading_comment(self, vector):
def test_strip_leading_comment(self):
"""Test stripping leading comments from SQL statements"""
assert ('--delete\n', 'select 1') == \
ImpalaShellClass.strip_leading_comment('--delete\nselect 1')
@@ -1230,7 +1235,6 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
"Bad date/time conversion format: yyyy年MM月dd日"
)
def test_timezone_validation(self, vector):
"""Test that query option TIMEZONE is validated when executing a query.
@@ -1300,7 +1304,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
"-i{0}:{1}".format(http_503_server_extra.HOST,
http_503_server_extra.PORT)]
shell_proc = spawn_shell(impala_shell_executable + shell_args)
shell_proc.expect("HTTP code 503: Service Unavailable \[EXTRA\]", timeout=10)
shell_proc.expect(r"HTTP code 503: Service Unavailable \[EXTRA\]", timeout=10)
def run_impala_shell_interactive(vector, input_lines, shell_args=None,

View File

@@ -23,10 +23,8 @@ from contextlib import closing
import logging
import os
import re
import shlex
import socket
from subprocess import PIPE, Popen
import sys
import time
import pexpect
@@ -54,6 +52,7 @@ LOG.addHandler(logging.StreamHandler())
SHELL_HISTORY_FILE = os.path.expanduser("~/.impalahistory")
IMPALA_HOME = os.environ['IMPALA_HOME']
def build_shell_env(env=None):
""" Construct the environment for the shell to run in based on 'env', or the current
process's environment if env is None."""
@@ -69,24 +68,24 @@ def build_shell_env(env=None):
def assert_var_substitution(result):
assert_pattern(r'\bfoo_number=.*$', 'foo_number= 123123', result.stdout, \
assert_pattern(r'\bfoo_number=.*$', 'foo_number= 123123', result.stdout,
'Numeric values not replaced correctly')
assert_pattern(r'\bfoo_string=.*$', 'foo_string=123', result.stdout, \
assert_pattern(r'\bfoo_string=.*$', 'foo_string=123', result.stdout,
'String values not replaced correctly')
assert_pattern(r'\bVariables:[\s\n]*BAR:\s*[0-9]*\n\s*FOO:\s*[0-9]*', \
'Variables:\n\tBAR: 456\n\tFOO: 123', result.stdout, \
assert_pattern(r'\bVariables:[\s\n]*BAR:\s*[0-9]*\n\s*FOO:\s*[0-9]*',
'Variables:\n\tBAR: 456\n\tFOO: 123', result.stdout,
"Set variable not listed correctly by the first SET command")
assert_pattern(r'\bError: Unknown variable FOO1$', \
'Error: Unknown variable FOO1', result.stderr, \
assert_pattern(r'\bError: Unknown variable FOO1$',
'Error: Unknown variable FOO1', result.stderr,
'Missing variable FOO1 not reported correctly')
assert_pattern(r'\bmulti_test=.*$', 'multi_test=456_123_456_123', \
assert_pattern(r'\bmulti_test=.*$', 'multi_test=456_123_456_123',
result.stdout, 'Multiple replaces not working correctly')
assert_pattern(r'\bError:\s*Unknown\s*substitution\s*syntax\s*' +
r'\(RANDOM_NAME\). Use \${VAR:var_name}', \
'Error: Unknown substitution syntax (RANDOM_NAME). Use ${VAR:var_name}', \
assert_pattern(r'\bError:\s*Unknown\s*substitution\s*syntax\s*'
r'\(RANDOM_NAME\). Use \${VAR:var_name}',
'Error: Unknown substitution syntax (RANDOM_NAME). Use ${VAR:var_name}',
result.stderr, "Invalid variable reference")
assert_pattern(r'"This should be not replaced: \${VAR:foo} \${HIVEVAR:bar}"',
'"This should be not replaced: ${VAR:foo} ${HIVEVAR:bar}"', \
'"This should be not replaced: ${VAR:foo} ${HIVEVAR:bar}"',
result.stdout, "Variable escaping not working")
assert_pattern(r'\bVariable MYVAR set to.*$', 'Variable MYVAR set to foo123',
result.stderr, 'No evidence of MYVAR variable being set.')
@@ -97,11 +96,11 @@ def assert_var_substitution(result):
result.stdout, 'No evidence of variable FOO being unset')
assert_pattern(r'\bUnsetting variable BAR$', 'Unsetting variable BAR',
result.stdout, 'No evidence of variable BAR being unset')
assert_pattern(r'\bVariables:[\s\n]*No variables defined\.$', \
'Variables:\n\tNo variables defined.', result.stdout, \
assert_pattern(r'\bVariables:[\s\n]*No variables defined\.$',
'Variables:\n\tNo variables defined.', result.stdout,
'Unset variables incorrectly listed by third SET command.')
assert_pattern(r'\bNo variable called NONEXISTENT is set', \
'No variable called NONEXISTENT is set', result.stdout, \
assert_pattern(r'\bNo variable called NONEXISTENT is set',
'No variable called NONEXISTENT is set', result.stdout,
'Problem unsetting non-existent variable.')
assert_pattern(r'\bVariable COMMENT_TYPE1 set to.*$',
'Variable COMMENT_TYPE1 set to ok', result.stderr,
@@ -112,11 +111,12 @@ def assert_var_substitution(result):
assert_pattern(r'\bVariable COMMENT_TYPE3 set to.*$',
'Variable COMMENT_TYPE3 set to ok', result.stderr,
'No evidence of COMMENT_TYPE3 variable being set.')
assert_pattern(r'\bVariables:[\s\n]*COMMENT_TYPE1:.*[\s\n]*' + \
'COMMENT_TYPE2:.*[\s\n]*COMMENT_TYPE3:.*$',
'Variables:\n\tCOMMENT_TYPE1: ok\n\tCOMMENT_TYPE2: ok\n\tCOMMENT_TYPE3: ok', \
assert_pattern(r'\bVariables:[\s\n]*COMMENT_TYPE1:.*[\s\n]*'
r'COMMENT_TYPE2:.*[\s\n]*COMMENT_TYPE3:.*$',
'Variables:\n\tCOMMENT_TYPE1: ok\n\tCOMMENT_TYPE2: ok\n\tCOMMENT_TYPE3: ok',
result.stdout, 'Set variables not listed correctly by the SET command')
def assert_pattern(pattern, result, text, message):
"""Asserts that the pattern, when applied to text, returns the expected result"""
m = re.search(pattern, text, re.MULTILINE)
@@ -217,6 +217,7 @@ def get_open_sessions_metric(vector):
assert protocol == 'beeswax', protocol
return 'impala-server.num-open-beeswax-sessions'
class ImpalaShellResult(object):
def __init__(self):
self.rc = 0
@@ -380,12 +381,13 @@ def get_dev_impala_shell_executable():
def create_impala_shell_executable_dimension(dev_only=False):
_, include_pypi = get_dev_impala_shell_executable()
dimensions = []
if os.getenv("IMPALA_SYSTEM_PYTHON2"):
python3_pytest = (os.getenv("IMPALA_USE_PYTHON3_TESTS", "false") == "true")
if os.getenv("IMPALA_SYSTEM_PYTHON2") and not python3_pytest:
dimensions.append('dev')
if os.getenv("IMPALA_SYSTEM_PYTHON3"):
dimensions.append('dev3')
if include_pypi and not dev_only:
if os.getenv("IMPALA_SYSTEM_PYTHON2"):
if os.getenv("IMPALA_SYSTEM_PYTHON2") and not python3_pytest:
dimensions.append('python2')
if os.getenv("IMPALA_SYSTEM_PYTHON3"):
dimensions.append('python3')

View File

@@ -20,11 +20,14 @@ from collections import defaultdict
import json
import logging
import socket
import sys
import threading
import time
import traceback
import uuid
import pytest
from builtins import range
from thrift.protocol import TBinaryProtocol
from thrift.server.TServer import TServer
@@ -115,6 +118,7 @@ class KillableThreadedServer(TServer):
self.port = self.serverTransport.port
def shutdown(self):
LOG.info('Server localhost:{} is shutting down'.format(self.port))
self.is_shutdown = True
self.serverTransport.close()
self.wait_until_down()
@@ -127,20 +131,22 @@ class KillableThreadedServer(TServer):
cnxn = TSocket.TSocket('localhost', self.port)
try:
cnxn.open()
LOG.info('Server localhost:{} is up'.format(cnxn.port))
return
except Exception:
if i == num_tries - 1: raise
time.sleep(0.1)
time.sleep(0.5)
def wait_until_down(self, num_tries=10):
for i in range(num_tries):
cnxn = TSocket.TSocket('localhost', self.port)
try:
cnxn.open()
time.sleep(0.1)
except Exception:
LOG.info('Server localhost:{} is down'.format(cnxn.port))
return
raise Exception("Server did not stop")
time.sleep(0.5)
raise Exception("Server localhost:{} did not stop".format(cnxn.port))
def serve(self):
self.serverTransport.listen()
@@ -149,8 +155,12 @@ class KillableThreadedServer(TServer):
# Since accept() can take a while, check again if the server is shutdown to avoid
# starting an unnecessary thread.
if self.is_shutdown: return
t = threading.Thread(target=self.handle, args=(client,))
t.setDaemon(self.daemon)
t = None
if sys.version_info.major < 3:
t = threading.Thread(target=self.handle, args=(client,))
t.setDaemon(True)
else:
t = threading.Thread(target=self.handle, args=(client,), daemon=self.daemon)
t.start()
def handle(self, client):
@@ -196,6 +206,9 @@ class StatestoreSubscriber(object):
self.heartbeat_cb, self.update_cb = heartbeat_cb, update_cb
self.subscriber_id = "python-test-client-%s" % uuid.uuid1()
self.exception = None
self.server = None
self.server_thread = None
self.client_transport = None
def __enter__(self):
return self
@@ -239,19 +252,24 @@ class StatestoreSubscriber(object):
return response
def __init_server(self):
LOG.info('Initializing server')
processor = Subscriber.Processor(self)
transport = WildcardServerSocket()
tfactory = TTransport.TBufferedTransportFactory()
pfactory = TBinaryProtocol.TBinaryProtocolFactory()
self.server = KillableThreadedServer(processor, transport, tfactory, pfactory,
daemon=True)
self.server_thread = threading.Thread(target=self.server.serve)
self.server_thread.setDaemon(True)
if sys.version_info.major < 3:
self.server_thread = threading.Thread(target=self.server.serve)
self.server_thread.setDaemon(True)
else:
self.server_thread = threading.Thread(target=self.server.serve, daemon=True)
self.server_thread.start()
self.server.wait_until_up()
self.port = self.server.port
def __init_client(self):
LOG.info('Initializing client')
self.client_transport = \
TTransport.TBufferedTransport(TSocket.TSocket('localhost', 24000))
self.protocol = TBinaryProtocol.TBinaryProtocol(self.client_transport)
@@ -352,6 +370,7 @@ class StatestoreSubscriber(object):
time.sleep(0.2)
@pytest.mark.execute_serially
@SkipIfDockerizedCluster.statestore_not_exposed
class TestStatestore(BaseTestSuite):
def make_topic_update(self, topic_name, key_template="foo", value_template="bar",

View File

@@ -29,6 +29,7 @@ from tests.common.skip import SkipIfFS
from tests.common.test_dimensions import create_exec_option_dimension
from tests.stress.stress_util import run_tasks, Task
from tests.util.filesystem_utils import FILESYSTEM_PREFIX, IS_HDFS
from tests.util.parse_util import bytes_to_str
from tests.conftest import DEFAULT_HIVE_SERVER2
@@ -339,7 +340,8 @@ class TestIcebergConcurrentOperations(ImpalaTestSuite):
table_location = "{0}/test-warehouse/{1}.db/{2}/data".format(
FILESYSTEM_PREFIX, unique_database, table_name)
data_files_on_fs_result = check_output(["hdfs", "dfs", "-ls", table_location])
data_files_on_fs_result = bytes_to_str(
check_output(["hdfs", "dfs", "-ls", table_location]))
# The first row of the HDFS result is a summary, the following lines contain
# 1 file each.
data_files_on_fs_rows = data_files_on_fs_result.strip().split('\n')[1:]

View File

@@ -31,6 +31,7 @@ from xml.etree.ElementTree import parse
from tests.util.filesystem_base import BaseFilesystem
from tests.util.filesystem_utils import FILESYSTEM_PREFIX
from tests.util.parse_util import bytes_to_str
class HdfsConfig(object):
@@ -220,13 +221,13 @@ class HadoopFsCommandLineClient(BaseFilesystem):
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
status = process.returncode
return (status, stdout, stderr)
return (status, bytes_to_str(stdout), bytes_to_str(stderr))
def create_file(self, path, file_data, overwrite=True):
"""Creates a temporary file with the specified file_data on the local filesystem,
then puts it into the specified path."""
if not overwrite and self.exists(path): return False
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as tmp_file:
tmp_file.write(file_data)
put_cmd_params = ['-put', '-d']
if overwrite: put_cmd_params.append('-f')

View File

@@ -17,6 +17,7 @@
from __future__ import absolute_import, division, print_function
import re
import sys
from datetime import datetime
# IMPALA-6715: Every so often the stress test or the TPC workload directories get
@@ -285,3 +286,11 @@ def get_time_summary_stats_counter(counter_name, runtime_profile):
max_value=parse_duration_string_ns(summary_stat['max'])))
return summary_stats
def bytes_to_str(bytes):
"""Utility function to convert bytes to string.
This is needed to handle the differences between Python 2 and 3."""
if sys.version_info.major < 3:
return str(bytes)
return bytes.decode('utf-8', errors='replace')

View File

@@ -21,6 +21,8 @@ from __future__ import absolute_import, division, print_function
import logging
import os
import shlex
import sys
from select import select
from subprocess import PIPE, Popen, STDOUT, call
from textwrap import dedent
@@ -35,6 +37,7 @@ def dump_server_stacktraces():
LOG.debug('Dumping stacktraces of running servers')
call([os.path.join(os.environ['IMPALA_HOME'], "bin/dump-stacktraces.sh")])
def exec_process(cmd):
"""Executes a subprocess, waiting for completion. The process exit code, stdout and
stderr are returned as a tuple."""
@@ -46,6 +49,7 @@ def exec_process(cmd):
rc = p.returncode
return rc, stdout, stderr
def exec_process_async(cmd):
"""Executes a subprocess, returning immediately. The process object is returned for
later retrieval of the exit code etc. """
@@ -55,6 +59,7 @@ def exec_process_async(cmd):
return Popen(shlex.split(cmd), shell=False, stdout=PIPE, stderr=PIPE,
universal_newlines=True)
def shell(cmd, cmd_prepend="set -euo pipefail\n", stdout=PIPE, stderr=STDOUT,
timeout_secs=None, **popen_kwargs):
"""Executes a command and returns its output. If the command's return code is non-zero
@@ -77,6 +82,7 @@ def shell(cmd, cmd_prepend="set -euo pipefail\n", stdout=PIPE, stderr=STDOUT,
remaining_fds.append(stderr_fileno)
stdout = list()
stderr = list()
def _read_available_output():
while True:
available_fds, _, _ = select(remaining_fds, [], [], 0)
@@ -88,7 +94,7 @@ def shell(cmd, cmd_prepend="set -euo pipefail\n", stdout=PIPE, stderr=STDOUT,
if not data:
del remaining_fds[0]
else:
stdout.append(data)
stdout.append(data if sys.version_info.major < 3 else data.decode())
elif fd == stderr_fileno:
if not data:
del remaining_fds[-1]