IMPALA-14333: Run impala-py.test using Python3

Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-19 09:58:28 -05:00 · 2025-08-17 10:11:59 -07:00
parent 789991c6cc
commit 28cff4022d
29 changed files with 402 additions and 263 deletions
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -347,7 +347,7 @@ export IMPALA_KERBERIZE=false
 unset IMPALA_TOOLCHAIN_KUDU_MAVEN_REPOSITORY
 unset IMPALA_TOOLCHAIN_KUDU_MAVEN_REPOSITORY_ENABLED

-export IMPALA_USE_PYTHON3_TESTS=${IMPALA_USE_PYTHON3_TESTS:-false}
+export IMPALA_USE_PYTHON3_TESTS=${IMPALA_USE_PYTHON3_TESTS:-true}

 # Source the branch and local config override files here to override any
 # variables above or any variables below that allow overriding via environment
--- a/shell/impala_shell/impala_shell.py
+++ b/shell/impala_shell/impala_shell.py
@@ -1553,7 +1553,7 @@ class ImpalaShell(cmd.Cmd, object):
      # undecodable elements.
      if self.last_query_handle is not None:
        self.imp_client.close_query(self.last_query_handle)
-      log_exception_with_timestamp(e, "UnicodeDecodeError", "Please check for"
+      log_exception_with_timestamp(e, "UnicodeDecodeError", "Please check for "
         "columns containing binary data to find the possible source of the error")
    except QueryStateException as e:
      # an exception occurred while executing the query
@@ -1965,6 +1965,7 @@ class ImpalaShell(cmd.Cmd, object):
        print("Error: OAuth access token not found in json payload")
        sys.exit(1)

+
 TIPS = [
  "Press TAB twice to see a list of available commands.",
  "After running a query, type SUMMARY to see a summary of where time was spent.",
--- a/testdata/workloads/functional-query/queries/QueryTest/delimited-latin-text.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/delimited-latin-text.test
@@ -1,20 +1,5 @@
 ====
 ---- QUERY
-# test querying text table "extended" ASCII (latin) delimiters:
-# fields terminated by '-2' -- thorn character
-# escaped by '-22' -- lowercase e with circumflex
-# lines terminated by '\n'
-select * from functional.text_thorn_ecirc_newline
---- RESULTS
-'one','two',3,4
-'one\xfeone','two',3,4
-'one\xea','two',3,4
-'one\xea\xfeone','two',3,4
-'one\xea\xea','two',3,4
---- TYPES
-STRING,STRING,INT,INT
-====
---- QUERY
 # create new tables like the ones above to test inserting
 create table tecn like functional.text_thorn_ecirc_newline;
 ---- RESULTS
--- a/tests/authorization/test_ranger.py
+++ b/tests/authorization/test_ranger.py
@@ -45,6 +45,7 @@ from tests.util.calculation_util import get_random_id
 from tests.util.filesystem_utils import WAREHOUSE, WAREHOUSE_PREFIX
 from tests.util.hdfs_util import NAMENODE
 from tests.util.iceberg_util import get_snapshots
+from tests.util.parse_util import bytes_to_str

 ADMIN = "admin"
 OWNER_USER = getuser()
@@ -791,7 +792,7 @@ class TestRanger(CustomClusterTestSuite):
        "{0}/service/public/v2/api/policy?servicename=test_impala&policyname={1}".format(
            RANGER_HOST, policy_name),
        auth=RANGER_AUTH, headers=REST_HEADERS)
-    assert 300 > r.status_code >= 200, r.content
+    assert 300 > r.status_code >= 200, bytes_to_str(r.content)

  @staticmethod
  def _check_privileges(result, expected):
@@ -807,7 +808,14 @@ class TestRanger(CustomClusterTestSuite):

  def _run_query_as_user(self, query, username, expect_success):
    """Helper to run an input query as a given user."""
-    impala_client = self.create_impala_client(user=username)
+    with self.create_impala_client(user=username) as impala_client:
+      if expect_success:
+        return self.execute_query_expect_success(
+            impala_client, query, query_options={'sync_ddl': 1})
+      return self.execute_query_expect_failure(impala_client, query)
+
+  def _run_query_with_client(self, query, impala_client, expect_success):
+    """Helper to run an input query using a given impala_client."""
    if expect_success:
      return self.execute_query_expect_success(
          impala_client, query, query_options={'sync_ddl': 1})
@@ -882,6 +890,7 @@ class TestRanger(CustomClusterTestSuite):
    grantee_role = "grantee_role"
    resource_owner_role = OWNER_USER
    admin_client = self.create_impala_client(user=ADMIN)
+    user_client = self.create_impala_client(user=OWNER_USER)
    unique_database = unique_name + "_db"
    table_name = "tbl"
    column_names = ["a", "b"]
@@ -899,15 +908,15 @@ class TestRanger(CustomClusterTestSuite):
      # able to create a UDF.
      admin_client.execute("grant all on uri '{0}{1}' to user {2}"
          .format(os.getenv("FILESYSTEM_PREFIX"), udf_uri, OWNER_USER))
-      self._run_query_as_user("create database {0}".format(unique_database), OWNER_USER,
-          True)
-      self._run_query_as_user("create table {0}.{1} ({2} int, {3} string)"
+      self._run_query_with_client("create database {0}".format(unique_database),
+                                  user_client, True)
+      self._run_query_with_client("create table {0}.{1} ({2} int, {3} string)"
          .format(unique_database, table_name, column_names[0], column_names[1]),
-          OWNER_USER, True)
-      self._run_query_as_user("create function {0}.{1} "
+          user_client, True)
+      self._run_query_with_client("create function {0}.{1} "
          "location '{2}{3}' symbol='org.apache.impala.TestUdf'"
          .format(unique_database, udf_name, os.getenv("FILESYSTEM_PREFIX"), udf_uri),
-          OWNER_USER, True)
+          user_client, True)

      for data in test_data:
        grantee_type = data[0]
@@ -935,6 +944,8 @@ class TestRanger(CustomClusterTestSuite):
      admin_client.execute("revoke create on server from user {0}".format(OWNER_USER))
      admin_client.execute("revoke all on uri '{0}{1}' from user {2}"
          .format(os.getenv("FILESYSTEM_PREFIX"), udf_uri, OWNER_USER))
+      admin_client.close()
+      user_client.close()

  def _test_grant_revoke_by_owner_on_database(self, privilege, unique_database,
      grantee_type, grantee, resource_owner_role):
@@ -946,10 +957,11 @@ class TestRanger(CustomClusterTestSuite):
    set_database_owner_role_stmt = "alter database {0} set owner role {1}"
    resource_owner_group = OWNER_USER
    admin_client = self.create_impala_client(user=ADMIN)
+    user_client = self.create_impala_client(user=OWNER_USER)

    try:
-      self._run_query_as_user(grant_database_stmt
-          .format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
+      self._run_query_with_client(grant_database_stmt
+          .format(privilege, unique_database, grantee_type, grantee), user_client,
          True)
      result = admin_client.execute(show_grant_database_stmt
          .format(grantee_type, grantee, unique_database))
@@ -959,8 +971,8 @@ class TestRanger(CustomClusterTestSuite):
          [grantee_type, grantee, unique_database, "*", "*", "", "", "", "",
           privilege, "false"]])

-      self._run_query_as_user(revoke_database_stmt
-          .format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
+      self._run_query_with_client(revoke_database_stmt
+          .format(privilege, unique_database, grantee_type, grantee), user_client,
          True)
      result = admin_client.execute(show_grant_database_stmt
          .format(grantee_type, grantee, unique_database))
@@ -977,12 +989,12 @@ class TestRanger(CustomClusterTestSuite):
          .format(unique_database, resource_owner_group))
      admin_client.execute("invalidate metadata")

-      result = self._run_query_as_user(grant_database_stmt
-          .format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
+      result = self._run_query_with_client(grant_database_stmt
+          .format(privilege, unique_database, grantee_type, grantee), user_client,
          False)
      assert ERROR_GRANT in str(result)
-      result = self._run_query_as_user(revoke_database_stmt
-          .format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
+      result = self._run_query_with_client(revoke_database_stmt
+          .format(privilege, unique_database, grantee_type, grantee), user_client,
          False)
      assert ERROR_REVOKE in str(result)

@@ -992,12 +1004,12 @@ class TestRanger(CustomClusterTestSuite):
      admin_client.execute(set_database_owner_role_stmt
          .format(unique_database, resource_owner_role))

-      result = self._run_query_as_user(grant_database_stmt
-          .format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
+      result = self._run_query_with_client(grant_database_stmt
+          .format(privilege, unique_database, grantee_type, grantee), user_client,
          False)
      assert ERROR_GRANT in str(result)
-      result = self._run_query_as_user(revoke_database_stmt
-          .format(privilege, unique_database, grantee_type, grantee), OWNER_USER,
+      result = self._run_query_with_client(revoke_database_stmt
+          .format(privilege, unique_database, grantee_type, grantee), user_client,
          False)
      assert ERROR_REVOKE in str(result)
      # Change the database owner back to the user 'OWNER_USER'.
@@ -1009,6 +1021,8 @@ class TestRanger(CustomClusterTestSuite):
      # from interfering with other tests.
      admin_client.execute(revoke_database_stmt
          .format(privilege, unique_database, grantee_type, grantee))
+      admin_client.close()
+      user_client.close()

  def _test_grant_revoke_by_owner_on_table(self, privilege, unique_database, table_name,
      grantee_type, grantee, resource_owner_role):
@@ -1020,23 +1034,24 @@ class TestRanger(CustomClusterTestSuite):
    show_grant_table_stmt = "show grant {0} {1} on table {2}.{3}"
    resource_owner_group = OWNER_USER
    admin_client = self.create_impala_client(user=ADMIN)
+    user_client = self.create_impala_client(user=OWNER_USER)
    set_table_owner_user_stmt = "alter table {0}.{1} set owner user {2}"
    set_table_owner_group_stmt = "alter table {0}.{1} set owner group {2}"
    set_table_owner_role_stmt = "alter table {0}.{1} set owner role {2}"

    try:
-      self._run_query_as_user(grant_table_stmt
+      self._run_query_with_client(grant_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee),
-          OWNER_USER, True)
+          user_client, True)
      result = admin_client.execute(show_grant_table_stmt
          .format(grantee_type, grantee, unique_database, table_name))
      TestRanger._check_privileges(result, [
          [grantee_type, grantee, unique_database, table_name, "*", "", "", "",
          "", privilege, "false"]])

-      self._run_query_as_user(revoke_table_stmt
+      self._run_query_with_client(revoke_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee),
-          OWNER_USER, True)
+          user_client, True)
      result = admin_client.execute(show_grant_table_stmt
          .format(grantee_type, grantee, unique_database, table_name))
      TestRanger._check_privileges(result, [])
@@ -1052,13 +1067,13 @@ class TestRanger(CustomClusterTestSuite):
          .format(unique_database, table_name, resource_owner_group))
      admin_client.execute("refresh {0}.{1}".format(unique_database, table_name))

-      result = self._run_query_as_user(grant_table_stmt
+      result = self._run_query_with_client(grant_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee),
-          OWNER_USER, False)
+          user_client, False)
      assert ERROR_GRANT in str(result)
-      result = self._run_query_as_user(revoke_table_stmt
+      result = self._run_query_with_client(revoke_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee),
-          OWNER_USER, False)
+          user_client, False)
      assert ERROR_REVOKE in str(result)

      # Set the owner of the table to a role that has the same name as
@@ -1067,13 +1082,13 @@ class TestRanger(CustomClusterTestSuite):
      admin_client.execute(set_table_owner_role_stmt
          .format(unique_database, table_name, resource_owner_role))

-      result = self._run_query_as_user(grant_table_stmt
+      result = self._run_query_with_client(grant_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee),
-          OWNER_USER, False)
+          user_client, False)
      assert ERROR_GRANT in str(result)
-      result = self._run_query_as_user(revoke_table_stmt
+      result = self._run_query_with_client(revoke_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee),
-          OWNER_USER, False)
+          user_client, False)
      assert ERROR_REVOKE in str(result)
      # Change the table owner back to the user 'OWNER_USER'.
      admin_client.execute(set_table_owner_user_stmt
@@ -1084,6 +1099,8 @@ class TestRanger(CustomClusterTestSuite):
      # from interfering with other tests.
      admin_client.execute(revoke_table_stmt
          .format(privilege, unique_database, table_name, grantee_type, grantee))
+      admin_client.close()
+      user_client.close()

  def _test_grant_revoke_by_owner_on_column(self, privilege, column_names,
      unique_database, table_name, grantee_type, grantee, resource_owner_role):
@@ -1095,14 +1112,15 @@ class TestRanger(CustomClusterTestSuite):
    show_grant_column_stmt = "show grant {0} {1} on column {2}.{3}.{4}"
    resource_owner_group = OWNER_USER
    admin_client = self.create_impala_client(user=ADMIN)
+    user_client = self.create_impala_client(user=OWNER_USER)
    set_table_owner_user_stmt = "alter table {0}.{1} set owner user {2}"
    set_table_owner_group_stmt = "alter table {0}.{1} set owner group {2}"
    set_table_owner_role_stmt = "alter table {0}.{1} set owner role {2}"

    try:
-      self._run_query_as_user(grant_column_stmt
+      self._run_query_with_client(grant_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
-          grantee_type, grantee), OWNER_USER, True)
+          grantee_type, grantee), user_client, True)
      result = admin_client.execute(show_grant_column_stmt
          .format(grantee_type, grantee, unique_database, table_name,
          column_names[0]))
@@ -1110,9 +1128,9 @@ class TestRanger(CustomClusterTestSuite):
          [grantee_type, grantee, unique_database, table_name, column_names[0],
          "", "", "", "", privilege, "false"]])

-      self._run_query_as_user(revoke_column_stmt
+      self._run_query_with_client(revoke_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
-          grantee_type, grantee), OWNER_USER, True)
+          grantee_type, grantee), user_client, True)
      result = admin_client.execute(show_grant_column_stmt
          .format(grantee_type, grantee, unique_database, table_name,
          column_names[0]))
@@ -1125,13 +1143,13 @@ class TestRanger(CustomClusterTestSuite):
          .format(unique_database, table_name, resource_owner_group))
      admin_client.execute("refresh {0}.{1}".format(unique_database, table_name))

-      result = self._run_query_as_user(grant_column_stmt
+      result = self._run_query_with_client(grant_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
-          grantee_type, grantee), OWNER_USER, False)
+          grantee_type, grantee), user_client, False)
      assert ERROR_GRANT in str(result)
-      result = self._run_query_as_user(revoke_column_stmt
+      result = self._run_query_with_client(revoke_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
-          grantee_type, grantee), OWNER_USER, False)
+          grantee_type, grantee), user_client, False)
      assert ERROR_REVOKE in str(result)

      # Set the owner of the table to a role that has the same name as 'OWNER_USER' and
@@ -1140,13 +1158,13 @@ class TestRanger(CustomClusterTestSuite):
      admin_client.execute(set_table_owner_role_stmt
          .format(unique_database, table_name, resource_owner_role))

-      result = self._run_query_as_user(grant_column_stmt
+      result = self._run_query_with_client(grant_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
-          grantee_type, grantee), OWNER_USER, False)
+          grantee_type, grantee), user_client, False)
      assert ERROR_GRANT in str(result)
-      result = self._run_query_as_user(revoke_column_stmt
+      result = self._run_query_with_client(revoke_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
-          grantee_type, grantee), OWNER_USER, False)
+          grantee_type, grantee), user_client, False)
      assert ERROR_REVOKE in str(result)
      # Change the table owner back to the user 'owner_user'.
      admin_client.execute(set_table_owner_user_stmt
@@ -1158,19 +1176,22 @@ class TestRanger(CustomClusterTestSuite):
      admin_client.execute(revoke_column_stmt
          .format(privilege, column_names[0], unique_database, table_name,
          grantee_type, grantee))
+      admin_client.close()
+      user_client.close()

  def _test_grant_revoke_by_owner_on_udf(self, privilege, unique_database, udf_name,
      grantee_type, grantee):
    # Due to IMPALA-11743 and IMPALA-12685, the owner of a UDF could not grant
    # or revoke the SELECT privilege.
-    result = self._run_query_as_user("grant {0} on user_defined_fn "
-        "{1}.{2} to {3} {4}".format(privilege, unique_database, udf_name,
-        grantee_type, grantee), OWNER_USER, False)
-    assert ERROR_GRANT in str(result)
-    result = self._run_query_as_user("revoke {0} on user_defined_fn "
-        "{1}.{2} from {3} {4}".format(privilege, unique_database, udf_name,
-        grantee_type, grantee), OWNER_USER, False)
-    assert ERROR_REVOKE in str(result)
+    with self.create_impala_client(user=OWNER_USER) as user_client:
+      result = self._run_query_with_client("grant {0} on user_defined_fn "
+          "{1}.{2} to {3} {4}".format(privilege, unique_database, udf_name,
+          grantee_type, grantee), user_client, False)
+      assert ERROR_GRANT in str(result)
+      result = self._run_query_with_client("revoke {0} on user_defined_fn "
+          "{1}.{2} from {3} {4}".format(privilege, unique_database, udf_name,
+          grantee_type, grantee), user_client, False)
+      assert ERROR_REVOKE in str(result)

  def _test_allow_catalog_cache_op_from_masked_users(self, unique_name):
    """Verify that catalog cache operations are allowed for masked users
@@ -1820,6 +1841,7 @@ class TestRangerIndependent(TestRanger):
  def test_grant_multiple_columns_consolidate_grant_revoke_requests(self):
    self._test_grant_multiple_columns(1)

+  @SkipIfFS.hive
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
    impalad_args=LEGACY_CATALOG_IMPALAD_ARGS,
@@ -2199,10 +2221,12 @@ class TestRangerLegacyCatalog(TestRanger):
  def test_legacy_catalog_ownership(self):
      self._test_ownership()

+  @SkipIfFS.hive
  @pytest.mark.execute_serially
  def test_grant_revoke_by_owner_legacy_catalog(self, unique_name):
    self._test_grant_revoke_by_owner(unique_name)

+  @SkipIfFS.hive
  @pytest.mark.execute_serially
  def test_select_view_created_by_non_superuser_with_catalog_v1(self, unique_name):
    self._test_select_view_created_by_non_superuser(unique_name)
@@ -2231,10 +2255,12 @@ class TestRangerLocalCatalog(TestRanger):
      pytest.xfail("getTableIfCached() faulty behavior, known issue")
      self._test_ownership()

+  @SkipIfFS.hive
  @pytest.mark.execute_serially
  def test_grant_revoke_by_owner_local_catalog(self, unique_name):
    self._test_grant_revoke_by_owner(unique_name)

+  @SkipIfFS.hive
  @pytest.mark.execute_serially
  def test_select_view_created_by_non_superuser_with_local_catalog(self, unique_name):
    self._test_select_view_created_by_non_superuser(unique_name)
@@ -2453,6 +2479,8 @@ class TestRangerLocalCatalog(TestRanger):
      else:
        assert "Error revoking a privilege in Ranger. Ranger error message: " \
               "HTTP 403 Error: Grantee group invalid_group doesn't exist" in str(result)
+    invalid_impala_client.close()
+    valid_impala_client.close()

  @pytest.mark.execute_serially
  def test_show_functions(self, unique_name):
--- a/tests/common/custom_cluster_test_suite.py
+++ b/tests/common/custom_cluster_test_suite.py
@@ -204,7 +204,9 @@ class CustomClusterTestSuite(ImpalaTestSuite):
      args[LOG_SYMLINKS] = True
    if workload_mgmt:
      args[WORKLOAD_MGMT] = True
-    if force_restart:
+    if force_restart or pytest.config.option.shard_tests:
+      # When sharding tests, always restart the cluster to avoid issues with tests
+      # that depend on a specific test order within a shard.
      args[FORCE_RESTART] = True

    def merge_args(args_first, args_last):
@@ -349,7 +351,7 @@ class CustomClusterTestSuite(ImpalaTestSuite):
      kwargs[IMPALAD_TIMEOUT_S] = args[IMPALAD_TIMEOUT_S]
    if FORCE_RESTART in args:
      kwargs[FORCE_RESTART] = args[FORCE_RESTART]
-      if args[FORCE_RESTART] is True:
+      if args[FORCE_RESTART] is True and not pytest.config.option.shard_tests:
        LOG.warning("Test uses force_restart=True to avoid restarting the cluster. "
                    "Test reorganization/assertion rewrite is needed")
    else:
@@ -398,6 +400,7 @@ class CustomClusterTestSuite(ImpalaTestSuite):
  @classmethod
  def cluster_teardown(cls, name, args):
    if args.get(WORKLOAD_MGMT, False):
+      cls.close_impala_clients()
      cls.cluster.graceful_shutdown_impalads()

    cls.clear_tmp_dirs()
@@ -618,6 +621,9 @@ class CustomClusterTestSuite(ImpalaTestSuite):
        except Exception as e:
          LOG.info("Failed to reuse running cluster: %s" % e)
          pass
+        finally:
+          cls.cluster = ImpalaCluster.get_e2e_test_cluster()
+          cls.impalad_test_service = cls.create_impala_service()

    LOG.info("Starting cluster with command: %s" % cmd_str)
    try:
--- a/tests/common/impala_test_suite.py
+++ b/tests/common/impala_test_suite.py
@@ -35,6 +35,7 @@ import socket
 import subprocess
 import time
 import string
+
 from functools import wraps
 from getpass import getuser
 from impala.hiveserver2 import HiveServer2Cursor
@@ -455,7 +456,7 @@ class ImpalaTestSuite(BaseTestSuite):
  @classmethod
  def close_impala_clients(cls):
    """Closes Impala clients created by create_impala_clients()."""
-    # cls.client should be equal to one of belove, unless test method implicitly override.
+    # cls.client should be equal to one of below, unless test method implicitly override.
    # Closing twice would lead to error in some clients (impyla+SSL).
    if cls.client not in (cls.beeswax_client, cls.hs2_client, cls.hs2_http_client):
      cls.client.close()
--- a/tests/common/skip.py
+++ b/tests/common/skip.py
@@ -81,7 +81,8 @@ class SkipIfFS:
  incorrent_reported_ec = pytest.mark.skipif(IS_OZONE and IS_EC, reason="HDDS-8543")

  # These need test infra work to re-enable.
-  hive = pytest.mark.skipif(not IS_HDFS, reason="Hive doesn't work")
+  hive = pytest.mark.skipif(
+      not IS_HDFS, reason="HiveServer2 doesn't work or not started")
  hbase = pytest.mark.skipif(not IS_HDFS, reason="HBase not started")
  qualified_path = pytest.mark.skipif(not IS_HDFS,
      reason="Tests rely on HDFS qualified paths, IMPALA-1872")
@@ -135,6 +136,7 @@ class SkipIf:
  not_tuple_cache = pytest.mark.skipif(not IS_TUPLE_CACHE,
      reason="Tuple Cache needed")

+
 class SkipIfLocal:
  # These are skipped due to product limitations.
  hdfs_blocks = pytest.mark.skipif(IS_LOCAL,
@@ -152,6 +154,7 @@ class SkipIfLocal:
  root_path = pytest.mark.skipif(IS_LOCAL,
      reason="Tests rely on the root directory")

+
 class SkipIfNotHdfsMinicluster:
  # These are skipped when not running against a local HDFS mini-cluster.
  plans = pytest.mark.skipif(
@@ -165,6 +168,7 @@ class SkipIfNotHdfsMinicluster:
      reason="Test is tuned for scheduling decisions made on a 3-node HDFS minicluster "
             "with no EC")

+
 class SkipIfBuildType:
  dev_build = pytest.mark.skipif(IMPALA_TEST_CLUSTER_PROPERTIES.is_dev(),
      reason="Test takes too much time on debug build.")
@@ -173,6 +177,7 @@ class SkipIfBuildType:
  remote = pytest.mark.skipif(IMPALA_TEST_CLUSTER_PROPERTIES.is_remote_cluster(),
      reason="Test depends on running against a local Impala cluster")

+
 class SkipIfEC:
  contain_full_explain = pytest.mark.skipif(IS_EC, reason="Contain full explain output "
              "for hdfs tables.")
@@ -225,6 +230,7 @@ class SkipIfHive2:
  ranger_auth = pytest.mark.skipif(HIVE_MAJOR_VERSION <= 2,
      reason="Hive 2 doesn't support Ranger authorization.")

+
 class SkipIfCatalogV2:
  """Expose decorators as methods so that is_catalog_v2_cluster() can be evaluated lazily
  when needed, instead of whenever this module is imported."""
@@ -261,6 +267,7 @@ class SkipIfCatalogV2:
      IMPALA_TEST_CLUSTER_PROPERTIES.is_catalog_v2_cluster(),
      reason="Table isn't invalidated with Local catalog and enabled hms_event_polling.")

+
 class SkipIfApacheHive():
  feature_not_supported = pytest.mark.skipif(IS_APACHE_HIVE,
      reason="Apache Hive does not support this feature")
--- a/tests/comparison/cluster.py
+++ b/tests/comparison/cluster.py
@@ -30,7 +30,6 @@ import requests
 import shutil
 import subprocess
 from abc import ABCMeta, abstractproperty
-from cm_api.api_client import ApiResource as CmApiResource
 from collections import defaultdict
 from collections import OrderedDict
 from contextlib import contextmanager
@@ -38,7 +37,6 @@ from getpass import getuser
 from io import BytesIO
 from multiprocessing.pool import ThreadPool
 from random import choice
-from sys import maxsize
 from tempfile import mkdtemp
 from threading import Lock
 from time import mktime, strptime
@@ -50,6 +48,13 @@ try:
 except ImportError:
  from urlparse import urlparse

+try:
+  from cm_api.api_client import ApiResource as CmApiResource
+except ImportError:
+  # If the cm_api module is not available, we will not be able to use Cloudera Manager.
+  # This is fine for local testing.
+  pass
+
 from tests.comparison.db_connection import HiveConnection, ImpalaConnection
 from tests.common.environ import HIVE_MAJOR_VERSION
 from tests.common.errors import Timeout
@@ -179,12 +184,9 @@ class Cluster(with_metaclass(ABCMeta, object)):
    """
    Print the cluster impalad version info to the console sorted by hostname.
    """
-    def _sorter(i1, i2):
-      return cmp(i1.host_name, i2.host_name)
-
    version_info = self.impala.get_version_info()
    print("Cluster Impalad Version Info:")
-    for impalad in sorted(version_info.keys(), cmp=_sorter):
+    for impalad in sorted(version_info.keys(), key=lambda x: x.host_name):
      print("{0}: {1}".format(impalad.host_name, version_info[impalad]))


@@ -635,7 +637,7 @@ class Impala(Service):
      impalads = self.impalads
    promise = self._thread_pool.map_async(func, impalads)
    # Python doesn't handle ctrl-c well unless a timeout is provided.
-    results = promise.get(maxsize)
+    results = promise.get(timeout=(2 ** 31 - 1))
    if as_dict:
      results = dict(zip(impalads, results))
    return results
--- a/tests/comparison/db_types.py
+++ b/tests/comparison/db_types.py
@@ -26,6 +26,7 @@ from tests.comparison.common import ValExpr, ValExprList

 module_contents = dict()

+
 class DataTypeMetaclass(type):
  '''Provides sorting of classes used to determine upcasting.'''

@@ -39,9 +40,23 @@ class DataTypeMetaclass(type):
  def __cmp__(cls, other):
    if not isinstance(other, DataTypeMetaclass):
      return -1
-    return cmp(
-        getattr(cls, 'CMP_VALUE', cls.__name__),
-        getattr(other, 'CMP_VALUE', other.__name__))
+    val_this = getattr(cls, 'CMP_VALUE', cls.__name__)
+    val_other = getattr(other, 'CMP_VALUE', other.__name__)
+    if (val_this < val_other):
+      return -1
+    elif (val_this > val_other):
+      return 1
+    else:
+      return 0
+
+  def __lt__(cls, other):
+    # This __lt__ method replace __cmp__ that is removed in Python3.
+    # See https://peps.python.org/pep-0207/.
+    # It is mainly to serve max() inside update_return_type_and_append() of funcs.py.
+    if not isinstance(other, DataTypeMetaclass):
+      return False
+    return (getattr(cls, 'CMP_VALUE', cls.__name__)
+            < getattr(other, 'CMP_VALUE', other.__name__))


 class DataType(with_metaclass(DataTypeMetaclass, ValExpr)):
@@ -92,7 +107,6 @@ class DataType(with_metaclass(DataTypeMetaclass, ValExpr)):
    return type(self)


-
 class Boolean(DataType):
  pass

@@ -213,6 +227,10 @@ JOINABLE_TYPES = (Char, Decimal, Int, Timestamp)
 TYPES = tuple(set(type_.type for type_ in EXACT_TYPES))

 __DECIMAL_TYPE_CACHE = dict()
+__CHAR_TYPE_CACHE = dict()
+__VARCHAR_TYPE_CACHE = dict()
+
+
 def get_decimal_class(total_digits, fractional_digits):
  cache_key = (total_digits, fractional_digits)
  if cache_key not in __DECIMAL_TYPE_CACHE:
@@ -223,7 +241,6 @@ def get_decimal_class(total_digits, fractional_digits):
  return __DECIMAL_TYPE_CACHE[cache_key]


-__CHAR_TYPE_CACHE = dict()
 def get_char_class(length):
  if length not in __CHAR_TYPE_CACHE:
    __CHAR_TYPE_CACHE[length] = type(
@@ -233,7 +250,6 @@ def get_char_class(length):
  return __CHAR_TYPE_CACHE[length]


-__VARCHAR_TYPE_CACHE = dict()
 def get_varchar_class(length):
  if length not in __VARCHAR_TYPE_CACHE:
    __VARCHAR_TYPE_CACHE[length] = type(
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -696,7 +696,7 @@ def validate_python_version():
 def pytest_collection_modifyitems(items, config, session):
  """Hook to handle --shard_tests command line option.

-  If set, this "deselects" a subset of tests, by hashing
+  If set, this "deselects" a subset of tests, by hashing (using crc32())
  their id into buckets.
  """
  if not config.option.shard_tests:
@@ -710,7 +710,7 @@ def pytest_collection_modifyitems(items, config, session):

  items_selected, items_deselected = [], []
  for i in items:
-    if hash(i.nodeid) % num_shards == this_shard:
+    if crc32(i.nodeid.encode('utf-8')) % num_shards == this_shard:
      items_selected.append(i)
    else:
      items_deselected.append(i)
--- a/tests/custom_cluster/test_ipv6.py
+++ b/tests/custom_cluster/test_ipv6.py
@@ -22,6 +22,7 @@ import logging
 import os
 import pytest
 import requests
+import sys

 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
 from tests.common.network import SKIP_SSL_MSG
@@ -66,7 +67,8 @@ WEBUI_PORTS = [25000, 25010, 25020]
 # Error text can depend on both protocol and python version.
 CONN_ERR = ["Could not connect", "Connection refused"]
 CERT_ERR = ["doesn't match", "certificate verify failed"]
-WEB_CERT_ERR = "CertificateError"
+WEB_CERT_ERR = ("CertificateError" if sys.version_info.major < 3
+                else "SSLCertVerificationError")


 class TestIPv6Base(CustomClusterTestSuite):
--- a/tests/custom_cluster/test_logging.py
+++ b/tests/custom_cluster/test_logging.py
@@ -46,21 +46,24 @@ class TestLoggingCore(CustomClusterTestSuite):
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(cluster_size=1,
      impalad_args="--max_error_logs_per_instance=2",
-      disable_log_buffering=True)
+      disable_log_buffering=True,
+      force_restart=True)
  def test_max_errors(self):
    self._test_max_errors(2, 4, True)

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(cluster_size=1,
      impalad_args="--max_error_logs_per_instance=3",
-      disable_log_buffering=True)
+      disable_log_buffering=True,
+      force_restart=True)
  def test_max_errors_0(self):
    self._test_max_errors(3, 0, True)

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(cluster_size=1,
      impalad_args="--max_error_logs_per_instance=2",
-      disable_log_buffering=True)
+      disable_log_buffering=True,
+      force_restart=True)
  def test_max_errors_no_downgrade(self):
    self._test_max_errors(2, -1, False)

--- a/tests/custom_cluster/test_query_retries.py
+++ b/tests/custom_cluster/test_query_retries.py
@@ -93,6 +93,7 @@ class TestQueryRetries(CustomClusterTestSuite):
  _count_query_result = "55"

  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(force_restart=True)
  def test_retries_from_cancellation_pool(self):
    """Tests that queries are retried instead of cancelled if one of the nodes leaves the
    cluster. The retries are triggered by the cancellation pool in the ImpalaServer. The
@@ -138,7 +139,8 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="-statestore_heartbeat_frequency_ms=1000")
+      statestored_args="-statestore_heartbeat_frequency_ms=1000",
+      force_restart=True)
  def test_kill_impalad_expect_retry(self):
    """Launch a query, wait for it to start running, kill a random impalad and then
    validate that the query has successfully been retried. Increase the statestore
@@ -251,7 +253,8 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="-statestore_heartbeat_frequency_ms=60000")
+      statestored_args="-statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_exec_rpc_failure(self):
    """Test ExecFInstance RPC failures. Set a really high statestort heartbeat frequency
    so that killed impalads are not removed from the cluster membership. This will cause
@@ -306,8 +309,9 @@ class TestQueryRetries(CustomClusterTestSuite):
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
      impalad_args="--debug_actions=" + _get_rpc_fail_action(FAILED_KRPC_PORT),
-      statestored_args="--statestore_heartbeat_frequency_ms=1000 \
-          --statestore_max_missed_heartbeats=2")
+      statestored_args=("--statestore_heartbeat_frequency_ms=1000 "
+                        "--statestore_max_missed_heartbeats=2"),
+      force_restart=True)
  def test_retry_exec_rpc_failure_before_admin_delay(self):
    """Test retried query triggered by RPC failures by simulating RPC errors at the port
    of the 2nd node in the cluster. Simulate admission delay for query with debug_action
@@ -375,7 +379,7 @@ class TestQueryRetries(CustomClusterTestSuite):
      impalad_args="--debug_actions=" + _get_rpc_fail_action(FAILED_KRPC_PORT),
      statestored_args="--statestore_heartbeat_frequency_ms=1000 \
          --statestore_max_missed_heartbeats=2",
-      cluster_size=2, num_exclusive_coordinators=1)
+      cluster_size=2, num_exclusive_coordinators=1, force_restart=True)
  def test_retry_query_failure_all_executors_blacklisted(self):
    """Test retried query triggered by RPC failures by simulating RPC errors at the port
    of the 2nd node, which is the only executor in the cluster. Simulate admission delay
@@ -438,7 +442,7 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="-statestore_heartbeat_frequency_ms=1000")
+      statestored_args="-statestore_heartbeat_frequency_ms=1000", force_restart=True)
  def test_multiple_retries(self):
    """Test that a query can only be retried once, and that if the retry attempt fails,
    it fails correctly and with the right error message. Multiple retry attempts are
@@ -498,6 +502,7 @@ class TestQueryRetries(CustomClusterTestSuite):
    self.__validate_memz()

  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(force_restart=True)
  def test_retry_fetched_rows(self):
    """Test that query retries are not triggered if some rows have already been
    fetched. Run a query, fetch some rows from it, kill one of the impalads that is
@@ -612,7 +617,7 @@ class TestQueryRetries(CustomClusterTestSuite):
    self.client.close_query(handle)

  @pytest.mark.execute_serially
-  @CustomClusterTestSuite.with_args(disable_log_buffering=True)
+  @CustomClusterTestSuite.with_args(disable_log_buffering=True, force_restart=True)
  def test_query_retry_reaches_spool_limit(self):
    """Test retryable queries with results spooling enabled and
    spool_all_results_for_retries=true that reach spooling mem limit will return rows and
@@ -661,6 +666,7 @@ class TestQueryRetries(CustomClusterTestSuite):
             "fetched some rows" % self.client.handle_id(handle) in str(e)

  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(force_restart=True)
  def test_original_query_cancel(self):
    """Test canceling a retryable query with spool_all_results_for_retries=true. Make sure
    Coordinator::Wait() won't block in cancellation."""
@@ -682,6 +688,7 @@ class TestQueryRetries(CustomClusterTestSuite):
        assert "Cancelled" in str(e)

  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(force_restart=True)
  def test_retry_finished_query(self):
    """Test that queries in FINISHED state can still be retried before the client fetch
    any rows. Sets batch_size to 1 so results will be available as soon as possible.
@@ -708,7 +715,7 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="-statestore_heartbeat_frequency_ms=60000")
+      statestored_args="-statestore_heartbeat_frequency_ms=60000", force_restart=True)
  def test_retry_query_cancel(self):
    """Trigger a query retry, and then cancel the retried query. Validate that the
    cancelled query fails with the correct error message. Set a really high statestore
@@ -750,7 +757,8 @@ class TestQueryRetries(CustomClusterTestSuite):
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
      impalad_args="--debug_actions=RETRY_DELAY_CHECKING_ORIGINAL_DRIVER:SLEEP@1000",
-      statestored_args="--statestore_heartbeat_frequency_ms=60000")
+      statestored_args="--statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retrying_query_cancel(self):
    """Trigger a query retry, and then cancel and close the retried query in RETRYING
    state. Validate that it doesn't crash the impalad. Set a really high statestore
@@ -780,7 +788,8 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="--statestore_heartbeat_frequency_ms=60000")
+      statestored_args="--statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retrying_query_before_inflight(self):
    """Trigger a query retry, and delay setting the original query inflight as that may
    happen after the query is retried. Validate that the query succeeds. Set a really
@@ -820,7 +829,8 @@ class TestQueryRetries(CustomClusterTestSuite):
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
      impalad_args="--debug_actions=RETRY_DELAY_GET_QUERY_DRIVER:SLEEP@2000",
-      statestored_args="--statestore_heartbeat_frequency_ms=60000")
+      statestored_args="--statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_query_close_before_getting_query_driver(self):
    """Trigger a query retry, and then close the retried query before getting
    the query driver. Validate that it doesn't crash the impalad.
@@ -847,7 +857,8 @@ class TestQueryRetries(CustomClusterTestSuite):
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
      impalad_args="--debug_actions=QUERY_RETRY_SET_RESULT_CACHE:FAIL",
-      statestored_args="--statestore_heartbeat_frequency_ms=60000")
+      statestored_args="--statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_query_result_cacheing_failed(self):
    """Test setting up results cacheing failed."""

@@ -874,7 +885,8 @@ class TestQueryRetries(CustomClusterTestSuite):
  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
      impalad_args="--debug_actions=QUERY_RETRY_SET_QUERY_IN_FLIGHT:FAIL",
-      statestored_args="--statestore_heartbeat_frequency_ms=60000")
+      statestored_args="--statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_query_set_query_in_flight_failed(self):
    """Test setting query in flight failed."""

@@ -898,7 +910,8 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="-statestore_heartbeat_frequency_ms=60000")
+      statestored_args="-statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_query_timeout(self):
    """Trigger a query retry, and then leave the query handle open until the
    'query_timeout_s' causes the handle to be closed. Assert that the runtime profile of
@@ -941,8 +954,10 @@ class TestQueryRetries(CustomClusterTestSuite):
    assert impalad_service.get_metric_value('impala-server.num-queries-expired') == 1

  @pytest.mark.execute_serially
-  @CustomClusterTestSuite.with_args(impalad_args="--idle_session_timeout=1",
-        statestored_args="--statestore_heartbeat_frequency_ms=60000")
+  @CustomClusterTestSuite.with_args(
+      impalad_args="--idle_session_timeout=1",
+      statestored_args="--statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_query_session_timeout(self):
    """Similar to 'test_retry_query_timeout' except with an idle session timeout."""
    self.close_impala_clients()
@@ -977,7 +992,8 @@ class TestQueryRetries(CustomClusterTestSuite):

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
-      statestored_args="-statestore_heartbeat_frequency_ms=60000")
+      statestored_args="-statestore_heartbeat_frequency_ms=60000",
+      force_restart=True)
  def test_retry_query_hs2(self):
    """Test query retries with the HS2 protocol. Enable the results set cache as well and
    test that query retries work with the results cache."""
--- a/tests/custom_cluster/test_restart_services.py
+++ b/tests/custom_cluster/test_restart_services.py
@@ -1010,7 +1010,7 @@ class TestGracefulShutdown(CustomClusterTestSuite, HS2TestSuite):
    assert cancel == "{0}s000ms".format(get_remain_shutdown_query_cancel(
        self.COORD_SHUTDOWN_FAST_DEADLINE_S, self.COORD_SHUTDOWN_FAST_DEADLINE_S))
    assert registered == "0"
-    assert running > 0
+    assert int(running) > 0
    self.cluster.impalads[1].wait_for_exit()
    # The slow query should be cancelled.
    self.__check_deadline_expired(SLOW_QUERY, slow_query_handle, True)
--- a/tests/custom_cluster/test_s3a_access.py
+++ b/tests/custom_cluster/test_s3a_access.py
@@ -25,7 +25,7 @@ from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
 from tests.common.skip import SkipIf
 from tests.util.filesystem_utils import WAREHOUSE

-tmp = tempfile.NamedTemporaryFile(delete=False)
+tmp = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
 BAD_KEY_FILE = tmp.name


--- a/tests/custom_cluster/test_saml2_sso.py
+++ b/tests/custom_cluster/test_saml2_sso.py
@@ -21,6 +21,7 @@ import base64
 import datetime
 import os
 import pytest
+import sys
 import uuid
 import xml.etree.ElementTree as ET
 import zlib
@@ -40,7 +41,7 @@ from tests.shell.util import run_impala_shell_cmd

 class NoRedirection(HTTPErrorProcessor):
  """Allows inspecting http redirection responses. """
-  def http_response(self, request, response):
+  def http_response(self, request, response):  # noqa: U100
    return response


@@ -49,6 +50,13 @@ def format_time(time):
  return time.strftime("%Y-%m-%dT%H:%M:%SZ")


+def encode_if_needed(value):
+  """ Encodes the value to bytes if needed, depending on the python version. """
+  if sys.version_info.major < 3:
+    return value.encode('utf-8') if isinstance(value, str) else value
+  return value if isinstance(value, bytes) else value.encode('utf-8')
+
+
 class TestClientSaml(CustomClusterTestSuite):
  """ Tests for a client using SAML2 browser profile.

@@ -100,9 +108,9 @@ class TestClientSaml(CustomClusterTestSuite):
              "--saml2_ee_test_mode=true"
              % (CERT_DIR, CERT_DIR, SP_CALLBACK_URL))

-  SSO_ARGS_WITH_GROUP_FILTER = (SSO_ARGS + " " +
-                                "--saml2_group_filter=group1,group2 "
-                                "--saml2_group_attribute_name=eduPersonAffiliation")
+  SSO_ARGS_WITH_GROUP_FILTER = (
+      "{} --saml2_group_filter=group1,group2 "
+      "--saml2_group_attribute_name=eduPersonAffiliation").format(SSO_ARGS)

  @CustomClusterTestSuite.with_args(impalad_args=SSO_ARGS, cluster_size=1)
  def test_saml2_browser_profile_no_group_filter(self, vector):
@@ -131,7 +139,7 @@ class TestClientSaml(CustomClusterTestSuite):

  @CustomClusterTestSuite.with_args(
      impalad_args=SSO_ARGS_WITH_GROUP_FILTER, cluster_size=1)
-  def test_saml2_browser_profile_with_group_filter(self, vector):
+  def test_saml2_browser_profile_with_group_filter(self):
      # test the SAML worflow with different attributes
      self._test_saml2_browser_workflow("", False)

@@ -157,7 +165,8 @@ class TestClientSaml(CustomClusterTestSuite):
    """ Initial POST request to hs2-http port, response should be redirected
        to IDP and contain the authnrequest. """
    opener = build_opener(NoRedirection)
-    req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, " ")
+    payload = encode_if_needed(" ")
+    req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, payload)
    req.add_header('X-Hive-Token-Response-Port', TestClientSaml.CLIENT_PORT)
    response = opener.open(req)
    relay_state, client_id, saml_req_xml = \
@@ -171,7 +180,10 @@ class TestClientSaml(CustomClusterTestSuite):
    assert client_id is not None
    new_url = response.info()["location"]
    assert new_url.startswith(TestClientSaml.IDP_URL)
-    query = parse_qs(urlparse(new_url).query.encode('ASCII'))
+    query_part = urlparse(new_url).query
+    query = parse_qs(query_part.encode('ASCII') if sys.version_info.major < 3
+                     else query_part)
+    assert "RelayState" in query, query
    relay_state = query["RelayState"][0]
    assert relay_state is not None
    saml_req = query["SAMLRequest"][0]
@@ -187,7 +199,8 @@ class TestClientSaml(CustomClusterTestSuite):
  def _request_resource_with_bearer(self, client_id, bearer_token):
    """ Send POST request to hs2-http port again, this time with bearer tokan.
        The response should contain a security cookie if the validation succeeded """
-    req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, " ")
+    payload = encode_if_needed(" ")
+    req = Request("http://localhost:%s" % TestClientSaml.HOST_PORT, payload)
    req.add_header('X-Hive-Client-Identifier', client_id)
    req.add_header('Authorization', "Bearer " + bearer_token)
    opener = build_opener(NoRedirection)
@@ -205,10 +218,11 @@ class TestClientSaml(CustomClusterTestSuite):
        Impala should answer with a form that submits to CLIENT_PORT and contains
        the bearer token as a hidden state. """
    authn_resp = self._generate_authn_response(request_id, attributes_xml)
-    encoded_authn_resp = base64.urlsafe_b64encode(authn_resp)
-    body = "SAMLResponse=%s&RelayState=%s" % (encoded_authn_resp, relay_state)
+    encoded_authn_resp = base64.urlsafe_b64encode(authn_resp.encode('utf-8'))
+    body = (b"SAMLResponse=" + encoded_authn_resp + b"&RelayState="
+            + encode_if_needed(relay_state))
    opener = build_opener(NoRedirection)
-    req = Request(TestClientSaml.SP_CALLBACK_URL, body)
+    req = Request(TestClientSaml.SP_CALLBACK_URL, encode_if_needed(body))
    response = opener.open(req)
    bearer_token = self._parse_xhtml_form(response, expect_success)
    return bearer_token
@@ -244,7 +258,7 @@ class TestClientSaml(CustomClusterTestSuite):
        message = input.attrib["value"]

    if expect_success:
-      assert token.startswith("u=user1")
+      assert token.startswith("u=user1"), str(content)
    else:
      assert message == TestClientSaml.ASSERTATION_ERROR_MESSAGE
    return token
--- a/tests/custom_cluster/test_scratch_disk.py
+++ b/tests/custom_cluster/test_scratch_disk.py
@@ -30,6 +30,7 @@ import time
 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
 from tests.common.skip import SkipIf
 from tests.util.hdfs_util import NAMENODE
+from tests.util.parse_util import bytes_to_str


 class TestScratchDir(CustomClusterTestSuite):
@@ -537,16 +538,18 @@ class TestScratchDir(CustomClusterTestSuite):
    hostname = socket.gethostname()
    # Verify that there are leftover files in the remote scratch dirs after being killed.
    full_dfs_tmp_path = "{}/impala-scratch".format(self.dfs_tmp_path())
-    files_result = subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path])
+    files_result = bytes_to_str(
+        subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path]))
    assert "Found 1 items" in files_result
    assert hostname in files_result
    full_dfs_tmp_path_with_hostname = "{}/{}".format(full_dfs_tmp_path, hostname)
-    files_result = subprocess.check_output(["hdfs", "dfs", "-ls",
-        full_dfs_tmp_path_with_hostname])
+    files_result = bytes_to_str(
+        subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path_with_hostname]))
    assert "Found 1 items" in files_result
    impalad.start()
    # Verify that the leftover files being removed after the impala daemon restarted.
-    files_result = subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path])
+    files_result = bytes_to_str(
+        subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path]))
    assert files_result == ""

  @pytest.mark.execute_serially
@@ -578,6 +581,7 @@ class TestScratchDir(CustomClusterTestSuite):
    client.close()
    # Verify that no host-level dir in the remote scratch dirs after shutdown.
    full_dfs_tmp_path = "{}/impala-scratch".format(self.dfs_tmp_path())
-    files_result = subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path])
+    files_result = bytes_to_str(
+        subprocess.check_output(["hdfs", "dfs", "-ls", full_dfs_tmp_path]))
    assert files_result == ""
    impalad.start()
--- a/tests/custom_cluster/test_shell_interactive_reconnect.py
+++ b/tests/custom_cluster/test_shell_interactive_reconnect.py
@@ -18,7 +18,6 @@
 from __future__ import absolute_import, division, print_function
 import socket

-import pexpect
 import pytest

 # Follow tests/shell/test_shell_interactive.py naming.
@@ -32,6 +31,7 @@ from tests.verifiers.metric_verifier import MetricVerifier

 NUM_QUERIES = 'impala-server.num-queries'

+
 class TestShellInteractiveReconnect(CustomClusterTestSuite):
  """ Check if interactive shell is using the current DB after reconnecting """
  @pytest.mark.execute_serially
@@ -75,6 +75,7 @@ class TestShellInteractiveReconnect(CustomClusterTestSuite):
      assert "alltypesaggmultifilesnopart" in result.stdout, result.stdout

  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args(force_restart=True)
  def test_auto_reconnect_after_impalad_died(self):
    """Test reconnect after restarting the remote impalad without using connect;"""
    # Use pexpect instead of ImpalaShell() since after using get_result() in ImpalaShell()
--- a/tests/query_test/test_delimited_text.py
+++ b/tests/query_test/test_delimited_text.py
@@ -26,6 +26,7 @@ from tests.common.test_dimensions import (
    create_single_exec_option_dimension,
    create_uncompressed_text_dimension)

+
 class TestDelimitedText(ImpalaTestSuite):
  """
  Tests delimited text files with different tuple delimiters, field delimiters
@@ -43,7 +44,7 @@ class TestDelimitedText(ImpalaTestSuite):
  def test_delimited_text(self, vector, unique_database):
    self.run_test_case('QueryTest/delimited-text', vector, unique_database)

-  def test_delimited_text_newlines(self, vector, unique_database):
+  def test_delimited_text_newlines(self, unique_database):
    """ Test text with newlines in strings - IMPALA-1943. Execute queries from Python to
    avoid issues with newline handling in test file format. """
    self.execute_query_expect_success(self.client, """
@@ -62,7 +63,7 @@ class TestDelimitedText(ImpalaTestSuite):
    result = self.execute_query("select * from %s.nl_queries" % unique_database)
    assert len(result.data) == 2
    assert result.data[0].split("\t") == ["the\n", "\nquick\nbrown", "fox\n"]
-    assert result.data[1].split("\t") == ["\njumped","over the lazy\n","\ndog"]
+    assert result.data[1].split("\t") == ["\njumped", "over the lazy\n", "\ndog"]
    # The row count may be computed without parsing each row, so could be inconsistent.
    result = self.execute_query("select count(*) from %s.nl_queries" % unique_database)
    assert len(result.data) == 1
@@ -72,8 +73,15 @@ class TestDelimitedText(ImpalaTestSuite):
    """Verifies Impala is able to properly handle delimited text that contains
    extended ASCII/latin characters. Marked as running serial because of shared
    cleanup/setup"""
-    self.run_test_case('QueryTest/delimited-latin-text', vector, unique_database,
-      encoding="latin-1")
+    result = self.execute_query_expect_success(
+        self.hs2_client,
+        "select * from functional.text_thorn_ecirc_newline order by col1")
+    assert result.tuples() == [('one', 'two', 3, 4),
+                               (b'one\xea', 'two', 3, 4),
+                               (b'one\xea\xea', 'two', 3, 4),
+                               (b'one\xea\xfeone', 'two', 3, 4),
+                               (b'one\xfeone', 'two', 3, 4)]
+    self.run_test_case('QueryTest/delimited-latin-text', vector, unique_database)

  def test_large_file_of_field_delimiters(self, vector, unique_database):
    """IMPALA-13161: Verifies reading a large file which has full of field delimiters
@@ -83,7 +91,7 @@ class TestDelimitedText(ImpalaTestSuite):
    table_loc = self._get_table_location(tbl, vector)
    # Generate a 3GB data file that has full of '\x00' (the default field delimiter)
    with open("data.txt", "wb") as f:
-      long_str = "\x00" * 1024 * 1024 * 3
+      long_str = b"\x00" * 1024 * 1024 * 3
      [f.write(long_str) for i in range(1024)]
    check_call(["hdfs", "dfs", "-put", "data.txt", table_loc])
    self.execute_query("refresh " + tbl)
--- a/tests/query_test/test_iceberg.py
+++ b/tests/query_test/test_iceberg.py
@@ -47,6 +47,7 @@ from tests.shell.util import run_impala_shell_cmd
 from tests.util.filesystem_utils import FILESYSTEM_PREFIX, get_fs_path, IS_HDFS, WAREHOUSE
 from tests.util.get_parquet_metadata import get_parquet_metadata
 from tests.util.iceberg_util import cast_ts, get_snapshots, IcebergCatalogs, quote
+from tests.util.parse_util import bytes_to_str

 LOG = logging.getLogger(__name__)

@@ -1130,35 +1131,35 @@ class TestIcebergTable(IcebergTestSuite):
         {'key': 6, 'value': 1},
         {'key': 7, 'value': 2}]
    assert datafiles[1]['lower_bounds'] == \
-        [{'key': 1, 'value': 'abc'},
+        [{'key': 1, 'value': b'abc'},
         # INT is serialized as 4-byte little endian
-         {'key': 2, 'value': '\x00\x00\x00\x00'},
+         {'key': 2, 'value': b'\x00\x00\x00\x00'},
         # BOOLEAN is serialized as 0x00 for FALSE
-         {'key': 3, 'value': '\x00'},
+         {'key': 3, 'value': b'\x00'},
         # BIGINT is serialized as 8-byte little endian
-         {'key': 4, 'value': '\x40\xaf\x0d\x86\x48\x70\x00\x00'},
+         {'key': 4, 'value': b'\x40\xaf\x0d\x86\x48\x70\x00\x00'},
         # TIMESTAMP is serialized as 8-byte little endian (number of microseconds since
         # 1970-01-01 00:00:00)
-         {'key': 5, 'value': '\xc0\xd7\xff\x06\xd0\xff\xff\xff'},
+         {'key': 5, 'value': b'\xc0\xd7\xff\x06\xd0\xff\xff\xff'},
         # DATE is serialized as 4-byte little endian (number of days since 1970-01-01)
-         {'key': 6, 'value': '\x93\xfe\xff\xff'},
+         {'key': 6, 'value': b'\x93\xfe\xff\xff'},
         # Unlike other numerical values, DECIMAL is serialized as big-endian.
-         {'key': 7, 'value': '\xd8\xf0'}]
+         {'key': 7, 'value': b'\xd8\xf0'}]
    assert datafiles[1]['upper_bounds'] == \
-        [{'key': 1, 'value': 'ghij'},
+        [{'key': 1, 'value': b'ghij'},
         # INT is serialized as 4-byte little endian
-         {'key': 2, 'value': '\x03\x00\x00\x00'},
+         {'key': 2, 'value': b'\x03\x00\x00\x00'},
         # BOOLEAN is serialized as 0x01 for TRUE
-         {'key': 3, 'value': '\x01'},
+         {'key': 3, 'value': b'\x01'},
         # BIGINT is serialized as 8-byte little endian
-         {'key': 4, 'value': '\x81\x58\xc2\x97\x56\xd5\x00\x00'},
+         {'key': 4, 'value': b'\x81\x58\xc2\x97\x56\xd5\x00\x00'},
         # TIMESTAMP is serialized as 8-byte little endian (number of microseconds since
         # 1970-01-01 00:00:00)
-         {'key': 5, 'value': '\x80\x02\x86\xef\x2f\x00\x00\x00'},
+         {'key': 5, 'value': b'\x80\x02\x86\xef\x2f\x00\x00\x00'},
         # DATE is serialized as 4-byte little endian (number of days since 1970-01-01)
-         {'key': 6, 'value': '\x6d\x01\x00\x00'},
+         {'key': 6, 'value': b'\x6d\x01\x00\x00'},
         # Unlike other numerical values, DECIMAL is serialized as big-endian.
-         {'key': 7, 'value': '\x00\xdc\x14'}]
+         {'key': 7, 'value': b'\x00\xdc\x14'}]

  def test_using_upper_lower_bound_metrics(self, vector, unique_database):
    self.run_test_case('QueryTest/iceberg-upper-lower-bound-metrics', vector,
@@ -2224,7 +2225,7 @@ class TestIcebergV2Table(IcebergTestSuite):
      table_location = "{0}/test-warehouse/{1}.db/{2}/data".format(
          FILESYSTEM_PREFIX, unique_database, table_name)
      files_result = check_output(["hdfs", "dfs", "-ls", table_location])
-      assert "Found 1 items" in files_result
+      assert "Found 1 items" in bytes_to_str(files_result)

  def test_predicate_push_down_hint(self, vector, unique_database):
    self.run_test_case('QueryTest/iceberg-predicate-push-down-hint', vector,
--- a/tests/query_test/test_queries.py
+++ b/tests/query_test/test_queries.py
@@ -161,21 +161,20 @@ class TestQueries(ImpalaTestSuite):
    file_format = vector.get_value('table_format').file_format
    if file_format == 'hbase':
      pytest.xfail(reason="IMPALA-283 - select count(*) produces inconsistent results")
-    vector.get_value('exec_option')['disable_outermost_topn'] = 1
-    vector.get_value('exec_option')['analytic_rank_pushdown_threshold'] = 0
-    self.run_test_case('QueryTest/sort', vector)
+    new_vector = deepcopy(vector)
+    options = new_vector.get_value('exec_option')
+    options['disable_outermost_topn'] = 1
+    options['analytic_rank_pushdown_threshold'] = 0
+    self.run_test_case('QueryTest/sort', new_vector)
    # We can get the sort tests for free from the top-n file
-    self.run_test_case('QueryTest/top-n', vector)
+    self.run_test_case('QueryTest/top-n', new_vector)

    if file_format in ['parquet', 'orc']:
      # set timestamp options to get consistent results for both format.
-      new_vector = deepcopy(vector)
-      options = new_vector.get_value('exec_option')
      options['convert_legacy_hive_parquet_utc_timestamps'] = 1
      options['timezone'] = '"Europe/Budapest"'
      self.run_test_case('QueryTest/sort-complex', new_vector)

-
  def test_partitioned_top_n(self, vector):
    """Test partitioned Top-N operator."""
    if vector.get_value('table_format').file_format == "hbase":
--- a/tests/shell/test_shell_commandline.py
+++ b/tests/shell/test_shell_commandline.py
@@ -33,11 +33,11 @@ from time import sleep, time
 from builtins import range
 import pytest

+from impala_shell.impala_client import utf8_encode_if_needed
 from impala_shell.impala_shell import ImpalaShell as ImpalaShellClass
 from tests.common.environ import ImpalaTestClusterProperties
 from tests.common.impala_service import ImpaladService
 from tests.common.impala_test_suite import IMPALAD_HS2_HOST_PORT, ImpalaTestSuite
-from tests.common.skip import SkipIf
 from tests.common.test_dimensions import (
    create_client_protocol_dimension,
    create_client_protocol_strict_dimension,
@@ -60,8 +60,9 @@ from tests.shell.util import (
 DEFAULT_QUERY = 'select 1'
 QUERY_FILE_PATH = os.path.join(os.environ['IMPALA_HOME'], 'tests', 'shell')

-RUSSIAN_CHARS = (u"А, Б, В, Г, Д, Е, Ё, Ж, З, И, Й, К, Л, М, Н, О, П, Р,"
-                 u"С, Т, У, Ф, Х, Ц,Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я")
+RUSSIAN_CHARS = utf8_encode_if_needed(
+    u"А, Б, В, Г, Д, Е, Ё, Ж, З, И, Й, К, Л, М, Н, О, П, Р,"
+    u"С, Т, У, Ф, Х, Ц,Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я")

 """IMPALA-12216 implemented timestamp to be printed in case of any error/warning
  during query execution, below is an example :
@@ -90,6 +91,7 @@ def find_query_option(key, string, strip_brackets=True):
  assert len(values) == 1
  return values[0].strip("[]") if strip_brackets else values[0]

+
@pytest.fixture
 def empty_table(unique_database, request):
  """Create an empty table within the test database before executing test.
@@ -299,7 +301,6 @@ class TestImpalaShell(ImpalaTestSuite):
        "Column metadata states there are 11 values, but read 10 values from column id."
    )

-
  def test_completed_query_errors_2(self, vector):
    if vector.get_value('strict_hs2_protocol'):
      pytest.skip("Impala-10827: Multiple queries not supported in strict hs2 mode.")
@@ -412,7 +413,7 @@ class TestImpalaShell(ImpalaTestSuite):
    args = ['-p', '-q', 'select 1; profile;']
    result_set = run_impala_shell_cmd(vector, args)
    # This regex helps us uniquely identify a profile.
-    regex = re.compile("Operator\s+#Hosts\s+#Inst\s+Avg\s+Time")
+    regex = re.compile(r"Operator\s+#Hosts\s+#Inst\s+Avg\s+Time")
    # We expect two query profiles.
    assert len(re.findall(regex, result_set.stdout)) == 2, \
        "Could not detect two profiles, stdout: %s" % result_set.stdout
@@ -626,24 +627,24 @@ class TestImpalaShell(ImpalaTestSuite):

  def test_international_characters(self, vector):
    """Sanity test to ensure that the shell can read international characters."""
-    args = ['-B', '-q', "select '{0}'".format(RUSSIAN_CHARS.encode('utf-8'))]
+    args = ['-B', '-q', "select '{0}'".format(RUSSIAN_CHARS)]
    result = run_impala_shell_cmd(vector, args)
    assert 'UnicodeDecodeError' not in result.stderr
-    assert RUSSIAN_CHARS.encode('utf-8') in result.stdout
+    assert RUSSIAN_CHARS in result.stdout

  def test_international_characters_prettyprint(self, vector):
    """IMPALA-2717: ensure we can handle international characters in pretty-printed
    output"""
-    args = ['-q', "select '{0}'".format(RUSSIAN_CHARS.encode('utf-8'))]
+    args = ['-q', "select '{0}'".format(RUSSIAN_CHARS)]
    result = run_impala_shell_cmd(vector, args)
    assert 'UnicodeDecodeError' not in result.stderr
-    assert RUSSIAN_CHARS.encode('utf-8') in result.stdout
+    assert RUSSIAN_CHARS in result.stdout

  def test_international_characters_prettyprint_tabs(self, vector):
    """IMPALA-2717: ensure we can handle international characters in pretty-printed
    output when pretty-printing falls back to delimited output."""

-    args = ['-q', "select '{0}\\t'".format(RUSSIAN_CHARS.encode('utf-8'))]
+    args = ['-q', "select '{0}\\t'".format(RUSSIAN_CHARS)]

    result = run_impala_shell_cmd(vector, args)
    protocol = vector.get_value('protocol')
@@ -654,13 +655,13 @@ class TestImpalaShell(ImpalaTestSuite):
      assert protocol in ('hs2', 'hs2-http'), protocol
      assert 'Reverting to tab delimited text' not in result.stderr
    assert 'UnicodeDecodeError' not in result.stderr
-    assert RUSSIAN_CHARS.encode('utf-8') in result.stdout
+    assert RUSSIAN_CHARS in result.stdout

  def test_international_characters_profile(self, vector):
    """IMPALA-12145: ensure we can handle international characters in the profile. """
    if vector.get_value('strict_hs2_protocol'):
      pytest.skip("Profile not supported in strict hs2 mode.")
-    text = RUSSIAN_CHARS.encode('utf-8')
+    text = RUSSIAN_CHARS
    args = ['-o', '/dev/null', '-p', '-q', "select '{0}'".format(text)]
    result = run_impala_shell_cmd(vector, args)
    assert 'UnicodeDecodeError' not in result.stderr
@@ -687,7 +688,7 @@ class TestImpalaShell(ImpalaTestSuite):
    # is running against Hive which is another variable. On thrift 0.14 and higher,
    # talking to Hive, the result is b'\\xaa', so allow this as another possibility.
    assert '\xef\xbf\xbd' in result.stdout or '\xaa' in result.stdout or \
-           '\\xaa' in result.stdout
+           '\\xaa' in result.stdout or '<EFBFBD>' in result.stdout

  def test_global_config_file(self, vector):
    """Test global and user configuration files."""
@@ -776,13 +777,13 @@ class TestImpalaShell(ImpalaTestSuite):
    assert "WARNING: Option 'config_file' can be only set from shell." in result.stderr
    err_msg = ("WARNING: Unable to read configuration file correctly. "
               "Ignoring unrecognized config option: 'invalid_option'\n")
-    assert  err_msg in result.stderr
+    assert err_msg in result.stderr

    args = ['--config_file=%s/impalarc_with_error' % QUERY_FILE_PATH]
    result = run_impala_shell_cmd(vector, args, expect_success=False)
    err_msg = ("Unexpected value in configuration file. "
               "'maybe' is not a valid value for a boolean option.")
-    assert  err_msg in result.stderr
+    assert err_msg in result.stderr

    # live_progress and live_summary are not supported with strict_hs2_protocol
    if not vector.get_value('strict_hs2_protocol'):
@@ -886,7 +887,7 @@ class TestImpalaShell(ImpalaTestSuite):

    # Test with an escaped variable.
    result = run_impala_shell_cmd(vector, ['--var=msg1=1', '--var=msg2=${var:msg1}2',
-                                           '--var=msg3=\${var:msg1}${var:msg2}',
+                                           '--var=msg3=\\${var:msg1}${var:msg2}',
                                           "--query=select '${var:msg3}'"])
    self._validate_shell_messages(result.stderr, ['${var:msg1}12', 'Fetched 1 row(s)'],
                                  should_exist=True)
@@ -894,7 +895,7 @@ class TestImpalaShell(ImpalaTestSuite):
    # Referencing a non-existent variable will result in an error.
    result = run_impala_shell_cmd(vector, [
        '--var=msg1=1', '--var=msg2=${var:doesnotexist}2',
-        '--var=msg3=\${var:msg1}${var:msg2}', "--query=select '${var:msg3}'"],
+        '--var=msg3=\\${var:msg1}${var:msg2}', "--query=select '${var:msg3}'"],
        expect_success=False)
    self._validate_shell_messages(result.stderr,
                                  ['Error: Unknown variable DOESNOTEXIST',
@@ -1045,7 +1046,7 @@ class TestImpalaShell(ImpalaTestSuite):
            if e.errno != errno.EINTR:
              raise
        data = connection.recv(1024)
-        assert expected_output in data
+        assert expected_output in str(data)
      finally:
        if impala_shell.poll() is None:
          impala_shell.kill()
@@ -1063,7 +1064,7 @@ class TestImpalaShell(ImpalaTestSuite):
    try:
      socket.setdefaulttimeout(10)
      s = socket.socket()
-      s.bind(("",0))
+      s.bind(("", 0))
      s.listen(1)
      test_impalad_port = s.getsockname()[1]
      load_balancer_fqdn = "my-load-balancer.local"
@@ -1087,7 +1088,7 @@ class TestImpalaShell(ImpalaTestSuite):
      assert "Could not execute command:" in result.stderr
    else:
      assert "Encountered: EOF" in result.stderr
-    args = ['-q', 'with v as (select 1) \;"']
+    args = ['-q', 'with v as (select 1) \\;"']
    result = run_impala_shell_cmd(vector, args, expect_success=False)
    if vector.get_value('strict_hs2_protocol'):
      assert "cannot recognize input near"
@@ -1103,15 +1104,16 @@ class TestImpalaShell(ImpalaTestSuite):
    sql_file, sql_path = tempfile.mkstemp()
    # This generates a sql file size of ~50K.
    num_cols = 1000
-    os.write(sql_file, "select \n")
-    for i in range(num_cols):
-      if i < num_cols:
-        os.write(sql_file, "col_{0} as a{1},\n".format(i, i))
-        os.write(sql_file, "col_{0} as b{1},\n".format(i, i))
-        os.write(sql_file, "col_{0} as c{1}{2}\n".format(
-            i, i, "," if i < num_cols - 1 else ""))
-    os.write(sql_file, "from non_existence_large_table;")
-    os.close(sql_file)
+    with open(sql_path, 'w') as f:
+      f.write("select \n")
+      for i in range(num_cols):
+        if i < num_cols:
+          f.write("col_{0} as a{1},\n".format(i, i))
+          f.write("col_{0} as b{1},\n".format(i, i))
+          f.write("col_{0} as c{1}{2}\n".format(
+              i, i, "," if i < num_cols - 1 else ""))
+      f.write("from non_existence_large_table;")
+      f.close()

    try:
      args = ['-f', sql_path, '-d', unique_database]
@@ -1148,7 +1150,7 @@ class TestImpalaShell(ImpalaTestSuite):
    tzname = find_query_option("TIMEZONE", result_set.stdout)
    assert os.path.isfile("/usr/share/zoneinfo/" + tzname)

-  def test_find_query_option(self, vector):
+  def test_find_query_option(self):
    """Test utility function find_query_option()."""
    test_input = """
        not_an_option
@@ -1177,8 +1179,8 @@ class TestImpalaShell(ImpalaTestSuite):
    # --connect_timeout_ms not supported with HTTP transport. Refer to the comment
    # in ImpalaClient::_get_http_transport() for details.
    # --http_socket_timeout_s not supported for strict_hs2_protocol.
-    if (vector.get_value('protocol') == 'hs2-http' and
-          vector.get_value('strict_hs2_protocol')):
+    if (vector.get_value('protocol') == 'hs2-http'
+        and vector.get_value('strict_hs2_protocol')):
        pytest.skip("THRIFT-4600")

    with closing(socket.socket()) as s:
@@ -1259,9 +1261,9 @@ class TestImpalaShell(ImpalaTestSuite):
    assert "|                    |" in result.stdout, result.stdout
    assert "| árvíztűrőtükörfúró |" in result.stdout, result.stdout
    assert "| 你好hello          |" in result.stdout, result.stdout
-    assert "| \x00\xef\xbf\xbd\x00\xef\xbf\xbd                 |" in result.stdout, \
-        result.stdout
-    assert '| \xef\xbf\xbdD3"\x11\x00              |' in result.stdout, result.stdout
+    # The last two output lines are malformed UTF-8 strings.
+    assert "| " + utf8_encode_if_needed("\0<EFBFBD>\0<EFBFBD>") in result.stdout, result.stdout
+    assert "| " + utf8_encode_if_needed("<EFBFBD>D3\"") in result.stdout, result.stdout

  def test_binary_as_string(self, vector):
    query = """select cast(binary_col as string) from functional.binary_tbl
@@ -1292,8 +1294,8 @@ class TestImpalaShell(ImpalaTestSuite):
    # way that python2 and python3 represent floating point values, the output
    # from the shell will differ with regard to which version of python the
    # shell is running under.
-    assert("3\t3\t30.299999999999997" in result.stdout or
-      "3\t3\t30.3" in result.stdout), result.stdout
+    assert ("3\t3\t30.299999999999997" in result.stdout
+            or "3\t3\t30.3" in result.stdout), result.stdout

    assert "4\t4\t40.4" in result.stdout, result.stdout

@@ -1410,8 +1412,8 @@ class TestImpalaShell(ImpalaTestSuite):

  def test_http_socket_timeout(self, vector):
    """Test setting different http_socket_timeout_s values."""
-    if (vector.get_value('strict_hs2_protocol') or
-          vector.get_value('protocol') != 'hs2-http'):
+    if (vector.get_value('strict_hs2_protocol')
+        or vector.get_value('protocol') != 'hs2-http'):
        pytest.skip("http socket timeout not supported in strict hs2 mode."
                    " Only supported with hs2-http protocol.")
    # Test http_socket_timeout_s=0, expect errors
--- a/tests/shell/test_shell_interactive.py
+++ b/tests/shell/test_shell_interactive.py
@@ -35,6 +35,7 @@ from time import sleep
 import pexpect
 import pytest

+from impala_shell.impala_client import utf8_encode_if_needed
 # This import is the actual ImpalaShell class from impala_shell.py.
 # We rename it to ImpalaShellClass here because we later import another
 # class called ImpalaShell from tests/shell/util.py, and we don't want
@@ -63,6 +64,7 @@ from tests.shell.util import (
    spawn_shell,
    stderr_get_first_error_msg,
 )
+from tests.util.parse_util import bytes_to_str

 QUERY_FILE_PATH = os.path.join(os.environ['IMPALA_HOME'], 'tests', 'shell')

@@ -117,7 +119,7 @@ class RequestHandler503(http.server.SimpleHTTPRequestHandler):
    self.end_headers()
    if self.should_send_body_text():
      # Optionally send body text with 503 message.
-      self.wfile.write("EXTRA")
+      self.wfile.write(b"EXTRA")


 class RequestHandler503Extra(RequestHandler503):
@@ -195,7 +197,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    proc.expect(":{0}] {1}>".format(get_impalad_port(vector), db))
    if not expectations: return
    for e in expectations:
-      assert e in proc.before
+      assert e in bytes_to_str(proc.before)

  def _wait_for_num_open_sessions(self, vector, impala_service, expected, err):
    """Helper method to wait for the number of open sessions to reach 'expected'."""
@@ -403,7 +405,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
  def test_cancellation_mid_command(self, vector):
    """Test that keyboard interrupt cancels multiline query strings"""
    if vector.get_value('strict_hs2_protocol'):
-      pytest.skip("IMPALA-10827: Cancellation infrastructure does not " +
+      pytest.skip("IMPALA-10827: Cancellation infrastructure does not "
          "work in strict hs2 mode.")
    shell_cmd = get_shell_cmd(vector)
    multiline_query = ["select column_1\n", "from table_1\n", "where ..."]
@@ -413,7 +415,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    for query_line in multiline_query:
      child_proc.send(query_line)
    child_proc.sendintr()
-    child_proc.expect("\^C")
+    child_proc.expect(r"\^C")
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline('quit;')
    child_proc.wait()
@@ -425,7 +427,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    child_proc.send("\n")
    child_proc.expect(">")
    child_proc.sendintr()
-    child_proc.expect("> \^C")
+    child_proc.expect(r"> \^C")
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline('quit;')
    child_proc.wait()
@@ -435,7 +437,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    if vector.get_value('strict_hs2_protocol'):
      pytest.skip("IMPALA-10827: Failed, need to investigate.")
    # test a unicode query spanning multiple lines
-    unicode_bytes = u'\ufffd'.encode('utf-8')
+    unicode_bytes = utf8_encode_if_needed(u'\ufffd')
    args = "select '{0}'\n;".format(unicode_bytes)
    result = run_impala_shell_interactive(vector, args)
    assert "Fetched 1 row(s)" in result.stderr, result.stderr
@@ -449,7 +451,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    child_proc = spawn_shell(shell_cmd)
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline("select '{0}'\n;".format(unicode_bytes))
-    child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
+    child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline('quit;')
    child_proc.wait()
@@ -477,7 +479,6 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    assert "Fetched 1 row(s)" in result.stderr, result.stderr
    assert "세율중분류구분코드" in result.stdout

-
  def test_welcome_string(self, vector):
    """Test that the shell's welcome message is only printed once
    when the shell is started. Ensure it is not reprinted on errors.
@@ -616,7 +617,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
      run_impala_shell_interactive(vector, "drop database if exists foo;")
      self.create_impala_clients()

-  def test_multiline_queries_in_history(self, vector, tmp_history_file):
+  def test_multiline_queries_in_history(self, vector, tmp_history_file):  # noqa: U100
    """Test to ensure that multiline queries with comments are preserved in history

    Ensure that multiline queries are preserved when they're read back from history.
@@ -638,7 +639,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    for query, _ in queries:
      child_proc.expect(PROMPT_REGEX)
      child_proc.sendline(query)
-      child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
+      child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline('quit;')
    child_proc.wait()
@@ -649,7 +650,8 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
      assert history_entry in result.stderr, "'%s' not in '%s'" % (history_entry,
                                                                   result.stderr)

-  def test_history_does_not_duplicate_on_interrupt(self, vector, tmp_history_file):
+  def test_history_does_not_duplicate_on_interrupt(
+      self, vector, tmp_history_file):  # noqa: U100
    """This test verifies that once the cmdloop is broken the history file will not be
    re-read. The cmdloop can be broken when the user sends a SIGINT or exceptions
    occur."""
@@ -662,7 +664,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    # initialize history
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline("select 1;")
-    child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
+    child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline("quit;")
    child_proc.wait()
@@ -671,9 +673,9 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    child_proc = spawn_shell(shell_cmd)
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendintr()
-    child_proc.expect("\^C")
+    child_proc.expect(r"\^C")
    child_proc.sendline("select 2;")
-    child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
+    child_proc.expect(r"Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
    child_proc.expect(PROMPT_REGEX)
    child_proc.sendline("quit;")
    child_proc.wait()
@@ -682,12 +684,14 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    p = ImpalaShell(vector)
    p.send_cmd('history')
    result = p.get_result().stderr.splitlines()
-    assert "[1]: select 1;" == result[1]
-    assert "[2]: quit;" == result[2]
-    assert "[3]: select 2;" == result[3]
-    assert "[4]: quit;" == result[4]
+    # Python 2 and Python 3 shell have different first lines in the history output.
+    start_idx = 1 if "Server version: " in result[0] else 0
+    assert "[1]: select 1;" == result[start_idx]
+    assert "[2]: quit;" == result[start_idx + 1]
+    assert "[3]: select 2;" == result[start_idx + 2]
+    assert "[4]: quit;" == result[start_idx + 3]

-  def test_history_file_option(self, vector, tmp_history_file):
+  def test_history_file_option(self, vector, tmp_history_file):  # noqa: U100
    """
    Setting the 'tmp_history_file' fixture above means that the IMPALA_HISTFILE
    environment will be overridden. Here we override that environment by passing
@@ -704,7 +708,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
      history_contents = open(new_hist.name).read()
      assert "select 'hi'" in history_contents

-  def test_rerun(self, vector, tmp_history_file):
+  def test_rerun(self, vector, tmp_history_file):  # noqa: U100
    """Smoke test for the 'rerun' command"""
    if vector.get_value('strict_hs2_protocol'):
      pytest.skip("Rerun not supported in strict hs2 mode.")
@@ -721,11 +725,12 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
        ("second_command"))
    child_proc.sendline('history;')
    child_proc.expect(":{0}] default>".format(get_impalad_port(vector)))
-    assert '[1]: select \'first_command\';' in child_proc.before
-    assert '[2]: select \'second_command\';' in child_proc.before
-    assert '[3]: history;' in child_proc.before
+    before_line = child_proc.before.decode('UTF-8', 'replace')
+    assert '[1]: select \'first_command\';' in before_line
+    assert '[2]: select \'second_command\';' in before_line
+    assert '[3]: history;' in before_line
    # Rerunning command should not add an entry into history.
-    assert '[4]' not in child_proc.before
+    assert '[4]' not in before_line
    self._expect_with_cmd(child_proc, "@0", vector, ("Command index out of range"))
    self._expect_with_cmd(child_proc, "rerun   4", vector, ("Command index out of range"))
    self._expect_with_cmd(child_proc, "@-4", vector, ("Command index out of range"))
@@ -861,7 +866,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
      # IMPALA-5416: Test that two source commands on a line won't crash the shell.
      result = run_impala_shell_interactive(
          vector, "source shell.cmds;source shell.cmds;")
-      assert len(re.findall("version\(\)", result.stdout)) == 2
+      assert len(re.findall(r"version\(\)", result.stdout)) == 2
    finally:
      os.chdir(cwd)

@@ -888,11 +893,11 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    # the client does not fetch. For statements returning 0 rows we do not
    # want an empty line in stdout.
    result = run_impala_shell_interactive(vector, "-- foo \n use default;")
-    assert re.search('> \[', result.stdout)
+    assert re.search(r'> \[', result.stdout)
    result = run_impala_shell_interactive(vector,
        "select * from functional.alltypes limit 0;")
    assert "Fetched 0 row(s)" in result.stderr
-    assert re.search('> \[', result.stdout)
+    assert re.search(r'> \[', result.stdout)

  def test_set_and_set_all(self, vector):
    """IMPALA-2181. Tests the outputs of SET and SET ALL commands. SET should contain the
@@ -1050,15 +1055,15 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
      assert '| \'--\' |' in result.stdout
      assert '| --   |' in result.stdout

-    query = ('select * from (\n' +
-             'select count(*) from functional.alltypes\n' +
+    query = ('select * from (\n'
+             'select count(*) from functional.alltypes\n'
             ') v; -- Incomplete SQL statement in this line')
    result = run_impala_shell_interactive(vector, query)
    assert '| count(*) |' in result.stdout

-    query = ('select id from functional.alltypes\n' +
-             'order by id; /*\n' +
-             '* Multi-line comment\n' +
+    query = ('select id from functional.alltypes\n'
+             'order by id; /*\n'
+             '* Multi-line comment\n'
             '*/')
    result = run_impala_shell_interactive(vector, query)
    assert '| id   |' in result.stdout
@@ -1154,7 +1159,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
    proc.sendeof()
    proc.wait()

-  def test_strip_leading_comment(self, vector):
+  def test_strip_leading_comment(self):
    """Test stripping leading comments from SQL statements"""
    assert ('--delete\n', 'select 1') == \
        ImpalaShellClass.strip_leading_comment('--delete\nselect 1')
@@ -1230,7 +1235,6 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
        "Bad date/time conversion format: yyyy年MM月dd日"
    )

-
  def test_timezone_validation(self, vector):
    """Test that query option TIMEZONE is validated when executing a query.

@@ -1300,7 +1304,7 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
                  "-i{0}:{1}".format(http_503_server_extra.HOST,
                                     http_503_server_extra.PORT)]
    shell_proc = spawn_shell(impala_shell_executable + shell_args)
-    shell_proc.expect("HTTP code 503: Service Unavailable \[EXTRA\]", timeout=10)
+    shell_proc.expect(r"HTTP code 503: Service Unavailable \[EXTRA\]", timeout=10)


 def run_impala_shell_interactive(vector, input_lines, shell_args=None,
--- a/tests/shell/util.py
+++ b/tests/shell/util.py
@@ -23,10 +23,8 @@ from contextlib import closing
 import logging
 import os
 import re
-import shlex
 import socket
 from subprocess import PIPE, Popen
-import sys
 import time

 import pexpect
@@ -54,6 +52,7 @@ LOG.addHandler(logging.StreamHandler())
 SHELL_HISTORY_FILE = os.path.expanduser("~/.impalahistory")
 IMPALA_HOME = os.environ['IMPALA_HOME']

+
 def build_shell_env(env=None):
  """ Construct the environment for the shell to run in based on 'env', or the current
  process's environment if env is None."""
@@ -69,24 +68,24 @@ def build_shell_env(env=None):


 def assert_var_substitution(result):
-  assert_pattern(r'\bfoo_number=.*$', 'foo_number= 123123', result.stdout, \
+  assert_pattern(r'\bfoo_number=.*$', 'foo_number= 123123', result.stdout,
    'Numeric values not replaced correctly')
-  assert_pattern(r'\bfoo_string=.*$', 'foo_string=123', result.stdout, \
+  assert_pattern(r'\bfoo_string=.*$', 'foo_string=123', result.stdout,
    'String values not replaced correctly')
-  assert_pattern(r'\bVariables:[\s\n]*BAR:\s*[0-9]*\n\s*FOO:\s*[0-9]*', \
-    'Variables:\n\tBAR: 456\n\tFOO: 123', result.stdout, \
+  assert_pattern(r'\bVariables:[\s\n]*BAR:\s*[0-9]*\n\s*FOO:\s*[0-9]*',
+    'Variables:\n\tBAR: 456\n\tFOO: 123', result.stdout,
    "Set variable not listed correctly by the first SET command")
-  assert_pattern(r'\bError: Unknown variable FOO1$', \
-    'Error: Unknown variable FOO1', result.stderr, \
+  assert_pattern(r'\bError: Unknown variable FOO1$',
+    'Error: Unknown variable FOO1', result.stderr,
    'Missing variable FOO1 not reported correctly')
-  assert_pattern(r'\bmulti_test=.*$', 'multi_test=456_123_456_123', \
+  assert_pattern(r'\bmulti_test=.*$', 'multi_test=456_123_456_123',
    result.stdout, 'Multiple replaces not working correctly')
-  assert_pattern(r'\bError:\s*Unknown\s*substitution\s*syntax\s*' +
-                 r'\(RANDOM_NAME\). Use \${VAR:var_name}', \
-    'Error: Unknown substitution syntax (RANDOM_NAME). Use ${VAR:var_name}', \
+  assert_pattern(r'\bError:\s*Unknown\s*substitution\s*syntax\s*'
+                 r'\(RANDOM_NAME\). Use \${VAR:var_name}',
+    'Error: Unknown substitution syntax (RANDOM_NAME). Use ${VAR:var_name}',
    result.stderr, "Invalid variable reference")
  assert_pattern(r'"This should be not replaced: \${VAR:foo} \${HIVEVAR:bar}"',
-    '"This should be not replaced: ${VAR:foo} ${HIVEVAR:bar}"', \
+    '"This should be not replaced: ${VAR:foo} ${HIVEVAR:bar}"',
    result.stdout, "Variable escaping not working")
  assert_pattern(r'\bVariable MYVAR set to.*$', 'Variable MYVAR set to foo123',
    result.stderr, 'No evidence of MYVAR variable being set.')
@@ -97,11 +96,11 @@ def assert_var_substitution(result):
    result.stdout, 'No evidence of variable FOO being unset')
  assert_pattern(r'\bUnsetting variable BAR$', 'Unsetting variable BAR',
    result.stdout, 'No evidence of variable BAR being unset')
-  assert_pattern(r'\bVariables:[\s\n]*No variables defined\.$', \
-    'Variables:\n\tNo variables defined.', result.stdout, \
+  assert_pattern(r'\bVariables:[\s\n]*No variables defined\.$',
+    'Variables:\n\tNo variables defined.', result.stdout,
    'Unset variables incorrectly listed by third SET command.')
-  assert_pattern(r'\bNo variable called NONEXISTENT is set', \
-    'No variable called NONEXISTENT is set', result.stdout, \
+  assert_pattern(r'\bNo variable called NONEXISTENT is set',
+    'No variable called NONEXISTENT is set', result.stdout,
    'Problem unsetting non-existent variable.')
  assert_pattern(r'\bVariable COMMENT_TYPE1 set to.*$',
    'Variable COMMENT_TYPE1 set to ok', result.stderr,
@@ -112,11 +111,12 @@ def assert_var_substitution(result):
  assert_pattern(r'\bVariable COMMENT_TYPE3 set to.*$',
    'Variable COMMENT_TYPE3 set to ok', result.stderr,
    'No evidence of COMMENT_TYPE3 variable being set.')
-  assert_pattern(r'\bVariables:[\s\n]*COMMENT_TYPE1:.*[\s\n]*' + \
-    'COMMENT_TYPE2:.*[\s\n]*COMMENT_TYPE3:.*$',
-    'Variables:\n\tCOMMENT_TYPE1: ok\n\tCOMMENT_TYPE2: ok\n\tCOMMENT_TYPE3: ok', \
+  assert_pattern(r'\bVariables:[\s\n]*COMMENT_TYPE1:.*[\s\n]*'
+    r'COMMENT_TYPE2:.*[\s\n]*COMMENT_TYPE3:.*$',
+    'Variables:\n\tCOMMENT_TYPE1: ok\n\tCOMMENT_TYPE2: ok\n\tCOMMENT_TYPE3: ok',
    result.stdout, 'Set variables not listed correctly by the SET command')

+
 def assert_pattern(pattern, result, text, message):
  """Asserts that the pattern, when applied to text, returns the expected result"""
  m = re.search(pattern, text, re.MULTILINE)
@@ -217,6 +217,7 @@ def get_open_sessions_metric(vector):
    assert protocol == 'beeswax', protocol
    return 'impala-server.num-open-beeswax-sessions'

+
 class ImpalaShellResult(object):
  def __init__(self):
    self.rc = 0
@@ -380,12 +381,13 @@ def get_dev_impala_shell_executable():
 def create_impala_shell_executable_dimension(dev_only=False):
  _, include_pypi = get_dev_impala_shell_executable()
  dimensions = []
-  if os.getenv("IMPALA_SYSTEM_PYTHON2"):
+  python3_pytest = (os.getenv("IMPALA_USE_PYTHON3_TESTS", "false") == "true")
+  if os.getenv("IMPALA_SYSTEM_PYTHON2") and not python3_pytest:
    dimensions.append('dev')
  if os.getenv("IMPALA_SYSTEM_PYTHON3"):
    dimensions.append('dev3')
  if include_pypi and not dev_only:
-    if os.getenv("IMPALA_SYSTEM_PYTHON2"):
+    if os.getenv("IMPALA_SYSTEM_PYTHON2") and not python3_pytest:
      dimensions.append('python2')
    if os.getenv("IMPALA_SYSTEM_PYTHON3"):
      dimensions.append('python3')
--- a/tests/statestore/test_statestore.py
+++ b/tests/statestore/test_statestore.py
@@ -20,11 +20,14 @@ from collections import defaultdict
 import json
 import logging
 import socket
+import sys
 import threading
 import time
 import traceback
 import uuid

+import pytest
+
 from builtins import range
 from thrift.protocol import TBinaryProtocol
 from thrift.server.TServer import TServer
@@ -115,6 +118,7 @@ class KillableThreadedServer(TServer):
    self.port = self.serverTransport.port

  def shutdown(self):
+    LOG.info('Server localhost:{} is shutting down'.format(self.port))
    self.is_shutdown = True
    self.serverTransport.close()
    self.wait_until_down()
@@ -127,20 +131,22 @@ class KillableThreadedServer(TServer):
      cnxn = TSocket.TSocket('localhost', self.port)
      try:
        cnxn.open()
+        LOG.info('Server localhost:{} is up'.format(cnxn.port))
        return
      except Exception:
        if i == num_tries - 1: raise
-      time.sleep(0.1)
+      time.sleep(0.5)

  def wait_until_down(self, num_tries=10):
    for i in range(num_tries):
      cnxn = TSocket.TSocket('localhost', self.port)
      try:
        cnxn.open()
-        time.sleep(0.1)
      except Exception:
+        LOG.info('Server localhost:{} is down'.format(cnxn.port))
        return
-    raise Exception("Server did not stop")
+      time.sleep(0.5)
+    raise Exception("Server localhost:{} did not stop".format(cnxn.port))

  def serve(self):
    self.serverTransport.listen()
@@ -149,8 +155,12 @@ class KillableThreadedServer(TServer):
      # Since accept() can take a while, check again if the server is shutdown to avoid
      # starting an unnecessary thread.
      if self.is_shutdown: return
-      t = threading.Thread(target=self.handle, args=(client,))
-      t.setDaemon(self.daemon)
+      t = None
+      if sys.version_info.major < 3:
+        t = threading.Thread(target=self.handle, args=(client,))
+        t.setDaemon(True)
+      else:
+        t = threading.Thread(target=self.handle, args=(client,), daemon=self.daemon)
      t.start()

  def handle(self, client):
@@ -196,6 +206,9 @@ class StatestoreSubscriber(object):
    self.heartbeat_cb, self.update_cb = heartbeat_cb, update_cb
    self.subscriber_id = "python-test-client-%s" % uuid.uuid1()
    self.exception = None
+    self.server = None
+    self.server_thread = None
+    self.client_transport = None

  def __enter__(self):
    return self
@@ -239,19 +252,24 @@ class StatestoreSubscriber(object):
    return response

  def __init_server(self):
+    LOG.info('Initializing server')
    processor = Subscriber.Processor(self)
    transport = WildcardServerSocket()
    tfactory = TTransport.TBufferedTransportFactory()
    pfactory = TBinaryProtocol.TBinaryProtocolFactory()
    self.server = KillableThreadedServer(processor, transport, tfactory, pfactory,
                                         daemon=True)
-    self.server_thread = threading.Thread(target=self.server.serve)
-    self.server_thread.setDaemon(True)
+    if sys.version_info.major < 3:
+      self.server_thread = threading.Thread(target=self.server.serve)
+      self.server_thread.setDaemon(True)
+    else:
+      self.server_thread = threading.Thread(target=self.server.serve, daemon=True)
    self.server_thread.start()
    self.server.wait_until_up()
    self.port = self.server.port

  def __init_client(self):
+    LOG.info('Initializing client')
    self.client_transport = \
        TTransport.TBufferedTransport(TSocket.TSocket('localhost', 24000))
    self.protocol = TBinaryProtocol.TBinaryProtocol(self.client_transport)
@@ -352,6 +370,7 @@ class StatestoreSubscriber(object):
      time.sleep(0.2)


+@pytest.mark.execute_serially
@SkipIfDockerizedCluster.statestore_not_exposed
 class TestStatestore(BaseTestSuite):
  def make_topic_update(self, topic_name, key_template="foo", value_template="bar",
--- a/tests/stress/test_update_stress.py
+++ b/tests/stress/test_update_stress.py
@@ -29,6 +29,7 @@ from tests.common.skip import SkipIfFS
 from tests.common.test_dimensions import create_exec_option_dimension
 from tests.stress.stress_util import run_tasks, Task
 from tests.util.filesystem_utils import FILESYSTEM_PREFIX, IS_HDFS
+from tests.util.parse_util import bytes_to_str
 from tests.conftest import DEFAULT_HIVE_SERVER2


@@ -339,7 +340,8 @@ class TestIcebergConcurrentOperations(ImpalaTestSuite):

    table_location = "{0}/test-warehouse/{1}.db/{2}/data".format(
        FILESYSTEM_PREFIX, unique_database, table_name)
-    data_files_on_fs_result = check_output(["hdfs", "dfs", "-ls", table_location])
+    data_files_on_fs_result = bytes_to_str(
+        check_output(["hdfs", "dfs", "-ls", table_location]))
    # The first row of the HDFS result is a summary, the following lines contain
    # 1 file each.
    data_files_on_fs_rows = data_files_on_fs_result.strip().split('\n')[1:]
--- a/tests/util/hdfs_util.py
+++ b/tests/util/hdfs_util.py
@@ -31,6 +31,7 @@ from xml.etree.ElementTree import parse

 from tests.util.filesystem_base import BaseFilesystem
 from tests.util.filesystem_utils import FILESYSTEM_PREFIX
+from tests.util.parse_util import bytes_to_str


 class HdfsConfig(object):
@@ -220,13 +221,13 @@ class HadoopFsCommandLineClient(BaseFilesystem):
          stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    status = process.returncode
-    return (status, stdout, stderr)
+    return (status, bytes_to_str(stdout), bytes_to_str(stderr))

  def create_file(self, path, file_data, overwrite=True):
    """Creates a temporary file with the specified file_data on the local filesystem,
    then puts it into the specified path."""
    if not overwrite and self.exists(path): return False
-    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
+    with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as tmp_file:
      tmp_file.write(file_data)
    put_cmd_params = ['-put', '-d']
    if overwrite: put_cmd_params.append('-f')
--- a/tests/util/parse_util.py
+++ b/tests/util/parse_util.py
@@ -17,6 +17,7 @@

 from __future__ import absolute_import, division, print_function
 import re
+import sys
 from datetime import datetime

 # IMPALA-6715: Every so often the stress test or the TPC workload directories get
@@ -285,3 +286,11 @@ def get_time_summary_stats_counter(counter_name, runtime_profile):
          max_value=parse_duration_string_ns(summary_stat['max'])))

  return summary_stats
+
+
+def bytes_to_str(bytes):
+    """Utility function to convert bytes to string.
+    This is needed to handle the differences between Python 2 and 3."""
+    if sys.version_info.major < 3:
+        return str(bytes)
+    return bytes.decode('utf-8', errors='replace')
--- a/tests/util/shell_util.py
+++ b/tests/util/shell_util.py
@@ -21,6 +21,8 @@ from __future__ import absolute_import, division, print_function
 import logging
 import os
 import shlex
+import sys
+
 from select import select
 from subprocess import PIPE, Popen, STDOUT, call
 from textwrap import dedent
@@ -35,6 +37,7 @@ def dump_server_stacktraces():
  LOG.debug('Dumping stacktraces of running servers')
  call([os.path.join(os.environ['IMPALA_HOME'], "bin/dump-stacktraces.sh")])

+
 def exec_process(cmd):
  """Executes a subprocess, waiting for completion. The process exit code, stdout and
  stderr are returned as a tuple."""
@@ -46,6 +49,7 @@ def exec_process(cmd):
  rc = p.returncode
  return rc, stdout, stderr

+
 def exec_process_async(cmd):
  """Executes a subprocess, returning immediately. The process object is returned for
  later retrieval of the exit code etc. """
@@ -55,6 +59,7 @@ def exec_process_async(cmd):
  return Popen(shlex.split(cmd), shell=False, stdout=PIPE, stderr=PIPE,
      universal_newlines=True)

+
 def shell(cmd, cmd_prepend="set -euo pipefail\n", stdout=PIPE, stderr=STDOUT,
    timeout_secs=None, **popen_kwargs):
  """Executes a command and returns its output. If the command's return code is non-zero
@@ -77,6 +82,7 @@ def shell(cmd, cmd_prepend="set -euo pipefail\n", stdout=PIPE, stderr=STDOUT,
    remaining_fds.append(stderr_fileno)
  stdout = list()
  stderr = list()
+
  def _read_available_output():
    while True:
      available_fds, _, _ = select(remaining_fds, [], [], 0)
@@ -88,7 +94,7 @@ def shell(cmd, cmd_prepend="set -euo pipefail\n", stdout=PIPE, stderr=STDOUT,
          if not data:
            del remaining_fds[0]
          else:
-            stdout.append(data)
+            stdout.append(data if sys.version_info.major < 3 else data.decode())
        elif fd == stderr_fileno:
          if not data:
            del remaining_fds[-1]