Files
impala/tests/custom_cluster/test_s3a_access.py
Laszlo Gaal d9986fd3a2 IMPALA-13156: Investigation: Set explicit credential provider for S3 builds
Lately we have seen several failures during S3 builds that relied on AWS
EC2 IAM instances roles for S3 bucket access credentials.
The failure mode was a spurious failure for the s3a IAM Instance
Credential Provider to actually provide the credentials.

This patch is an attempt to extract more debugging information from such
a failure: according to Hadoop-AWS developers, "Unable to load
credentials from system settings..." is a generic error message from the
Hadoop credential providers when they operate in a chained fashion. This
happens when there is no explicit credetial provider specified in
core-site.xml, and the credential providers are tried in sequence.

The patch specifies the Hadoop s3a IAM Instance Credential Provider
when all of the following conditions are true:
- the default file system is S3,
- the minicluster is running on an AWS EC2 VM,
- AWS credentials are provided by an IAM instance role attached to
  the VM.
The conditions are detected by following the rule matrix set up in
bin/impala-config.sh; the same rule set is evaluated in
core-site.xml.py, the Python script that generates the working copy of
core-site.xml.

The patch also XFAILs test_keys_do_not_work() in test_s3_access.py in
custom cluster tests, because the test assumes the default chain of
Hadoop AWS credential providers being available. This patch restricts
the credential provider set to the IAM Instance Credential Provider
only, which breaks the test mechanism.
The test will be reinstated when a suitable workaround mechanism is
found, but at this stage the extended diagnostic provided by this patch
for investigating the credential provider flakiness is considered more
valuable.

Tests:
- private build on S3 for the positive case
- regression tested on a HDFS private build
- in both cases the generated core-site.xml file was inspected during
  the test run.

Change-Id: Ia8c09f8d042a69c5d3227398c720ea38e1c7e12f
Reviewed-on: http://gerrit.cloudera.org:8080/21510
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-09-12 02:55:17 +00:00

85 lines
3.4 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
from __future__ import absolute_import, division, print_function
import os
import pytest
import stat
import tempfile
from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
from tests.common.skip import SkipIf
from tests.util.filesystem_utils import WAREHOUSE
tmp = tempfile.NamedTemporaryFile(delete=False)
BAD_KEY_FILE = tmp.name
@SkipIf.not_s3
class TestS3AAccess(CustomClusterTestSuite):
cmd_filename = ""
@classmethod
def setup_class(cls):
super(TestS3AAccess, cls).setup_class()
try:
tmp.write('echo badkey')
finally:
tmp.close()
# Make this file executable
tmp_file_stat = os.stat(BAD_KEY_FILE)
os.chmod(BAD_KEY_FILE, tmp_file_stat.st_mode | stat.S_IEXEC)
@classmethod
def teardown_class(cls):
os.remove(BAD_KEY_FILE)
def _get_impala_client(self):
impalad = self.cluster.get_any_impalad()
return impalad.service.create_beeswax_client()
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
"-s3a_access_key_cmd=\"%s\"\
-s3a_secret_key_cmd=\"%s\"" % (BAD_KEY_FILE, BAD_KEY_FILE))
def test_ddl_keys_ignored(self, unique_database):
'''DDL statements will ignore the S3 keys passed to Impala because the code path
that it exercises goes through the Hive Metastore which should have the correct keys
from the core-site configuration.'''
client = self._get_impala_client()
# This is repeated in the test below (because it's necessary there), but we still
# want to make sure that it is tested separately as it's a good test practice.
self.execute_query_expect_success(client,
"create external table if not exists {0}.tinytable_s3 like functional.tinytable \
location '{1}/tinytable'".format(unique_database, WAREHOUSE))
@pytest.mark.xfail(run=False, reason="Incompatible with IMPALA-13156 debug attempts")
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
"-s3a_access_key_cmd=\"%s\"\
-s3a_secret_key_cmd=\"%s\"" % (BAD_KEY_FILE, BAD_KEY_FILE))
def test_keys_do_not_work(self, unique_database):
'''Test that using incorrect S3 access and secret keys will not allow Impala to
query S3.
TODO: We don't have the test infrastructure in place yet to check if the keys do work
in a custom cluster test. (See IMPALA-3422)'''
client = self._get_impala_client()
self.execute_query_expect_success(client,
"create external table if not exists {0}.tinytable_s3 like functional.tinytable \
location '{1}/tinytable'".format(unique_database, WAREHOUSE))
self.execute_query_expect_failure(client, "select * from {0}.tinytable_s3"
.format(unique_database))