IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
stiga-huang
2021-02-25 20:19:49 +08:00
committed by Impala Public Jenkins
parent 6c6b0ee869
commit 2dfc68d852
68 changed files with 303 additions and 64 deletions

View File

@@ -25,7 +25,8 @@ import subprocess
from tests.beeswax.impala_beeswax import ImpalaBeeswaxException
from tests.common.impala_test_suite import ImpalaTestSuite
from tests.common.skip import SkipIf, SkipIfS3, SkipIfABFS, SkipIfADLS, SkipIfLocal
from tests.common.skip import (SkipIf, SkipIfS3, SkipIfABFS, SkipIfADLS, SkipIfGCS,
SkipIfLocal)
from tests.common.test_dimensions import create_exec_option_dimension
class TestDataErrors(ImpalaTestSuite):
@@ -106,6 +107,7 @@ class TestHdfsUnknownErrors(ImpalaTestSuite):
assert "Safe mode is OFF" in output
@SkipIfS3.qualified_path
@SkipIfGCS.qualified_path
@SkipIfABFS.qualified_path
@SkipIfADLS.qualified_path
class TestHdfsScanNodeErrors(TestDataErrors):
@@ -125,6 +127,7 @@ class TestHdfsScanNodeErrors(TestDataErrors):
self.run_test_case('DataErrorsTest/hdfs-scan-node-errors', vector)
@SkipIfS3.qualified_path
@SkipIfGCS.qualified_path
@SkipIfABFS.qualified_path
@SkipIfADLS.qualified_path
@SkipIfLocal.qualified_path
@@ -141,6 +144,7 @@ class TestHdfsSeqScanNodeErrors(TestHdfsScanNodeErrors):
@SkipIfS3.qualified_path
@SkipIfGCS.qualified_path
@SkipIfABFS.qualified_path
@SkipIfADLS.qualified_path
class TestHdfsRcFileScanNodeErrors(TestHdfsScanNodeErrors):