IMPALA-9071: Handle translated external HDFS table in CTAS

After upgrading Hive-3 to a version containing HIVE-22158, it's not
allowed for managed tables to be non transactional. Creating non ACID
tables will result in creating an external table with table property
'external.table.purge' set to true.

In Hive-3, the default location of external HDFS tables will be located
in 'metastore.warehouse.external.dir' if it's set. This property is
added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6
yet.

In CTAS statement, we create a temporary HMS Table for the analysis on
the Insert part. The table path is created assuming it's a managed
table, and the Insert part will use this path for insertion. However, in
Hive-3, the created table is translated to an external table. It's not
the same as we passed to the HMS API. The created table is located in
'metastore.warehouse.external.dir', while the table path we assumed is
in 'metastore.warehouse.dir'. This introduces bugs when these two
properties are different. CTAS statement will create table in one place
and insert data in another place.

This patch adds a new method in MetastoreShim to wrap the difference for
getting the default table path for non transactional tables between
Hive-2 and Hive-3.

Changes in the infra:
 - To support customizing hive configuration, add an env var,
   CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of
   existing CLASSPATH. The customized hive-site.xml should be put inside
   CUSTOM_CLASSPATH.
 - Change hive-site.xml.py to generate a hive-site.xml with non default
   'metastore.warehouse.external.dir'
 - Add an option, --env_vars, in bin/start-impala-cluster.py to pass
   down CUSTOM_CLASSPATH.

Tests:
 - Add a custom cluster test to start Hive with
   metastore.warehouse.external.dir being set to non default value. Run
   it locally using CDP components with HIVE-22158. xfail the test until
   we bump CDP_BUILD_NUMBER to 1507246.
 - Run CORE tests using CDH components

Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9
Reviewed-on: http://gerrit.cloudera.org:8080/14527
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This commit is contained in:
stiga-huang
2019-10-24 11:37:25 +08:00
committed by Joe McDonnell
parent b02f514583
commit b6b31e4cc4
9 changed files with 156 additions and 12 deletions

View File

@@ -100,6 +100,8 @@ parser.add_option("--log_level", type="int", dest="log_level", default=1,
help="Set the impalad backend logging level")
parser.add_option("--jvm_args", dest="jvm_args", default="",
help="Additional arguments to pass to the JVM(s) during startup.")
parser.add_option("--env_vars", dest="env_vars", default="",
help="Additional environment variables for Impala to run with")
parser.add_option("--kudu_master_hosts", default=KUDU_MASTER_HOSTS,
help="The host name or address of the Kudu master. Multiple masters "
"can be specified using a comma separated list.")
@@ -163,6 +165,10 @@ def check_process_exists(binary, attempts=1):
def run_daemon_with_options(daemon_binary, args, output_file, jvm_debug_port=None):
"""Wrapper around run_daemon() with options determined from command-line options."""
env_vars = {"JAVA_TOOL_OPTIONS": build_java_tool_options(jvm_debug_port)}
if options.env_vars is not None:
for kv in options.env_vars.split():
k, v = kv.split('=')
env_vars[k] = v
run_daemon(daemon_binary, args, build_type=options.build_type, env_vars=env_vars,
output_file=output_file)