mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
IMPALA-9071: Handle translated external HDFS table in CTAS
After upgrading Hive-3 to a version containing HIVE-22158, it's not allowed for managed tables to be non transactional. Creating non ACID tables will result in creating an external table with table property 'external.table.purge' set to true. In Hive-3, the default location of external HDFS tables will be located in 'metastore.warehouse.external.dir' if it's set. This property is added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6 yet. In CTAS statement, we create a temporary HMS Table for the analysis on the Insert part. The table path is created assuming it's a managed table, and the Insert part will use this path for insertion. However, in Hive-3, the created table is translated to an external table. It's not the same as we passed to the HMS API. The created table is located in 'metastore.warehouse.external.dir', while the table path we assumed is in 'metastore.warehouse.dir'. This introduces bugs when these two properties are different. CTAS statement will create table in one place and insert data in another place. This patch adds a new method in MetastoreShim to wrap the difference for getting the default table path for non transactional tables between Hive-2 and Hive-3. Changes in the infra: - To support customizing hive configuration, add an env var, CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of existing CLASSPATH. The customized hive-site.xml should be put inside CUSTOM_CLASSPATH. - Change hive-site.xml.py to generate a hive-site.xml with non default 'metastore.warehouse.external.dir' - Add an option, --env_vars, in bin/start-impala-cluster.py to pass down CUSTOM_CLASSPATH. Tests: - Add a custom cluster test to start Hive with metastore.warehouse.external.dir being set to non default value. Run it locally using CDP components with HIVE-22158. xfail the test until we bump CDP_BUILD_NUMBER to 1507246. - Run CORE tests using CDH components Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9 Reviewed-on: http://gerrit.cloudera.org:8080/14527 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This commit is contained in:
committed by
Joe McDonnell
parent
b02f514583
commit
b6b31e4cc4
@@ -100,6 +100,8 @@ parser.add_option("--log_level", type="int", dest="log_level", default=1,
|
||||
help="Set the impalad backend logging level")
|
||||
parser.add_option("--jvm_args", dest="jvm_args", default="",
|
||||
help="Additional arguments to pass to the JVM(s) during startup.")
|
||||
parser.add_option("--env_vars", dest="env_vars", default="",
|
||||
help="Additional environment variables for Impala to run with")
|
||||
parser.add_option("--kudu_master_hosts", default=KUDU_MASTER_HOSTS,
|
||||
help="The host name or address of the Kudu master. Multiple masters "
|
||||
"can be specified using a comma separated list.")
|
||||
@@ -163,6 +165,10 @@ def check_process_exists(binary, attempts=1):
|
||||
def run_daemon_with_options(daemon_binary, args, output_file, jvm_debug_port=None):
|
||||
"""Wrapper around run_daemon() with options determined from command-line options."""
|
||||
env_vars = {"JAVA_TOOL_OPTIONS": build_java_tool_options(jvm_debug_port)}
|
||||
if options.env_vars is not None:
|
||||
for kv in options.env_vars.split():
|
||||
k, v = kv.split('=')
|
||||
env_vars[k] = v
|
||||
run_daemon(daemon_binary, args, build_type=options.build_type, env_vars=env_vars,
|
||||
output_file=output_file)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user