The previous patch of IMPALA-9071 assumes that all tables created by
CTAS statement are non transactional table. This is wrong since CTAS
statement can also specify tblproperties so can create transactional
table.
This patch fixs the hard coded external checking. Instead, we judge on
whether the table is transactional. If not, it will be translated to
external table by HMS.
Tests:
- Add coverage for creating transactional tables by CTAS.
Change-Id: I4b585216e33e4f7962b19ae2351165288691eaf2
Reviewed-on: http://gerrit.cloudera.org:8080/14546
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After upgrading Hive-3 to a version containing HIVE-22158, it's not
allowed for managed tables to be non transactional. Creating non ACID
tables will result in creating an external table with table property
'external.table.purge' set to true.
In Hive-3, the default location of external HDFS tables will be located
in 'metastore.warehouse.external.dir' if it's set. This property is
added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6
yet.
In CTAS statement, we create a temporary HMS Table for the analysis on
the Insert part. The table path is created assuming it's a managed
table, and the Insert part will use this path for insertion. However, in
Hive-3, the created table is translated to an external table. It's not
the same as we passed to the HMS API. The created table is located in
'metastore.warehouse.external.dir', while the table path we assumed is
in 'metastore.warehouse.dir'. This introduces bugs when these two
properties are different. CTAS statement will create table in one place
and insert data in another place.
This patch adds a new method in MetastoreShim to wrap the difference for
getting the default table path for non transactional tables between
Hive-2 and Hive-3.
Changes in the infra:
- To support customizing hive configuration, add an env var,
CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of
existing CLASSPATH. The customized hive-site.xml should be put inside
CUSTOM_CLASSPATH.
- Change hive-site.xml.py to generate a hive-site.xml with non default
'metastore.warehouse.external.dir'
- Add an option, --env_vars, in bin/start-impala-cluster.py to pass
down CUSTOM_CLASSPATH.
Tests:
- Add a custom cluster test to start Hive with
metastore.warehouse.external.dir being set to non default value. Run
it locally using CDP components with HIVE-22158. xfail the test until
we bump CDP_BUILD_NUMBER to 1507246.
- Run CORE tests using CDH components
Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9
Reviewed-on: http://gerrit.cloudera.org:8080/14527
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>