impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Files

Joe McDonnell 0163a10332 IMPALA-9068: Use different directories for external vs managed warehouse

Hive 3 changed the typical storage model for tables to split them
between two directories:
 - hive.metastore.warehouse.dir stores managed tables (which is now
   defined to be only transactional tables)
 - hive.metastore.warehouse.external.dir stores external tables
   (everything that is not a transactional table)
In more recent commits of Hive, there is now validation that the
external tables cannot be stored in the managed directory. In order
to adopt these newer versions of Hive, we need to use separate
directories for external vs managed warehouses.

Most of our test tables are not transactional, so they would reside
in the external directory. To keep the test changes small, this uses
/test-warehouse for the external directory and /test-warehouse/managed
for the managed directory. Having the managed directory be a subdirectory
of /test-warehouse means that the data snapshot code should not need to
change.

The Hive 2 configuration doesn't change as it does not have this concept.

Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION
to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate
location for data as compared to USE_CDP_HIVE=false. That should reduce
conflicts between the two configurations.

Testing:
 - Ran exhaustive tests with USE_CDP_HIVE=false
 - Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version)
 - Verified that dataload succeeds and tests are able to run with a newer
   Hive version.

Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Reviewed-on: http://gerrit.cloudera.org:8080/15026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2020-01-24 17:29:15 +00:00

create_table.sql

IMPALA-9068: Use different directories for external vs managed warehouse

2020-01-24 17:29:15 +00:00

file_schema1.avsc

IMPALA-1149: read bytes fields as strings in HdfsAvroScanner::MaterializeTuple()

2014-08-18 20:17:05 -07:00

file_schema2.avsc

IMPALA-8198: DATE: Read from avro.

2019-09-27 17:18:35 +00:00

README

IMPALA-441: support default values for Avro tables

2014-01-08 10:51:39 -08:00

records1.avro

IMPALA-1149: read bytes fields as strings in HdfsAvroScanner::MaterializeTuple()

2014-08-18 20:17:05 -07:00

records1.json

IMPALA-1149: read bytes fields as strings in HdfsAvroScanner::MaterializeTuple()

2014-08-18 20:17:05 -07:00

records2.avro

IMPALA-8198: DATE: Read from avro.

2019-09-27 17:18:35 +00:00

records2.json

IMPALA-8198: DATE: Read from avro.

2019-09-27 17:18:35 +00:00

README

This folder contains the files necessary to test Impala support for Avro schema resolution
(along with the TestAvroSchemaResolution query test).

create_table.sql creates a functional_avro_snap.schema_resolution_test table and loads
records1.avro and records2.avro. The .avro files were created via the following commands:

java -jar ~/avro-tools-1.7.4.jar fromjson --schema-file file_schema1.avsc --codec snappy records1.json > records1.avro
java -jar ~/avro-tools-1.7.4.jar fromjson --schema-file file_schema2.avsc --codec snappy records2.json > records2.avro

create_table.sql, file_schema1.avsc and file_schema2.avsc contain the relevant schema definitions.