Files
impala/testdata/workloads/functional-query/queries/QueryTest/avro-schema-resolution.test
Alex Behm 6f0b255c5a Address several shortcomings with respect to the usability of Avro tables.
Addressed JIRAs: IMPALA-1947 and IMPALA-1813

New Feature:
Adds support for creating an Avro table without an explicit
Avro schema with the following syntax.

CREATE TABLE <table_name> column_defs STORED AS AVRO

Fixes and Improvements:
This patch fixes and unifies the logic for reconciling differences between
an Avro table's Avro Schema and its column definitions. This reconciliation
logic is executed during Impala's CREATE TABLE and when loading a table's
metadata. Impala generally performs the schema reconciliation during table
creation, but Hive does not. In many cases, Hive's CREATE TABLE stores the
original column definitions in the HMS (in the StorageDescriptor) instead
of the reconciled column definitions.

The reconciliation logic considers the field/column names and follows this
conflict resolution policy which is similar to Hive's:

Mismatched number of columns -> Prefer Avro columns.
Mismatched name/type -> Prefer Avro column, except:
  A CHAR/VARCHAR column definition maps to an Avro STRING, and is preserved
  as a CHAR/VARCHAR in the reconciled schema.

Behavior for TIMESTAMP:
A TIMESTAMP column definition maps to an Avro STRING and is presented as a STRING
in the reconciled schema, because Avro has no binary TIMESTAMP representation.
As a result, no Avro table may have a TIMESTAMP column (existing behavior).

Change-Id: I8457354568b6049b2dd2794b65fadc06e619d648
Reviewed-on: http://gerrit.cloudera.org:8080/550
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-25 09:52:18 +00:00

27 lines
1.0 KiB
Plaintext

====
---- QUERY
# see testdata/avro_schema_resolution
select * from schema_resolution_test
---- TYPES
boolean, int, long, float, double, string, string, string
---- RESULTS
true,1,1,1,1,'default string','','NULL'
false,2,2,2,2,'serialized string','','NULL'
====
---- QUERY
# IMPALA-1136: Tests that Impala can read Hive-created Avro tables that have
# no specified Avro schema, i.e., the Avro schema is inferred from the column
# definitions.
# IMPALA-1947: A TIMESTAMP from the column definitions results in a STRING column
# backed by a stored Avro STRING during table loading.
# See testdata/avro_schema_resolution
select * from no_avro_schema where year = 2009 order by id limit 1
union all
select * from no_avro_schema where year = 2010 order by id limit 1
---- TYPES
int, boolean, int, int, int, bigint, float, double, string, string, string, int, int
---- RESULTS: VERIFY_IS_EQUAL_SORTED
2430,true,0,0,0,0,0,0,'09/01/09','0','2009-09-01 00:00:00',2009,9
6380,true,0,0,0,0,0,0,'10/01/10','0','2010-10-01 00:00:00',2010,10
====