mirror of
https://github.com/apache/impala.git
synced 2026-01-05 21:00:54 -05:00
Addressed JIRAs: IMPALA-1947 and IMPALA-1813 New Feature: Adds support for creating an Avro table without an explicit Avro schema with the following syntax. CREATE TABLE <table_name> column_defs STORED AS AVRO Fixes and Improvements: This patch fixes and unifies the logic for reconciling differences between an Avro table's Avro Schema and its column definitions. This reconciliation logic is executed during Impala's CREATE TABLE and when loading a table's metadata. Impala generally performs the schema reconciliation during table creation, but Hive does not. In many cases, Hive's CREATE TABLE stores the original column definitions in the HMS (in the StorageDescriptor) instead of the reconciled column definitions. The reconciliation logic considers the field/column names and follows this conflict resolution policy which is similar to Hive's: Mismatched number of columns -> Prefer Avro columns. Mismatched name/type -> Prefer Avro column, except: A CHAR/VARCHAR column definition maps to an Avro STRING, and is preserved as a CHAR/VARCHAR in the reconciled schema. Behavior for TIMESTAMP: A TIMESTAMP column definition maps to an Avro STRING and is presented as a STRING in the reconciled schema, because Avro has no binary TIMESTAMP representation. As a result, no Avro table may have a TIMESTAMP column (existing behavior). Change-Id: I8457354568b6049b2dd2794b65fadc06e619d648 Reviewed-on: http://gerrit.cloudera.org:8080/550 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins
27 lines
1.0 KiB
Plaintext
27 lines
1.0 KiB
Plaintext
====
|
|
---- QUERY
|
|
# see testdata/avro_schema_resolution
|
|
select * from schema_resolution_test
|
|
---- TYPES
|
|
boolean, int, long, float, double, string, string, string
|
|
---- RESULTS
|
|
true,1,1,1,1,'default string','','NULL'
|
|
false,2,2,2,2,'serialized string','','NULL'
|
|
====
|
|
---- QUERY
|
|
# IMPALA-1136: Tests that Impala can read Hive-created Avro tables that have
|
|
# no specified Avro schema, i.e., the Avro schema is inferred from the column
|
|
# definitions.
|
|
# IMPALA-1947: A TIMESTAMP from the column definitions results in a STRING column
|
|
# backed by a stored Avro STRING during table loading.
|
|
# See testdata/avro_schema_resolution
|
|
select * from no_avro_schema where year = 2009 order by id limit 1
|
|
union all
|
|
select * from no_avro_schema where year = 2010 order by id limit 1
|
|
---- TYPES
|
|
int, boolean, int, int, int, bigint, float, double, string, string, string, int, int
|
|
---- RESULTS: VERIFY_IS_EQUAL_SORTED
|
|
2430,true,0,0,0,0,0,0,'09/01/09','0','2009-09-01 00:00:00',2009,9
|
|
6380,true,0,0,0,0,0,0,'10/01/10','0','2010-10-01 00:00:00',2010,10
|
|
====
|