impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Files

Skye Wanderman-Milne 9b51b2b6e6 IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option

This patch introduces a new query option,
PARQUET_FALLBACK_SCHEMA_RESOLUTION which allows Parquet files' schemas
to be resolved by either name or position.  It's "fallback" because
eventually field IDs will be the primary schema resolution scheme, and
we don't want to create an option that we will have to change the name
of later. The default is still by position. I chose to do a query
option because it will make testing easier and also be easier to
diagnose resolution problems quickly in the field. If users want to
switch the default behavior to be by name (like Hive), they can use
the --default_query_options flag.

This patch also introduces a new test section, SHELL, which can be
used to execute shell commands in a .test file. This is useful for
copying files into test tables.

Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6
Reviewed-on: http://gerrit.cloudera.org:8080/2384
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins

2016-04-02 04:04:25 +00:00

README

IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option

2016-04-02 04:04:25 +00:00

switched_map.avsc

IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option

2016-04-02 04:04:25 +00:00

switched_map.json

IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option

2016-04-02 04:04:25 +00:00

switched_map.parq

IMPALA-2835: introduce PARQUET_FALLBACK_SCHEMA_RESOLUTION query option

2016-04-02 04:04:25 +00:00

README

switched_map.parq was generated by modifying parquet-mr to switch the key and value fields
of map, and then converting switched_map.json to parquet using switched_map.avsc as the
schema. switched_map.parq has the following schema according to parquet-tools:

message com.cloudera.impala.switched_map {
  required group int_map (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required int32 value;
      required binary key (UTF8);
    }
  }
}