mirror of
https://github.com/apache/impala.git
synced 2026-01-01 18:00:30 -05:00
This patch modifies the Parquet scanner to resolve nested schemas, and read and materialize collection types. The high-level modification is to create a CollectionColumnReader that recursively materializes map- and array-type slots. This patch also adds many tests, most of which query a new table called complextypestbl. This table contains hand-generated data that is meant to expose edge cases in the scanner. The tests mostly test the scanner, with a few tests of other functionality (e.g. array serialization). I ran a local benchmark comparing this scanner code to the original scanner code on an expanded version of tpch_parquet.lineitem with 48009720 rows. My benchmark involved selecting different numbers of columns with a single scanner thread, and I looked at the HDFS scan node time in the query profiles. This code introduces a 10%-20% regression in single-threaded scan time. Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a Reviewed-on: http://gerrit.cloudera.org:8080/576 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins
32 lines
1.7 KiB
JSON
32 lines
1.7 KiB
JSON
{"type": "record",
|
|
"namespace": "com.cloudera.impala",
|
|
"name": "ComplexTypesTbl",
|
|
"fields": [
|
|
{"name": "id", "type": ["null", "long"]},
|
|
{"name": "int_array", "type": ["null", {"type": "array", "items": ["null", "int"]}]},
|
|
{"name": "int_array_array", "type": ["null", {"type": "array", "items":
|
|
["null", {"type": "array", "items": ["null", "int"]}]}]},
|
|
{"name": "int_map", "type": ["null", {"type": "map", "values": ["null", "int"]}]},
|
|
{"name": "int_map_array", "type": ["null", {"type": "array", "items":
|
|
["null", {"type": "map", "values": ["null", "int"]}]}]},
|
|
{"name": "nested_struct", "type":
|
|
["null", {"type": "record", "name": "r1", "fields": [
|
|
{"name": "a", "type": ["null", "int"]},
|
|
{"name": "b", "type": ["null", {"type": "array", "items": ["null", "int"]}]},
|
|
{"name": "c", "type": ["null", {"type": "record", "name": "r2", "fields": [
|
|
{"name": "d", "type": ["null", {"type": "array", "items":
|
|
["null", {"type": "array", "items":
|
|
["null", {"type": "record", "name": "r3", "fields": [
|
|
{"name": "e", "type": ["null", "int"]},
|
|
{"name": "f", "type": ["null", "string"]}]}]}]}]}
|
|
]}]},
|
|
{"name": "g", "type": ["null", {"type": "map", "values":
|
|
["null", {"type": "record", "name": "r4", "fields": [
|
|
{"name": "h", "type":
|
|
["null", {"type": "record", "name": "r5", "fields": [
|
|
{"name": "i", "type": ["null", {"type": "array", "items":
|
|
["null", "double"]}]}]}]}]}]}]}
|
|
]}]}
|
|
]
|
|
}
|