Files
impala/testdata/ComplexTypesTbl/nonnullable.json
Skye Wanderman-Milne bcc73a36da Nested types: read and materialize nested types in Parquet scanner
This patch modifies the Parquet scanner to resolve nested schemas, and
read and materialize collection types. The high-level modification is
to create a CollectionColumnReader that recursively materializes map-
and array-type slots.

This patch also adds many tests, most of which query a new table
called complextypestbl. This table contains hand-generated data that
is meant to expose edge cases in the scanner. The tests mostly test
the scanner, with a few tests of other functionality (e.g. array
serialization).

I ran a local benchmark comparing this scanner code to the original
scanner code on an expanded version of tpch_parquet.lineitem with
48009720 rows. My benchmark involved selecting different numbers of
columns with a single scanner thread, and I looked at the HDFS scan
node time in the query profiles. This code introduces a 10%-20%
regression in single-threaded scan time.

Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a
Reviewed-on: http://gerrit.cloudera.org:8080/576
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-02 19:23:54 +00:00

15 lines
258 B
JSON

[
{"id": 8,
"int_array": [-1],
"int_array_array": [[-1,-2],[]],
"int_map": {"k1": -1},
"int_map_array": [{}, {"k1": 1}, {}, {}],
"nested_struct": {
"a": -1,
"b": [-1],
"c": {
"d": [
[{"e": -1, "f": "nonnullable"}]]},
"g": {}}}
]