impala/testdata/ComplexTypesTbl/nonnullable.json at 88448d1d4ab31eaaf82f764b36dc7d11d4c63c32 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 12:02:10 -05:00

Files

Skye Wanderman-Milne bcc73a36da Nested types: read and materialize nested types in Parquet scanner

This patch modifies the Parquet scanner to resolve nested schemas, and
read and materialize collection types. The high-level modification is
to create a CollectionColumnReader that recursively materializes map-
and array-type slots.

This patch also adds many tests, most of which query a new table
called complextypestbl. This table contains hand-generated data that
is meant to expose edge cases in the scanner. The tests mostly test
the scanner, with a few tests of other functionality (e.g. array
serialization).

I ran a local benchmark comparing this scanner code to the original
scanner code on an expanded version of tpch_parquet.lineitem with
48009720 rows. My benchmark involved selecting different numbers of
columns with a single scanner thread, and I looked at the HDFS scan
node time in the query profiles. This code introduces a 10%-20%
regression in single-threaded scan time.

Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a
Reviewed-on: http://gerrit.cloudera.org:8080/576
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins

2015-09-02 19:23:54 +00:00

15 lines

258 B

JSON

Raw Blame History

 [
 {"id": 8,
  "int_array": [-1],
  "int_array_array": [[-1,-2],[]],
  "int_map": {"k1": -1},
  "int_map_array": [{}, {"k1": 1}, {}, {}],
  "nested_struct": {
    "a": -1,
    "b": [-1],
    "c": {
      "d": [
        [{"e": -1, "f": "nonnullable"}]]},
    "g": {}}}
 ]