The two Parquet files (nullable.parq and nonnullable_orc.parq) were generated as testdata/data/schemas/nested/README stated. The two ORC files (nullable.orc and nonnullable.orc) were generated by the orc-tools which can convert JSON files into ORC format. However, we need to modify nullable.json and nonnullable.json to meet the format it requires. The whole file should not be a array. It should be JSON objects of each row joined by '\n'. Assume the JSON files are nullable_orc.json and nonnullable_orc.json. The ORC files can be regenerated by running the following commands in current directory: wget https://search.maven.org/remotecontent?filepath=org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar \ -O orc-tools-1.5.4-uber.jar java -jar orc-tools-1.5.4-uber.jar convert \ -s "struct,int_array_Array:array>,int_map:map,int_Map_Array:array>,nested_struct:struct,C:struct>>>,g:map>>>>>" \ -o nullable.orc \ nullable_orc.json java -jar orc-tools-1.5.4-uber.jar convert \ -s "struct,int_array_array:array>,Int_Map:map,int_map_array:array>,nested_Struct:struct,c:struct>>>,G:map>>>>>" \ -o nonnullable.orc \ nonnullable_orc.json