bad_parquet_data.parquet: Generated with parquet-mr 1.2.5 Contains 3 single-column rows: "parquet" "is" "fun" repeated_values.parquet: Generated with parquet-mr 1.2.5 Contains 3 single-column rows: "parquet" "parquet" "parquet" multiple_rowgroups.parquet: Generated with parquet-mr 1.2.5 Populated with: hive> set parquet.block.size=500; hive> INSERT INTO TABLE tbl SELECT l_comment FROM tpch.lineitem LIMIT 1000; alltypesagg_hive_13_1.parquet: Generated with parquet-mr version 1.5.0-cdh5.4.0-SNAPSHOT hive> create table alltypesagg_hive_13_1 stored as parquet as select * from alltypesagg; bad_column_metadata.parquet: Generated with hacked version of parquet-mr 1.8.2-SNAPSHOT Schema: {"type": "record", "namespace": "com.cloudera.impala", "name": "bad_column_metadata", "fields": [ {"name": "id", "type": ["null", "long"]}, {"name": "int_array", "type": ["null", {"type": "array", "items": ["null", "int"]}]} ] } Contains 3 row groups, each with ten rows and each array containing ten elements. The first rowgroup column metadata for 'int_array' incorrectly states there are 50 values (instead of 100), and the second rowgroup column metadata for 'id' incorrectly states there are 11 values (instead of 10). The third rowgroup has the correct metadata. data-bzip2.bz2 Generated with bzip2, contains single bzip2 stream Contains 1 column, uncompressed data size < 8M large_bzip2.bz2 Generated with bzip2, contains single bzip2 stream Contains 1 column, uncompressed data size > 8M data-pbzip2.bz2 Generated with pbzip2, contains multiple bzip2 streams Contains 1 column, uncompressed data size < 8M large_pbzip2.bz2 Generated with pbzip2, contains multiple bzip2 stream Contains 1 column, uncompressed data size > 8M