Files
impala/testdata/bad_parquet_data
Gergely Fürnstáhl 71c904e5c2 IMPALA-10948: Default scale and DecimalType
Added default 0 for scale if it is not set to comply with parquet spec.

Wrapped reading scale and precision in a function to support reading
LogicalType.DecimalType if it is set, falling back to old ones if it is
not, for backward compatibility.

Regenerated bad_parquet_decimals table with filled DecimalType, moved
missing scale test, as it is no longer a bad table.

Added no_scale.parquet table to test reading table without set scale.

Checked it with parquet-tools:
message schema {
  optional fixed_len_byte_array(2) d1 (DECIMAL(4,0));
}

Change-Id: I003220b6e2ef39d25d1c33df62c8432803fdc6eb
Reviewed-on: http://gerrit.cloudera.org:8080/18224
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-03-04 16:49:22 +00:00
..

These Parquet files were created by modifying Impala's HdfsParquetTableWriter.

String Data
-----------
These files have a single nullable string column 's'.

dict-encoded-negative-len.parq: a single dictionary-encoded value with a negative length.
dict-encoded-out-of-bounds.parq: a single dictionary-encoded value with a length past end of page.
plain-encoded-negative-len.parq: a single plain-encoded value with a negative length.
plain-encoded-out-of-bounds.parq: a single plain-encoded value with a length past end of page.

Decimal Data
-----------
illegal_decimals.parq has 8 decimal columns(d1, d2, ..., d8). All of them have illegal schema in the footer.
Generated by modifying HdfsParquetTableWriter::WriteFileFooter by these:

 Status HdfsParquetTableWriter::WriteFileFooter() {
+  file_metadata_.schema[1].__set_precision(0);
+  file_metadata_.schema[1].__isset.precision = false;
+  file_metadata_.schema[1].logicalType.DECIMAL.precision = 0;
+  file_metadata_.schema[1].logicalType.__isset.DECIMAL = false;
+
+  file_metadata_.schema[2].__set_precision(20);
+  file_metadata_.schema[2].logicalType.DECIMAL.precision = 20;
+  file_metadata_.schema[2].logicalType.__isset.DECIMAL = true;
+  file_metadata_.schema[2].__isset.logicalType = true;
+
+  file_metadata_.schema[3].__set_precision(-1);
+  file_metadata_.schema[3].logicalType.DECIMAL.precision = -1;
+  file_metadata_.schema[3].logicalType.__isset.DECIMAL = true;
+  file_metadata_.schema[3].__isset.logicalType = true;
+
+  file_metadata_.schema[4].__set_type(parquet::Type::FIXED_LEN_BYTE_ARRAY);
+  file_metadata_.schema[4].__isset.type_length = false;
+
+  file_metadata_.schema[5].__set_type(parquet::Type::FIXED_LEN_BYTE_ARRAY);
+  file_metadata_.schema[5].__set_type_length(0);
+
+  file_metadata_.schema[6].__set_scale(-1);
+  file_metadata_.schema[6].logicalType.DECIMAL.scale = -1;
+  file_metadata_.schema[6].logicalType.__isset.DECIMAL = true;
+  file_metadata_.schema[6].__isset.logicalType = true;
+
+  file_metadata_.schema[7].__set_scale(4);
+  file_metadata_.schema[7].__set_precision(2);
+  file_metadata_.schema[7].logicalType.DECIMAL.scale = 4;
+  file_metadata_.schema[7].logicalType.DECIMAL.precision = 2;
+  file_metadata_.schema[7].logicalType.__isset.DECIMAL = true;
+  file_metadata_.schema[7].__isset.logicalType = true;

Then create the table and insert one row into it:

create table my_decimal_tbl (d1 decimal(4,2), d2 decimal(4,2), ..., d7 decimal(4,2)) stored as parquet;
insert into my_decimal_tbl values (cast(0 as decimal(4,2)), cast(0 as decimal(4,2)), ...);