Files
impala/testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
stiga-huang 599c84b4dd IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
The previous patch added checks on illegal decimal schemas of parquet
files. However, it doesn't return a non-ok status in
ParquetMetadataUtils::ValidateColumn if abort_on_error is set to false.
So we continue to use the illegal file schema and hit the DCHECK.

This patch fixes this and adding test coverage for illegal decimal
schemas.

Tests:
 - Add a bad parquet file with illegal decimal schemas.
 - Add e2e tests on the file.
 - Ran test_fuzz_decimal_tbl 100 times. Saw the errors are caught as
   expected.

Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Reviewed-on: http://gerrit.cloudera.org:8080/17748
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-05 07:55:26 +00:00

68 lines
3.0 KiB
Plaintext

====
---- QUERY
# IMPALA-2558: Trigger bad parse_status_ in HdfsParquetScanner::AssembleRows().
# Abort on error must be used to trigger a status.
# Set a single node and scanner thread to make this test deterministic.
set num_nodes=1;
set num_scanner_threads=1;
select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v
---- CATCH
Column metadata states there are 50 values, but read 100 values from column element.
====
---- QUERY
# IMPALA-2558. Same as above but only selecting a scalar column.
set num_nodes=1;
set num_scanner_threads=1;
select id from bad_column_metadata
---- CATCH
Column metadata states there are 11 values, but read 10 values from column id.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d1 from bad_parquet_decimals
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd1' does not have the decimal precision set.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d2 from bad_parquet_decimals
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd2' has a precision that does not match the table metadata precision. File metadata precision: 20, table metadata precision: 4.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d3 from functional_parquet.bad_parquet_decimals;
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd3' has a precision that does not match the table metadata precision. File metadata precision: -1, table metadata precision: 4.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d4 from functional_parquet.bad_parquet_decimals
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd4' does not have the scale set.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d5 from functional_parquet.bad_parquet_decimals
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd5' does not have type_length set.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d6 from functional_parquet.bad_parquet_decimals
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd6' has invalid type length: 0
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d7 from functional_parquet.bad_parquet_decimals
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd7' has invalid scale: -1. Precision is 4.
====
---- QUERY
# IMPALA-10808, IMPALA-10814: Check illegal decimal file schemas
select d8 from functional_parquet.bad_parquet_decimals;
---- CATCH
File '$NAMENODE/test-warehouse/bad_parquet_decimals_parquet/illegal_decimals.parq' column 'd8' has invalid scale: 4. Precision is 2.
====