IMPALA-2558: DCHECK in parquet scanner after block read error

There was an incorrect DCHECK in the parquet scanner. If abort_on_error is false, the intended behaviour is to skip to the next row group, but the DCHECK assumed that execution should have aborted if a parse error was encountered. This also: - Fixes a DCHECK after an empty row group. InitColumns() would try to create empty scan ranges for the column readers. - Uses metadata_range_->file() instead of stream_->filename() in the scanner. InitColumns() was using stream_->filename() in error messages, which used to work but now stream_ is set to NULL before calling InitColumns(). Change-Id: I8e29e4c0c268c119e1583f16bd6cf7cd59591701 Reviewed-on: http://gerrit.cloudera.org:8080/1257 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins
2026-01-08 12:02:54 -05:00 · 2015-10-29 18:07:33 -07:00
parent 19b6bf0201
commit dd2eb951d7
8 changed files with 188 additions and 78 deletions
--- a/testdata/datasets/functional/functional_schema_template.sql
+++ b/testdata/datasets/functional/functional_schema_template.sql
@@ -1418,6 +1418,19 @@ hadoop fs -put -f ${IMPALA_HOME}/testdata/data/kite_required_fields.parquet \
 /test-warehouse/kite_required_fields_parquet/
 ====
 ---- DATASET
+-- Parquet file with incorrect column metadata in multiple row groups
+functional
+---- BASE_TABLE_NAME
+bad_column_metadata
+---- COLUMNS
+id bigint
+int_array array<int>
+---- LOAD
+`hadoop fs -mkdir -p /test-warehouse/bad_column_metadata_parquet && \
+hadoop fs -put -f ${IMPALA_HOME}/testdata/data/bad_column_metadata.parquet \
+/test-warehouse/bad_column_metadata_parquet
+====
+---- DATASET
 functional
 ---- BASE_TABLE_NAME
 bad_serde