IMPALA-2558: DCHECK in parquet scanner after block read error

There was an incorrect DCHECK in the parquet scanner. If abort_on_error
is false, the intended behaviour is to skip to the next row group, but
the DCHECK assumed that execution should have aborted if a parse error
was encountered.

This also:
- Fixes a DCHECK after an empty row group. InitColumns() would try to
  create empty scan ranges for the column readers.
- Uses metadata_range_->file() instead of stream_->filename() in the
  scanner. InitColumns() was using stream_->filename() in error
  messages, which used to work but now stream_ is set to NULL before
  calling InitColumns().

Change-Id: I8e29e4c0c268c119e1583f16bd6cf7cd59591701
Reviewed-on: http://gerrit.cloudera.org:8080/1257
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This commit is contained in:
Skye Wanderman-Milne
2015-10-29 18:07:33 -07:00
committed by Internal Jenkins
parent 19b6bf0201
commit dd2eb951d7
8 changed files with 188 additions and 78 deletions

View File

@@ -1418,6 +1418,19 @@ hadoop fs -put -f ${IMPALA_HOME}/testdata/data/kite_required_fields.parquet \
/test-warehouse/kite_required_fields_parquet/
====
---- DATASET
-- Parquet file with incorrect column metadata in multiple row groups
functional
---- BASE_TABLE_NAME
bad_column_metadata
---- COLUMNS
id bigint
int_array array<int>
---- LOAD
`hadoop fs -mkdir -p /test-warehouse/bad_column_metadata_parquet && \
hadoop fs -put -f ${IMPALA_HOME}/testdata/data/bad_column_metadata.parquet \
/test-warehouse/bad_column_metadata_parquet
====
---- DATASET
functional
---- BASE_TABLE_NAME
bad_serde