mirror of
https://github.com/apache/impala.git
synced 2026-01-08 12:02:54 -05:00
IMPALA-2558: DCHECK in parquet scanner after block read error
There was an incorrect DCHECK in the parquet scanner. If abort_on_error is false, the intended behaviour is to skip to the next row group, but the DCHECK assumed that execution should have aborted if a parse error was encountered. This also: - Fixes a DCHECK after an empty row group. InitColumns() would try to create empty scan ranges for the column readers. - Uses metadata_range_->file() instead of stream_->filename() in the scanner. InitColumns() was using stream_->filename() in error messages, which used to work but now stream_ is set to NULL before calling InitColumns(). Change-Id: I8e29e4c0c268c119e1583f16bd6cf7cd59591701 Reviewed-on: http://gerrit.cloudera.org:8080/1257 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins
This commit is contained in:
committed by
Internal Jenkins
parent
19b6bf0201
commit
dd2eb951d7
@@ -1418,6 +1418,19 @@ hadoop fs -put -f ${IMPALA_HOME}/testdata/data/kite_required_fields.parquet \
|
||||
/test-warehouse/kite_required_fields_parquet/
|
||||
====
|
||||
---- DATASET
|
||||
-- Parquet file with incorrect column metadata in multiple row groups
|
||||
functional
|
||||
---- BASE_TABLE_NAME
|
||||
bad_column_metadata
|
||||
---- COLUMNS
|
||||
id bigint
|
||||
int_array array<int>
|
||||
---- LOAD
|
||||
`hadoop fs -mkdir -p /test-warehouse/bad_column_metadata_parquet && \
|
||||
hadoop fs -put -f ${IMPALA_HOME}/testdata/data/bad_column_metadata.parquet \
|
||||
/test-warehouse/bad_column_metadata_parquet
|
||||
====
|
||||
---- DATASET
|
||||
functional
|
||||
---- BASE_TABLE_NAME
|
||||
bad_serde
|
||||
|
||||
Reference in New Issue
Block a user