mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-11134: Impala returns "Couldn't skip rows in file" error for old Parquet file
Impala returns "Couldn't skip rows in file" error for old Parquet
file written by an old Impala (e.g. Impala 2.5, 2.6) In DEBUG build
Impala crashes by a DCHECK:
Check failed: num_buffered_values_ > 0 (-1 vs. 0)
The problem is that in some old Parquet files there can be a mismatch
between 'num_values' in a page and the encoded def/rep levels.
There is usually one more def/rep levels encoded in these files.
In SkipTopLevelRows() we skipped values based on how many def levels are
92ce6fe48e/be/src/exec/parquet/parquet-column-readers.cc (L1308-L1314)
Since there are more def levels than values in some old files,
num_buferred_values_ could become negative.
This patch also takes the value of num_buferred_values_ into account
when calculating 'read_count', so we can deal with such files. With
this patch we also include the column name in the "Couldn't skip rows"
error message, so in the future it'll be easier to identify the
problematic columns.
Testing:
* added Parquet file written by Impala 2.5 and e2e test for it
Change-Id: I568fe59df720ea040be4926812412ba4c1510a26
Reviewed-on: http://gerrit.cloudera.org:8080/18257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
committed by
Impala Public Jenkins
parent
f0d4afaff1
commit
b60ccabd5b
@@ -475,6 +475,8 @@ error_codes = (
|
||||
("JWKS_PARSE_ERROR", 153, "Error parsing JWKS: $0."),
|
||||
|
||||
("JWT_VERIFY_FAILED", 154, "Error verifying JWT Token: $0."),
|
||||
|
||||
("PARQUET_ROWS_SKIPPING", 155, "Couldn't skip rows in column '$0' in file '$1'."),
|
||||
)
|
||||
|
||||
import sys
|
||||
|
||||
Reference in New Issue
Block a user