IMPALA-11134: Impala returns "Couldn't skip rows in file" error for old Parquet file

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Impala returns "Couldn't skip rows in file" error for old Parquet
file written by an old Impala (e.g. Impala 2.5, 2.6) In DEBUG build
Impala crashes by a DCHECK:

 Check failed: num_buffered_values_ > 0 (-1 vs. 0)

The problem is that in some old Parquet files there can be a mismatch
between 'num_values' in a page and the encoded def/rep levels.
There is usually one more def/rep levels encoded in these files.

In SkipTopLevelRows() we skipped values based on how many def levels are
92ce6fe48e/be/src/exec/parquet/parquet-column-readers.cc (L1308-L1314)

Since there are more def levels than values in some old files,
num_buferred_values_ could become negative.

This patch also takes the value of num_buferred_values_ into account
when calculating 'read_count', so we can deal with such files. With
this patch we also include the column name in the "Couldn't skip rows"
error message, so in the future it'll be easier to identify the
problematic columns.

Testing:
 * added Parquet file written by Impala 2.5 and e2e test for it

Change-Id: I568fe59df720ea040be4926812412ba4c1510a26
Reviewed-on: http://gerrit.cloudera.org:8080/18257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This commit is contained in:

Zoltan Borok-Nagy

2022-02-19 18:21:49 +01:00

committed by

Impala Public Jenkins

parent f0d4afaff1

commit b60ccabd5b

6 changed files with 28 additions and 4 deletions

									
										2

common/thrift/generate_error_codes.py
									
												View File
												
				@@ -475,6 +475,8 @@ error_codes = (

				  ("JWKS_PARSE_ERROR", 153, "Error parsing JWKS: $0."),

				  ("JWT_VERIFY_FAILED", 154, "Error verifying JWT Token: $0."),

				  ("PARQUET_ROWS_SKIPPING", 155, "Couldn't skip rows in column '$0' in file '$1'."),

				)

				import sys

IMPALA-11134: Impala returns "Couldn't skip rows in file" error for old Parquet file

2 common/thrift/generate_error_codes.py Unescape Escape View File

2

common/thrift/generate_error_codes.py

View File