IMPALA-2213: make Parquet scanner fail query if the file size metadata is stale

This patch changes the Parquet scanner to check if it can't read the
full footer scan range, indicating that file has been overwritten by a
shorter file without refreshing the table metadata. Before it would
DCHECK. This patch adds a test for this case, as well as the case
where the new file is longer than the metadata states (which fails
with an existing error).

Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07
Reviewed-on: http://gerrit.cloudera.org:8080/1084
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This commit is contained in:
Skye Wanderman-Milne
2015-09-30 21:55:30 -04:00
committed by ishaan
parent 6bac14a283
commit 68fef6a5bf
6 changed files with 173 additions and 9 deletions

View File

@@ -50,5 +50,5 @@ bigint,bigint,string,string,boolean,boolean,bigint,bigint,bigint,bigint
# Parquet file with invalid magic number
SELECT * from bad_magic_number
---- CATCH
File $NAMENODE/test-warehouse/bad_magic_number_parquet/bad_magic_number.parquet is invalid. Invalid file footer: XXXX
File '$NAMENODE/test-warehouse/bad_magic_number_parquet/bad_magic_number.parquet' has an invalid version number: XXXX
====