Files
impala/testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
Henry Robinson 34b5f1c416 IMPALA-(3895,3859): Don't log file data on parse errors
Logging file or table data is a bad idea, and doing it by default is
particularly bad. This patch changes HdfsScanNode::LogRowParseError() to
log a file and offset only.

Testing: See rewritten tests.

To support testing this change, we also fix IMPALA-3895, by introducing
a canonical string __HDFS_FILENAME__ that all Hadoop filenames in the ERROR
output are replaced with before comparing with the expected
results. This fixes a number of issues with the old way of matching
filenames which purported to be a regex, but really wasn't. In
particular, we can now match the rest of an ERROR line after the
filename, which was not possible before.

In some cases, we don't want to substitute filenames because the ERROR
output is looking for a very specific output. In that case we can write:

$NAMENODE/<filename>

and this patch will not perform _any_ filename substitutions on ERROR
sections that contain the $NAMENODE string.

Finally, this patch fixes a bug where a test that had an ERRORS section
but no RESULTS section would silently pass without testing anything.

Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272
Reviewed-on: http://gerrit.cloudera.org:8080/4020
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-08-25 10:20:36 +00:00

104 lines
2.0 KiB
Plaintext

====
---- QUERY
# Returns all results despite a discrepancy between the number of values
# scanned and the number of values stored in the file metadata.
# Set a single node and scanner thread to make this test deterministic.
set num_nodes=1;
set num_scanner_threads=1;
select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v
---- TYPES
bigint,bigint
---- RESULTS
1,10
2,10
3,10
4,10
5,10
6,10
7,10
8,10
9,10
10,10
11,10
12,10
13,10
14,10
15,10
16,10
17,10
18,10
19,10
20,10
21,10
22,10
23,10
24,10
25,10
26,10
27,10
28,10
29,10
30,10
---- ERRORS
Column metadata states there are 50 values, but read 100 values from column element. file=__HDFS_FILENAME__ (1 of 2 similar)
====
---- QUERY
# Same as above but only selecting a single scalar column.
set num_nodes=1;
set num_scanner_threads=1;
select id from bad_column_metadata
---- TYPES
bigint
---- RESULTS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---- ERRORS
Column metadata states there are 11 values, but read 10 values from column id. file=__HDFS_FILENAME__
====
---- QUERY
SELECT * from bad_parquet_strings_negative_len
---- TYPES
STRING
---- RESULTS
---- ERRORS
File '$NAMENODE/test-warehouse/bad_parquet_strings_negative_len_parquet/plain-encoded-negative-len.parq' is corrupt: error decoding value of type STRING at offset 58
File '$NAMENODE/test-warehouse/bad_parquet_strings_negative_len_parquet/dict-encoded-negative-len.parq' is corrupt: error reading dictionary for data of type STRING: could not decode dictionary
====
---- QUERY
SELECT * from bad_parquet_strings_out_of_bounds
---- TYPES
STRING
---- RESULTS
---- ERRORS
File '$NAMENODE/test-warehouse/bad_parquet_strings_out_of_bounds_parquet/plain-encoded-out-of-bounds.parq' is corrupt: error decoding value of type STRING at offset 58
File '$NAMENODE/test-warehouse/bad_parquet_strings_out_of_bounds_parquet/dict-encoded-out-of-bounds.parq' is corrupt: error reading dictionary for data of type STRING: could not decode dictionary
====