Files
impala/testdata/workloads/functional-query/queries/QueryTest
Tim Armstrong 5afd9f7df7 IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found
This adds a test that performs some simple fuzz testing of HDFS
scanners. It creates a copy of a given HDFS table, with each
file in the table corrupted in a random way: either a single
byte is set to a random value, or the file is truncated to a
random length. It then runs a query that scans the whole table
with several different batch_size settings. I made some effort
to make the failures reproducible by explicitly seeding the
random number generator, and providing a mechanism to override
the seed.

The fuzzer has found crashes resulting from corrupted or truncated
input files for RCFile, SequenceFile, Parquet, and Text LZO so far.
Avro only had a small buffer read overrun detected by ASAN.

Includes fixes for Parquet crashes found by the fuzzer, a small
buffer overrun in Avro, and a DCHECK in MemPool.

Initially it is only enabled for Avro, Parquet, and uncompressed
text. As follow-up work we should fix the bugs in the other scanners
and enable the test for them.

We also don't implement abort_on_error=0 correctly in Parquet:
for some file formats, corrupt headers result in the query being
aborted, so an exception will xfail the test.

Testing:
Ran the test with exploration_strategy=exhaustive in a loop locally
with both DEBUG and ASAN builds for a couple of days over a weekend.
Also ran exhaustive private build.

Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
Reviewed-on: http://gerrit.cloudera.org:8080/3833
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-08-11 08:42:41 +00:00
..
2015-06-01 16:51:53 -07:00
2016-02-19 00:03:15 -08:00
2016-03-02 23:23:04 -08:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-11 03:10:11 -07:00
2014-06-24 02:14:27 -07:00
2015-03-11 16:39:39 -07:00
2014-01-08 10:48:09 -08:00
2014-06-20 13:35:10 -07:00