impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Files

Tim Armstrong 5afd9f7df7 IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found

This adds a test that performs some simple fuzz testing of HDFS
scanners. It creates a copy of a given HDFS table, with each
file in the table corrupted in a random way: either a single
byte is set to a random value, or the file is truncated to a
random length. It then runs a query that scans the whole table
with several different batch_size settings. I made some effort
to make the failures reproducible by explicitly seeding the
random number generator, and providing a mechanism to override
the seed.

The fuzzer has found crashes resulting from corrupted or truncated
input files for RCFile, SequenceFile, Parquet, and Text LZO so far.
Avro only had a small buffer read overrun detected by ASAN.

Includes fixes for Parquet crashes found by the fuzzer, a small
buffer overrun in Avro, and a DCHECK in MemPool.

Initially it is only enabled for Avro, Parquet, and uncompressed
text. As follow-up work we should fix the bugs in the other scanners
and enable the test for them.

We also don't implement abort_on_error=0 correctly in Parquet:
for some file formats, corrupt headers result in the query being
aborted, so an exception will xfail the test.

Testing:
Ran the test with exploration_strategy=exhaustive in a loop locally
with both DEBUG and ASAN builds for a couple of days over a weekend.
Also ran exhaustive private build.

Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
Reviewed-on: http://gerrit.cloudera.org:8080/3833
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins

2016-08-11 08:42:41 +00:00

aggregation_no_codegen_only.test

IMPALA-1621,2241,2271,2330,2352: Lazy switch to IO buffers to reduce min mem needed for PAGG/PHJ

2015-09-23 11:07:42 -07:00

aggregation.test

IMPALA-3018: Don't return NULL on zero length allocations.

2016-07-14 19:04:45 +00:00

alloc-fail-init.test

IMPALA-3350: Add some missing StringVal.is_null checks

2016-05-12 14:17:39 -07:00

alloc-fail-update.test

IMPALA-2925: Fix flaky tests in test_alloc_fail_update()

2016-02-10 00:54:11 +00:00

alter-table-set-column-stats.test

IMPALA-3634: Use $FILESYSTEM_PREFIX in alter-table-set-column-stats.test

2016-05-31 23:32:12 -07:00

alter-table.test

IMPALA-1740: Add support for skip.header.line.count.

2016-05-12 14:17:46 -07:00

analytic-fns-tpcds.test

[CDH5] Fix tpcds analytical functions test.

2014-09-26 16:56:40 -07:00

analytic-fns.test

IMPALA-3210: last/first_value() support for IGNORE NULLS

2016-07-18 08:28:09 -07:00

avro-schema-changes.test

IMPALA-3687: Prefer Avro field name during schema reconciliation

2016-07-14 19:04:43 +00:00

avro-schema-resolution.test

IMPALA-3687: Prefer Avro field name during schema reconciliation

2016-07-14 19:04:43 +00:00

avro-writer.test

IMPALA-1185: Make Avro and Seq writers unsupported

2014-09-26 12:28:03 -07:00

chars-formats.test

Char PARQUET, AVRO, and TEXT tests

2014-09-26 12:24:07 -07:00

chars.test

IMPALA-1636: Generalize index-based partition pruning to allow constant

2015-03-07 09:51:27 +00:00

compute-stats-decimal.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

compute-stats-incremental.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

compute-stats-keywords.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

compute-stats-many-partitions.test

IMPALA-1595: Add 'location' to SHOW [TABLE STATS|PARTITIONS] for HDFS tables

2015-04-21 19:27:50 +00:00

compute-stats.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

corrupt-stats.test

Use unique_database fixture in test_compute_stats.py.

2016-05-12 14:17:50 -07:00

create_kudu.test

Kudu Table and Kudu Scan Node

2015-06-01 16:51:53 -07:00

create-database.test

IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 10:31:15 -07:00

create-table-as-select.test

IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 10:31:15 -07:00

create-table-like-file.test

IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 10:31:15 -07:00

create-table-like-table.test

IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 10:31:15 -07:00

create-table.test

IMPALA-3530: Clean up test_ddl.py. Part 1.

2016-06-10 10:31:15 -07:00

data-source-tables.test

IMPALA-2147: Support IS [NOT] DISTINCT FROM and "<=>" predicates

2016-01-14 05:45:22 +00:00

decimal_avro.test

IMPALA-3206: Enable codegen for AVRO_DECIMAL

2016-07-14 19:04:44 +00:00

decimal.test

IMPALA-3210: last/first_value() support for IGNORE NULLS

2016-07-18 08:28:09 -07:00

delimited-latin-text.test

IMPALA-3491: Use unique_database fixture in test_delimited_text.py.

2016-06-07 09:34:30 -07:00

delimited-text.test

IMPALA-3491: Use unique_database fixture in test_delimited_text.py.

2016-06-07 09:34:30 -07:00

describe-db.test

IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.

2016-06-07 09:34:30 -07:00

describe-path.test

IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.

2016-06-07 09:34:30 -07:00

distinct-estimate.test

Improve Hll estimate for small cardinalities.

2015-07-16 19:38:17 +00:00

distinct.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

empty.test

IMPALA-2894: Move regression test into a different .test file.

2016-01-27 20:41:45 +00:00

exchange-delays.test

Test for IMPALA-2987

2016-03-02 23:23:04 -08:00

explain-level0.test

ExecSummary

2014-06-11 03:10:11 -07:00

explain-level1.test

ExecSummary

2014-06-11 03:10:11 -07:00

explain-level2.test

ExecSummary

2014-06-11 03:10:11 -07:00

explain-level3.test

ExecSummary

2014-06-11 03:10:11 -07:00

exprs.test

IMPALA-2107: Add Base64 encoder/decoder

2016-05-12 14:17:32 -07:00

functions-ddl.test

IMPALA-2843: Persist hive udfs across catalog restarts

2016-02-19 23:04:03 -08:00

grant_revoke.test

IMPALA-3133: Wrong privileges after a REVOKE ALL ON SERVER statement

2016-05-12 14:17:57 -07:00

hbase-compute-stats-incremental.test

IMPALA-3491: Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture.

2016-05-23 08:40:19 -07:00

hbase-compute-stats.test

IMPALA-3491: Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture.

2016-05-23 08:40:19 -07:00

hbase-filters.test

IMPALA-642: Conjunctive predicates on HBase table not working...

2014-05-08 13:59:00 -07:00

hbase-inline-view.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-inserts.test

Add nested types support to Create Table Like File

2015-08-22 01:46:26 +00:00

hbase-limit.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-rowkeys.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-scan-node.test

Add partition pruning tests

2014-06-24 02:14:27 -07:00

hbase-show-create-table.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-show-stats.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hbase-subquery.test

Treat HBase as a file format for functional tests

2014-01-08 10:52:36 -08:00

hbase-top-n.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

hdfs-caching-validation.test

IMPALA-1595: Add 'location' to SHOW [TABLE STATS|PARTITIONS] for HDFS tables

2015-04-21 19:27:50 +00:00

hdfs-caching.test

IMPALA-2862: Fix regex parsing in test result verifier

2016-02-02 21:55:57 +00:00

hdfs-partitions.test

IMPALA-2514: DCHECK on destroying an ExprContext

2015-10-12 14:41:00 -07:00

hdfs-scan-node.test

Add partition pruning tests

2014-06-24 02:14:27 -07:00

hdfs-text-scan-with-header.test

IMPALA-1740: Add support for skip.header.line.count.

2016-05-12 14:17:46 -07:00

hdfs-text-scan.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

hdfs-tiny-scan.test

Fix IMPALA-129, IMPALA-534, and other scanner bugs.

2014-01-08 10:52:14 -08:00

hidden-files.test

IMPALA-3491: Use unique_database fixture in test_hidden_files.py.

2016-05-12 14:17:59 -07:00

impala-demo.test

Test data loading framework improvements

2014-01-08 10:46:49 -08:00

inline-view-limit.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

inline-view.test

IMPALA-2375: Disabling/moving tests that don't work with the old HJ

2015-10-07 14:47:40 -07:00

insert_null.test

Add nested types support to Create Table Like File

2015-08-22 01:46:26 +00:00

insert_overwrite.test

Added SHOW TABLE/COLUMN STATS command.

2014-01-08 10:53:51 -08:00

insert_parquet_invalid_codec.test

Enable isilon end to end tests for Impala.

2015-05-27 22:25:12 +00:00

insert_part_key.test

Throw error on unrecognized test sections.

2014-12-02 18:08:09 -08:00

insert_permutation.test

IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

2016-05-12 14:17:49 -07:00

insert.test

IMPALA-1440: test for insert mem limit

2016-05-31 23:32:12 -07:00

invalid_header.test

IMPALA-3004: Fix QueryTest tests

2016-02-19 00:03:15 -08:00

java-udf.test

IMPALA-3378/IMPALA-3379: fix various JNI issues

2016-05-12 14:17:41 -07:00

joins-against-hbase.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

joins-partitioned.test

IMPALA-2529: expr test case fails on non-partitioned HJ

2015-10-12 14:41:05 -07:00

joins.test

IMPALA-3645: Free probe expressions' local allocations in ConstructBuildSide()

2016-06-02 09:32:54 -07:00

kudu_alter.test

KUDU-1029, KUDU-1087: Allow altering tblproperties()

2015-09-03 16:44:10 -07:00

kudu_crud.test

IMPALA-3454: Kudu deletes may fail if subqueries are used

2016-05-25 06:41:29 -07:00

kudu_partition_ddl.test

Re-enable Kudu in build using client stubs when needed

2016-03-29 23:57:54 +00:00

kudu_stats.test

IMPALA-3373: Computing stats on Kudu table duplicates the columns

2016-05-12 14:17:34 -07:00

kudu-scan-node.test

IMPALA-2635: Kudu scanner hangs on UNION

2016-01-28 21:49:39 -08:00

kudu-show-create.test

KUDU-1115: Deprecate 'kudu.split_keys' for range partitioning

2015-09-16 16:43:47 -07:00

large_strings.test

IMPALA-3350: Add some missing StringVal.is_null checks

2016-05-12 14:17:39 -07:00

legacy-joins-aggs.test

Fail queries that require a SubplanNode when using legacy joins and aggs.

2015-09-10 04:50:31 +00:00

libs_with_same_filenames.test

IMPALA-3256: TestUdfs.test_libs_with_same_filenames failure

2016-05-12 14:17:29 -07:00

limit.test

Remove explicit references to functional_hbase tables from .test files.

2015-02-23 23:32:41 +00:00

load-java-udfs.test

IMPALA-2843: Persist hive udfs across catalog restarts

2016-02-19 23:04:03 -08:00

load.test

IMPALA-3729: batch_size=1 coverage for avro scanner

2016-07-19 23:30:02 -07:00

local-filesystem.test

IMPALA-3491: Use unique_database fixture in test_local_fs.py

2016-06-08 16:30:32 -07:00

max-nesting-depth.test

Nested Types: Enforce and test maximum nesting depth of 100.

2015-10-05 11:30:54 -07:00

misc.test

IMPALA-2776: Remove escapechartesttable and associated tests.

2016-01-05 06:04:41 +00:00

mixed-format.test

Change the way data is loaded

2014-01-08 10:48:09 -08:00

multiple-filesystems.test

IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING

2016-05-31 23:32:11 -07:00

nested-types-runtime.test

IMPALA-3311: fix string data coming out of aggs in subplans

2016-05-12 23:06:36 -07:00

nested-types-scanner-array-materialization.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-scanner-basic.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-scanner-maps.test

Regenerate complextypestbl files to include nested_struct.g field

2016-04-01 05:06:38 +00:00

nested-types-scanner-multiple-materialization.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-scanner-position.test

Nested types: read and materialize nested types in Parquet scanner

2015-09-02 19:23:54 +00:00

nested-types-subplan.test

IMPALA-2894: Move regression test into a different .test file.

2016-01-27 20:41:45 +00:00

nested-types-tpch.test

IMPALA-2993: don't check for "Failed to allocate buffer for collection" error

2016-02-18 01:25:10 -08:00

nested-types-with-clause.test

IMPALA-2375: Disabling/moving tests that don't work with the old HJ

2015-10-07 14:47:40 -07:00

null_data.test

Add nested types support to Create Table Like File

2015-08-22 01:46:26 +00:00

outer-joins.test

IMPALA-2950: Fully resolve exprs before wrapping with TupleIsNullPredicates.

2016-02-10 07:16:58 +00:00

overflow.test

IMPALA-724: Support infinite / nan values in text files

2014-05-08 12:28:53 -07:00

parquet-abort-on-error.test

IMPALA-2736: Basic column-wise slot materialization in Parquet scanner.

2016-05-12 14:17:48 -07:00

parquet-continue-on-error.test

IMPALA-3745: parquet invalid data handling

2016-06-15 21:33:39 -07:00

parquet-corrupt-rle-counts-abort.test

IMPALA-3754: fix TestParquet.test_corrupt_rle_counts flakiness

2016-06-20 15:37:18 -07:00

parquet-corrupt-rle-counts.test

IMPALA-3754: fix TestParquet.test_corrupt_rle_counts flakiness

2016-06-20 15:37:18 -07:00

parquet-resolution-by-name.test

Query options not correctly reset after each test.

2016-05-12 14:17:38 -07:00

parquet.test

IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found

2016-08-11 08:42:41 +00:00

partition-col-types.test

IMPALA-2100: Exclude explain header from expected results of test_partitioning.py.

2015-09-08 19:57:55 +00:00

runtime_filters_wait.test

IMPALA-3480: Add query options for min/max filter sizes

2016-05-12 23:06:35 -07:00

runtime_filters.test

IMPALA-2956: Filters should be able to target multiple scan nodes

2016-05-18 01:40:22 -07:00

runtime_row_filters_phj.test

IMPALA-3007: Adjust Bloom Filter size according to NDV estimate

2016-05-12 14:17:46 -07:00

runtime_row_filters.test

IMPALA-2956: Filters should be able to target multiple scan nodes

2016-05-18 01:40:22 -07:00

scanners.test

Add partition pruning tests

2014-06-24 02:14:27 -07:00

semi-joins-exhaustive.test

IMPALA-2375: Unblock old hj/agg test runs

2015-09-27 15:13:32 -07:00

semi-joins.test

IMPALA-2375: Unblock old hj/agg test runs

2015-09-27 15:13:32 -07:00

seq-writer.test

Adding SEQUENCEFILE compressed record format

2014-11-19 17:21:36 -08:00

set.test

IMPALA-3535: Ignore invalid per-pool default query options

2016-05-17 10:09:05 -07:00

show-create-table.test

IMPALA-783: add show create view as alias for show create table

2016-01-20 04:32:21 +00:00

show-data-sources.test

S3: Some more work toward enabling additional S3 test coverage

2015-03-03 08:29:13 +00:00

show-stats.test

IMPALA-3491: Use unique_database fixture in test_metadata_query_statements.py.

2016-06-07 09:34:30 -07:00

show.test

IMPALA-3711: Remove unnecessary privilege checks in getDbsMetadata()

2016-07-07 10:41:29 -07:00

single-node-large-sorts.test

IMPALA-3344: Simplify sorter and document/enforce invariants.

2016-06-02 21:33:08 -07:00

single-node-nlj-exhaustive.test

IMPALA-2824: Restore query options after each test.

2016-01-26 03:13:05 +00:00

single-node-nlj.test

IMPALA-561: Allow multiple callbacks in a thread resource pool.

2016-03-10 23:16:29 +00:00

sort.test

IMPALA-3344: Simplify sorter and document/enforce invariants.

2016-06-02 21:33:08 -07:00

spilling.test

IMPALA-1346/1590/2344: fix sorter buffer mgmt when spilling

2016-06-06 17:34:07 -07:00

strict-mode-abort.test

IMPALA-3579: Strict handling of numeric overflow in text parsing

2016-05-23 08:40:20 -07:00

strict-mode.test

IMPALA-3579: Strict handling of numeric overflow in text parsing

2016-05-23 08:40:20 -07:00

subplans.test

IMPALA-2368: Prevent double Reset() with nested subplans.

2015-09-27 15:13:28 -07:00

subquery.test

IMPALA-3232: Allow not-exists uncorrelated subqueries

2016-05-12 23:06:36 -07:00

test-unmatched-schema.test

IMPALA-3729: batch_size=1 coverage for avro scanner

2016-07-19 23:30:02 -07:00

text-bzip-scan.test

IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files.

2016-02-28 21:31:37 -08:00

text-writer.test

IMPALA-1185: Make Avro and Seq writers unsupported

2014-09-26 12:28:03 -07:00

top-n.test

IMPALA-3412: fix CHAR codegen crash in tuple comparator

2016-05-12 14:17:45 -07:00

truncate-table.test

IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

2016-05-12 14:17:49 -07:00

uda-mem-limit.test

S3: enable more tests for S3

2015-03-11 16:39:39 -07:00

uda.test

IMPALA-1829: UDAs with different intermediate type

2015-08-19 04:37:39 +00:00

udf-errors.test

S3: enable more tests for S3

2015-03-11 16:39:39 -07:00

udf-init-close.test

IMPALA-1030: HdfsTableSink was evaluating exprs in Prepare()

2014-06-12 02:23:20 -07:00

udf-mem-limit.test

S3: enable more tests for S3

2015-03-11 16:39:39 -07:00

udf.test

IMPALA-3674: Lazy materialization of LLVM module bitcode.

2016-07-20 18:30:25 -07:00

union.test

IMPALA-2948: Fix a bug in the planner when fast partition key scan is enabled

2016-02-06 05:28:28 +00:00

use.test

Change the way data is loaded

2014-01-08 10:48:09 -08:00

values.test

IMPALA-2749: Fix decimal multiplication overflow

2016-01-23 23:59:27 +00:00

views-compatibility.test

IMPALA-995: Add plan hints embedded in comments and preserve them in views.

2014-09-18 00:36:03 -07:00

views-ddl.test

IMPALA-3139: Fix drop table statement to not drop views and vice versa

2016-03-15 12:10:33 +00:00

views.test

Added order by query tests

2014-06-20 13:35:10 -07:00

wide-row.test

IMPALA-525: Adjust IO buffer size based on read length and other memory fixes

2014-01-08 10:54:01 -08:00

with-clause.test

IMPALA-2375: Disabling/moving tests that don't work with the old HJ

2015-10-07 14:47:40 -07:00