impala/testdata at 881e00a8bff0469ab7860bcd0d4d4794fb04a4b8 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2026-01-10 00:00:16 -05:00

Files

History

Zoltan Borok-Nagy 881e00a8bf IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN

If the first number in a row group written by Impala is NaN,
then Impala writes incorrect statistics in the metadata.
This will result in incorrect results when filtering the
data.

This commit fixes the read path when encountering NaNs in
Parquet min/max statistics. If min and max are both NaN, we
can't use the statistics at all. If only one of them is NaN,
the other still can be used.

I added some tests to QueryTest/parqet-stats.test

Change-Id: If3897fc1426541239223670812f59e2bed32f455
Reviewed-on: http://gerrit.cloudera.org:8080/9358
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins

2018-02-22 00:57:46 +00:00

..

Fix IMP-297

2014-01-08 10:46:44 -08:00

AllTypesErrorNoNulls

Timestamp data type implimentation.

2012-03-22 21:38:18 -07:00

IMPALA-1136, IMPALA-2161: Skip \u0000 characters when dealing Avro schemas

2015-09-02 00:37:28 +00:00

avro_schema_resolution

IMPALA-4615: Fix create_table.sql command order

2017-03-09 06:00:15 +00:00

IMPALA-4387: validate decimal type in Avro file schema

2016-10-30 00:12:58 +00:00

bad_parquet_data

IMPALA-3745: parquet invalid data handling

2016-06-15 21:33:39 -07:00

Add HdfsLzoTextScanner

2014-01-08 10:46:35 -08:00

Add support for streaming decompression of gzip text

2014-11-23 01:55:55 -08:00

Add HdfsLzoTextScanner

2014-01-08 10:46:35 -08:00

IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-17 22:52:58 +00:00

IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 22:52:29 +00:00

IMPALA-6068: Scale back fixing functional-types

2017-12-04 23:46:44 +00:00

ComplexTypesTbl

IMPALA-4675: Case-insensitive matching of Parquet fields.

2017-03-03 10:20:07 +00:00

compressed_formats

IMPALA-1619: Support 64-bit allocations.

2016-07-08 15:42:09 -07:00

CustomerMultiBlock

IMPALA-4993: extend dictionary filtering to collections

2018-01-19 20:37:25 +00:00

IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN

2018-02-22 00:57:46 +00:00

IMPALA-4993: extend dictionary filtering to collections

2018-01-19 20:37:25 +00:00

adding outer joins plus new tests

2011-09-28 09:02:07 -07:00

Perf Work:

2011-12-30 00:26:27 -08:00

ImpalaDemoDataset

Test data loading framework improvements

2014-01-08 10:46:49 -08:00

adding outer joins plus new tests

2011-09-28 09:02:07 -07:00

Fix null string parsing.

2014-01-08 10:44:40 -08:00

LineItemMultiBlock

IMPALA-2466: Add more tests for the HDFS parquet scanner.

2016-03-25 13:10:15 +00:00

IMPALA-3918: Remove Cloudera copyrights and add ASF license header

2016-08-09 08:19:41 +00:00

max_nesting_depth

Nested Types: Enforce and test maximum nesting depth of 100.

2015-10-05 11:30:54 -07:00

multi_compression_parquet_data

IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-10 01:30:33 +00:00

IMPALA-13: Use SSE string functions that take an explicit length

2014-04-11 11:16:24 -07:00

parquet_nested_types_encodings

IMPALA-4725: Query option to control Parquet array resolution.

2017-03-09 05:07:44 +00:00

parquet_schema_resolution

IMPALA-3786: Replace "cloudera" with "apache" (part 2)

2016-09-29 21:14:13 +00:00

src/main/java/org/apache/impala/datagenerator

IMPALA-4433: Always generate testdata using the same time zone setting

2016-11-15 04:18:33 +00:00

Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.

2017-12-20 22:04:18 +00:00

TblWithRaggedColumns

IMP-380 handle '\r' at end of row.

2014-01-08 10:46:14 -08:00

IMP-232: Parallel INSERT OVERWRITE

2014-01-08 10:45:04 -08:00

When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes

2014-01-08 10:44:24 -08:00

tinytable_seq_snap

IMPALA-362: impalad hangs when read sequence file without contents

2014-01-08 10:50:49 -08:00

IMPALA-4171: Remove JAR from repo.

2016-09-22 02:00:50 +00:00

UnsupportedTypes

IMPALA-3812: Fix error message for unsupported types

2016-11-17 05:31:34 +00:00

IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN

2018-02-22 00:57:46 +00:00

__init__.py

CDH-18416: Don't inline ReadWriteUtil::ReadZLong()

2014-04-28 15:58:15 -07:00

.gitignore

Updates several .gitignore files.

2017-08-31 01:40:47 +00:00

pom.xml

Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.

2017-12-20 22:04:18 +00:00