Files
impala/testdata/data
Juan Yu d1c263402e IMPALA-1973: Fixing crash when uninitialized, empty row is added in HdfsTextScanner
This patch fixes an issue when an uninitialized, empty row is falsely
added to the rowbatch. The uninitialized data inside this row leads
later on to a crash when the null byte is checked together with the
offsets (that contains garbage).

The fix is to not only check for the number of materialized columns, but
as well for the number of materialized partition key columns. Only if both are
empty and the parser has an unfinished tuple, add the empty row.

To accommodate for the last row, check in FinishScanRange() if there is an
unfinished tuple with materialized slots or materialized partition key. Write
the fields if necessary.

Change-Id: I2808cc228e62d048d917d3a6352d869d117597ab
(cherry picked from commit c1795a8b40d10fbb32d9051a0e7de5ebffc8a6bd)
Reviewed-on: http://gerrit.cloudera.org:8080/364
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-05-05 00:19:12 +00:00
..
2015-02-27 18:48:56 +00:00
2014-05-16 22:26:11 -07:00
2014-04-14 21:07:32 -07:00
2014-04-14 21:07:32 -07:00
2014-01-08 10:48:41 -08:00
2012-01-18 23:08:29 -08:00

bad_parquet_data.parquet:
Generated with parquet-mr 1.2.5
Contains 3 single-column rows:
"parquet"
"is"
"fun"

repeated_values.parquet:
Generated with parquet-mr 1.2.5
Contains 3 single-column rows:
"parquet"
"parquet"
"parquet"

multiple_rowgroups.parquet:
Generated with parquet-mr 1.2.5
Populated with:
hive> set parquet.block.size=500;
hive> INSERT INTO TABLE tbl
      SELECT l_comment FROM tpch.lineitem LIMIT 1000;

alltypesagg_hive_13_1.parquet:
Generated with parquet-mr version 1.5.0-cdh5.4.0-SNAPSHOT
hive> create table alltypesagg_hive_13_1 stored as parquet as select * from alltypesagg;