impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 18:00:57 -05:00

Files

Lenni Kuff cc1c0c61fd IMP-1291: Support "extended" ASCII characters as delimiters in text files

This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
  unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
  for ASCII character values between 128-255 (-2 maps to ASCII 254).

Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.

To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.

Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888

2014-03-13 13:00:15 -07:00

mstr

- Move thrift out of FE src and into impala/common

2011-12-30 19:35:20 -08:00

bad_parquet_data.parquet

IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8

2014-01-08 10:54:27 -08:00

multiple_rowgroups.parquet

IMPALA-729: fix resource management in Parquet scanner for multiple row groups

2014-01-08 10:56:26 -08:00

oldrcfile.rc

Fix pre-hive 9 rc file scanner.

2014-01-08 10:48:41 -08:00

overflow.txt

- Cleaned up some TODOs.

2012-01-18 23:08:29 -08:00

README

IMPALA-729: fix resource management in Parquet scanner for multiple row groups

2014-01-08 10:56:26 -08:00

repeated_values.parquet

Allow zero bit width dict/RLE decoders.

2014-01-08 10:54:27 -08:00

text-comma-backslash-newline.txt

IMPALA-496: Fix escaping of field delimiter and escape character in inserts

2014-01-08 10:52:09 -08:00

text-dollar-hash-pipe.txt

IMPALA-496: Fix escaping of field delimiter and escape character in inserts

2014-01-08 10:52:09 -08:00

text-thorn-ecirc-newline.txt

IMP-1291: Support "extended" ASCII characters as delimiters in text files

2014-03-13 13:00:15 -07:00

widerow.txt

IMPALA-525: Adjust IO buffer size based on read length and other memory fixes

2014-01-08 10:54:01 -08:00

README

bad_parquet_data.parquet:
Generated with parquet-mr 1.2.5
Contains 3 single-column rows:
"parquet"
"is"
"fun"

repeated_values.parquet:
Generated with parquet-mr 1.2.5
Contains 3 single-column rows:
"parquet"
"parquet"
"parquet"

multiple_rowgroups.parquet:
Generated with parquet-mr 1.2.5
Populated with:
hive> set parquet.block.size=500;
hive> INSERT INTO TABLE tbl
      SELECT l_comment FROM tpch.lineitem LIMIT 1000;