Files
impala/testdata/workloads/tpch/tpch_core.csv
Matthew Jacobs 25428fdb21 Add support for streaming decompression of gzip text
Compressed text formats currently require entire compressed
files be read into memory to be decompressed in a single call
to the decompression codec. This changes the HdfsTextScanner
to drive gzip in a streaming mode, i.e. produce partial output
as input is consumed.

Change-Id: Id5c0805e18cf6b606bcf27a5df4b5f58895809fd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5233
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 05c3cc55e7a601d97adc4eebe03f878c68a33e56)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5385
2014-11-23 01:55:55 -08:00

658 B

1# Manually created file.
2file_format:text, dataset:tpch, compression_codec:none, compression_type:none
3file_format:text, dataset:tpch, compression_codec:gzip, compression_type:block
4file_format:seq, dataset:tpch, compression_codec:gzip, compression_type:block
5file_format:seq, dataset:tpch, compression_codec:snap, compression_type:block
6file_format:rc, dataset:tpch, compression_codec:none, compression_type:none
7file_format:avro, dataset:tpch, compression_codec: none, compression_type: none
8file_format:avro, dataset:tpch, compression_codec: snap, compression_type: block
9file_format:parquet, dataset:tpch, compression_codec: none, compression_type: none