mirror of
https://github.com/apache/impala.git
synced 2026-01-03 06:00:52 -05:00
Compressed text formats currently require entire compressed files be read into memory to be decompressed in a single call to the decompression codec. This changes the HdfsTextScanner to drive gzip in a streaming mode, i.e. produce partial output as input is consumed. Change-Id: Id5c0805e18cf6b606bcf27a5df4b5f58895809fd Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5233 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 05c3cc55e7a601d97adc4eebe03f878c68a33e56) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5385
658 B
658 B
| 1 | # Manually created file. |
|---|---|
| 2 | file_format:text, dataset:tpch, compression_codec:none, compression_type:none |
| 3 | file_format:text, dataset:tpch, compression_codec:gzip, compression_type:block |
| 4 | file_format:seq, dataset:tpch, compression_codec:gzip, compression_type:block |
| 5 | file_format:seq, dataset:tpch, compression_codec:snap, compression_type:block |
| 6 | file_format:rc, dataset:tpch, compression_codec:none, compression_type:none |
| 7 | file_format:avro, dataset:tpch, compression_codec: none, compression_type: none |
| 8 | file_format:avro, dataset:tpch, compression_codec: snap, compression_type: block |
| 9 | file_format:parquet, dataset:tpch, compression_codec: none, compression_type: none |