mirror of
https://github.com/apache/impala.git
synced 2026-01-06 06:01:03 -05:00
Compressed text formats currently require entire compressed files be read into memory to be decompressed in a single call to the decompression codec. This changes the HdfsTextScanner to drive gzip in a streaming mode, i.e. produce partial output as input is consumed. Change-Id: Id5c0805e18cf6b606bcf27a5df4b5f58895809fd Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5233 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 05c3cc55e7a601d97adc4eebe03f878c68a33e56) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5385
1.9 KiB
1.9 KiB
| 1 | # Generated File. |
|---|---|
| 2 | file_format: text, dataset: tpch, compression_codec: none, compression_type: none |
| 3 | file_format: text, dataset: tpch, compression_codec: gzip, compression_type: block |
| 4 | file_format: text, dataset: tpch, compression_codec: lzo, compression_type: block |
| 5 | file_format: seq, dataset: tpch, compression_codec: none, compression_type: none |
| 6 | file_format: seq, dataset: tpch, compression_codec: def, compression_type: block |
| 7 | file_format: seq, dataset: tpch, compression_codec: def, compression_type: record |
| 8 | file_format: seq, dataset: tpch, compression_codec: gzip, compression_type: block |
| 9 | file_format: seq, dataset: tpch, compression_codec: gzip, compression_type: record |
| 10 | file_format: seq, dataset: tpch, compression_codec: bzip, compression_type: block |
| 11 | file_format: seq, dataset: tpch, compression_codec: bzip, compression_type: record |
| 12 | file_format: seq, dataset: tpch, compression_codec: snap, compression_type: block |
| 13 | file_format: seq, dataset: tpch, compression_codec: snap, compression_type: record |
| 14 | file_format: rc, dataset: tpch, compression_codec: none, compression_type: none |
| 15 | file_format: rc, dataset: tpch, compression_codec: def, compression_type: block |
| 16 | file_format: rc, dataset: tpch, compression_codec: gzip, compression_type: block |
| 17 | file_format: rc, dataset: tpch, compression_codec: bzip, compression_type: block |
| 18 | file_format: rc, dataset: tpch, compression_codec: snap, compression_type: block |
| 19 | file_format: avro, dataset: tpch, compression_codec: none, compression_type: none |
| 20 | file_format: avro, dataset: tpch, compression_codec: def, compression_type: block |
| 21 | file_format: avro, dataset: tpch, compression_codec: snap, compression_type: block |
| 22 | file_format: parquet, dataset: tpch, compression_codec: none, compression_type: none |
| 23 | file_format: parquet, dataset: tpch, compression_codec: def, compression_type: block |
| 24 | file_format: parquet, dataset: tpch, compression_codec: snap, compression_type: block |