mirror of
https://github.com/apache/impala.git
synced 2026-01-22 00:01:21 -05:00
This patch adds support to Impala for scanning .DEFLATE files of tables stored as text. To avoid confusion, it should be noted that although these files have a compression type of DEFLATE in Impala, they should be treated as if their compression type is DEFAULT. Hadoop tools such as Hive and MapReduce support reading and writing text files compressed using the deflate algorithm, which is the default compression type. Hadoop uses the zlib library (an implementation of the DEFLATE algorithm) to compress text files into .DEFLATE files, which are not in the raw deflate format but rather the zlib format (the zlib library supports three flavors of deflate, and Hadoop uses the flavor that compresses data into deflate with zlib wrappings rather than just raw deflate) Testing: There is a pre-existing unit test that validates compressing and decompressing data with compression type DEFLATE. Also, modified existing end-to-end testing that simulates querying files of various formats and compression types. All core and exhaustive tests pass. Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a Reviewed-on: http://gerrit.cloudera.org:8080/13857 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2.3 KiB
2.3 KiB
| 1 | # Generated File. |
|---|---|
| 2 | file_format: text, dataset: functional, compression_codec: none, compression_type: none |
| 3 | file_format: text, dataset: functional, compression_codec: def, compression_type: block |
| 4 | file_format: text, dataset: functional, compression_codec: gzip, compression_type: block |
| 5 | file_format: text, dataset: functional, compression_codec: bzip, compression_type: block |
| 6 | file_format: text, dataset: functional, compression_codec: snap, compression_type: block |
| 7 | file_format: text, dataset: functional, compression_codec: lzo, compression_type: block |
| 8 | file_format: seq, dataset: functional, compression_codec: none, compression_type: none |
| 9 | file_format: seq, dataset: functional, compression_codec: def, compression_type: block |
| 10 | file_format: seq, dataset: functional, compression_codec: def, compression_type: record |
| 11 | file_format: seq, dataset: functional, compression_codec: gzip, compression_type: block |
| 12 | file_format: seq, dataset: functional, compression_codec: gzip, compression_type: record |
| 13 | file_format: seq, dataset: functional, compression_codec: bzip, compression_type: block |
| 14 | file_format: seq, dataset: functional, compression_codec: bzip, compression_type: record |
| 15 | file_format: seq, dataset: functional, compression_codec: snap, compression_type: block |
| 16 | file_format: seq, dataset: functional, compression_codec: snap, compression_type: record |
| 17 | file_format: rc, dataset: functional, compression_codec: none, compression_type: none |
| 18 | file_format: rc, dataset: functional, compression_codec: def, compression_type: block |
| 19 | file_format: rc, dataset: functional, compression_codec: gzip, compression_type: block |
| 20 | file_format: rc, dataset: functional, compression_codec: bzip, compression_type: block |
| 21 | file_format: rc, dataset: functional, compression_codec: snap, compression_type: block |
| 22 | file_format: avro, dataset: functional, compression_codec: none, compression_type: none |
| 23 | file_format: avro, dataset: functional, compression_codec: def, compression_type: block |
| 24 | file_format: avro, dataset: functional, compression_codec: snap, compression_type: block |
| 25 | file_format: parquet, dataset: functional, compression_codec: none, compression_type: none |
| 26 | file_format: orc, dataset: functional, compression_codec: def, compression_type: block |
| 27 | file_format: hbase, dataset: functional, compression_codec: none, compression_type: none |
| 28 | file_format: kudu, dataset: functional, compression_codec: none, compression_type: none |