Files
impala/testdata/workloads/functional-query/functional-query_exhaustive.csv
Ethan Xue 6d68c4f6c0 IMPALA-8549: Add support for scanning DEFLATE text files
This patch adds support to Impala for scanning .DEFLATE files of
tables stored as text. To avoid confusion, it should be noted that
although these files have a compression type of DEFLATE in Impala,
they should be treated as if their compression type is DEFAULT.

Hadoop tools such as Hive and MapReduce support reading and writing
text files compressed using the deflate algorithm, which is the default
compression type. Hadoop uses the zlib library (an implementation of
the DEFLATE algorithm) to compress text files into .DEFLATE files,
which are not in the raw deflate format but rather the zlib format
(the zlib library supports three flavors of deflate, and Hadoop uses
the flavor that compresses data into deflate with zlib wrappings rather
than just raw deflate)

Testing:
There is a pre-existing unit test that validates compressing and
decompressing data with compression type DEFLATE. Also, modified
existing end-to-end testing that simulates querying files of various
formats and compression types. All core and exhaustive tests pass.

Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a
Reviewed-on: http://gerrit.cloudera.org:8080/13857
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-08-08 01:05:02 +00:00

2.3 KiB

1# Generated File.
2file_format: text, dataset: functional, compression_codec: none, compression_type: none
3file_format: text, dataset: functional, compression_codec: def, compression_type: block
4file_format: text, dataset: functional, compression_codec: gzip, compression_type: block
5file_format: text, dataset: functional, compression_codec: bzip, compression_type: block
6file_format: text, dataset: functional, compression_codec: snap, compression_type: block
7file_format: text, dataset: functional, compression_codec: lzo, compression_type: block
8file_format: seq, dataset: functional, compression_codec: none, compression_type: none
9file_format: seq, dataset: functional, compression_codec: def, compression_type: block
10file_format: seq, dataset: functional, compression_codec: def, compression_type: record
11file_format: seq, dataset: functional, compression_codec: gzip, compression_type: block
12file_format: seq, dataset: functional, compression_codec: gzip, compression_type: record
13file_format: seq, dataset: functional, compression_codec: bzip, compression_type: block
14file_format: seq, dataset: functional, compression_codec: bzip, compression_type: record
15file_format: seq, dataset: functional, compression_codec: snap, compression_type: block
16file_format: seq, dataset: functional, compression_codec: snap, compression_type: record
17file_format: rc, dataset: functional, compression_codec: none, compression_type: none
18file_format: rc, dataset: functional, compression_codec: def, compression_type: block
19file_format: rc, dataset: functional, compression_codec: gzip, compression_type: block
20file_format: rc, dataset: functional, compression_codec: bzip, compression_type: block
21file_format: rc, dataset: functional, compression_codec: snap, compression_type: block
22file_format: avro, dataset: functional, compression_codec: none, compression_type: none
23file_format: avro, dataset: functional, compression_codec: def, compression_type: block
24file_format: avro, dataset: functional, compression_codec: snap, compression_type: block
25file_format: parquet, dataset: functional, compression_codec: none, compression_type: none
26file_format: orc, dataset: functional, compression_codec: def, compression_type: block
27file_format: hbase, dataset: functional, compression_codec: none, compression_type: none
28file_format: kudu, dataset: functional, compression_codec: none, compression_type: none