mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
Currently, the dataload scripts don't respect non-standard compression codecs when loading Parquet data. It always loads snappy, even when specifying something else like --table_format=parquet/zstd. This fixes the dataload scripts so that they specify the compression_codec query option correctly and thus use the right codec when loading Parquet. For backwards compatibility, this preserves the behavior that parquet/none corresponds to the default compression codec (which is Snappy). This should make it easier to do performance testing on various Parquet codecs (like ZSTD). Testing: - Ran bin/load-data.py -w tpch --table_format=parquet/zstd and checked the codec in the file with the parquet-reader utility Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a Reviewed-on: http://gerrit.cloudera.org:8080/17259 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>