Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.
This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.
For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).
This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).
Testing:
- Ran bin/load-data.py -w tpch --table_format=parquet/zstd
and checked the codec in the file with the parquet-reader
utility
Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Reviewed-on: http://gerrit.cloudera.org:8080/17259
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>