impala/testdata at 267f4d67f4f9c8b10af539f8f2e0a2abfa4bafd5 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 03:01:44 -05:00

Files

History

Joe McDonnell d29fab1ad9 IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Reviewed-on: http://gerrit.cloudera.org:8080/17259
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>

2021-04-08 20:46:37 +00:00

..

…

AllTypesErrorNoNulls

…

IMPALA-10496: SAML implementation in Impala

2021-02-17 22:52:05 +00:00

…

avro_schema_resolution

IMPALA-9068: Use different directories for external vs managed warehouse

2020-01-24 17:29:15 +00:00

IMPALA-8198: DATE: Read from avro.

2019-09-27 17:18:35 +00:00

bad_parquet_data

IMPALA-3745: parquet invalid data handling

2016-06-15 21:33:39 -07:00

…

…

IMPALA-10629: Fix parquet compression codecs for data load scripts

2021-04-08 20:46:37 +00:00

IMPALA-7712: Support Google Cloud Storage

2021-03-13 11:20:08 +00:00

Convert dataload hdfs copy commands to LOAD DATA statements

2020-02-24 21:22:18 +00:00

ComplexTypesTbl

IMPALA-6503: Support reading complex types from ORC

2019-03-08 04:39:08 +00:00

compressed_formats

IMPALA-1619: Support 64-bit allocations.

2016-07-08 15:42:09 -07:00

CustomerMultiBlock

IMPALA-4993: extend dictionary filtering to collections

2018-01-19 20:37:25 +00:00

IMPALA-10467: Implement ds_theta_union() function

2021-02-19 13:32:09 +00:00

IMPALA-10547: Restore TPC-DS "reason" table missing from Kudu schema

2021-02-26 23:44:32 +00:00

…

…

impala-profiles

IMPALA-9382: part 3/3 clean up runtime profile v2 text output

2021-02-11 23:34:47 +00:00

ImpalaDemoDataset

…

…

…

LineItemMultiBlock

IMPALA-5717: Support for reading ORC data files

2018-04-11 05:13:02 +00:00

IMPALA-10196: Remove LlvmCodeGen::CastPtrToLlvmPtr

2020-09-29 20:55:36 +00:00

max_nesting_depth

…

multi_compression_parquet_data

IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-10 01:30:33 +00:00

IMPALA-8095: Detailed expression cardinality tests

2019-02-09 02:56:52 +00:00

…

parquet_nested_types_encodings

IMPALA-4725: Query option to control Parquet array resolution.

2017-03-09 05:07:44 +00:00

parquet_schema_resolution

IMPALA-3786: Replace "cloudera" with "apache" (part 2)

2016-09-29 21:14:13 +00:00

TblWithRaggedColumns

…

…

…

tinytable_seq_snap

…

IMPALA-8043: Fix BE test failures related to SystemV timezones.

2019-01-15 17:04:55 +00:00

IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet

2018-11-14 20:16:14 +00:00

IMPALA-4171: Remove JAR from repo.

2016-09-22 02:00:50 +00:00

UnsupportedTypes

IMPALA-3812: Fix error message for unsupported types

2016-11-17 05:31:34 +00:00

IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-04-02 21:50:17 +00:00

__init__.py

…

.gitignore

Update .gitignore

2018-10-26 22:19:35 +00:00