impala/testdata/bin at 267f4d67f4f9c8b10af539f8f2e0a2abfa4bafd5 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 03:01:44 -05:00

Files

History

Joe McDonnell d29fab1ad9 IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Reviewed-on: http://gerrit.cloudera.org:8080/17259
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>

2021-04-08 20:46:37 +00:00

..

check-hbase-nodes.py

IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-22 01:18:56 +00:00

check-schema-diff.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

compute-table-stats.sh

IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-10 07:15:36 +00:00

copy-data-sources.sh

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

copy-udfs-udas.sh

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

create-data-source-table.sql

IMPALA-7368: Add initial support for DATE type

2019-04-23 13:33:57 +00:00

create-hbase.sh

IMPALA-3918: Remove Cloudera copyrights and add ASF license header

2016-08-09 08:19:41 +00:00

create-load-data.sh

IMPALA-9331: Add symptom for dataload failing on schema mismatch

2021-04-01 15:09:10 +00:00

create-mini.sql

IMPALA-4110: Clean up issues found by Apache RAT.

2016-09-14 22:09:24 +00:00

create-table-many-blocks.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

create-tpcds-testcase-files.sh

IMPALA-7290: part 1: clean up shell tests

2019-04-30 11:30:45 +00:00

generate-block-ids.sh

Expose $IMPALA_MAVEN_OPTIONS for configuring Maven.

2017-11-14 01:29:56 +00:00

generate-load-nested.sh

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

generate-schema-statements.py

IMPALA-10629: Fix parquet compression codecs for data load scripts

2021-04-08 20:46:37 +00:00

generate-test-vectors.py

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

kill-all.sh

IMPALA-8099: Update the build scripts to support Apache Ranger

2019-02-15 21:28:05 +00:00

kill-hbase.sh

IMPALA-9165: Add back hard kill to kill-hbase.sh

2019-11-23 02:42:25 +00:00

kill-hive-server.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

kill-java-service.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

kill-mini-dfs.sh

IMPALA-3918: Remove Cloudera copyrights and add ASF license header

2016-08-09 08:19:41 +00:00

kill-ranger-server.sh

IMPALA-8099: Update the build scripts to support Apache Ranger

2019-02-15 21:28:05 +00:00

kill-sentry-service.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

load_nested.py

IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 23:42:12 +00:00

load-dependent-tables-hive2.sql

IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading

2019-05-15 11:00:45 +00:00

load-dependent-tables.sql

IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading

2019-05-15 11:00:45 +00:00

load-metastore-snapshot.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

load-test-warehouse-snapshot.sh

IMPALA-7712: Support Google Cloud Storage

2021-03-13 11:20:08 +00:00

load-tpc-kudu.py

IMPALA-3739: Enable stress tests on Kudu

2016-10-21 11:01:37 +00:00

minikdc_env.sh

IMPALA-9361: manually configured kerberized minicluster

2020-02-08 05:16:12 +00:00

random_avro_schema.py

IMPALA-3786: Replace "cloudera" with "apache" (part 2)

2016-09-29 21:14:13 +00:00

README-BENCHMARK-TEST-GENERATION

Added scripts for generating and running benchmarks across different data sets and file formats

2012-05-08 16:06:45 -07:00

run-all.sh

IMPALA-7712: Support Google Cloud Storage

2021-03-13 11:20:08 +00:00

run-hbase.sh

IMPALA-9361: manually configured kerberized minicluster

2020-02-08 05:16:12 +00:00

run-hive-server.sh

IMPALA-10522: Support external use of frontend libraries

2021-03-12 17:49:08 +00:00

run-mini-dfs.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

run-ranger-server.sh

IMPALA-8815: fix ranger startup after set-classpath.sh

2019-10-31 05:01:17 +00:00

run-step.sh

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-02 00:40:19 +00:00

setup-hdfs-env.sh

IMPALA-9055: Impala shouldn't set expiration to NEVER for cache directives.

2019-10-18 08:15:00 +00:00

wait-for-hiveserver2.py

Increase wait times for startup of Hive and its Metastore

2016-11-15 20:35:01 +00:00

wait-for-metastore.py

Increase wait times for startup of Hive and its Metastore

2016-11-15 20:35:01 +00:00