impala/testdata/bin at fcdcf1a9d8449b5b7cec5bcd8f7f6b1fa7dd8a88 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Files

History

ishaan fcdcf1a9d8 Parallelize data loaded through Impala to speed up data loading.

Currently, we execute all the queries involved in data loading serially. This change
creates a separate .sql file for each file format, compression codec and compression
scheme combination, and executes all the files in parallel. Additionally, we now store all the
.sql files (independent of workload) in $IMPALA_HOME/data_load_files/<dataset_name>. Note
that only data loaded through Impala is parallelized, data loaded through hive and hbase
remains serial.

On our build machines, the time taken to load all the data from snapshot was on the order
of 15 minutes.

Change-Id: If8a862c43f0e75b506ca05d83eacdc05621cbbf8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/804
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins

2014-01-08 10:53:53 -08:00

..

cache_tables.py

Updated cache_tables to properly load from mini DFS data node directory

2014-01-08 10:44:14 -08:00

compute-table-stats.sh

Treat HBase as a file format for functional tests

2014-01-08 10:52:36 -08:00

copy-udfs-udas.sh

Use hive-exec instead of hive-builtin because hive-builtin does not exist in CDH5 Hive.

2014-01-08 10:53:33 -08:00

create-hbase.sh

Treat HBase as a file format for functional tests

2014-01-08 10:52:36 -08:00

create-load-data.sh

IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables

2014-01-08 10:53:37 -08:00

create-mini.sql

Shell scripts to start, load and kill a mini dfs cluster.

2012-01-30 17:32:12 -08:00

generate-block-ids.sh

Add utility to warm buffer cache with all blocks for tables used in benchmark queries.

2012-06-28 10:35:53 -07:00

generate-schema-statements.py

Parallelize data loaded through Impala to speed up data loading.

2014-01-08 10:53:53 -08:00

generate-test-vectors.py

Treat HBase as a file format for functional tests

2014-01-08 10:52:36 -08:00

kill-all.sh

IMP-773: Add better logging/error detection to start-impala-cluster.py

2014-01-08 10:51:25 -08:00

kill-hbase.sh

HBase now runs on pseudo-distributed mode with 4 region servers

2012-03-08 15:07:12 -08:00

kill-hive-server.sh

Use an external Hive Metastore Service for local test runs

2014-01-08 10:53:15 -08:00

kill-mini-dfs.sh

Build changes for CDH4 upgrade

2012-06-22 16:05:03 -07:00

load-dependent-tables.sql

Fix build failure because of hbase data loading.

2014-01-08 10:52:37 -08:00

load-hive-builtins.sh

Add HdfsLzoTextScanner

2014-01-08 10:46:35 -08:00

load-test-warehouse-snapshot.sh

Treat HBase as a file format for functional tests

2014-01-08 10:52:36 -08:00

lzo_indexer.sh

Add HdfsLzoTextScanner

2014-01-08 10:46:35 -08:00

README-BENCHMARK-TEST-GENERATION

Added scripts for generating and running benchmarks across different data sets and file formats

2012-05-08 16:06:45 -07:00

run-all.sh

Test CR: Change spacing in run-all.sh

2014-01-08 10:53:50 -08:00

run-hbase.sh

Various build improvements: make sure that unpack-dependencies happens before preparing test data, wait for master up before RS start.

2012-03-26 18:30:37 -07:00

run-hive-server.sh

Use Hive Metastore Service instead of HiveServer 1 in test infrastructure

2014-01-08 10:53:26 -08:00

run-hive.sh

- added data generator under testdata/src/main/java/com/cloudera/impala/datagenerator

2011-06-27 16:19:25 -07:00

run-mini-dfs.sh

Move minicluster_xml_conf to HADOOP_CONF_DIR.

2014-01-08 10:53:03 -08:00

split-hbase.sh

Add retry loop around split-hbase to fix build breaks

2014-01-08 10:52:39 -08:00

wait-for-hbase-master.py

Various build improvements: make sure that unpack-dependencies happens before preparing test data, wait for master up before RS start.

2012-03-26 18:30:37 -07:00