Commit Graph

15 Commits

Author SHA1 Message Date
Nong Li
1f6481382e Fix parquet test setup. 2014-01-08 10:49:41 -08:00
Lenni Kuff
cba9cd00dd Fix full data load build break due to constructing incorrect HDFS paths 2014-01-08 10:49:34 -08:00
Lenni Kuff
558d5ce755 Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists 2014-01-08 10:49:28 -08:00
Lenni Kuff
831ee529be Fixed data loading bugs, moved most tables out of load-dependent-tables 2014-01-08 10:48:56 -08:00
Skye Wanderman-Milne
811d5dd00b Create Avro schema directory in test warehouse 2014-01-08 10:48:50 -08:00
Nong Li
0df9476be1 Parquet data loading. 2014-01-08 10:48:48 -08:00
Skye Wanderman-Milne
461a48df2b Refactor testing framework to generate Avro tables. 2014-01-08 10:48:45 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
12d18631e3 Test enhancements: dynamic table format data loading, per-workload exploration stategies 2014-01-08 10:47:07 -08:00
Lenni Kuff
1e25c98fb4 Test data loading framework improvements
This change includes a number of improvements for the test data loading framework:
* Named sections for schema template definitions
* Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE)
* More granular data loading via table name filters
* Improved robustness in detecting failed data loads
* Table level constraints for specific file formats
* Re-written compute stats script
2014-01-08 10:46:49 -08:00
Michael Ubell
37aaf06f79 IMP-390 Get rid of test dependencies on InProcessQE and Runquery 2014-01-08 10:46:18 -08:00
Lenni Kuff
846b5c55be Disabling running of COMPUTE STATISTICS statements by default during data loading 2014-01-08 10:45:10 -08:00
Lenni Kuff
6e07e0b8d8 Added support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements during data loading
Add support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements to the data loading
workflow. This allows for capturing simple table stats such as number of rows, number of
partitions, and table size in bytes. These are stored into a new mysql database with the same
name as the metastore except with a '_Stats' suffix. If using Derby a new database results are
stored in a new derby database.
2014-01-08 10:44:34 -08:00
Lenni Kuff
91f51a1b39 Fixed issue with data loading of workloads that have non-word characters in their names
Fixed a problem where we were not properly looking up the dataset associated
with the given workload if it had non-word characters in its name (a-z & _). Also cut down
on the execution time of the hive-benchmark workload under the "core" vector.
2014-01-08 10:44:23 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00