impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 18:02:33 -05:00

Author	SHA1	Message	Date
Nong Li	1f6481382e	Fix parquet test setup.	2014-01-08 10:49:41 -08:00
Lenni Kuff	cba9cd00dd	Fix full data load build break due to constructing incorrect HDFS paths	2014-01-08 10:49:34 -08:00
Lenni Kuff	558d5ce755	Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists	2014-01-08 10:49:28 -08:00
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Skye Wanderman-Milne	811d5dd00b	Create Avro schema directory in test warehouse	2014-01-08 10:48:50 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	12d18631e3	Test enhancements: dynamic table format data loading, per-workload exploration stategies	2014-01-08 10:47:07 -08:00
Lenni Kuff	1e25c98fb4	Test data loading framework improvements This change includes a number of improvements for the test data loading framework: * Named sections for schema template definitions * Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE) * More granular data loading via table name filters * Improved robustness in detecting failed data loads * Table level constraints for specific file formats * Re-written compute stats script	2014-01-08 10:46:49 -08:00
Michael Ubell	37aaf06f79	IMP-390 Get rid of test dependencies on InProcessQE and Runquery	2014-01-08 10:46:18 -08:00
Lenni Kuff	846b5c55be	Disabling running of COMPUTE STATISTICS statements by default during data loading	2014-01-08 10:45:10 -08:00
Lenni Kuff	6e07e0b8d8	Added support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements during data loading Add support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements to the data loading workflow. This allows for capturing simple table stats such as number of rows, number of partitions, and table size in bytes. These are stored into a new mysql database with the same name as the metastore except with a '_Stats' suffix. If using Derby a new database results are stored in a new derby database.	2014-01-08 10:44:34 -08:00
Lenni Kuff	91f51a1b39	Fixed issue with data loading of workloads that have non-word characters in their names Fixed a problem where we were not properly looking up the dataset associated with the given workload if it had non-word characters in its name (a-z & _). Also cut down on the execution time of the hive-benchmark workload under the "core" vector.	2014-01-08 10:44:23 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

15 Commits