impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Commit Graph

Author	SHA1	Message	Date
Lenni Kuff	bd2c63b69b	Added scripts for generating and running benchmarks across different data sets and file formats The scripts consist of a few parts * generate_test_vectors.rb - This is a ruby script (I will convert it to python with a later checkin) which reads in a dimension file (in this case benchmark_dimensions.yaml) that describes the different dimensions that we want to explore. Currently the dimensions are: data set, file format, and compression algorithm. This script outputs to a file a list of "test vectors". It explores the input based on a fully exhaustive and a pairwise exploration strategy. The goal is to have a reduced set of test vectors to provide coverage but don't take as long to run as the exhaustive set of vectors. Note that I am checking in the vector outputs so this script only needs to be run if we want to generate a new set of vectors. I also am checking in a vector called benchmark_core.vector which just describes the current benchmark behavior (no compression, text file). This will allow running benchmarks with the coverage that exists today. * testdata/bin/generate_benchmark_statements.rb - This script reads in the vector output and generates the statements required to create the benchmark schema and load the data into the benchmark tables with the proper compression/file format settings. It outputs "sql" files "create-benchmark-.sql" and "load-benchmark-.sql". * updated the load-benchmark-data.sh to take parameters for which set of data to load (pairwise or exhaustive). If no parameters are passed is just loads the "core" data which is what it does now. * Updated run_benchmark.py so that you can specify what type of run you want to do - core, exhaustive, or pairwise. It will read in the corresponding vector file and generate the proper table names for the queries to select the proper data. Also added the functionality to specify a results file to compare against. Overall notes: By default the current behavior and coverage is maintained when running these scripts without any parameters. Everything needed is checked in so the ruby scripts don't need to be run unless we want to add dimensions at a later time.	2012-05-08 16:06:45 -07:00

Author

SHA1

Message

Date

Lenni Kuff

bd2c63b69b

Added scripts for generating and running benchmarks across different data sets and file formats

The scripts consist of a few parts

* generate_test_vectors.rb - This is a ruby script (I will convert it to python with a later checkin) which reads in a dimension file (in this case benchmark_dimensions.yaml) that describes the different dimensions that we want to explore. Currently the dimensions are: data set, file format, and compression algorithm. This script outputs to a file a list of "test vectors". It explores the input based on a fully exhaustive and a pairwise exploration strategy. The goal is to have a reduced set of test vectors to provide coverage but don't take as long to run as the exhaustive set of vectors. Note that I am checking in the vector outputs so this script only needs to be run if we want to generate a new set of vectors. I also am checking in a vector called benchmark_core.vector which just describes the current benchmark behavior (no compression, text file). This will allow running benchmarks with the coverage that exists today.

* testdata/bin/generate_benchmark_statements.rb - This script reads in the vector output and generates the statements required to create the benchmark schema and load the data into the benchmark tables with the proper compression/file format settings. It outputs "sql" files "create-benchmark-*.sql" and "load-benchmark-*.sql".

* updated the load-benchmark-data.sh to take parameters for which set of data to load (pairwise or exhaustive). If no parameters are passed is just loads the "core" data which is what it does now.

* Updated run_benchmark.py so that you can specify what type of run you want to do - core, exhaustive, or pairwise. It will read in the corresponding vector file and generate the proper table names for the queries to select the proper data. Also added the functionality to specify a results file to compare against.

Overall notes: By default the current behavior and coverage is maintained when running these scripts without any parameters. Everything needed is checked in so the ruby scripts don't need to be run unless we want to add dimensions at a later time.

2012-05-08 16:06:45 -07:00

1 Commits