Files
impala/bin/README-RUNNING-BENCHMARKS
Lenni Kuff bd2c63b69b Added scripts for generating and running benchmarks across different data sets and file formats
The scripts consist of a few parts

* generate_test_vectors.rb - This is a ruby script (I will convert it to python with a later checkin) which reads in a dimension file (in this case benchmark_dimensions.yaml) that describes the different dimensions that we want to explore. Currently the dimensions are: data set, file format, and compression algorithm. This script outputs to a file a list of "test vectors". It explores the input based on a fully exhaustive and a pairwise exploration strategy. The goal is to have a reduced set of test vectors to provide coverage but don't take as long to run as the exhaustive set of vectors. Note that I am checking in the vector outputs so this script only needs to be run if we want to generate a new set of vectors. I also am checking in a vector called benchmark_core.vector which just describes the current benchmark behavior (no compression, text file). This will allow running benchmarks with the coverage that exists today.

* testdata/bin/generate_benchmark_statements.rb - This script reads in the vector output and generates the statements required to create the benchmark schema and load the data into the benchmark tables with the proper compression/file format settings. It outputs "sql" files "create-benchmark-*.sql" and "load-benchmark-*.sql".

* updated the load-benchmark-data.sh to take parameters for which set of data to load (pairwise or exhaustive). If no parameters are passed is just loads the "core" data which is what it does now.

* Updated run_benchmark.py so that you can specify what type of run you want to do - core, exhaustive, or pairwise. It will read in the corresponding vector file and generate the proper table names for the queries to select the proper data. Also added the functionality to specify a results file to compare against.

Overall notes: By default the current behavior and coverage is maintained when running these scripts without any parameters. Everything needed is checked in so the ruby scripts don't need to be run unless we want to add dimensions at a later time.
2012-05-08 16:06:45 -07:00

21 lines
1010 B
Plaintext

To run the benchmark tests you first need to load the benchmark data. This can be done
via the 'load-benchmark-data.sh' script. There are currently 3 different classes of
benchmark tests and data sets:
core - All data sets, all use text file format, no compression.
exhaustive - All combinations of data sets, file formats, and compression.
pairwise - This contains a subset of the exhaustive test cases. Providing
good test coverage but taking less time (and machine resources) to run than
the exhaustive tests.
Load the data matching the test scenario you want to run by passing either
'core', 'exhaustive', or 'pairwise' parameter to load-benchmark-data.sh. The
default is to load only 'core'.
Once the data is loading run the benchmark suite using the 'run_benchmark.py'
script. This script takes a parameter '--exploration_strategy' that can be
set to either 'core', 'exhaustive', or 'pairwise' depending on which test
cases should be run. The default is to run the 'core' test cases.