mirror of
https://github.com/apache/impala.git
synced 2025-12-31 15:00:10 -05:00
The scripts consist of a few parts * generate_test_vectors.rb - This is a ruby script (I will convert it to python with a later checkin) which reads in a dimension file (in this case benchmark_dimensions.yaml) that describes the different dimensions that we want to explore. Currently the dimensions are: data set, file format, and compression algorithm. This script outputs to a file a list of "test vectors". It explores the input based on a fully exhaustive and a pairwise exploration strategy. The goal is to have a reduced set of test vectors to provide coverage but don't take as long to run as the exhaustive set of vectors. Note that I am checking in the vector outputs so this script only needs to be run if we want to generate a new set of vectors. I also am checking in a vector called benchmark_core.vector which just describes the current benchmark behavior (no compression, text file). This will allow running benchmarks with the coverage that exists today. * testdata/bin/generate_benchmark_statements.rb - This script reads in the vector output and generates the statements required to create the benchmark schema and load the data into the benchmark tables with the proper compression/file format settings. It outputs "sql" files "create-benchmark-*.sql" and "load-benchmark-*.sql". * updated the load-benchmark-data.sh to take parameters for which set of data to load (pairwise or exhaustive). If no parameters are passed is just loads the "core" data which is what it does now. * Updated run_benchmark.py so that you can specify what type of run you want to do - core, exhaustive, or pairwise. It will read in the corresponding vector file and generate the proper table names for the queries to select the proper data. Also added the functionality to specify a results file to compare against. Overall notes: By default the current behavior and coverage is maintained when running these scripts without any parameters. Everything needed is checked in so the ruby scripts don't need to be run unless we want to add dimensions at a later time.
21 lines
1010 B
Plaintext
21 lines
1010 B
Plaintext
To run the benchmark tests you first need to load the benchmark data. This can be done
|
|
via the 'load-benchmark-data.sh' script. There are currently 3 different classes of
|
|
benchmark tests and data sets:
|
|
|
|
core - All data sets, all use text file format, no compression.
|
|
|
|
exhaustive - All combinations of data sets, file formats, and compression.
|
|
|
|
pairwise - This contains a subset of the exhaustive test cases. Providing
|
|
good test coverage but taking less time (and machine resources) to run than
|
|
the exhaustive tests.
|
|
|
|
Load the data matching the test scenario you want to run by passing either
|
|
'core', 'exhaustive', or 'pairwise' parameter to load-benchmark-data.sh. The
|
|
default is to load only 'core'.
|
|
|
|
Once the data is loading run the benchmark suite using the 'run_benchmark.py'
|
|
script. This script takes a parameter '--exploration_strategy' that can be
|
|
set to either 'core', 'exhaustive', or 'pairwise' depending on which test
|
|
cases should be run. The default is to run the 'core' test cases.
|