This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.
As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).
You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.
These new tests can be run using the script at tests/run-tests.sh
This adds initial changes for the Impala failure testing library. It also refactors
run workload into its own module to it can be used in other tests.
The failure testing has two main components - the first is an object model on top on top
of Impala services in a cluster. This allows for enumerating the serivces in the cluster
and executing commands on remote machines. This initial cut is built on top of the
CM service to help with starting/stopping services. The long term goal is to let this run
on both a CM cluster and non-CM cluster as well as locally.
The other part of the failure injection change is failure_inctor module that uses the
Impala service abstraction to select and inject failures into random impala services.
This failure testing framework hasn't been completely validated because the product code
is not yet ready, but it is important to get this checked in so all new changes to
run-workload are based off this refactor.
Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5
Queries now return rows on both our small (query test) data set as well as the 10TB
data set. This change also fixes a problem with python not being set properly and
adds support for reporting query results using the geometric mean
Change-Id: Ia432148d96645ecda3f63900b3bfbd29c706d886
This changes cleans up run-workload to push more query execution logic into query_executor.
It also adds a new feature to run-workload to support filtering of the file format / compression
to run on.
This change updates run-workload to provide a more generic interface for query
execution. Now the query executor just takes an execution function and a new
QueryExecOptions object that defines the values to use for execution.
I also made a change to store partial result sets so we can salvage some work if
a run fails.
Now we save Hive results into a separate file (previously everything was stored
in the same file. Also added ability to do a run-benchmark and specify to skip
impala and which will help generate hive reference results.
Updated the reporting script to reflect this change.
This improves the summary reporting for perf results, fixes a problem with how the short query names were being
stored, and also adds support for running multiple workloads of different scale factors.
This change add a -num_clients flag that specifies the number of clients
(threads) to use when executing each query in a workload. This is used to
validate Impala concurrency/stress. The logging was getting messed up with
multiple threads so I also updated this to use the logger module.
Currently we only capture and save the results of the first thread that
executes. In the future we might want to update this to capture results from all
the threads.