Commit Graph

23 Commits

Author SHA1 Message Date
Taras Bobrovytsky
e94de02469 Added execution summary, modified benchmark to handle JSON
- Added execution summary to the beeswax client and QueryResult
- Modified report-benchmark-results to handle JSON and perform
  execution summary comparison between runs
- Added comments to the new workload runner

Change-Id: I9c3c5f2fdc5d8d1e70022c4077334bc44e3a2d1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3598
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fd0b1406be2511c202e02fa63af94fbbe5e18eee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3618
2014-07-25 21:06:00 -07:00
ishaan
3bed0be1df Refactor the performance framework and change its execution strategy.
This patch introduces new abstractions and changes the way queries are run via the
workload runner. A new class 'Workload' is introduced, which represents the notion of a
workload in the performance framework (i.e, A set of query names mapped to query
strings).

The new workflow is:
 - run-workload acts as a driver. It accepts user parmaters for which queries to
   run and their execution strategy. It generates workload objects and passes them to the
   workload-runner.
 - The workload runner takes a workload, its execution parameters and generates a set of
   test vectors over which the workload is run iteratively.
 - A workload is executed by initialiazing a QueryExecutor for each query being run in a
   test vector. The workload executor is then responsible for execution and gathering
   results.
 - The execution details of every query being executed are are stored and returned to the
   driver (run-workload).

Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
2014-07-25 18:17:11 -07:00
Taras Bobrovytsky
8d6f8ff01c run-workload should exit with a non-zero error code if a query fails and abort_on_error is true
The exception raised by a child thread did not reach the main thread, so the
script exited with 0 instead of 1.

Change-Id: I09be9dc824386bf25a64af0323cbf78f6d006b91
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3081
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3214
2014-07-15 14:43:10 -07:00
ishaan
c5c58c6bce The workload runner should abort execution is a query fails in a multi-user run.
Currently, we coalesce the results and do not properly catch a failure if one of the
threads has a failed query and exit_on_error is set to True. This patch ensures that we
exit before the next query is run.

Change-Id: Ie650e0f547874386c79c78982ea9916f33e18cda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2654
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-05-27 20:46:21 -07:00
Alex Behm
f4b809dd11 Re-registering resource brokers with Llama if Llama restarts.
All in-flight queries will be blocked until re-registration succeeds
or until a timeout has been reached.

Change-Id: I9c22c9d3a2deff92b227065974109715a1b18595
2014-01-15 15:12:08 -08:00
Alex Behm
c295b5eda8 [CDH5] Fixed JDBC connectivity to Impala and Hive and related Impala tests. Hive now uses the simple SASL transport because its NOSASL transport is broken (HIVE-4232). Impala still uses the NOSASL transport. The changes also include more careful dependency management.
Change-Id: I16633dcef912dce20c8de8cf2f43c45a49460d20
2014-01-15 15:11:47 -08:00
ishaan
7e520f8f23 Make workload runner logging more concise and readable.
This patch makes the workload runner's logging concise and more informative. Specifically,
it
 - logs the time taken for each iteration of a query.
 - changes the default log level to INFO.
 - The output is less verbose.

Change-Id: I5f964cf76269fd64ce127b9e4c51fe1deafd1d1b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1076
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:54:35 -08:00
ishaan
f6f8d9d19d Fix query_executor to set user specified Impala query options.
This is currently broken (query options do not get set via run-workload). If any
query options are provided to run-workload, it exits with an error. This patch
re-enables setting query options through run-workload and also moves their validation to
impala_beeswax.

Change-Id: I1df010990f9e57ebd4cf59ada5d9646a883df380
Reviewed-on: http://gerrit.ent.cloudera.com:8080/820
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:49 -08:00
ishaan
565d15579c Add the ability to use a workload as the unit of execution in the Impala benchmark runner.
At the moment, a query is the default unit of execution and parallelism in the Impala
performance suite. With this change, we now have the ability to treat a workload as the
unit of execution. A workload is defined as a unique combination of the dataset, scale
factor, a subset (or all) of the queries in the dataset, and a table format (file format,
compression codec and compression scheme).

It introduces two new command line options in bin/run-workload.py:
  * --execution_scope
    The default scope is 'query', and it maintains previous semantics. The
    new scope is 'workload', which toggles the unit of execution to a workload.
  * --shuffle_query_exec_order.
    Shuffles the order in which queries are executed (only applicable when the
    execution_scope if workload), defaults to False.

Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/503
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:07 -08:00
ishaan
5ea0bb6d30 Ignore results for failing clients for stress tests with abort_on_error set to False.
Change-Id: I69710da2672dfaa2d377bbaaf5c334963b11ae6f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/332
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:35 -08:00
Alex Leblang
a0ad16af41 Testing framework changes for VTune plugin
This patch contains changes to the general test and plugin framework that were needed to make the VTune plugin run. These changes create a conext dictionary that is passed to the plugin.

Change-Id: I12ee2076fb0d777813c56bbb338e6d20426afaff
Reviewed-on: http://gerrit.ent.cloudera.com:8080/111
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: Alex Leblang <alex.leblang@cloudera.com>
2014-01-08 10:52:12 -08:00
ishaan
a6cb5f70a4 Introduce the notion of scope to the plugin framework.
Change-Id: I2cf39c38e7e0a359950d9e05e2daed433fc0c38f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/144
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:05 -08:00
Lenni Kuff
3a4dd14654 Update run-workload.py to skip exec_option validation (this is done on the server side) 2014-01-08 10:51:52 -08:00
ishaan
6f9569ea6f Add a plugin framework for run-workload 2014-01-08 10:51:39 -08:00
Lenni Kuff
129c6473b9 Update run-workload to gather, display, and archive runtime profiles for each workload query 2014-01-08 10:48:46 -08:00
Lenni Kuff
4cf7d2634e Update benchmark runner to use mean of all results if num_clients > 1 2014-01-08 10:48:30 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
1a2695781d Add support for targeting JDBC via run-workload and add Impala Jdbc Client tool 2014-01-08 10:47:29 -08:00
ishaan
508c357f77 Make the beeswax module try to import saslwrapper first. 2014-01-08 10:47:23 -08:00
Lenni Kuff
97380676d8 Fixed case sensitivity issue with exec options in the test Impala Beeswax client code
Also added support for executing run-workload in a mode that continues after query errors
2014-01-08 10:46:50 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
ishaan
c5ddba4296 run-workload + keberos 2014-01-08 10:46:29 -08:00
Lenni Kuff
b3fce13b1d Initial Impala failure testing library + modularize run-workload
This adds initial changes for the Impala failure testing library. It also refactors
run workload into its own module to it can be used in other tests.

The failure testing has two main components - the first is an object model on top on top
of Impala services in a cluster. This allows for enumerating the serivces in the cluster
and executing commands on remote machines. This initial cut is built on top of the
CM service to help with starting/stopping services. The long term goal is to let this run
on both a CM cluster and non-CM cluster as well as locally.

The other part of the failure injection change is failure_inctor module that uses the
Impala service abstraction to select and inject failures into random impala services.

This failure testing framework hasn't been completely validated because the product code
is not yet ready, but it is important to get this checked in so all new changes to
run-workload are based off this refactor.

Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5
2014-01-08 10:46:16 -08:00