Commit Graph

7 Commits

Author SHA1 Message Date
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
Taras Bobrovytsky
29a7368940 Modified perf_result_datastore to use Impala instead of MySQL
Change-Id: I441a51bc7e03d1bfe2283e77c16cba9394034258
Reviewed-on: http://gerrit.cloudera.org:8080/325
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
2015-04-09 20:25:28 +00:00
Taras Bobrovytsky
fd1a469878 Significant improvements to benchmark report
- Added % change to performance regressions/improvements table
- Automatic extraction of Impala version from runtime profiles
- Execution summary row will not be printed if max time is < 100ms or < 2% of the overall runtime
- Failed queries are ignored
- First result is discarded for each query
- Geometric mean was added to summary
- Improved handling of multiple workloads in a single JSON file
- Improved handling of the case when queries are different in results and reference results
- Works well for single client runs. Additional work is needed to handle multiple client runs well.

Change-Id: Ice7b9cc4fd7502a448d35ace10fbcef183df1769
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4210
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c722f6b0a104df54b550978cd222a9af4d39b929)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5250
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
2014-11-13 18:54:08 -08:00
ishaan
565d15579c Add the ability to use a workload as the unit of execution in the Impala benchmark runner.
At the moment, a query is the default unit of execution and parallelism in the Impala
performance suite. With this change, we now have the ability to treat a workload as the
unit of execution. A workload is defined as a unique combination of the dataset, scale
factor, a subset (or all) of the queries in the dataset, and a table format (file format,
compression codec and compression scheme).

It introduces two new command line options in bin/run-workload.py:
  * --execution_scope
    The default scope is 'query', and it maintains previous semantics. The
    new scope is 'workload', which toggles the unit of execution to a workload.
  * --shuffle_query_exec_order.
    Shuffles the order in which queries are executed (only applicable when the
    execution_scope if workload), defaults to False.

Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/503
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:07 -08:00
Lenni Kuff
dd9798c9f3 IMP-785: calculation_util.calculate_mean does not calculate mean (instead median) 2014-01-08 10:48:35 -08:00
Henry Robinson
e15a39143a Fix definition of calculate_mean 2014-01-08 10:48:34 -08:00
Lenni Kuff
4cf7d2634e Update benchmark runner to use mean of all results if num_clients > 1 2014-01-08 10:48:30 -08:00