Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
- Added % change to performance regressions/improvements table
- Automatic extraction of Impala version from runtime profiles
- Execution summary row will not be printed if max time is < 100ms or < 2% of the overall runtime
- Failed queries are ignored
- First result is discarded for each query
- Geometric mean was added to summary
- Improved handling of multiple workloads in a single JSON file
- Improved handling of the case when queries are different in results and reference results
- Works well for single client runs. Additional work is needed to handle multiple client runs well.
Change-Id: Ice7b9cc4fd7502a448d35ace10fbcef183df1769
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4210
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c722f6b0a104df54b550978cd222a9af4d39b929)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5250
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
At the moment, a query is the default unit of execution and parallelism in the Impala
performance suite. With this change, we now have the ability to treat a workload as the
unit of execution. A workload is defined as a unique combination of the dataset, scale
factor, a subset (or all) of the queries in the dataset, and a table format (file format,
compression codec and compression scheme).
It introduces two new command line options in bin/run-workload.py:
* --execution_scope
The default scope is 'query', and it maintains previous semantics. The
new scope is 'workload', which toggles the unit of execution to a workload.
* --shuffle_query_exec_order.
Shuffles the order in which queries are executed (only applicable when the
execution_scope if workload), defaults to False.
Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/503
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>