Fixed the UTF-8 UnicodeDecodeError which was thrown while dumping and
loading the json file. Now the script ignores non-decodable characters.
Fixed the ZeroDevisionError coming from t-test when the standard
deviations were 0. "(N/A) Invalid t-test type" is shown for significant
changes and a hint at the end if any invalid t-test was detected.
Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Reviewed-on: http://gerrit.cloudera.org:8080/18215
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This makes the --scale option mandatory when running
./bin/single_node_perf_run.py. If the option is not set, the script
attempts to run the workloads against the database
'[workload-name]None_[file-format]', which is typically not what the
user wants.
Makes some minor documentation improvements to the script.
Testing:
* Confirmed that running without the --scale option set causes the
script to error out with a help message
Change-Id: I9ad13580f8f74388981a37d6960087d95cde574b
Reviewed-on: http://gerrit.cloudera.org:8080/15335
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is useful for benchmarking non-standard configurations,
e.g. with mt_dop enabled.
Testing:
Ran the script, confirmed manually that the arguments took effect.
single_node_perf_run.py <other args> \
--impalad_args=--default_query_options=mt_dop=4 \
--impalad_args=--unlock_mt_dop=true
Change-Id: Ib903f0eabb06a7e8981c874c8fe1cec0936b1a64
Reviewed-on: http://gerrit.cloudera.org:8080/14923
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jim Apple <jbapple@apache.org>
Description:
tpcds-unmodified workload was added as a part of IMPALA-6819.
This change allows tpcds-unmodified workload to be available
for the single node perf run.
Testing:
Ran single node perf run using the following parameters and the
test run was successful
--iterations 2 --scale 2 --table_formats "parquet/none" \
--num_impalads 1 --workload "tpcds-unmodified" \
--load --query_names "TPCDS-Q17.*" --start_minicluster
Change-Id: I511661c586cd55e3240ccbea9c499b9c3fc98440
Reviewed-on: http://gerrit.cloudera.org:8080/10931
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
single_node_perf_run.py uses git_hash_A vs. git_hash_B, distinguish
them by their position in the command-line
arguments. single_node_perf_run.py calls report_benchmark_results.py,
which uses the "reference vs. input", distinguished by their
command-line flags. The output of report_benchmark_results.py uses
"{empty string} vs Base".
In the long run, I think it would be better to fix all three to use
the same terminology, but this comment hopefully adds clarity.
Change-Id: Ib236ce7e83dc193ef1382f6304444ce58759a639
Reviewed-on: http://gerrit.cloudera.org:8080/8470
Tested-by: Impala Public Jenkins
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
In single_node_perf_run.py, restore_workloads() can make the tree
"dirty", and when a tree is dirty, git won't let you switch branches
in a way that clobbers the dirty file contents:
$ cd $(mktemp -d)
$ git init .
Initialized empty Git repository in /tmp/tmp.H0NxzTXLUj/.git/
$ touch foo && git add foo && git commit -a -m "foo"
[master (root-commit) 3776149] foo
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 foo
$ git checkout -b ok_foo && echo "ok" >> foo && git commit -a -m "foo is ok"
Switched to a new branch 'ok_foo'
[ok_foo 9fd5bde] foo is ok
1 file changed, 1 insertion(+)
$ git checkout master && echo "not ok" >> foo
Switched to branch 'master'
$ git checkout ok_foo
error: Your local changes to the following files would be overwritten by checkout:
foo
Please, commit your changes or stash them before you can switch branches.
Aborting
Discovered when testing single_node_perf_run with
https://gerrit.cloudera.org/#/c/7153/; after this commit, that patch
works with single_node_perf_run.py
Change-Id: Id0220f3cd7a26d2627e40cd432c23815a6d65ea4
Reviewed-on: http://gerrit.cloudera.org:8080/7291
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
When git checkout would overwrite changes, it fails and alerts the
user to do something with the changes. This patch removes any changes
to files induced by the workload copy-and-paste.
Testing: using a patch provided by Lars Volker that touched
testdata/workloads/ (https://gerrit.cloudera.org/#/c/7073/), I was
able to reproduce the problem he saw and see that this patch fixed it.
Change-Id: I9a0d004c353eb4b547aeaf3c56289594326653d7
Reviewed-on: http://gerrit.cloudera.org:8080/7145
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
This is a migration from an old and broken script from another
repository. Example use:
bin/single_node_perf_run.py --ninja --workloads targeted-perf \
--load --scale 4 --iterations 20 --num_impalads 3 \
--start_minicluster --query_names PERF_AGG-Q3 \
$(git rev-parse HEAD~1) $(git rev-parse HEAD)
The script can load data, run benchmarks, and compare the statistics
of those runs for significant differences in performance. It glues
together buildall.sh, bin/load-data.py, bin/run-workload.py, and
tests/benchmark/report_benchmark_results.py.
Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Reviewed-on: http://gerrit.cloudera.org:8080/6818
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins