impala

mirror of https://github.com/apache/impala.git synced 2026-02-02 15:00:38 -05:00

Author	SHA1	Message	Date
Gergely Fürnstáhl	182617ee87	IMPALA-11113 and IMPALA-11114: fixed single_node_perf_run.py for TPCDS Fixed the UTF-8 UnicodeDecodeError which was thrown while dumping and loading the json file. Now the script ignores non-decodable characters. Fixed the ZeroDevisionError coming from t-test when the standard deviations were 0. "(N/A) Invalid t-test type" is shown for significant changes and a hint at the end if any invalid t-test was detected. Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f Reviewed-on: http://gerrit.cloudera.org:8080/18215 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2022-02-15 15:20:56 +00:00
Sahil Takiar	d3a2d73fda	IMPALA-9439: Make --scale a mandatory option in single_node_perf_run.py This makes the --scale option mandatory when running ./bin/single_node_perf_run.py. If the option is not set, the script attempts to run the workloads against the database '[workload-name]None_[file-format]', which is typically not what the user wants. Makes some minor documentation improvements to the script. Testing: * Confirmed that running without the --scale option set causes the script to error out with a help message Change-Id: I9ad13580f8f74388981a37d6960087d95cde574b Reviewed-on: http://gerrit.cloudera.org:8080/15335 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-02 22:35:31 +00:00
Tim Armstrong	112953c63b	Add --impalad_args to single_node_perf_run.py This is useful for benchmarking non-standard configurations, e.g. with mt_dop enabled. Testing: Ran the script, confirmed manually that the arguments took effect. single_node_perf_run.py <other args> \ --impalad_args=--default_query_options=mt_dop=4 \ --impalad_args=--unlock_mt_dop=true Change-Id: Ib903f0eabb06a7e8981c874c8fe1cec0936b1a64 Reviewed-on: http://gerrit.cloudera.org:8080/14923 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Jim Apple <jbapple@apache.org>	2019-12-22 08:57:32 +00:00
Tim Armstrong	23731ba90c	Fix single_node_perf_run default num_impalads The documentation claims that the default is 1, but it was actually 3. Change-Id: Ia295ce0b0040e02b4fa8faafc0ac749e35b46c19 Reviewed-on: http://gerrit.cloudera.org:8080/14383 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-08 09:26:14 +00:00
Jim Apple	fa672909c8	IMPALA-8062: Call impala-config in single_node_perf_run This wraps most shell calls in single_node_perf_run.py with a bash shell that first sources impala-config.sh, to make sure environment variables are set properly. Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Reviewed-on: http://gerrit.cloudera.org:8080/12277 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-27 03:04:25 +00:00
njanarthanan	dcbff6bbbd	IMPALA-7228: Add tpcds-unmodified to single-node-perf-run Description: tpcds-unmodified workload was added as a part of IMPALA-6819. This change allows tpcds-unmodified workload to be available for the single node perf run. Testing: Ran single node perf run using the following parameters and the test run was successful --iterations 2 --scale 2 --table_formats "parquet/none" \ --num_impalads 1 --workload "tpcds-unmodified" \ --load --query_names "TPCDS-Q17.*" --start_minicluster Change-Id: I511661c586cd55e3240ccbea9c499b9c3fc98440 Reviewed-on: http://gerrit.cloudera.org:8080/10931 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-13 20:29:48 +00:00
Jim Apple	216642e28d	IMPALA-6105: Clarify argument order in single_node_perf_run single_node_perf_run.py uses git_hash_A vs. git_hash_B, distinguish them by their position in the command-line arguments. single_node_perf_run.py calls report_benchmark_results.py, which uses the "reference vs. input", distinguished by their command-line flags. The output of report_benchmark_results.py uses "{empty string} vs Base". In the long run, I think it would be better to fix all three to use the same terminology, but this comment hopefully adds clarity. Change-Id: Ib236ce7e83dc193ef1382f6304444ce58759a639 Reviewed-on: http://gerrit.cloudera.org:8080/8470 Tested-by: Impala Public Jenkins Reviewed-by: Jim Apple <jbapple-impala@apache.org>	2017-11-07 16:16:09 +00:00
Jim Apple	01b5973c40	single_node_perf_run.py: clean up newly-added testdata In single_node_perf_run.py, restore_workloads() can make the tree "dirty", and when a tree is dirty, git won't let you switch branches in a way that clobbers the dirty file contents: $ cd $(mktemp -d) $ git init . Initialized empty Git repository in /tmp/tmp.H0NxzTXLUj/.git/ $ touch foo && git add foo && git commit -a -m "foo" [master (root-commit) 3776149] foo 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 foo $ git checkout -b ok_foo && echo "ok" >> foo && git commit -a -m "foo is ok" Switched to a new branch 'ok_foo' [ok_foo 9fd5bde] foo is ok 1 file changed, 1 insertion(+) $ git checkout master && echo "not ok" >> foo Switched to branch 'master' $ git checkout ok_foo error: Your local changes to the following files would be overwritten by checkout: foo Please, commit your changes or stash them before you can switch branches. Aborting Discovered when testing single_node_perf_run with https://gerrit.cloudera.org/#/c/7153/; after this commit, that patch works with single_node_perf_run.py Change-Id: Id0220f3cd7a26d2627e40cd432c23815a6d65ea4 Reviewed-on: http://gerrit.cloudera.org:8080/7291 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-07-11 00:12:24 +00:00
Jim Apple	de9f5230eb	IMPALA-5482: fix git checkout when workloads are modified When git checkout would overwrite changes, it fails and alerts the user to do something with the changes. This patch removes any changes to files induced by the workload copy-and-paste. Testing: using a patch provided by Lars Volker that touched testdata/workloads/ (https://gerrit.cloudera.org/#/c/7073/), I was able to reproduce the problem he saw and see that this patch fixed it. Change-Id: I9a0d004c353eb4b547aeaf3c56289594326653d7 Reviewed-on: http://gerrit.cloudera.org:8080/7145 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-11 18:20:22 +00:00
Jim Apple	07a7138817	Add a script to test performance on a developer machine This is a migration from an old and broken script from another repository. Example use: bin/single_node_perf_run.py --ninja --workloads targeted-perf \ --load --scale 4 --iterations 20 --num_impalads 3 \ --start_minicluster --query_names PERF_AGG-Q3 \ $(git rev-parse HEAD~1) $(git rev-parse HEAD) The script can load data, run benchmarks, and compare the statistics of those runs for significant differences in performance. It glues together buildall.sh, bin/load-data.py, bin/run-workload.py, and tests/benchmark/report_benchmark_results.py. Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 Reviewed-on: http://gerrit.cloudera.org:8080/6818 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-05-31 08:10:48 +00:00

10 Commits