impala

mirror of https://github.com/apache/impala.git synced 2025-12-31 15:00:10 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	e94de02469	Added execution summary, modified benchmark to handle JSON - Added execution summary to the beeswax client and QueryResult - Modified report-benchmark-results to handle JSON and perform execution summary comparison between runs - Added comments to the new workload runner Change-Id: I9c3c5f2fdc5d8d1e70022c4077334bc44e3a2d1d Reviewed-on: http://gerrit.ent.cloudera.com:8080/3598 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins (cherry picked from commit fd0b1406be2511c202e02fa63af94fbbe5e18eee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3618	2014-07-25 21:06:00 -07:00
ishaan	3bed0be1df	Refactor the performance framework and change its execution strategy. This patch introduces new abstractions and changes the way queries are run via the workload runner. A new class 'Workload' is introduced, which represents the notion of a workload in the performance framework (i.e, A set of query names mapped to query strings). The new workflow is: - run-workload acts as a driver. It accepts user parmaters for which queries to run and their execution strategy. It generates workload objects and passes them to the workload-runner. - The workload runner takes a workload, its execution parameters and generates a set of test vectors over which the workload is run iteratively. - A workload is executed by initialiazing a QueryExecutor for each query being run in a test vector. The workload executor is then responsible for execution and gathering results. - The execution details of every query being executed are are stored and returned to the driver (run-workload). Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins	2014-07-25 18:17:11 -07:00
ishaan	c5c58c6bce	The workload runner should abort execution is a query fails in a multi-user run. Currently, we coalesce the results and do not properly catch a failure if one of the threads has a failed query and exit_on_error is set to True. This patch ensures that we exit before the next query is run. Change-Id: Ie650e0f547874386c79c78982ea9916f33e18cda Reviewed-on: http://gerrit.ent.cloudera.com:8080/2654 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-05-27 20:46:21 -07:00
Nong Li	b310935424	Minor workload runner logging improvements. Change-Id: I75d27593599e654f7fab1cd104dd9fe9fa88cfdb Reviewed-on: http://gerrit.ent.cloudera.com:8080/1145 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Conflicts: tests/common/workload_runner.py	2014-01-08 10:54:38 -08:00
ishaan	7e520f8f23	Make workload runner logging more concise and readable. This patch makes the workload runner's logging concise and more informative. Specifically, it - logs the time taken for each iteration of a query. - changes the default log level to INFO. - The output is less verbose. Change-Id: I5f964cf76269fd64ce127b9e4c51fe1deafd1d1b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1076 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:54:35 -08:00
Lenni Kuff	927c486e0c	Imply EOL match character for workload runner query name regex matches This adds the EOL match character '$' to the end of all query names regex string to make the matching behaviour a bit more user friendly. This way if the user inputs "TPCH-Q1" it will not match TPCH-Q11/Q12/etc which is probably what they want. The user can still do a wildcard match using "TPCH-Q1." or "TPCH-Q1.$" Change-Id: Icfb6a111aa464353387e9b631168c44127a7896f Reviewed-on: http://gerrit.ent.cloudera.com:8080/784 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:46 -08:00
Lenni Kuff	8218c19528	Match query names in run-workload using regex This change allows for matching query names in run-workload using a regex strings. For example, the user can now pass run-workload a query name string like: --query_names=tpcds-q.,^tpch. Change-Id: I5b13858ec32cf10769a4c4f2afc49adfeb98ec93 Reviewed-on: http://gerrit.ent.cloudera.com:8080/777 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:46 -08:00
ishaan	0cb16863ee	run-workload should log a warning to console and not fail if abort_on_query_error is False and the query fails. This change also disables printing the runtime_profile to the console. Change-Id: Ic7bc3406d6eddb67a514ecfb4a27add8c40a8604 Reviewed-on: http://gerrit.ent.cloudera.com:8080/687 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:25 -08:00
ishaan	ee42aa8d36	Fix incorrect argument in the Impala test suite call to execute_using_jdbc execute_using_jdbc used to expect a query string. Its interface was recently changed to accept a query object. Additionally, change the interface of the Query() class to enable it to accept raw (qualified) query strings. Change-Id: I44693cd2cccf1041cab32a9821fb76b12d148375 Reviewed-on: http://gerrit.ent.cloudera.com:8080/577 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:09 -08:00
ishaan	565d15579c	Add the ability to use a workload as the unit of execution in the Impala benchmark runner. At the moment, a query is the default unit of execution and parallelism in the Impala performance suite. With this change, we now have the ability to treat a workload as the unit of execution. A workload is defined as a unique combination of the dataset, scale factor, a subset (or all) of the queries in the dataset, and a table format (file format, compression codec and compression scheme). It introduces two new command line options in bin/run-workload.py: * --execution_scope The default scope is 'query', and it maintains previous semantics. The new scope is 'workload', which toggles the unit of execution to a workload. * --shuffle_query_exec_order. Shuffles the order in which queries are executed (only applicable when the execution_scope if workload), defaults to False. Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/503 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:07 -08:00
ishaan	5ea0bb6d30	Ignore results for failing clients for stress tests with abort_on_error set to False. Change-Id: I69710da2672dfaa2d377bbaaf5c334963b11ae6f Reviewed-on: http://gerrit.ent.cloudera.com:8080/332 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:35 -08:00
Alex Leblang	a0ad16af41	Testing framework changes for VTune plugin This patch contains changes to the general test and plugin framework that were needed to make the VTune plugin run. These changes create a conext dictionary that is passed to the plugin. Change-Id: I12ee2076fb0d777813c56bbb338e6d20426afaff Reviewed-on: http://gerrit.ent.cloudera.com:8080/111 Reviewed-by: Alex Leblang <alex.leblang@cloudera.com> Tested-by: Alex Leblang <alex.leblang@cloudera.com>	2014-01-08 10:52:12 -08:00
ishaan	686c81067a	Round Robin impalad selection with multiple clients and multiple impalads.	2014-01-08 10:51:55 -08:00
ishaan	6f9569ea6f	Add a plugin framework for run-workload	2014-01-08 10:51:39 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Lenni Kuff	9a4feb7391	Add impala local failure test framework and tests	2014-01-08 10:50:18 -08:00
Lenni Kuff	2f8ba3c08e	Disable running mini-impala-cluster tests in run-all-tests.sh due to IMPALA-155	2014-01-08 10:48:45 -08:00
Lenni Kuff	90d7e085fa	Update tests to use num_nodes=0, use external impala cluster, add sanity check run mode	2014-01-08 10:48:38 -08:00
Lenni Kuff	dd9798c9f3	IMP-785: calculation_util.calculate_mean does not calculate mean (instead median)	2014-01-08 10:48:35 -08:00
Lenni Kuff	4cf7d2634e	Update benchmark runner to use mean of all results if num_clients > 1	2014-01-08 10:48:30 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	1a2695781d	Add support for targeting JDBC via run-workload and add Impala Jdbc Client tool	2014-01-08 10:47:29 -08:00
Lenni Kuff	d5177c3c30	Update run-workload to support specifying table format test vectors from command line	2014-01-08 10:47:20 -08:00
Lenni Kuff	97380676d8	Fixed case sensitivity issue with exec options in the test Impala Beeswax client code Also added support for executing run-workload in a mode that continues after query errors	2014-01-08 10:46:50 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
ishaan	c5ddba4296	run-workload + keberos	2014-01-08 10:46:29 -08:00
Lenni Kuff	5e91fc8ff8	Fix $TABLE table suffix replacement for workloads that don't have tables in a database	2014-01-08 10:46:17 -08:00
Lenni Kuff	b3fce13b1d	Initial Impala failure testing library + modularize run-workload This adds initial changes for the Impala failure testing library. It also refactors run workload into its own module to it can be used in other tests. The failure testing has two main components - the first is an object model on top on top of Impala services in a cluster. This allows for enumerating the serivces in the cluster and executing commands on remote machines. This initial cut is built on top of the CM service to help with starting/stopping services. The long term goal is to let this run on both a CM cluster and non-CM cluster as well as locally. The other part of the failure injection change is failure_inctor module that uses the Impala service abstraction to select and inject failures into random impala services. This failure testing framework hasn't been completely validated because the product code is not yet ready, but it is important to get this checked in so all new changes to run-workload are based off this refactor. Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5	2014-01-08 10:46:16 -08:00

28 Commits