impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 15:01:43 -05:00

Author	SHA1	Message	Date
Lenni Kuff	b877240ffd	Increase the 'field_size_limit' for csv reader/writer used in benchmark result processing	2014-01-08 10:49:04 -08:00
Lenni Kuff	129c6473b9	Update run-workload to gather, display, and archive runtime profiles for each workload query	2014-01-08 10:48:46 -08:00
Lenni Kuff	d3b9de2222	Add support for detecting significant performance changes (regression and improvements)	2014-01-08 10:48:19 -08:00
Lenni Kuff	4d876e01d0	Add support for storing details on number of concurrent clients in backend perf db	2014-01-08 10:47:55 -08:00
Lenni Kuff	1a2695781d	Add support for targeting JDBC via run-workload and add Impala Jdbc Client tool	2014-01-08 10:47:29 -08:00
Lenni Kuff	d5177c3c30	Update run-workload to support specifying table format test vectors from command line	2014-01-08 10:47:20 -08:00
Lenni Kuff	9953224f68	Cleanup IMPALA_HOME/bin directory Deleted some old files and moved some files out of the bin directory into better locations	2014-01-08 10:46:55 -08:00
Lenni Kuff	97380676d8	Fixed case sensitivity issue with exec options in the test Impala Beeswax client code Also added support for executing run-workload in a mode that continues after query errors	2014-01-08 10:46:50 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
ishaan	c5ddba4296	run-workload + keberos	2014-01-08 10:46:29 -08:00
Lenni Kuff	9f91081183	Modify TPCH tests to always insert into text table so workload can run on all file formats	2014-01-08 10:46:21 -08:00
Lenni Kuff	5e91fc8ff8	Fix $TABLE table suffix replacement for workloads that don't have tables in a database	2014-01-08 10:46:17 -08:00
Lenni Kuff	b3fce13b1d	Initial Impala failure testing library + modularize run-workload This adds initial changes for the Impala failure testing library. It also refactors run workload into its own module to it can be used in other tests. The failure testing has two main components - the first is an object model on top on top of Impala services in a cluster. This allows for enumerating the serivces in the cluster and executing commands on remote machines. This initial cut is built on top of the CM service to help with starting/stopping services. The long term goal is to let this run on both a CM cluster and non-CM cluster as well as locally. The other part of the failure injection change is failure_inctor module that uses the Impala service abstraction to select and inject failures into random impala services. This failure testing framework hasn't been completely validated because the product code is not yet ready, but it is important to get this checked in so all new changes to run-workload are based off this refactor. Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5	2014-01-08 10:46:16 -08:00
Lenni Kuff	25edaae9d7	Enable running specific query name(s) + log exec results before completion	2014-01-08 10:46:16 -08:00
Lenni Kuff	231b66f37f	A few small fixes Queries now return rows on both our small (query test) data set as well as the 10TB data set. This change also fixes a problem with python not being set properly and adds support for reporting query results using the geometric mean Change-Id: Ia432148d96645ecda3f63900b3bfbd29c706d886	2014-01-08 10:46:15 -08:00
ishaan	3ec95e3226	Enable run-workload to run with beeswax.	2014-01-08 10:46:14 -08:00
Lenni Kuff	5f72b34faa	Additional changes to run-workload for flexible query execution, filtering of file formats This changes cleans up run-workload to push more query execution logic into query_executor. It also adds a new feature to run-workload to support filtering of the file format / compression to run on.	2014-01-08 10:45:08 -08:00
Lenni Kuff	1b4f318bf2	Update run-workload to facilitate beeswax execution and support saving of partial results This change updates run-workload to provide a more generic interface for query execution. Now the query executor just takes an execution function and a new QueryExecOptions object that defines the values to use for execution. I also made a change to store partial result sets so we can salvage some work if a run fails.	2014-01-08 10:45:06 -08:00
Lenni Kuff	7d595ba740	Update run-workload result reporting to make reference result comparison more flexible Now we save Hive results into a separate file (previously everything was stored in the same file. Also added ability to do a run-benchmark and specify to skip impala and which will help generate hive reference results. Updated the reporting script to reflect this change.	2014-01-08 10:44:50 -08:00
ishaan	4c84cdae51	Handle queries with '%', python does not parse it properly.	2014-01-08 10:44:44 -08:00
Lenni Kuff	58001240d5	Improve performance reporting and add support for running multiple workloads with different scale factors This improves the summary reporting for perf results, fixes a problem with how the short query names were being stored, and also adds support for running multiple workloads of different scale factors.	2014-01-08 10:44:41 -08:00
Lenni Kuff	aa60a59188	Add support for executing multiple workload queries in parallel This change add a -num_clients flag that specifies the number of clients (threads) to use when executing each query in a workload. This is used to validate Impala concurrency/stress. The logging was getting messed up with multiple threads so I also updated this to use the logger module. Currently we only capture and save the results of the first thread that executes. In the future we might want to update this to capture results from all the threads.	2014-01-08 10:44:40 -08:00

22 Commits