impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 21:02:41 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	e94de02469	Added execution summary, modified benchmark to handle JSON - Added execution summary to the beeswax client and QueryResult - Modified report-benchmark-results to handle JSON and perform execution summary comparison between runs - Added comments to the new workload runner Change-Id: I9c3c5f2fdc5d8d1e70022c4077334bc44e3a2d1d Reviewed-on: http://gerrit.ent.cloudera.com:8080/3598 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins (cherry picked from commit fd0b1406be2511c202e02fa63af94fbbe5e18eee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3618	2014-07-25 21:06:00 -07:00
ishaan	3bed0be1df	Refactor the performance framework and change its execution strategy. This patch introduces new abstractions and changes the way queries are run via the workload runner. A new class 'Workload' is introduced, which represents the notion of a workload in the performance framework (i.e, A set of query names mapped to query strings). The new workflow is: - run-workload acts as a driver. It accepts user parmaters for which queries to run and their execution strategy. It generates workload objects and passes them to the workload-runner. - The workload runner takes a workload, its execution parameters and generates a set of test vectors over which the workload is run iteratively. - A workload is executed by initialiazing a QueryExecutor for each query being run in a test vector. The workload executor is then responsible for execution and gathering results. - The execution details of every query being executed are are stored and returned to the driver (run-workload). Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins	2014-07-25 18:17:11 -07:00
Nong Li	a25400c94e	Increase timeout in test_rows_availability to make sure query state is what we expect. Change-Id: Id4feebcc7b7cecb07555009219e6420e48a0c82b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3534 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/3579	2014-07-22 12:12:13 -07:00
Nong Li	202d656ddc	Stop setting query state to EXCEPTION for non-exception cases. We were setting the state to exception on Cancel() all the time. We use the cancellation path as the normal cleanup path so this gets called even when the query went fine (e.g. UnregisterQuery calls Cancel()). We had already plumbed through a 'cause' argument to differentiate. Change-Id: Icf1091c165dec36d3dad7ce308367bbbc9edee4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/3524 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3575	2014-07-22 04:08:28 -07:00
Taras Bobrovytsky	568e851774	Added option to specify the scale factor for pytest This allows execution of tests on a cluster with multiple scale factors. For example: py.test <test file> --impalad <cluster ip>:21000 --scale_factor 300gb Change-Id: I5230a6ef354def44b984eab2ac8a01989b9a471c Reviewed-on: http://gerrit.ent.cloudera.com:8080/3051 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3215	2014-07-15 14:44:37 -07:00
Taras Bobrovytsky	8d6f8ff01c	run-workload should exit with a non-zero error code if a query fails and abort_on_error is true The exception raised by a child thread did not reach the main thread, so the script exited with 0 instead of 1. Change-Id: I09be9dc824386bf25a64af0323cbf78f6d006b91 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3081 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3214	2014-07-15 14:43:10 -07:00
Ippokratis Pandis	6026f1ebe1	IMPALA-1055: Compute stats query statements don't quote DB and table names The compute stats statement was not quoting the DB and table names. If those names were aliasing with keywords, then the compute stats would not execute due to a syntax error. Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198	2014-06-20 09:32:52 -07:00
Skye Wanderman-Milne	1cc628d32d	IMPALA-950: Skip computing stats for decimal columns. This patch also adds a mechanism to return analysis warnings to client, which is used to log skipped decimal columns. Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984	2014-06-11 19:16:34 -07:00
Nong Li	5e49150a22	Speed up views compat test. - Use a smaller table so hive runs faster - Don't invalidate the catalog, just the view created in hive - This lets us run it in parallel Change-Id: I8085d8967dc96cbbb20e2d719072b29fe591cd98 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2958 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-10 20:53:23 -07:00
Ippokratis Pandis	fe0646f76b	IMPALA-1022: Handle cases where in Parquet the expected number of rows in metadata is wrong There are cases of Parquet files where the metadata indicate wrong number of rows for these files. The parquet-scanner until now was not reporting any problem in this case. Instead it was reading as long as there where values for the read columns. But with IMPALA-1016 we are now reading at most as many rows as the rows per metadata. With this patch, the parquet-scanner, right before it finishes scannings, checks whether it read the expected number of rows (taken from metadata). In cases where the actual number of rows read is less than or greater than the expected number, it either aborts or logs an error. Change-Id: Ie6a66a38e8912730bf04762e6526ec1cadb2bcdc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2755 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2944	2014-06-10 17:27:54 -07:00
Lenni Kuff	b3ebfddadd	Allow tests to access query result column values by col alias or col position For example, you can now do something like: result_set = execute("select * from tbl") result_row = result_set[0] result_row['col_alias'] or result_row[4] to access column values. If the column alias/position does not exist an exception is thrown. Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922	2014-06-09 23:24:26 -07:00
Victor Bittorf	09aff77a6c	IMPALA-943: removed database udf_test from front-end tests Added CATCH section to test files. Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912	2014-06-09 19:06:15 -07:00
Henry Robinson	d264ab90fe	Add support for client SSL to Python Beeswax client Change-Id: I0d9352471067bfe19e25221e0ecbbb08f945b962 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2810 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 545bd30d5cf3cae9a3581d7bc942a909a1a98806) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2850 Tested-by: Henry Robinson <henry@cloudera.com>	2014-06-05 10:48:23 -07:00
ishaan	c5c58c6bce	The workload runner should abort execution is a query fails in a multi-user run. Currently, we coalesce the results and do not properly catch a failure if one of the threads has a failed query and exit_on_error is set to True. This patch ensures that we exit before the next query is run. Change-Id: Ie650e0f547874386c79c78982ea9916f33e18cda Reviewed-on: http://gerrit.ent.cloudera.com:8080/2654 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-05-27 20:46:21 -07:00
Henry Robinson	93a3d65492	Support for LDAP tests * Allow Beeswax connections to optionally use LDAP * Run custom cluster tests from the aux repo, if it exists Change-Id: I054af64e030ad0cd722ae8dd75afda9c58ea2913 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2547 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2640	2014-05-21 05:52:55 -07:00
Henry Robinson	38befd2126	IMPALA-724: Support infinite / nan values in text files This patch allows the text scanner to read 'inf' or 'Infinity' from a row and correctly translate it into floating-point infinity. It also adds is_inf() and is_nan() builtins. Finally, we change the text table writer to write Infinity and NaN for compatibility with Hive. In the future, we might consider adding nan / inf literals to our grammar (postgres has this, see: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html). Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483	2014-05-08 12:28:53 -07:00
Lenni Kuff	bb09b5270f	IMPALA-839: Update tests to be more thorough when run exhaustively Some tests have constraints that were there only to help reduce runtime which reduces coverage when running in exhaustive mode. The majority of the constraints are because it adds no value to run the test across additional dimensions (or it is invalid to run with those dimensions). Updates the tests that have legitimate constraints to use two new helper methods for constraining the table format dimension: create_uncompressed_text_dimension() create_parquet_dimension() These will create a dimension that will produce a single test vector, either uncompressed text or parquet respectively. Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290	2014-04-18 20:11:31 -07:00
Lenni Kuff	15327e8136	Migrate DataErrors tests to Python test framework, re-enable subset of tests This re-enables a subset of the stable data errors tests and updates them to work in our test framework. This includes support for updating results via --update_results. This also lets us remove a lot of old code that was there only to support these disabled tests. Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277	2014-04-18 02:25:11 -07:00
Nong Li	1cab95066d	Add the return type as a column for SHOW FUNCTIONS. Also includes some misc pattern matching cleanup. Change-Id: I6c9ec78b094a73864b4d669afbd75a48c9bf9585 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2199 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2271	2014-04-17 17:58:13 -07:00
Alex Behm	2fff51d9e9	IMP-1329,IMPALA-924: Make ExchangeNode::Open() block until rows are available. The bug: Coordinator::Wait() is supposed to block until rows become available for consumption by the client. We rely on Wait() to determine when to advance the query status to a 'ready' state and signal to the client that rows can be fetched. Long fetch times can trigger client timeouts at various levels (socket, app, etc.). Coordinator::Wait() simply opens the coordinator fragment's plan tree. For most plan nodes, Open() does work to prepare the plan tree, s.t., GetNext() returns quickly. However, for ExchangeNodes Open() used to not wait until rows are obtained form the underlying stream receiver. The fix: Make ExchangeNode::Open() block until rows are available. Change-Id: I7b197eea11d21fd732414d96c899a17b2d99631c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2185	2014-04-10 23:49:38 -07:00
Matthew Jacobs	cd2dc3e2bd	Fix test_failpoints to close queries after cancel Change-Id: I4f272ccec84030d8b4f85d0e1554a042ee26be30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2092 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins (cherry picked from commit d42aad459a68991fc489caf1edbca10ea599d28a) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2116 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-03-28 18:47:25 -07:00
Lenni Kuff	70c05d4caa	IMPALA-897: shell does not close queries after completion when running from a script The problem was that we were setting a flag marking the last_query_handle as closed, but were not resetting the flag before the next query. This caused the first query to be closed properly, but subsequent queries would not be closed. The fix is to change where the flag is reset to the same place as where we assign last_query_handle. Added a test case. Change-Id: I870a96789489bfe4f388910b808409cd0584af8a (cherry picked from commit 1439151af5b63112b0dd631fac9c7ab4d43bba37) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1976 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-03-18 18:46:54 -07:00
Lenni Kuff	dd20958e5d	Minor test cleanup * Prefer 'refresh <table name>' over 'invalidate metadata' * Remove the 'RELOAD' test setup option that was used by only 1 test. * Delete a .py test file that seems to be a duplicate Change-Id: I890546635840bb8f4d55789a89f8c8f33e40d001 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1933 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1946 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-17 17:30:15 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Lenni Kuff	08417c875f	IMPALA-849: Impala does not work with boolean partition key columns This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral (incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr. As part of this change it turns out Boolean partition columns are also broken in Hive. I filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition column for Impala due to this bug. Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-03-11 18:42:08 -07:00
Alex Behm	9cabee4a71	Wait for the Metastore to come up before starting HiveServer2. Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-02-25 21:05:33 -08:00
Matthew Jacobs	9156cb94ca	Admission controller functional tests The test works by submitting a number of queries (parameterized) with some delay between submissions (parameterized) and the ability to submit to one impalad or many. The queries are set with the WAIT debug action so that we have more control over the state that the admission controller uses to make decisions. Each query is submitted on a separate thread. Depending on the test parameters a varying number of queries will be admitted, queued, and rejected. Once queries are admitted, the query execution blocks and we can cancel the query in order to allow another queued query to be admitted. The test tracks the state of the admission controller using metric counters on each impalad. Change-Id: I455484a7f899032890b22c38592fcea1875f5399 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1413 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins (cherry picked from commit bc2a74d6da622de877422f926ff1892bed867bb1) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1624 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-02-20 14:48:30 -08:00
Nong Li	904ae86e82	IMPALA-626: Allow dropping functions while it is running. Change-Id: Ia9d6fa1daadddbd05961696d13b9ff43fef2da61 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1621 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-20 13:12:10 -08:00
Lenni Kuff	7a6892dcbe	Fix race when invalidating catalog metadata and loading a new table There was race when the catalog was invalidated at the same time a table was being loaded. This is because an uninitialized Table was being returned unexpectedly to the impalad due to the concurrent invalidate. This fixes the problem by updating the CatalogObjectCache to load when a catalog object is uninitialized, rather than load when null. New items can now be added in a initialized or uninitialized state; uninitialized objects are loaded on access. Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh In addition, it cleans up the locking in the Catalog to make it more straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog and this lock is used to protect the catalogVersion_. Operations that need to perform an atomic bulk catalog operation can use this lock (such as when the CatalogServer needs to take a snapshot of the catalog to calculate what delta to send to the statestore). Otherwise, the lock is not needed and objects are protected by the synchronization at each level in the object heirarchy (Db->[Function/Table]). That is, Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized independently. Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418	2014-01-31 16:16:32 -08:00
Alex Behm	f4b809dd11	Re-registering resource brokers with Llama if Llama restarts. All in-flight queries will be blocked until re-registration succeeds or until a timeout has been reached. Change-Id: I9c22c9d3a2deff92b227065974109715a1b18595	2014-01-15 15:12:08 -08:00
Alex Behm	c295b5eda8	[CDH5] Fixed JDBC connectivity to Impala and Hive and related Impala tests. Hive now uses the simple SASL transport because its NOSASL transport is broken (HIVE-4232). Impala still uses the NOSASL transport. The changes also include more careful dependency management. Change-Id: I16633dcef912dce20c8de8cf2f43c45a49460d20	2014-01-15 15:11:47 -08:00
Lenni Kuff	51f003a785	IMP-1156: Add CatalogServer API for listing all UDFs and UDAs in a database Adds a new client API for retrieving all user defined functions (aggregate and scalar) in a database. This is a requirement from CM Backup Disaster and Recovery. Change-Id: I4e33d714795fe808370262f36218ea112f67ec30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-14 00:01:25 -08:00
Henry Robinson	51e58e1f3c	Statestore aesthetic cleanup * Statestore is now one word, without camelcase, eveywhere. Previous names included StateStore, state-store and state_store, variously. The only exception is a couple of flags that have 'state_store', and can't be changed for compatibility reasons. * File names are also changed to reflect the standard naming. * Most comments are now 90 chars wide (from 80 before) Change-Id: I83b666c87991537f9b1b80c2f0ea70c2e0c07dcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1225 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-01-09 09:56:04 -08:00
Lenni Kuff	6afea60704	Update test logging to print executable SQL statements and log all actions executed This is the first step in cleaning up the test logging. It provides a common connection interface that provides tracing around all operations. When a test fails the output will be executable SQL. It also logs actions such as when a connection is opened, close, or when an operation is cancelled. Currently only beeswax connections are supported, but I have a seperate patch that adds support for executing using HS2 as well as Beeswax. Example of new logging: -- connecting to: localhost:21000 -- executing against localhost:21000 use functional; SET disable_codegen=False; SET abort_on_error=1; SET batch_size=0; SET num_nodes=0; -- executing against localhost:21000 select a.timestamp_col from alltypessmall a inner join alltypessmall b on (a.timestamp_col = b.timestamp_col) where a.year=2009 and a.month=1 and b.year=2009 and b.month=1; -- closing connection to: localhost:21000 Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:40 -08:00
Nong Li	b310935424	Minor workload runner logging improvements. Change-Id: I75d27593599e654f7fab1cd104dd9fe9fa88cfdb Reviewed-on: http://gerrit.ent.cloudera.com:8080/1145 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Conflicts: tests/common/workload_runner.py	2014-01-08 10:54:38 -08:00
ishaan	7e520f8f23	Make workload runner logging more concise and readable. This patch makes the workload runner's logging concise and more informative. Specifically, it - logs the time taken for each iteration of a query. - changes the default log level to INFO. - The output is less verbose. Change-Id: I5f964cf76269fd64ce127b9e4c51fe1deafd1d1b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1076 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:54:35 -08:00
Lenni Kuff	9717b7af28	Rename SYNCED_DDL query option to SYNC_DDL Change-Id: I0b5e08694a271c40ac55d8e695cf3a74a012ce06 Reviewed-on: http://gerrit.ent.cloudera.com:8080/972 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:11 -08:00
Lenni Kuff	6bba0c8ffe	Fix bug cleaning up removed Functions and fix test_ddl to create all test dbs When dropping functions, we neeed to remove the function from the list of Functions with that name AND remove the list from the Function map if the list is empty. The second part wasn't happening. Also fixes the test_ddl to properly create all test databases. Change-Id: Id85af7d5db74a31161f48bea3816bdf734063133 Reviewed-on: http://gerrit.ent.cloudera.com:8080/952 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:00 -08:00
Lenni Kuff	39f77b8b8f	Add support for cluster-synchronized catalog operations This change adds support for cluster-synchronized catalog operations. This provides the guaranteethat after a catalog op completes, all other subscribers to the catalog topic have also processed that update. This is useful when load balancing, because a common workflow is to target a different impalad for each statement executed. For example if each of the following were executed sequentially, but targeting a different node: 1) CREATE TABLE Foo 2) INSERT INTO Foo 3) SELECT * FROM Foo 4) INSERT INTO Foo .... Since both the INSERT and the CREATE update the catalog, it would not work as expected without this patch. The user might either get a "table not found" error or would be missing partition information from the INSERT. The downside is that this approach to DDL takes a bit longer because we need to wait until all subscribers have processed an update. If all nodes are healthy, this overhead should not be significantly longer than the current DDL time. However, a single bad node might slow down or completely block the completion of all DDL operations. By default this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1 To test this, the base test suite was updated to support selecting a random impalad to execute each query section in a query test file. This is currently only enabled for the insert and DDL tests, but could be leveraged by more tests in the future. TODO: Add additional failure tests around this functionality. TODO: Add an explicit "sync" statement so users do not need to run all their DDL in this mode (since it is slower). Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83 Reviewed-on: http://gerrit.ent.cloudera.com:8080/899 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:58 -08:00
Lenni Kuff	35817f6a17	Support faster DDL operations via the CatalogServer This change adds support for faster DDL via the CatalogServer by directly returning the TCatalogObject from each catalog operation and using this result to update the local impalad's catalog cache directly, rather than waiting for a state store heartbeat that contains the change. Because the Impalad's catalog can now be updated in two ways, it means that we need to be careful when applying updates to ensure no work gets "undone". For example, consider the following sequence of events: t1: [Direct Update] - Add item A - (Catalog Version 9) t2: [Direct Update] - Drop item A - (Catalog Version 10) t3: [StateStore Update] - (From Catalog Version 9) In this case, we need to ensure that the state store update in t3 does not undo the drop in t2, even though that update will contain the change to "add item A". To support this, we now check the catalog versions before adding any item to ensure that an existing item does not overwrite an item with a newer catalog version. To handle the case of removals, a new CatalogUpdateLog is introduced. This log tracks the catalog version each item was removed from the catalog. When adding a new catalog object, it is checked to see if this object was removed in a catalog version > than the version of the current object. If so, the update is ignored. This covers most updates, but there is still one concurrency issue that is not covered with this change. If someone issues an "invalidate metadata" concurrently with a direct catalog operation, it may briefly set the catalog back in time. This seems like okay behavior to me (the command is invalidating the catalog metadata). If we want to address this the CatalogUpdateLog could be extended to track additions to the catalog and we could replay the log after invalidating the metadata (as one possible solution). Change-Id: Icc9bdecc3c32436708bf9e9e7974f91d40e514f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/864 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:58 -08:00
Henry Robinson	9bc840dc85	Support for custom cluster configurations in some tests Test suites that derive from common.CustomClusterTestSuite have a brand new cluster for every tests case, which they can configure as they wish with custom arguments using the @with_args() decorator. A future improvement is to optionally only have one cluster per test suite, to allow multiple tests to run more quickly if they share configuration options. Change-Id: I6abd5740e644996d7ca2800edf4ff11b839d1bc4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/882 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:57 -08:00
Henry Robinson	f241782966	IMPALA-620: Fix re-registration starvation bug in statestore This patch fixes a slightly pathological state that occurs when the statestore is under heavy load. The result of the bug is that subscribers cannot successfully re-register because the statestore never marks them as failed. The exact sequence of events is as follows: 1. Subscriber registers with state-store. 2. Statestore does not send heartbeats in timely fashion to subscriber. Subscriber times-out. 3. Subscriber is restarted quickly. Statestore does not detect restart. 4. Subscriber's RegisterSubscriber() call fails, because statestore detects duplicate registration. 5. Subscriber restarts again. Since state-store is slow to send heartbeats, the state-store has not detected the restart and the subscriber receives a heartbeat message from the statestore and does not reject it. 6. Statestore continues to believe subscriber is alive, since the heartbeats are not being rejected. To fix this, we add a registration ID to each successfully registered subscriber that is known to both subscriber and statestore. If the subscriber should restart and re-register, it receives a new registration ID. Whenever a heartbeat arrives, it compares its registration ID to that sent by the statestore with the heartbeat, and rejects the heartbeat if they do not match. We also allow re-registration of existing subscribers (getting rid of the dreaded "Duplicate subscription" message). A new registration overwrites an old one. Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe Reviewed-on: http://gerrit.ent.cloudera.com:8080/778 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-01-08 10:53:53 -08:00
Lenni Kuff	2336ed99a4	Re-enable process failure tests + add simple failure tests for catalogd This brings back online the process failure tests and adds a basic failure test for the catalog service. The timeouts had to be adjusted to account for the extra time it takes to load the the catalog and also there is an additional state store subscriber. Note: the statestore 'live.backends' metric which is used in these tests needs to be renamed, it really means 'live.subscribers'. However, it requires some coordination with other teams to make the change. Also updated start-impala-cluster to check the catalog.ready flag to ensure the impalad catalog is ready to accept queries. Change-Id: If22e25dba7dc83aa40bec937b5f82b815bed4645 Reviewed-on: http://gerrit.ent.cloudera.com:8080/730 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:52 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
ishaan	f6f8d9d19d	Fix query_executor to set user specified Impala query options. This is currently broken (query options do not get set via run-workload). If any query options are provided to run-workload, it exits with an error. This patch re-enables setting query options through run-workload and also moves their validation to impala_beeswax. Change-Id: I1df010990f9e57ebd4cf59ada5d9646a883df380 Reviewed-on: http://gerrit.ent.cloudera.com:8080/820 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:49 -08:00
Skye Wanderman-Milne	9d05d6d03a	Allow UDF tests to run in parallel. Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d Reviewed-on: http://gerrit.ent.cloudera.com:8080/727 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:47 -08:00
Lenni Kuff	927c486e0c	Imply EOL match character for workload runner query name regex matches This adds the EOL match character '$' to the end of all query names regex string to make the matching behaviour a bit more user friendly. This way if the user inputs "TPCH-Q1" it will not match TPCH-Q11/Q12/etc which is probably what they want. The user can still do a wildcard match using "TPCH-Q1." or "TPCH-Q1.$" Change-Id: Icfb6a111aa464353387e9b631168c44127a7896f Reviewed-on: http://gerrit.ent.cloudera.com:8080/784 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:46 -08:00
Lenni Kuff	8218c19528	Match query names in run-workload using regex This change allows for matching query names in run-workload using a regex strings. For example, the user can now pass run-workload a query name string like: --query_names=tpcds-q.,^tpch. Change-Id: I5b13858ec32cf10769a4c4f2afc49adfeb98ec93 Reviewed-on: http://gerrit.ent.cloudera.com:8080/777 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:46 -08:00
Lenni Kuff	72e211ca4a	Use Hive Metastore Service instead of HiveServer 1 in test infrastructure Change-Id: I4e2ba02b2101bae95d196ab13f9453e1b3a9d7be Reviewed-on: http://gerrit.ent.cloudera.com:8080/689 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:26 -08:00
ishaan	0cb16863ee	run-workload should log a warning to console and not fail if abort_on_query_error is False and the query fails. This change also disables printing the runtime_profile to the console. Change-Id: Ic7bc3406d6eddb67a514ecfb4a27add8c40a8604 Reviewed-on: http://gerrit.ent.cloudera.com:8080/687 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:25 -08:00

1 2 3

121 Commits