- Added execution summary to the beeswax client and QueryResult
- Modified report-benchmark-results to handle JSON and perform
execution summary comparison between runs
- Added comments to the new workload runner
Change-Id: I9c3c5f2fdc5d8d1e70022c4077334bc44e3a2d1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3598
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fd0b1406be2511c202e02fa63af94fbbe5e18eee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3618
This patch introduces new abstractions and changes the way queries are run via the
workload runner. A new class 'Workload' is introduced, which represents the notion of a
workload in the performance framework (i.e, A set of query names mapped to query
strings).
The new workflow is:
- run-workload acts as a driver. It accepts user parmaters for which queries to
run and their execution strategy. It generates workload objects and passes them to the
workload-runner.
- The workload runner takes a workload, its execution parameters and generates a set of
test vectors over which the workload is run iteratively.
- A workload is executed by initialiazing a QueryExecutor for each query being run in a
test vector. The workload executor is then responsible for execution and gathering
results.
- The execution details of every query being executed are are stored and returned to the
driver (run-workload).
Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
We were setting the state to exception on Cancel() all the time.
We use the cancellation path as the normal cleanup path so this
gets called even when the query went fine (e.g. UnregisterQuery
calls Cancel()). We had already plumbed through a 'cause' argument
to differentiate.
Change-Id: Icf1091c165dec36d3dad7ce308367bbbc9edee4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3524
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3575
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.
Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
This patch also adds a mechanism to return analysis warnings to
client, which is used to log skipped decimal columns.
Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984
- Use a smaller table so hive runs faster
- Don't invalidate the catalog, just the view created in hive
- This lets us run it in parallel
Change-Id: I8085d8967dc96cbbb20e2d719072b29fe591cd98
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2958
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
There are cases of Parquet files where the metadata indicate wrong number of rows for
these files. The parquet-scanner until now was not reporting any problem in this case.
Instead it was reading as long as there where values for the read columns.
But with IMPALA-1016 we are now reading at most as many rows as the rows per metadata.
With this patch, the parquet-scanner, right before it finishes scannings, checks whether
it read the expected number of rows (taken from metadata). In cases where the actual
number of rows read is less than or greater than the expected number, it either aborts
or logs an error.
Change-Id: Ie6a66a38e8912730bf04762e6526ec1cadb2bcdc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2755
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2944
For example, you can now do something like:
result_set = execute("select * from tbl")
result_row = result_set[0]
result_row['col_alias'] or result_row[4]
to access column values. If the column alias/position does not exist an exception is
thrown.
Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922
Currently, we coalesce the results and do not properly catch a failure if one of the
threads has a failed query and exit_on_error is set to True. This patch ensures that we
exit before the next query is run.
Change-Id: Ie650e0f547874386c79c78982ea9916f33e18cda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2654
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This patch allows the text scanner to read 'inf' or 'Infinity' from a
row and correctly translate it into floating-point infinity. It also
adds is_inf() and is_nan() builtins.
Finally, we change the text table writer to write Infinity and NaN for
compatibility with Hive.
In the future, we might consider adding nan / inf literals to our
grammar (postgres has this, see:
http://www.postgresql.org/docs/9.3/static/datatype-numeric.html).
Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483
Some tests have constraints that were there only to help reduce runtime which
reduces coverage when running in exhaustive mode. The majority of the constraints
are because it adds no value to run the test across additional dimensions (or
it is invalid to run with those dimensions). Updates the tests that have
legitimate constraints to use two new helper methods for constraining the table format
dimension:
create_uncompressed_text_dimension()
create_parquet_dimension()
These will create a dimension that will produce a single test vector, either
uncompressed text or parquet respectively.
Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290
This re-enables a subset of the stable data errors tests and updates them to
work in our test framework. This includes support for updating results via --update_results.
This also lets us remove a lot of old code that was there only to support these disabled
tests.
Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277
The bug: Coordinator::Wait() is supposed to block until rows become available for
consumption by the client. We rely on Wait() to determine when to advance the query
status to a 'ready' state and signal to the client that rows can be fetched.
Long fetch times can trigger client timeouts at various levels (socket, app, etc.).
Coordinator::Wait() simply opens the coordinator fragment's plan tree.
For most plan nodes, Open() does work to prepare the plan tree, s.t., GetNext()
returns quickly. However, for ExchangeNodes Open() used to not wait
until rows are obtained form the underlying stream receiver.
The fix: Make ExchangeNode::Open() block until rows are available.
Change-Id: I7b197eea11d21fd732414d96c899a17b2d99631c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2185
The problem was that we were setting a flag marking the last_query_handle as closed, but
were not resetting the flag before the next query. This caused the first query to
be closed properly, but subsequent queries would not be closed. The fix is to change
where the flag is reset to the same place as where we assign last_query_handle.
Added a test case.
Change-Id: I870a96789489bfe4f388910b808409cd0584af8a
(cherry picked from commit 1439151af5b63112b0dd631fac9c7ab4d43bba37)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1976
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
for ASCII character values between 128-255 (-2 maps to ASCII 254).
Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.
To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.
Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888
This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral
(incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a
BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr.
As part of this change it turns out Boolean partition columns are also broken in Hive. I
filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition
column for Impala due to this bug.
Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
The test works by submitting a number of queries (parameterized) with
some delay between submissions (parameterized) and the ability to
submit to one impalad or many. The queries are set with the WAIT debug
action so that we have more control over the state that the admission
controller uses to make decisions. Each query is submitted on a
separate thread. Depending on the test parameters a varying number of
queries will be admitted, queued, and rejected. Once queries are
admitted, the query execution blocks and we can cancel the query in
order to allow another queued query to be admitted. The test tracks
the state of the admission controller using metric counters on each
impalad.
Change-Id: I455484a7f899032890b22c38592fcea1875f5399
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1413
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit bc2a74d6da622de877422f926ff1892bed867bb1)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1624
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
There was race when the catalog was invalidated at the same time a table
was being loaded. This is because an uninitialized Table was being returned
unexpectedly to the impalad due to the concurrent invalidate.
This fixes the problem by updating the CatalogObjectCache to load when
a catalog object is uninitialized, rather than load when null. New items can
now be added in a initialized or uninitialized state; uninitialized objects
are loaded on access.
Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh
In addition, it cleans up the locking in the Catalog to make it more
straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog
and this lock is used to protect the catalogVersion_. Operations that need to
perform an atomic bulk catalog operation can use this lock (such as when the
CatalogServer needs to take a snapshot of the catalog to calculate what delta to send
to the statestore). Otherwise, the lock is not needed and objects are protected by the
synchronization at each level in the object heirarchy (Db->[Function/Table]). That is,
Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized
independently.
Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418
All in-flight queries will be blocked until re-registration succeeds
or until a timeout has been reached.
Change-Id: I9c22c9d3a2deff92b227065974109715a1b18595
Adds a new client API for retrieving all user defined functions (aggregate and scalar)
in a database. This is a requirement from CM Backup Disaster and Recovery.
Change-Id: I4e33d714795fe808370262f36218ea112f67ec30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1271
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
* Statestore is now one word, without camelcase, eveywhere. Previous
names included StateStore, state-store and state_store,
variously. The only exception is a couple of flags that have
'state_store', and can't be changed for compatibility reasons.
* File names are also changed to reflect the standard naming.
* Most comments are now 90 chars wide (from 80 before)
Change-Id: I83b666c87991537f9b1b80c2f0ea70c2e0c07dcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1225
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This is the first step in cleaning up the test logging. It provides a common connection
interface that provides tracing around all operations. When a test fails the output will
be executable SQL. It also logs actions such as when a connection is opened, close, or
when an operation is cancelled. Currently only beeswax connections are supported, but
I have a seperate patch that adds support for executing using HS2 as well as Beeswax.
Example of new logging:
-- connecting to: localhost:21000
-- executing against localhost:21000
use functional;
SET disable_codegen=False;
SET abort_on_error=1;
SET batch_size=0;
SET num_nodes=0;
-- executing against localhost:21000
select a.timestamp_col from alltypessmall a inner join alltypessmall b on
(a.timestamp_col = b.timestamp_col)
where a.year=2009 and a.month=1 and b.year=2009 and b.month=1;
-- closing connection to: localhost:21000
Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
This patch makes the workload runner's logging concise and more informative. Specifically,
it
- logs the time taken for each iteration of a query.
- changes the default log level to INFO.
- The output is less verbose.
Change-Id: I5f964cf76269fd64ce127b9e4c51fe1deafd1d1b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1076
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
When dropping functions, we neeed to remove the function from the list
of Functions with that name AND remove the list from the Function map if
the list is empty. The second part wasn't happening.
Also fixes the test_ddl to properly create all test databases.
Change-Id: Id85af7d5db74a31161f48bea3816bdf734063133
Reviewed-on: http://gerrit.ent.cloudera.com:8080/952
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This change adds support for cluster-synchronized catalog operations. This provides the
guaranteethat after a catalog op completes, all other subscribers to the catalog topic have
also processed that update. This is useful when load balancing, because a common workflow
is to target a different impalad for each statement executed.
For example if each of the following were executed sequentially, but targeting
a different node:
1) CREATE TABLE Foo
2) INSERT INTO Foo
3) SELECT * FROM Foo
4) INSERT INTO Foo ....
Since both the INSERT and the CREATE update the catalog, it would not work as expected
without this patch. The user might either get a "table not found" error or would be
missing partition information from the INSERT.
The downside is that this approach to DDL takes a bit longer because we need to wait
until all subscribers have processed an update. If all nodes are healthy, this overhead
should not be significantly longer than the current DDL time. However, a single bad node
might slow down or completely block the completion of all DDL operations. By default
this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1
To test this, the base test suite was updated to support selecting a random impalad
to execute each query section in a query test file. This is currently only enabled
for the insert and DDL tests, but could be leveraged by more tests in the future.
TODO: Add additional failure tests around this functionality.
TODO: Add an explicit "sync" statement so users do not need to run all their DDL
in this mode (since it is slower).
Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/899
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This change adds support for faster DDL via the CatalogServer by directly
returning the TCatalogObject from each catalog operation and using this result
to update the local impalad's catalog cache directly, rather than waiting
for a state store heartbeat that contains the change.
Because the Impalad's catalog can now be updated in two ways, it means that
we need to be careful when applying updates to ensure no work gets "undone".
For example, consider the following sequence of events:
t1: [Direct Update] - Add item A - (Catalog Version 9)
t2: [Direct Update] - Drop item A - (Catalog Version 10)
t3: [StateStore Update] - (From Catalog Version 9)
In this case, we need to ensure that the state store update in t3 does not undo the
drop in t2, even though that update will contain the change to "add item A".
To support this, we now check the catalog versions before adding any item to ensure
that an existing item does not overwrite an item with a newer catalog version.
To handle the case of removals, a new CatalogUpdateLog is introduced. This log tracks
the catalog version each item was removed from the catalog. When adding a new
catalog object, it is checked to see if this object was removed in a catalog version >
than the version of the current object. If so, the update is ignored.
This covers most updates, but there is still one concurrency issue that is not covered
with this change. If someone issues an "invalidate metadata" concurrently with a
direct catalog operation, it may briefly set the catalog back in time. This seems like
okay behavior to me (the command is invalidating the catalog metadata). If we want
to address this the CatalogUpdateLog could be extended to track additions to the catalog
and we could replay the log after invalidating the metadata (as one possible solution).
Change-Id: Icc9bdecc3c32436708bf9e9e7974f91d40e514f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/864
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Test suites that derive from common.CustomClusterTestSuite have a brand
new cluster for every tests case, which they can configure as they wish
with custom arguments using the @with_args() decorator.
A future improvement is to optionally only have one cluster per test
suite, to allow multiple tests to run more quickly if they share
configuration options.
Change-Id: I6abd5740e644996d7ca2800edf4ff11b839d1bc4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/882
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This patch fixes a slightly pathological state that occurs when the
statestore is under heavy load. The result of the bug is that
subscribers cannot successfully re-register because the statestore never
marks them as failed.
The exact sequence of events is as follows:
1. Subscriber registers with state-store.
2. Statestore does not send heartbeats in timely fashion to
subscriber. Subscriber times-out.
3. Subscriber is restarted quickly. Statestore does not detect
restart.
4. Subscriber's RegisterSubscriber() call fails, because statestore
detects duplicate registration.
5. Subscriber restarts again. Since state-store is slow to send
heartbeats, the state-store has not detected the restart and the
subscriber receives a heartbeat message from the statestore and
does not reject it.
6. Statestore continues to believe subscriber is alive, since the
heartbeats are not being rejected.
To fix this, we add a registration ID to each successfully registered
subscriber that is known to both subscriber and statestore. If the
subscriber should restart and re-register, it receives a new
registration ID. Whenever a heartbeat arrives, it compares its
registration ID to that sent by the statestore with the heartbeat, and
rejects the heartbeat if they do not match.
We also allow re-registration of existing subscribers (getting rid of
the dreaded "Duplicate subscription" message). A new registration
overwrites an old one.
Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/778
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This brings back online the process failure tests and adds a basic failure
test for the catalog service. The timeouts had to be adjusted to account for the
extra time it takes to load the the catalog and also there is an additional state
store subscriber. Note: the statestore 'live.backends' metric which is used in these
tests needs to be renamed, it really means 'live.subscribers'. However, it requires some
coordination with other teams to make the change.
Also updated start-impala-cluster to check the catalog.ready flag to ensure the impalad
catalog is ready to accept queries.
Change-Id: If22e25dba7dc83aa40bec937b5f82b815bed4645
Reviewed-on: http://gerrit.ent.cloudera.com:8080/730
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly
Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
did not catch that they were incorrect
Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This is currently broken (query options do not get set via run-workload). If any
query options are provided to run-workload, it exits with an error. This patch
re-enables setting query options through run-workload and also moves their validation to
impala_beeswax.
Change-Id: I1df010990f9e57ebd4cf59ada5d9646a883df380
Reviewed-on: http://gerrit.ent.cloudera.com:8080/820
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
This adds the EOL match character '$' to the end of all query names
regex string to make the matching behaviour a bit more user friendly. This way
if the user inputs "TPCH-Q1" it will not match TPCH-Q11/Q12/etc
which is probably what they want. The user can still do a wildcard
match using "TPCH-Q1.*" or "TPCH-Q1.*$"
Change-Id: Icfb6a111aa464353387e9b631168c44127a7896f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/784
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
This change allows for matching query names in run-workload using
a regex strings. For example, the user can now pass
run-workload a query name string like: --query_names=tpcds-q.*,^tpch.*
Change-Id: I5b13858ec32cf10769a4c4f2afc49adfeb98ec93
Reviewed-on: http://gerrit.ent.cloudera.com:8080/777
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>