Commit Graph

16 Commits

Author SHA1 Message Date
Srinath Shankar
0df773eed6 Check RuntimeState for cancellation in sorter.
Currently, cancellation checking when a SortNode is executing only
happens when a batch is being added to the sorter (SortNode::SortInput()) or
when a batch is being retrieved from the sorter (SortNode::GetNext())

This fix passes in a RuntimeState into the Sorter instance itself, which
checks for cancellation at the following points:
i) During an in-memory sort (In Partition() and SortHelper()). In Partition(),
 the cancellation check may be delayed if the input is completely sorted.
ii) During an intermediate merge before each batch of rows from a merge is
 copied into a run.

Change-Id: I5c28c7244ee2e40627cf14542b99f872e3a8c343
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3007
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3059
2014-06-14 17:48:40 -07:00
ishaan
6f416dd2c2 Close all queries in test_cancellation
The queries in test_cancellation are currently cancelled but not closed, causing some test
queries to eventually time out because the admission controller limits are passed. This
patch ensures that all queries issued in test_cancellation are closed.

Change-Id: I65b26672155e31889bb6f43d3ac87be0f7b4eb72
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2187
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2213
2014-04-13 17:45:51 -07:00
Lenni Kuff
6afea60704 Update test logging to print executable SQL statements and log all actions executed
This is the first step in cleaning up the test logging. It provides a common connection
interface that provides tracing around all operations. When a test fails the output will
be executable SQL. It also logs actions such as when a connection is opened, close, or
when an operation is cancelled. Currently only beeswax connections are supported, but
I have a seperate patch that adds support for executing using HS2 as well as Beeswax.

Example of new logging:
-- connecting to: localhost:21000
-- executing against localhost:21000
use functional;

SET disable_codegen=False;
SET abort_on_error=1;
SET batch_size=0;
SET num_nodes=0;

-- executing against localhost:21000
select a.timestamp_col from alltypessmall a inner join alltypessmall b on
(a.timestamp_col = b.timestamp_col)
where a.year=2009 and a.month=1 and b.year=2009 and b.month=1;
-- closing connection to: localhost:21000

Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:40 -08:00
Alex Behm
93e5b262c2 Added COMPUTE STATS command for gathering table and column stats.
A compute stats command computes the table and column stats for a given
table and persists them in the metastore.
The table stats consist of the per-partition and per-table row count.
The column stats are computed on a per-table basis and consist of the
number of distinct values and the number of NULLs per column.

This patch introduces a new 'child query' concept that
compute stats utilizes. Child queries are cancelled
if the parent query is cancelled. A compute stats stmt is
executed by the following query hirarchy:
parent: compute stats query (DDL)
- child: compute table stats query (QUERY)
- child: compute column stats query (QUERY)

The new child query concept is necessary to decouple child query fetches
from parent query fetches, i.e., we could not execute a child query as
part of the original compute stats query, because then a client could
fetch the results we need for updating the Metastore statistics. The
reason why our existing CTAS works without this decoupling
is that its insert 'child query' is not fetchable.

Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:14 -08:00
Lenni Kuff
35817f6a17 Support faster DDL operations via the CatalogServer
This change adds support for faster DDL via the CatalogServer by directly
returning the TCatalogObject from each catalog operation and using this result
to update the local impalad's catalog cache directly, rather than waiting
for a state store heartbeat that contains the change.
Because the Impalad's catalog can now be updated in two ways, it means that
we need to be careful when applying updates to ensure no work gets "undone".

For example, consider the following sequence of events:
t1: [Direct Update] - Add item A - (Catalog Version 9)
t2: [Direct Update] - Drop item A - (Catalog Version 10)
t3: [StateStore Update] - (From Catalog Version 9)

In this case, we need to ensure that the state store update in t3 does not undo the
drop in t2, even though that update will contain the change to "add item A".

To support this, we now check the catalog versions before adding any item to ensure
that an existing item does not overwrite an item with a newer catalog version.
To handle the case of removals, a new CatalogUpdateLog is introduced. This log tracks
the catalog version each item was removed from the catalog. When adding a new
catalog object, it is checked to see if this object was removed in a catalog version >
than the version of the current object. If so, the update is ignored.

This covers most updates, but there is still one concurrency issue that is not covered
with this change. If someone issues an "invalidate metadata" concurrently with a
direct catalog operation, it may briefly set the catalog back in time. This seems like
okay behavior to me (the command is invalidating the catalog metadata). If we want
to address this the CatalogUpdateLog could be extended to track additions to the catalog
and we could replay the log after invalidating the metadata (as one possible solution).

Change-Id: Icc9bdecc3c32436708bf9e9e7974f91d40e514f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/864
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:58 -08:00
Lenni Kuff
d698881f71 Improve test run throughput by executing more tests in parallel
This updates the tests to run more test cases in parallel and also removes some
unneeded "invalidate metadata" calls. This cut down the 'serial' execution time
for me by 10+ minutes.

Change-Id: I04b4d6db508a26a1a2e4b972bcf74f4d8b9dde5a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/757
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:46 -08:00
Nong Li
78539ee531 Allow insert cancellation test to fail due to IMPALA-551
Change-Id: I5d98be1cc503cc51206051a7c6a493bf884ab5b3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/594
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:10 -08:00
Nong Li
b5ca38cf7e IMPALA-551: Fix file cleanup in hdfs-table-sink on early error cases.
Change-Id: I50324381e5ddbf8d80dd72d27b16b5255bf5f985
Reviewed-on: http://gerrit.ent.cloudera.com:8080/492
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:57 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Lenni Kuff
73a9c5f024 Disable open file verification for INSERT cancel tests due to IMPALA-551
Change-Id: Ib11e4ac7e5161faa9c45872afafd88105e32433b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/326
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:28 -08:00
Lenni Kuff
04af2828c3 Add insert cancellation test, dynamically build failpoint dimensions from explain output
Change-Id: I43fc5860edef6d47698c6b3edbb623dcc7fd37c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:27 -08:00
Skye Wanderman-Milne
f87878f111 IMPALA-412: Impala might hang when an impalad dies during query execution
Allows queries to be cancelled while fetching rows and fixes some test bugs.
2014-01-08 10:51:51 -08:00
Nong Li
f60f2d3e50 Implement support for grouped scan ranges in io mgr and integration with parquet. 2014-01-08 10:49:18 -08:00
Nong Li
0df9476be1 Parquet data loading. 2014-01-08 10:48:48 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
5869611fb8 Add targeted cancellation tests 2014-01-08 10:47:07 -08:00