Currently, cancellation checking when a SortNode is executing only
happens when a batch is being added to the sorter (SortNode::SortInput()) or
when a batch is being retrieved from the sorter (SortNode::GetNext())
This fix passes in a RuntimeState into the Sorter instance itself, which
checks for cancellation at the following points:
i) During an in-memory sort (In Partition() and SortHelper()). In Partition(),
the cancellation check may be delayed if the input is completely sorted.
ii) During an intermediate merge before each batch of rows from a merge is
copied into a run.
Change-Id: I5c28c7244ee2e40627cf14542b99f872e3a8c343
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3007
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3059
The queries in test_cancellation are currently cancelled but not closed, causing some test
queries to eventually time out because the admission controller limits are passed. This
patch ensures that all queries issued in test_cancellation are closed.
Change-Id: I65b26672155e31889bb6f43d3ac87be0f7b4eb72
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2187
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2213
This is the first step in cleaning up the test logging. It provides a common connection
interface that provides tracing around all operations. When a test fails the output will
be executable SQL. It also logs actions such as when a connection is opened, close, or
when an operation is cancelled. Currently only beeswax connections are supported, but
I have a seperate patch that adds support for executing using HS2 as well as Beeswax.
Example of new logging:
-- connecting to: localhost:21000
-- executing against localhost:21000
use functional;
SET disable_codegen=False;
SET abort_on_error=1;
SET batch_size=0;
SET num_nodes=0;
-- executing against localhost:21000
select a.timestamp_col from alltypessmall a inner join alltypessmall b on
(a.timestamp_col = b.timestamp_col)
where a.year=2009 and a.month=1 and b.year=2009 and b.month=1;
-- closing connection to: localhost:21000
Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
A compute stats command computes the table and column stats for a given
table and persists them in the metastore.
The table stats consist of the per-partition and per-table row count.
The column stats are computed on a per-table basis and consist of the
number of distinct values and the number of NULLs per column.
This patch introduces a new 'child query' concept that
compute stats utilizes. Child queries are cancelled
if the parent query is cancelled. A compute stats stmt is
executed by the following query hirarchy:
parent: compute stats query (DDL)
- child: compute table stats query (QUERY)
- child: compute column stats query (QUERY)
The new child query concept is necessary to decouple child query fetches
from parent query fetches, i.e., we could not execute a child query as
part of the original compute stats query, because then a client could
fetch the results we need for updating the Metastore statistics. The
reason why our existing CTAS works without this decoupling
is that its insert 'child query' is not fetchable.
Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This change adds support for faster DDL via the CatalogServer by directly
returning the TCatalogObject from each catalog operation and using this result
to update the local impalad's catalog cache directly, rather than waiting
for a state store heartbeat that contains the change.
Because the Impalad's catalog can now be updated in two ways, it means that
we need to be careful when applying updates to ensure no work gets "undone".
For example, consider the following sequence of events:
t1: [Direct Update] - Add item A - (Catalog Version 9)
t2: [Direct Update] - Drop item A - (Catalog Version 10)
t3: [StateStore Update] - (From Catalog Version 9)
In this case, we need to ensure that the state store update in t3 does not undo the
drop in t2, even though that update will contain the change to "add item A".
To support this, we now check the catalog versions before adding any item to ensure
that an existing item does not overwrite an item with a newer catalog version.
To handle the case of removals, a new CatalogUpdateLog is introduced. This log tracks
the catalog version each item was removed from the catalog. When adding a new
catalog object, it is checked to see if this object was removed in a catalog version >
than the version of the current object. If so, the update is ignored.
This covers most updates, but there is still one concurrency issue that is not covered
with this change. If someone issues an "invalidate metadata" concurrently with a
direct catalog operation, it may briefly set the catalog back in time. This seems like
okay behavior to me (the command is invalidating the catalog metadata). If we want
to address this the CatalogUpdateLog could be extended to track additions to the catalog
and we could replay the log after invalidating the metadata (as one possible solution).
Change-Id: Icc9bdecc3c32436708bf9e9e7974f91d40e514f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/864
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
This updates the tests to run more test cases in parallel and also removes some
unneeded "invalidate metadata" calls. This cut down the 'serial' execution time
for me by 10+ minutes.
Change-Id: I04b4d6db508a26a1a2e4b972bcf74f4d8b9dde5a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/757
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>