impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Author	SHA1	Message	Date
Srinath Shankar	0df773eed6	Check RuntimeState for cancellation in sorter. Currently, cancellation checking when a SortNode is executing only happens when a batch is being added to the sorter (SortNode::SortInput()) or when a batch is being retrieved from the sorter (SortNode::GetNext()) This fix passes in a RuntimeState into the Sorter instance itself, which checks for cancellation at the following points: i) During an in-memory sort (In Partition() and SortHelper()). In Partition(), the cancellation check may be delayed if the input is completely sorted. ii) During an intermediate merge before each batch of rows from a merge is copied into a run. Change-Id: I5c28c7244ee2e40627cf14542b99f872e3a8c343 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3007 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3059	2014-06-14 17:48:40 -07:00
ishaan	6f416dd2c2	Close all queries in test_cancellation The queries in test_cancellation are currently cancelled but not closed, causing some test queries to eventually time out because the admission controller limits are passed. This patch ensures that all queries issued in test_cancellation are closed. Change-Id: I65b26672155e31889bb6f43d3ac87be0f7b4eb72 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2187 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2213	2014-04-13 17:45:51 -07:00
Lenni Kuff	6afea60704	Update test logging to print executable SQL statements and log all actions executed This is the first step in cleaning up the test logging. It provides a common connection interface that provides tracing around all operations. When a test fails the output will be executable SQL. It also logs actions such as when a connection is opened, close, or when an operation is cancelled. Currently only beeswax connections are supported, but I have a seperate patch that adds support for executing using HS2 as well as Beeswax. Example of new logging: -- connecting to: localhost:21000 -- executing against localhost:21000 use functional; SET disable_codegen=False; SET abort_on_error=1; SET batch_size=0; SET num_nodes=0; -- executing against localhost:21000 select a.timestamp_col from alltypessmall a inner join alltypessmall b on (a.timestamp_col = b.timestamp_col) where a.year=2009 and a.month=1 and b.year=2009 and b.month=1; -- closing connection to: localhost:21000 Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:40 -08:00
Alex Behm	93e5b262c2	Added COMPUTE STATS command for gathering table and column stats. A compute stats command computes the table and column stats for a given table and persists them in the metastore. The table stats consist of the per-partition and per-table row count. The column stats are computed on a per-table basis and consist of the number of distinct values and the number of NULLs per column. This patch introduces a new 'child query' concept that compute stats utilizes. Child queries are cancelled if the parent query is cancelled. A compute stats stmt is executed by the following query hirarchy: parent: compute stats query (DDL) - child: compute table stats query (QUERY) - child: compute column stats query (QUERY) The new child query concept is necessary to decouple child query fetches from parent query fetches, i.e., we could not execute a child query as part of the original compute stats query, because then a client could fetch the results we need for updating the Metastore statistics. The reason why our existing CTAS works without this decoupling is that its insert 'child query' is not fetchable. Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd Reviewed-on: http://gerrit.ent.cloudera.com:8080/873 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:14 -08:00
Lenni Kuff	35817f6a17	Support faster DDL operations via the CatalogServer This change adds support for faster DDL via the CatalogServer by directly returning the TCatalogObject from each catalog operation and using this result to update the local impalad's catalog cache directly, rather than waiting for a state store heartbeat that contains the change. Because the Impalad's catalog can now be updated in two ways, it means that we need to be careful when applying updates to ensure no work gets "undone". For example, consider the following sequence of events: t1: [Direct Update] - Add item A - (Catalog Version 9) t2: [Direct Update] - Drop item A - (Catalog Version 10) t3: [StateStore Update] - (From Catalog Version 9) In this case, we need to ensure that the state store update in t3 does not undo the drop in t2, even though that update will contain the change to "add item A". To support this, we now check the catalog versions before adding any item to ensure that an existing item does not overwrite an item with a newer catalog version. To handle the case of removals, a new CatalogUpdateLog is introduced. This log tracks the catalog version each item was removed from the catalog. When adding a new catalog object, it is checked to see if this object was removed in a catalog version > than the version of the current object. If so, the update is ignored. This covers most updates, but there is still one concurrency issue that is not covered with this change. If someone issues an "invalidate metadata" concurrently with a direct catalog operation, it may briefly set the catalog back in time. This seems like okay behavior to me (the command is invalidating the catalog metadata). If we want to address this the CatalogUpdateLog could be extended to track additions to the catalog and we could replay the log after invalidating the metadata (as one possible solution). Change-Id: Icc9bdecc3c32436708bf9e9e7974f91d40e514f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/864 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:58 -08:00
Lenni Kuff	d698881f71	Improve test run throughput by executing more tests in parallel This updates the tests to run more test cases in parallel and also removes some unneeded "invalidate metadata" calls. This cut down the 'serial' execution time for me by 10+ minutes. Change-Id: I04b4d6db508a26a1a2e4b972bcf74f4d8b9dde5a Reviewed-on: http://gerrit.ent.cloudera.com:8080/757 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:46 -08:00
Nong Li	78539ee531	Allow insert cancellation test to fail due to IMPALA-551 Change-Id: I5d98be1cc503cc51206051a7c6a493bf884ab5b3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/594 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:10 -08:00
Nong Li	b5ca38cf7e	IMPALA-551: Fix file cleanup in hdfs-table-sink on early error cases. Change-Id: I50324381e5ddbf8d80dd72d27b16b5255bf5f985 Reviewed-on: http://gerrit.ent.cloudera.com:8080/492 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:57 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Lenni Kuff	73a9c5f024	Disable open file verification for INSERT cancel tests due to IMPALA-551 Change-Id: Ib11e4ac7e5161faa9c45872afafd88105e32433b Reviewed-on: http://gerrit.ent.cloudera.com:8080/326 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:28 -08:00
Lenni Kuff	04af2828c3	Add insert cancellation test, dynamically build failpoint dimensions from explain output Change-Id: I43fc5860edef6d47698c6b3edbb623dcc7fd37c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/297 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:27 -08:00
Skye Wanderman-Milne	f87878f111	IMPALA-412: Impala might hang when an impalad dies during query execution Allows queries to be cancelled while fetching rows and fixes some test bugs.	2014-01-08 10:51:51 -08:00
Nong Li	f60f2d3e50	Implement support for grouped scan ranges in io mgr and integration with parquet.	2014-01-08 10:49:18 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	5869611fb8	Add targeted cancellation tests	2014-01-08 10:47:07 -08:00

16 Commits