impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	7faaa65996	Added order by query tests - Added static order by tests to test_queries.py and QueryTest/sort.test - test_order_by.py also contains tests with static queries that are run with multiple memory limits. - Added stress, scratch disk and failpoints tests - Incorporated Srinath's change that copied all order by with limit tests into the top-n.test file Extra time required: Serial: scratch disk: 42 seconds test queries sort : 77 seconds test sort: 56 seconds sort stress: 142 seconds TOTAL: 5 min 17 seconds Parallel(8 threads): scratch disk: 40 seconds test queries sort: 42 seconds test sort: 49 seconds sort stress: 93 seconds TOTAL: 3 min 44 sec Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205	2014-06-20 13:35:10 -07:00
Lenni Kuff	c45e9a70d9	[CDH5] Add DDL support for HDFS caching This change adds DDL support for HDFS caching. The DDL allows the user to indicate a table or partition should be cached and which pool to cache the data into: * Create a cached table: CREATE TABLE ... CACHED IN 'poolName' * Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName' * Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED When a table/partition is marked as cached, a new HDFS caching request is submitted to cache the location (HDFS path) of the table/partition and the ID of that request is stored with in the table metadata (in the table properties). This is stored as: 'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS and persisted across HDFS restarts. When a cached table or partition is dropped it is important to uncache the cached data (drop the associated cache request). For partitioned tables, this means dropping all cache requests from all cached partitions in the table. Likewise, if a partitioned table is created as cached, new partitions should be marked as cached by default. It is desirable to know which cache pools exists early on (in analysis) so the query will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To support this, a new cache pool catalog object type was introduced. The catalog server caches the known pools (periodically refreshing the cache) and sends the known pools out in catalog updates. This allows impalads to perform analysis checks on cache pool existence going to HDFS. It would be easy to use this to add basic cache pool management in the future (ADD/DROP/SHOW CACHE POOL). Waiting for the table/partition to become cached may take a long time. Instead of blocking the user from access the time during this period we will wait for the cache requests to complete in the background and once they have finished the table metadata will be automatically refreshed. Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-27 16:47:15 -07:00
Lenni Kuff	bb09b5270f	IMPALA-839: Update tests to be more thorough when run exhaustively Some tests have constraints that were there only to help reduce runtime which reduces coverage when running in exhaustive mode. The majority of the constraints are because it adds no value to run the test across additional dimensions (or it is invalid to run with those dimensions). Updates the tests that have legitimate constraints to use two new helper methods for constraining the table format dimension: create_uncompressed_text_dimension() create_parquet_dimension() These will create a dimension that will produce a single test vector, either uncompressed text or parquet respectively. Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290	2014-04-18 20:11:31 -07:00
ishaan	098ad99b82	Skip the invalidate metadata stress test until the race in the catalog server is resolved. Change-Id: I71911078d274f894f5a28c0e7123e5e5ac8dc940 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1507 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1702 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>	2014-02-28 10:42:48 -08:00
Lenni Kuff	7a6892dcbe	Fix race when invalidating catalog metadata and loading a new table There was race when the catalog was invalidated at the same time a table was being loaded. This is because an uninitialized Table was being returned unexpectedly to the impalad due to the concurrent invalidate. This fixes the problem by updating the CatalogObjectCache to load when a catalog object is uninitialized, rather than load when null. New items can now be added in a initialized or uninitialized state; uninitialized objects are loaded on access. Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh In addition, it cleans up the locking in the Catalog to make it more straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog and this lock is used to protect the catalogVersion_. Operations that need to perform an atomic bulk catalog operation can use this lock (such as when the CatalogServer needs to take a snapshot of the catalog to calculate what delta to send to the statestore). Otherwise, the lock is not needed and objects are protected by the synchronization at each level in the object heirarchy (Db->[Function/Table]). That is, Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized independently. Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418	2014-01-31 16:16:32 -08:00
Lenni Kuff	409d2ae5d7	Migrate run-test to python and add mini-stress test as part of buildall	2014-01-08 10:47:34 -08:00

6 Commits