impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 21:00:54 -05:00

Author	SHA1	Message	Date
Lars Volker	4fc7f15376	IMPALA-2862: Fix regex parsing in test result verifier Test results can be verified using regular expressions. The extraction of the regular expression substring from the expected test results had a bug where only the first character of an expression was considered. This lead to wrong but undetected test results. Change-Id: Ia670da6e0758455a86dc44744b96b9465d890af3 Reviewed-on: http://gerrit.cloudera.org:8080/1818 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-02-02 21:55:57 +00:00
Bharath Vissapragada	aed3505c8d	IMPALA-1651: CREATE TABLE LIKE shouldn't inherit hdfs caching settings from source table Change-Id: Ia5dba8ac463d088b50e1d16a7b5db1941d7c6989 Reviewed-on: http://gerrit.cloudera.org:8080/1917 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-01-28 13:40:32 +00:00
Michael Ho	eb479d040e	IMPALA-1714: Check if tables or partitions specified in "CREATE/ALTER TABLE ... CACHE IN ..." statements are cacheable. This change adds more analysis checks to verify the location of the table or partition specified in a "CREATE/ALTER TABLE ... CACHED IN ..." statement can actually be cached. Caching is only supported for HDFS locations. If table-wide caching is enabled for a table, adding a partition at an uncacheable location will be disallowed for that table unless the attribute "UNCACHED" is explicitly specified. Enabling table-wide caching for a table at an uncacheable location or a table with partitions at uncacheable locations will also be disallowed. However, caching can still be enabled for individual partitions whose underlying locations support caching. Change-Id: I2299c9285126f4b035360f2ef902147188ccd5f1 Reviewed-on: http://gerrit.cloudera.org:8080/1373 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-11-06 23:30:14 +00:00
Henry Robinson	f22b8659fd	IMPALA-1595: Add 'location' to SHOW [TABLE STATS\|PARTITIONS] for HDFS tables This patch adds a 'location' column to the output of SHOW TABLE STATS / SHOW PARTITIONS. This helps users understand the effects of ALTER TABLE SET LOCATION commands, particularly for partitions, and is easier to identify than the output of DESCRIBE FORMATTED. Some existing tests in alter-table.test have been updated to include checking the location output before and after a SET LOCATION command. The tests in show.test have also been updated to check for the location; all other tests that use SHOW [TABLE STATS\|PARTITIONS] use a generic regex to avoid overly verbose tests. Change-Id: I9d276f7b133c38c9319e0906397ca1c31cec95bb Reviewed-on: http://gerrit.cloudera.org:8080/316 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-04-21 19:27:50 +00:00
Martin Grund	fdafbc5709	IMPALA-1645 and IMPALA-1632: Verify Cache Directives When a table is loaded in the catalog, we will now perform a check to verify that the cache directive ID and cache replication factor is still valid and the data is current. If the cache directive does no longer exist, we issue a error message and mark the table / partition as uncached. Furthermore, the replication factor is updated with the information from the actual cache directive. In the case of insert statement there is a special situation as the catalog update is happening synchronously and will try to access the cache directive information that might be stale. Thus in this insert path, we catch the possible not found exception and reset the caching information. Change-Id: I882041ce5395b8a3d17e9fc2750053393340df65 Reviewed-on: http://gerrit.cloudera.org:8080/40 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-02-11 03:35:46 +00:00
Martin Grund	cee1e84c1e	IMPALA-1587: Extending caching directives for multiple replicas This patch adds the possibility to specify the number of replicas that should be cached in main memory. This can be useful in high QPS scenarios as the majority of the load is no longer the single cached replica, but a set of cached replicas. While the cache replication factor can be larger than the block replication factor on disk, the difference will be ignored by HDFS until more replicas become available. This extends the current syntax for specifying the cache pool in the following way: cached in 'poolName' is extended with the optional replication factor cached in 'poolName' with replication = XX By default, the cache replication factor is set to 1. As this value is not yet configurable in HDFS it's defined as a constant in the JniCatalog thrift specification. If a partitioned table is cached, all its child partitions inherit this cache replication factor. If child partitions have a custom cache replication factor, changing the cache replication factor on the partitioned table afterwards will overwrite this custom value. If a new partition is added to the table, it will again inherit the cache replication factor of the parent independent of the cache pool that is used to cache the partition. To review changes and status of the replication factor for tables and partitions the replication factor is part of output of the "show partitions" command. Change-Id: I2aee63258d6da14fb5ce68574c6b070cf948fb4d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5533 Tested-by: jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-01-26 20:30:59 -08:00
Lenni Kuff	745c091fcc	[CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-28 08:54:54 -07:00
Lenni Kuff	c45e9a70d9	[CDH5] Add DDL support for HDFS caching This change adds DDL support for HDFS caching. The DDL allows the user to indicate a table or partition should be cached and which pool to cache the data into: * Create a cached table: CREATE TABLE ... CACHED IN 'poolName' * Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName' * Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED When a table/partition is marked as cached, a new HDFS caching request is submitted to cache the location (HDFS path) of the table/partition and the ID of that request is stored with in the table metadata (in the table properties). This is stored as: 'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS and persisted across HDFS restarts. When a cached table or partition is dropped it is important to uncache the cached data (drop the associated cache request). For partitioned tables, this means dropping all cache requests from all cached partitions in the table. Likewise, if a partitioned table is created as cached, new partitions should be marked as cached by default. It is desirable to know which cache pools exists early on (in analysis) so the query will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To support this, a new cache pool catalog object type was introduced. The catalog server caches the known pools (periodically refreshing the cache) and sends the known pools out in catalog updates. This allows impalads to perform analysis checks on cache pool existence going to HDFS. It would be easy to use this to add basic cache pool management in the future (ADD/DROP/SHOW CACHE POOL). Waiting for the table/partition to become cached may take a long time. Instead of blocking the user from access the time during this period we will wait for the cache requests to complete in the background and once they have finished the table metadata will be automatically refreshed. Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-27 16:47:15 -07:00

8 Commits