impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Author	SHA1	Message	Date
Dan Hecht	1fee56cb26	IMPALA-1080: Implement "SET <query_option>" as SQL statement. Also add support for "SET", which returns a table of query options and their respective values. The front-end parses the option into a (key, value) pair and then the existing backend logic is used to set the option, or return the result sets. Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins (cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614	2014-07-25 10:25:09 -07:00
Alex Behm	e9864d5f78	Introduce type hierarchy and add complex types. This patch replaces ColumnType with a hierarchy of types that models the existing scalar types as well as the new complex types ARRAY, MAP, and STRUCT. Change-Id: Ia895f41153e99febb0c35412acac12689c3c2064 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3491 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3538	2014-07-21 20:00:46 -07:00
Lenni Kuff	7157f54bbe	Support DROP STATS <table name> Adds support for dropping all table and column stats from a table. Once incremental stats are supported, this will provide the user a way to force a recompute of all stats. Change-Id: I27e03d5986b64eb91852bfc3417ffa971d432d6b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3533 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit f1f074f24bfdc77c4cef147fe9d26f27df80ab81) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3551	2014-07-21 10:28:16 -07:00
Abdullah Yousufi	f4d1afe0ce	IMPALA-921: Change EXPLAIN_LEVEL value from 0 to 1 in impala-shell for SET command Change-Id: I2bfcefb5c8143d4cb4d74157c5309cd9445bac02 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3383 Reviewed-by: Abdullah Yousufi <abdullah.yousufi@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3499	2014-07-15 12:32:43 -07:00
Henry Robinson	dd4c1c32dc	Add optional RM reservation limit to memtrackers If RM and per-query memory limits were enabled at the same time, the per-query limit would be ignored if RM wanted to expand the memory allocation. This change adds an optional reservation limit to a memtracker. The original limit goes back to being a hard limit - i.e. any attempt to consume more than that amount results in failure. The RM reservation limit is the RM-allocated memory limit. If that is exceeded it triggers the ExpandRmReservation() method, which tries to retrieve more memory as long as the hard limit is observed. The net effect is that per-query memory limits have the intended, hard-limit effect, while the RM limits coexist nicely and can expand with more memory as required. At the same time, we change the precedence of various ways of suggesting an initial reservation size so that the user can change the reservation size via a query option (MEM_RESERVATION_SIZE). Change-Id: I41bfa4eb1336810a8a5946f6be3472111a052144 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3134 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-07-01 18:08:47 -07:00
Lenni Kuff	ad933ec765	Switch terminology of 'impersonated user' to 'delegated user' This is to help ensure naming is consistent across the platform and also avoid confusion with HS2 "impersonation" which is something very different. Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326	2014-06-28 20:46:06 -07:00
Henry Robinson	2a374e5893	Prepare resource broker for cancellation changes This patch anticipates the changes to Llama that allow a client-specified resource ID to be returned with every reservation or expansion request. Doing this allows us to remove the tricky coordination logic between WaitForNotification() and AMNotification() when we don't know which side will access the rendezvous data structures first. Now we can guarantee that the consumer-side will be set-up before the notification is received. Change-Id: I908b1dae8d074a84b0465e3a444d6651f126efd7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3093 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-06-21 00:08:19 -07:00
Victor Bittorf	2d7f2e19b2	IMPALA 938: Infer schema from Parquet file Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'". Supports all options that CREATE TABLE does. Currently only PARQUET is supported. Run testdata/bin/create-load-data.sh after pulling this patch. Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158	2014-06-20 17:38:01 -07:00
Paden Tomasello	dca53ce023	Changes to row-batch.cc and Data.thrift interface. This change will allow row-batch.cc to use LZ4 codec. It will be implemented in a following patch. Change-Id: I9302da1b72c83fcf8420724138d40ad0d82c554b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3030 Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3155	2014-06-19 12:53:41 -07:00
Alex Behm	ef6705d7e0	Rename MergeNode to UnionNode. Change-Id: I9e3675a103757db1345b04bd1d102d2719efddd0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3154 Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-06-19 12:44:21 -07:00
Alex Behm	677062be3d	Rework planning of unions s.t. a UnionStmt produces a single MergeNode. This patch changes the planning of a UnionStmt s.t. it always produces a single fragment with a MergeNode connecting all child fragments as its root. The data partition of the returned fragment and how the child fragments are merged depends on the data partitions of the child fragments: - All child fragments are unpartitioned or partitioned: The returned fragment is has a UNPARTITIONED or RANDOM data partition, respectively. The MergeNode absorbs the plan trees of all child fragments. - Mixed partitioned/unpartitioned child fragments: The returned fragment is RANDOM partitioned. The plan trees of all partitioned child fragments are absorbed into the MergeNode. All unpartitioned child fragments are connected to the MergeNode via a RANDOM exchange, and remain unchanged otherwise. Also adds support for random partitioned data exchanges. Change-Id: I82b2d12c104d98c4e7133234653ee1b67658ef7a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2876 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3143	2014-06-19 00:56:58 -07:00
Alex Behm	9dc883b140	IMPALA-1005: Print consistent plan fragment ids in explain plan and runtime profile. Change-Id: I63b59a896dc9dc0c9ed1d5e889f7b5626ba61202 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3037 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3124	2014-06-18 15:44:43 -07:00
Paden Tomasello	0326f17bb3	Adding Lz4 Codec. Change-Id: I037d4e0de3b2cd2b8582caea058c8e1f2f880ff3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3027 Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com> Tested-by: jenkins	2014-06-16 14:20:34 -07:00
Matthew Jacobs	dbe1b534ed	IMPALA-1050: NPE error when pool placement policy cannot map user to pool Change-Id: I53ed823ee55bee96269f4119af7da2dab25d4a7c Reviewed-on: http://gerrit.ent.cloudera.com:8080/3028 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 569bd5d4a8e30a907a33551c58a3ab80849b8dc9) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3061	2014-06-15 13:38:20 -07:00
Nong Li	5bbf006d19	Update parquet spec to 2.0 and add decimal logical type. Change-Id: I1a4cbe73a2494f8b2dd09f44bfcc0a019e710344 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3034 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-06-13 11:18:00 -07:00
Alex Behm	0251e5215c	Allow MergeNode with constant selects to run correctly on multiple fragment instances. Change-Id: I0b1ff27f591366b960aa944fadabbb4b35f4b9b4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2832 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3002	2014-06-12 16:39:55 -07:00
Skye Wanderman-Milne	1cc628d32d	IMPALA-950: Skip computing stats for decimal columns. This patch also adds a mechanism to return analysis warnings to client, which is used to log skipped decimal columns. Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984	2014-06-11 19:16:34 -07:00
Nong Li	5d903efca3	ExecSummary The runtime profile as we present it is not very useful and I think the structure of it makes it hard to consume. This patch adds a new client facing schemed set of counters that are collected from the runtime profiles. For example, with this structure it would be easy to have the shell get the stats of a running query and print a useful progress report or to check the most relevant metrics for diagnosing issues. Here's an example of the output for one of the tpch queries: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------ 09:MERGING-EXCHANGE 1 79.738us 79.738us 5 5 0 -1.00 B UNPARTITIONED 05:TOP-N 3 84.693us 88.810us 5 5 12.00 KB 120.00 B 04:AGGREGATE 3 5.263ms 6.432ms 5 5 44.00 KB 10.00 MB MERGE FINALIZE 08:AGGREGATE 3 16.659ms 27.444ms 52.52K 600.12K 3.20 MB 15.11 MB MERGE 07:EXCHANGE 3 2.644ms 5.1ms 52.52K 600.12K 0 0 HASH(o_orderpriority) 03:AGGREGATE 3 342.913ms 966.291ms 52.52K 600.12K 10.80 MB 15.11 MB 02:HASH JOIN 3 2s165ms 2s171ms 144.87K 600.12K 13.63 MB 941.01 KB INNER JOIN, BROADCAST \|--06:EXCHANGE 3 8.296ms 8.692ms 57.22K 15.00K 0 0 BROADCAST \| 01:SCAN HDFS 2 1s412ms 1s978ms 57.22K 15.00K 24.21 MB 176.00 MB tpch.orders o 00:SCAN HDFS 3 8s032ms 8s558ms 3.79M 600.12K 32.29 MB 264.00 MB tpch.lineitem l Change-Id: Iaad4b9dd577c375006313f19442bee6d3e27246a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2964 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-11 03:10:11 -07:00
Srinath Shankar	5755b0bdee	Order by without limit for Impala Enable order-by without limit Added BufferedBlockMgr to allocate buffers and spill to disk. Added Sorter for the external sort impelementation Added new SortNode execution node that completely sorts its input Changes to enable writing in IoMgr went in a separate patch. Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins Conflicts: testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-06-09 16:58:08 -07:00
Henry Robinson	60cbe1b0e1	IMPALA-741: Support partitions with non-existant HDFS locations If a partition had a location that did not exist in HDFS, Impala would refuse to load its metadata. This meant a typo could render a table unloadable. We fix this problem by removing the existence check from the frontend, and by inheriting access from the first extant parent of the partition directory. Fixing this exposed a second issue, where Impala wouldn't create directories for partitions in the right place after an INSERT if the partition location had been changed. To get this right we have to plumb the partition ID through to Coordinator::FinalizeSuccessfulInsert(), so that the coordinator can look up the partition's location from the query-wide descriptor table. As a by-product, this patch rationalises the per-partition, per-fragment statistics gathering a little bit by putting almost all the per-partition stats into TInsertPartitionStatus. Change-Id: I9ee0a1a1ef62cf28f55be3249e8142c362083163 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2851 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-06-08 18:44:45 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Lenni Kuff	c45e9a70d9	[CDH5] Add DDL support for HDFS caching This change adds DDL support for HDFS caching. The DDL allows the user to indicate a table or partition should be cached and which pool to cache the data into: * Create a cached table: CREATE TABLE ... CACHED IN 'poolName' * Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName' * Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED When a table/partition is marked as cached, a new HDFS caching request is submitted to cache the location (HDFS path) of the table/partition and the ID of that request is stored with in the table metadata (in the table properties). This is stored as: 'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS and persisted across HDFS restarts. When a cached table or partition is dropped it is important to uncache the cached data (drop the associated cache request). For partitioned tables, this means dropping all cache requests from all cached partitions in the table. Likewise, if a partitioned table is created as cached, new partitions should be marked as cached by default. It is desirable to know which cache pools exists early on (in analysis) so the query will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To support this, a new cache pool catalog object type was introduced. The catalog server caches the known pools (periodically refreshing the cache) and sends the known pools out in catalog updates. This allows impalads to perform analysis checks on cache pool existence going to HDFS. It would be easy to use this to add basic cache pool management in the future (ADD/DROP/SHOW CACHE POOL). Waiting for the table/partition to become cached may take a long time. Instead of blocking the user from access the time during this period we will wait for the cache requests to complete in the background and once they have finished the table metadata will be automatically refreshed. Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-27 16:47:15 -07:00
Dimitris Tsirogiannis	ca86e470de	IMPALA-887: Improve partition pruning time This commit is the first step in improving the performance of partition pruning. Currently, Impala can prune approximately 10K partitions per sec, thereby introducing significant overhead for huge table with a large number of partitions. With this commit we reduce that overhead by 3X by batching the partition pruning calls to the backend. Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687	2014-05-26 13:10:12 -07:00
Nong Li	723f583b4d	Allow adding predicates after processing build table. Change-Id: I4c845d9f08f0be29e548eceac3912871acd0270f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2658 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-22 13:09:51 -07:00
Lenni Kuff	83e239723f	Add TRole/TPrivilege structs to Thrift CatalogObjects These are used as our internal representation of the authorization policy metadata (as opposed to directly using the Sentry Thrift structs). Versioned/managed in the same way as other TCatalogObjects. Change-Id: Ia1ed9bd4e25e9072849edebcae7c2d3a7aed660d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2545 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit c89431775fcca19cdbeddba635b83fd121d39b04) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2646	2014-05-21 15:51:24 -07:00
Henry Robinson	e87c0eb22a	[CDH5] Detect pseudo-distributed Llama cluster Since we're no longer using the MiniLlama, we need to explicitly set whether or not the cluster is pseudo-distributed. Impala needs this information to correctly translate datanode addresses to a format that Llama understands. This change (adapted from one made by Casey) adds a method to the frontend (callable via JNI) to get a configuration value from the Hadoop configuration. We'll set that configuration value for local RM testing. Change-Id: Ifd51db98a993ac0270dac2b832babbc394483c1a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2549 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-05-20 21:24:33 -07:00
Matthew Jacobs	f9c9a7ca13	Add SHOW DATA SOURCES Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614	2014-05-19 17:52:27 -07:00
Matthew Jacobs	fb49706ec8	Add additional types to TColumnValue and fix field names Adds 8 and 16 byte integer values and a binary value to TColumnValue and fixes the field names. Change-Id: Ie318fe7dad43b0cc0032b65b6b04c3fe173ae9b8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2418 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 68c476822402d27d985ed78fa5d14a843b681082) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2493	2014-05-08 17:38:54 -07:00
Matthew Jacobs	ebc6c5894e	External Data Source: Frontend and catalog changes Initial frontend and catalog changes for external data sources. Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485	2014-05-08 14:56:19 -07:00
Matthew Jacobs	61b36a42bd	External Data Source: Few small API changes * Rename getStats() to prepare() * Adds TRowBatch.num_rows to indicate number of rows when no cols are materialized * Changes api and sample poms to produce source jars Change-Id: I02dcc89e27716978708386cfc3f7940ee5dbc023 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2406 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 2d7fcba8b7442b54a388f8b994d0cfa08940bbd7) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2434	2014-05-02 17:10:25 -07:00
Nong Li	03e5665e56	Decimal: Read/Write to parquet. This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and encoding for decimals. Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432	2014-05-02 16:38:35 -07:00
Matthew Jacobs	1f07f2d7ee	External Data Source: Thrift structure changes A few changes to the external data source thrift types: * Change RowBatch to return entire columns. Adds Data.TColumnData to represent an entire column. * Makes all fields in ExternalDataSource (except for status fields on the result structures) optional in case fields become deprecated in the future. * Adds a limit parameter to the TOpenParams structure in case the data source needs to apply the limit itself. Change-Id: I62db68bfb64d2190dfdd0c84be5925ad5db031ef Reviewed-on: http://gerrit.ent.cloudera.com:8080/2345 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit faf220d628359be1368f898493900fc2e2913c53) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2385 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-04-27 12:57:13 -07:00
Matthew Jacobs	25c0ebf58c	External Data Source: Public API Adds the thrift structures for the public external data source API and a new maven project containing the Java ExternalDataSource interface and the generated Java thrift classes. The ExternalDataSource.thrift structures can evolve in a backward compatible way. The ExternalDataSource Java interface will always contain a version number in the namespace (e.g. com.cloudera.impala.extdatasource.v1 for V1) so we can potentially make breaking changes to the interface in the future but still support older versions. A trivial implementation of the ExternalDataSource API is also added for testing purposes. TODO: Make the sample data source implementation realistic. Change-Id: I827d6420a87ed7a2bce34e050362ca98ddc5dbcc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2241 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit f29814e9ede9d4c889f2648606fcf511feeb47ae) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2313	2014-04-22 18:34:48 -07:00
Nong Li	1cab95066d	Add the return type as a column for SHOW FUNCTIONS. Also includes some misc pattern matching cleanup. Change-Id: I6c9ec78b094a73864b4d669afbd75a48c9bf9585 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2199 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2271	2014-04-17 17:58:13 -07:00
Matthew Jacobs	d0c353a9b4	IMPALA-922: Return helpful errors with Yarn group rules When the -fair_scheduler_allocation_path is configured with a policy that uses the "primaryGroup" Yarn queue allocation rule, Yarn throws an error if the user is not on the local OS. Currently the user will get an error message that says: "java.io.IOException: No groups found for user <username>". We now return a more helpful error message. Change-Id: I014ac15ef607e473957752f23af94d0cc4efec0f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2078 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 3cf37dc4e91afe887ada988f256b7008983580d2) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2244	2014-04-15 15:32:05 -07:00
Henry Robinson	99c37aac37	IMPALA-827: Add an option for directories created by INSERT to inherit their parent's permissions This patch adds --insert_inherit_permissions. If true, all new partition directories created by INSERT will inherit their permissions from their parent. When false, the directories are created with the default permissions. Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-04-04 10:25:20 -07:00
Lenni Kuff	fd174a5e69	[CDH5] Remove duplication of network addresses in HdfsTable (updated for HDFS caching) This is a port of 0b9134a from CDH4, but required some adjustments to work on CDH5 due to the HDFS caching work. The differences from CDH4/CDH5 are mainly in HdfsTable/HdfsPartition. I added a new BlockReplica class to represent a single block with info on the host index + caching info. This removes duplication TNetworkAddresses in the block location metadata of HdfsTable. Each HdfsTable now contains a list of TNetworkAddress and the BlockLocations just reference an index in this list to specify the host, rather than duplicating the TNetworkAddress. For a table with 100K blocks, this reduces the size of the THdfsTable struct by an additional ~50+% (on top of the duplicate file path changes). This takes the total size of the table from: 21.1MB -> 9.4MB (file path duplication) -> 4.2MB (host duplication) = ~80% total improvement. Change-Id: If7f11764dc0961376f9648779d253829f4cd83a2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1367 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1887 Reviewed-by: Nong Li <nong@cloudera.com>	2014-03-14 14:30:08 -07:00
Alex Behm	7fcd7cd64e	Add list of tables missing stats to explain header and mem-limit exceeded error. Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880	2014-03-12 21:15:22 -07:00
Matthew Jacobs	e817c3742c	Admission controller: fix a number of TODOs * Remove requirement that fair scheduler and Llama conf files be on the classpath if specified as relative paths. Now they can be specified as any relative or absolute path. * Add flags to disable all per-pool max requests limits or mem limits. * Rename RequestPoolUtils to RequestPoolService * Make it more clear RequestPoolService is a singleton by putting it in ExecEnv * FileWatchService: use Executors.newScheduledThreadPool instead of a thread * Moved MEGABYTE (and related constants) to new Constants class (frontend) * Test RequestPoolService: Removed AllocationFileLoaderServiceHelper, replaced with reflection Change-Id: Iadf79cf77a7894a469c3587d0019a6d0bee7e58f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1787 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit b9a167f6fdb4ab2595aca6035e1f9d926b909d94) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1858	2014-03-12 14:23:54 -07:00
Henry Robinson	da1c7d37ff	Add memory and VCPU expansion to RM-enabled queries * Each node has one QueryResourceMgr per query it is running fragments for. A QueryResourceMgr handles creating expansion RPC requests, and monitoring the thread:VCPU ratio for each query (and requesting more VCPUs from YARN if oversubscribed). * MemTrackers now have an ExpandLimit API which does nothing unless they have a QueryResourceMgr. This method blocks for now, but when the IO manager changes its API to use TryConsume(), we'll need to issue these asynchronously to avoid keeping hold of a thread. * ResourceBroker etc. got updated to support the Expansion API. Change-Id: Ia3c4635497f0563cfc5cd0e330e5f1f586577200 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1800 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-03-07 08:58:05 -08:00
Matthew Jacobs	989830186f	Remove RM pool configuration and yarn_pool query option/profile property Admission control adds support for configuring pools via a fair scheduler allocation configuration, so the pool configuration mechanism is no longer needed. This also renames the "yarn_pool" query option to the more general "request_pool" as it can also be used to configure the admission controller when RM/Yarn is not used. Similarly, the query profile shows the pool as "Request Pool" rather than "Yarn Pool". Change-Id: Id2cefb77ccec000e8df954532399d27eb18a2309 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1668 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 8d59416fb519ec357f23b5267949fd9682c9d62f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1759	2014-03-06 14:46:09 -08:00
Matthew Jacobs	41d90312fa	Admission controller: user to pool resolution, authorization, and pool configs Adds RequestPoolUtils which exposes user to pool resolution, authorization, and relevant pool configurations by wrapping Yarn classes that provide that functionality. (To support CDH4, those Yarn classes will come from thirdparty/cdh4-extras.) RequestPoolUtils is created once by the backend and the instance lives for the duration of the process. Change-Id: I53db075555578614356d33f9d939c5378b9ec797 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1566 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e385bdb54ed97e567c672a76723936c24cfe45f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1758	2014-03-06 14:21:31 -08:00
Skye Wanderman-Milne	6ceed1e632	UDF API additions This patch introduces the ability to specify a prepare and close function for a UDF, as well as FunctionContext methods for maintaining state across UDF invocations within a query. Many of the changes are related to adding an Expr::Open() function which calls the UDF's prepare function, if specified (it has to be called in Open() since the LLVM module must be compiled first). Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757	2014-03-05 07:32:34 -08:00
Nong Li	f0a67153d3	Decimal analysis changes. Change-Id: Ib7d6a6a7650cc9058ff1486fc7546ab66c698d46 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1734 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-03 21:15:00 -08:00
Matthew Jacobs	b879b4c2e4	Admission controller: Separate TPoolStats mem_usage and mem_estimate Change-Id: I521de3a99faca3aaf10e3900a4a12b0d2fa7a0f3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1704 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit b8fa9c0bf7b555d36180be42c89cd4d7f6b8ec7b) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1737	2014-03-03 19:44:51 -08:00
Nong Li	309ab4df0d	Update backend to support hdfs caching. Change-Id: I22761c8893c8fd222564d4e2a97bfba1284cd741 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1724 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-02 00:36:33 -08:00
Lenni Kuff	d6cbd3dc44	Ensure db/table names are always case insensitive in catalog topic entry keys This fixes a bug that can happen with 'invalidate metadata <table name>' if the following sequences of events happens: 1) Table is created in Impala (table names are always treated as lower case) 2) Table is dropped and re-created in Hive, using the same name but different casing 3) invalidate metadata <table name> is run in Impala, which will update the existing table with the version from the Hive metastore. When building the next statestore update, the catalog server will send an update out thinking that the table from 1) was dropped and the table from 3) was added because the topic entry key is case sensitive. This may incorrectly remove the table from an impalad's catalog. The fix is to always treat db/table names as case insensitive. Change-Id: Ib59edc403989781bf12e0405c0ccd37b8e41ee41 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1634 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/1637	2014-02-23 00:20:16 -08:00
Matthew Jacobs	af84be67dd	Admission controller: add memory limits in addition to number of requests Adds the ability to set per-pool memory limits. Each impalad tracks the memory used by queries in each pool; a per-pool memory tracker is added between the per-query trackers and the process memory tracker. The current memory usage is disseminated via statestore heartbeats (along with the other per-pool stats) and a cluster-wide estimate of the pool memory usage is updated when topic updates are received. The admission controller will not admit incoming requests if the total memory usage is already over the configured pool limit. Change-Id: Ie9bc82d99643352ba77fb91b6c25b42938b1f745 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1508 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit 64a137930a318e56a7090a317e6aa5df67ea72cd) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1623 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-02-20 14:19:34 -08:00
Henry Robinson	bb028a54b7	IMPALA-809: Concurrently received statestore heartbeats are no longer an error This patch fixes a problem observed when a subscriber was processing a heartbeat, and while doing so tried to re-register with the statestore. The statestore would schedule a heartbeat for the new registration, but the subscriber would return an error, thinking that it was still re-registering (see UpdateState() for the try_lock logic that gave rise to this error). The statestore, upon receiving the error, would update its failure detector and eventually mark the subscriber as failed, unnecessarily forcing a re-registration loop. This only regularly happens when UpdateState() takes a long time, i.e. when a subscriber callback takes a while. This patch also adds metrics to measure the amount of time callbacks take. Change-Id: I157cdfd550279a6942e7ca54fe622520c8ad5dcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1574 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com> (cherry picked from commit bc0a8819e754623bc9e5e5ab805369ad8381e5b9) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1610	2014-02-19 17:34:37 -08:00
Nong Li	0d2919fe7f	Refactor scalar and aggregate function analysis and execution. This patch cleans up analysis and execution of scalar and aggregate functions so that there is no difference between how builtins and user functions are handled. The only difference is that the catalog is populated with the builtins all the time. The BE always gets a TFunction object and just executes it (builtins will have an empty hdfs file location). This removes the opcode registry and all of the functionality is subsumed by the catalog, most of which was already duplicated there anyway. This also introduces the concept of a system database; databases that the user cannot modify and is populated automatically on startup. Change-Id: Iaa3f84dad0a1a57691f5c7d8df7305faf01d70ed Reviewed-on: http://gerrit.ent.cloudera.com:8080/1386 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1577	2014-02-18 18:40:08 -08:00

1 2 3 4 5 ...

268 Commits