impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 00:02:28 -05:00

Author	SHA1	Message	Date
Henry Robinson	dd4c1c32dc	Add optional RM reservation limit to memtrackers If RM and per-query memory limits were enabled at the same time, the per-query limit would be ignored if RM wanted to expand the memory allocation. This change adds an optional reservation limit to a memtracker. The original limit goes back to being a hard limit - i.e. any attempt to consume more than that amount results in failure. The RM reservation limit is the RM-allocated memory limit. If that is exceeded it triggers the ExpandRmReservation() method, which tries to retrieve more memory as long as the hard limit is observed. The net effect is that per-query memory limits have the intended, hard-limit effect, while the RM limits coexist nicely and can expand with more memory as required. At the same time, we change the precedence of various ways of suggesting an initial reservation size so that the user can change the reservation size via a query option (MEM_RESERVATION_SIZE). Change-Id: I41bfa4eb1336810a8a5946f6be3472111a052144 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3134 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-07-01 18:08:47 -07:00
Nong Li	5d903efca3	ExecSummary The runtime profile as we present it is not very useful and I think the structure of it makes it hard to consume. This patch adds a new client facing schemed set of counters that are collected from the runtime profiles. For example, with this structure it would be easy to have the shell get the stats of a running query and print a useful progress report or to check the most relevant metrics for diagnosing issues. Here's an example of the output for one of the tpch queries: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------ 09:MERGING-EXCHANGE 1 79.738us 79.738us 5 5 0 -1.00 B UNPARTITIONED 05:TOP-N 3 84.693us 88.810us 5 5 12.00 KB 120.00 B 04:AGGREGATE 3 5.263ms 6.432ms 5 5 44.00 KB 10.00 MB MERGE FINALIZE 08:AGGREGATE 3 16.659ms 27.444ms 52.52K 600.12K 3.20 MB 15.11 MB MERGE 07:EXCHANGE 3 2.644ms 5.1ms 52.52K 600.12K 0 0 HASH(o_orderpriority) 03:AGGREGATE 3 342.913ms 966.291ms 52.52K 600.12K 10.80 MB 15.11 MB 02:HASH JOIN 3 2s165ms 2s171ms 144.87K 600.12K 13.63 MB 941.01 KB INNER JOIN, BROADCAST \|--06:EXCHANGE 3 8.296ms 8.692ms 57.22K 15.00K 0 0 BROADCAST \| 01:SCAN HDFS 2 1s412ms 1s978ms 57.22K 15.00K 24.21 MB 176.00 MB tpch.orders o 00:SCAN HDFS 3 8s032ms 8s558ms 3.79M 600.12K 32.29 MB 264.00 MB tpch.lineitem l Change-Id: Iaad4b9dd577c375006313f19442bee6d3e27246a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2964 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-11 03:10:11 -07:00
Srinath Shankar	5755b0bdee	Order by without limit for Impala Enable order-by without limit Added BufferedBlockMgr to allocate buffers and spill to disk. Added Sorter for the external sort impelementation Added new SortNode execution node that completely sorts its input Changes to enable writing in IoMgr went in a separate patch. Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins Conflicts: testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-06-09 16:58:08 -07:00
Henry Robinson	99c37aac37	IMPALA-827: Add an option for directories created by INSERT to inherit their parent's permissions This patch adds --insert_inherit_permissions. If true, all new partition directories created by INSERT will inherit their permissions from their parent. When false, the directories are created with the default permissions. Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-04-04 10:25:20 -07:00
Matthew Jacobs	989830186f	Remove RM pool configuration and yarn_pool query option/profile property Admission control adds support for configuring pools via a fair scheduler allocation configuration, so the pool configuration mechanism is no longer needed. This also renames the "yarn_pool" query option to the more general "request_pool" as it can also be used to configure the admission controller when RM/Yarn is not used. Similarly, the query profile shows the pool as "Request Pool" rather than "Yarn Pool". Change-Id: Id2cefb77ccec000e8df954532399d27eb18a2309 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1668 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 8d59416fb519ec357f23b5267949fd9682c9d62f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1759	2014-03-06 14:46:09 -08:00
Nong Li	309ab4df0d	Update backend to support hdfs caching. Change-Id: I22761c8893c8fd222564d4e2a97bfba1284cd741 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1724 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-02 00:36:33 -08:00
Alex Behm	dc7b398bd3	Impala reserves resources from YARN via LLama. Impala reserves resources from YARN via Llama and handles resources preemptions by cancelling affected queries. Adds the Impala Resource Broker for interacting with Llama. Refactors scheduler and coordinator to move fragment-to-host assignment logic into scheduler. Local test setup uses MiniLLama. Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1	2014-01-15 15:12:04 -08:00
Lenni Kuff	9717b7af28	Rename SYNCED_DDL query option to SYNC_DDL Change-Id: I0b5e08694a271c40ac55d8e695cf3a74a012ce06 Reviewed-on: http://gerrit.ent.cloudera.com:8080/972 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:11 -08:00
Lenni Kuff	39f77b8b8f	Add support for cluster-synchronized catalog operations This change adds support for cluster-synchronized catalog operations. This provides the guaranteethat after a catalog op completes, all other subscribers to the catalog topic have also processed that update. This is useful when load balancing, because a common workflow is to target a different impalad for each statement executed. For example if each of the following were executed sequentially, but targeting a different node: 1) CREATE TABLE Foo 2) INSERT INTO Foo 3) SELECT * FROM Foo 4) INSERT INTO Foo .... Since both the INSERT and the CREATE update the catalog, it would not work as expected without this patch. The user might either get a "table not found" error or would be missing partition information from the INSERT. The downside is that this approach to DDL takes a bit longer because we need to wait until all subscribers have processed an update. If all nodes are healthy, this overhead should not be significantly longer than the current DDL time. However, a single bad node might slow down or completely block the completion of all DDL operations. By default this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1 To test this, the base test suite was updated to support selecting a random impalad to execute each query section in a query test file. This is currently only enabled for the insert and DDL tests, but could be leveraged by more tests in the future. TODO: Add additional failure tests around this functionality. TODO: Add an explicit "sync" statement so users do not need to run all their DDL in this mode (since it is slower). Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83 Reviewed-on: http://gerrit.ent.cloudera.com:8080/899 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:58 -08:00
Henry Robinson	89a0beb56a	IMPALA-449: Better cleanup after an INSERT fails This patch goes some way to improving recovery after an INSERT fails. Inserts now write intermediate results to <table_dir>/.impala_insert_staging. After execution completes, either successfully or not, the query-specific directory under that directory is deleted. This doesn't complete the job for better cleanup (although this goes as far as IMPALA-449 suggests). Two things to do in the future: * Have each backend delete its own staging files on error. The difficulty getting there now is that backends don't know if they are cancelled in error or because a LIMIT was reached. * If the operation to move files to their final destinations should fail during FinalizeQuery(), the coordinator should perform compensation actions and delete the files that made it. Note: We also considered a query-wide and impalad-wide option to change the staging dir. There are advantages to this (all intermediate results go to a known location which is easy to clean up on failure), but also security and other operational concerns. Worth revisiting in the future. Change-Id: Ia54cf36db6a382e359877f87d7d40aad7fdb77be Reviewed-on: http://gerrit.ent.cloudera.com:8080/670 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:37 -08:00
Alex Behm	4bb8b38cde	Added stats and cost estimates to explain output. Change-Id: I1273745a439fd25cefa4e08ecc075c98cc8bfc45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/602 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:22 -08:00
Nong Li	2cf314262f	IMPALA-582, IMPALA-494: Fix parquet writer to allow multiple files per partition. Added parquet_file_size query option. Change-Id: I860dc8ab858622976402233229c365112bf081bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/477 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:49 -08:00
Lenni Kuff	d84a33efa7	Make ABORT_ON_ERROR=true the default query option value	2014-01-08 10:51:46 -08:00
Alan Choi	ecee109e68	IMPALA-387 Add refresh/invalidate SQL	2014-01-08 10:51:25 -08:00
Alan Choi	b71357fc28	IMPALA-387 Reuse Hdfs and Hive metastore metadata to perform a fast incremental refresh	2014-01-08 10:51:17 -08:00
Alan Choi	0d662a3c8c	IMPALA-381 add HBase query options for setCaching and setCacheBlocks	2014-01-08 10:51:07 -08:00
Nong Li	7c6598066c	Add testing for different compression codecs with parquet.	2014-01-08 10:51:04 -08:00
Lenni Kuff	ff2ae70b27	IMPALA-232: Clarify Impala shell's "version" cmd returns the shell version and also get server version	2014-01-08 10:50:06 -08:00
Lenni Kuff	9559f2a3d9	IMP-861: Enable refreshing a specific table name	2014-01-08 10:50:01 -08:00
Alan Choi	991db9001b	IMPALA-113 Raise error when default order by limit is exceeded	2014-01-08 10:49:03 -08:00
Lenni Kuff	0bcb54fcf8	Add GetRuntimeProfile RPC and enable printing runtime profile from impala-shell	2014-01-08 10:48:44 -08:00
Marcel Kornacker	77f4fc8cf9	Adding memory limits - new class MemLimit - new query flag MEM_LIMIT - implementation of impalad flag mem_limit Still missing: - parsing a mem limit spec that contains "M/G", as in: 1.25G	2014-01-08 10:48:33 -08:00
Marcel Kornacker	63e3cd0279	Adding query option DEBUG_ACTION	2014-01-08 10:47:37 -08:00
Alan Choi	be98df19c8	HiveServer2 This patch implements the HiveServer2 API. We have tested it with Lenni's patch against the tpch workload. It has also been tested manually against Hive's beeline with queries and metadata operations. All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax code is refactored to impala-beeswax-server.cc. HiveServer2 has a few more metadata operations. These operations go through impala-hs2-server to ddl-executor and then to FE. The logics are implemented in fe/src/main/java/com/cloudera/impala/service/MetadataOp.java. Because of the Thrift union issue, I have to modify the generated c++ file. Therefore, all the HiveServer2 thrift generated c++ code are checked into be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove these files. Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881	2014-01-08 10:47:24 -08:00
Henry Robinson	7ba437a52e	Code changes to build against thrift 0.9.0 in thirdparty/	2014-01-08 10:47:22 -08:00
Alan Choi	ff704ce586	IMP-690: impala-shell calls PingImpalaService thrift API to verify the connected server is an impalad.	2014-01-08 10:47:13 -08:00
Marcel Kornacker	bf56c21c1b	IMP-618 Adding DEFAULT_ORDER_BY_LIMIT query option. Also removing deprecated PARTITION_AGG query option.	2014-01-08 10:47:04 -08:00
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
Alan Choi	0ce8a044e3	Disable RC/Trevni (with option to allow it); remove file_buffer_size IMP-336: remove file_buffer_size query options Add "allow_unsupported_formats" query options to allow RC/Trevni in our test; disabled by default	2014-01-08 10:46:02 -08:00
Marcel Kornacker	5984c0be52	First cut of partitioned plan generation: - created new class PlanFragment, which encapsulates everything having to do with a single plan fragment, including its partition, output exprs, destination node, etc. - created new class DataPartition - explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints - Adding IdGenerator class. - moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for PlanFragment.getExplainString() - Changed planner interface to return scan ranges with a complete list of server locations, instead of making a server assignment. Also included: cleaned up AggregateInfo: - the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation. - moved analysis functionality into AggregateInfo Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).	2014-01-08 10:45:56 -08:00
Henry Robinson	e7348a209b	IMP-232: Parallel INSERT OVERWRITE	2014-01-08 10:45:04 -08:00
Marcel Kornacker	c18d0970d7	Changed RuntimeProfile::PrettyPrint() and Coordinator::BackendExecState::GetNodeThroughput() not to hold locks while they make function calls. Changed Frontend.assignIds() to use UUID.randomUUID() to generate the query id.	2014-01-08 10:44:46 -08:00
Nong Li	81bba16dac	Parallel scanners.	2014-01-08 10:44:38 -08:00
Henry Robinson	e5893064b0	Fix build failure	2014-01-08 10:44:37 -08:00
Henry Robinson	fb681fba4e	Simple Python shell for Impala	2014-01-08 10:44:37 -08:00
Alan Choi	f15ef994fb	"mvn test" now uses impalad and beeswax api to submit query and fetch, including insert query. review issue: 260	2014-01-08 10:44:30 -08:00
Alan Choi	cbadb4eac4	When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes this problem. review: 162	2014-01-08 10:44:24 -08:00
Alan Choi	41200fc307	Impalad now accept Query.Configuration as execution option issue: 210	2014-01-08 10:44:22 -08:00
Marcel Kornacker	10bf3e91e3	Cancellation support: - added DataStreamMgr::Cancel(), which is used to propagate cancellation from the coordinator to all (possibly blocked) ExchangeNodes - all exec nodes now check for cancellation before they do anything that might block for a while - fixed up logic related to async cancellation Added support for async query execution via beeswax interface: - implemented ImpalaServer::query() - QueryExecState now tracks beeswax's idea of the query state - ImpalaServer::get_state() now returns the actual state Fixed handling of ExecNode::Close(): - needs to be called for entire plan tree, regardless of what fails (can't use RETURN_IF_ERROR() inside of it) - needs to be called for every Open() call by coordinator/ImpalaServer	2014-01-08 10:44:18 -08:00
Marcel Kornacker	fb32d40b03	Switching to an asynchronous plan fragment exec interface; this entails: - making the coordinator asynchronous - renamed ImpalaBackendService to ImpalaInternalService; - new class ImpalaServer implements ImpalaService and ImpalaInternalService - renaming ImpalaInternalService fields to conform to c++ style - merged impala-service.{cc,h} and backend-service.{cc,h} into impala-server.{cc,h} - added TStatusCode field to Status.ErrorDetail - removed ImpalaInternalService.CloseChannel Also removed JdbcDriverTest.java	2014-01-08 10:44:15 -08:00
Alan Choi	f52286f72c	This completes the Beeswax implementation for ODBC. All the ODBC tests (CDH/hive-odbc-test) passes (except those with "create table" and "show table". We should have nightly regression of the odbc test to run against impalad. There're still a few issues: 1. running with num_node > 0 crashes the coordinator; 2. work around for a few ODBC jiras 3. no test for bool/timestamp because ODBC doesn't support them. review: issue 110	2012-06-18 14:46:46 -07:00
Henry Robinson	eb2a09ed4a	impalad can use external planservice, plus catalog refresh utility	2012-06-12 12:22:31 -07:00
Alan Choi	7af87c7dea	Beeswax Service for Impala (partiial implementation) review id: 82	2012-06-06 10:08:06 -07:00
Henry Robinson	3ff3559805	Add support for per-partition file formats to front end and backend. At the same time, this patch removes the partitionKeyRegex in favour of explicitly sending a list of literal expressions for each file path from the front end.	2012-06-05 12:00:09 -07:00
Marcel Kornacker	0227ea8868	Several changes related to impalad: - breaks out ImpalaService implementation into impala-service.{cc,h} and completes the implementation (minus cancellation) - reorg of testutil/QueryExecutor: now we have a QueryExecutorIf with two implementations, InProcessQueryExecutor (the existing one) and ImpaladQueryExecutor (which executes against a running impalad process)	2012-05-21 12:00:21 -07:00
Henry Robinson	2af14392a6	Serial INSERT support	2012-05-03 13:44:32 -07:00
Marcel Kornacker	6a57a1d879	Enabling multi-node distributed execution: - adding flag --backends="host:port,host:port,..." , which TestEnv uses to create clients for ImpalaBackendServices running on those nodes; this is just a hack in order to be able to use runquery for multi-node execution - impalad-main.cc: main() of impala daemon, which will export both ImpalaService and ImpalaBackendService (but at the moment only does the latter; everything related to ImpalaService is commented out) - com.cloudera.impala.service.Frontend: API to the frontend functionality; invoked by impalad via jni; ignore for now	2012-02-10 10:53:40 -08:00
Nong Li	c84fec38d3	- Move thrift out of FE src and into impala/common - Thrift files now build using cmake instead of mvn - Added cmake build to impala/ which drives the build process	2011-12-30 19:35:20 -08:00

48 Commits