impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 06:00:52 -05:00

Author	SHA1	Message	Date
Matthew Jacobs	6c12546561	IMPALA-4833: Compute precise per-host reservation size Before this change, the per-host reservation size was computed by the Planner. However, scheduling happens after planning, so the Planner must assume that all fragments run on all hosts, and the reservation size is likely much larger than it needs to be. This moves the computation of the per-host reservation size to the BE where it can be computed more precisely. This also includes a number of plan/profile changes. Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac Reviewed-on: http://gerrit.cloudera.org:8080/7630 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-12 08:10:07 +00:00
Tim Armstrong	a98b90bd38	IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Reviewed-on: http://gerrit.cloudera.org:8080/5801 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 01:03:02 +00:00
Tim Armstrong	64fd0115e5	IMPALA-4862: make resource profile consistent with backend behaviour This moves away from the PipelinedPlanNodeSet approach of enumerating sets of concurrently-executing nodes because unions would force creating many overlapping sets of nodes. The new approach computes the peak resources during Open() and the peak resources between Open() and Close() (i.e. while calling GetNext()) bottom-up for each plan node in a fragment. The fragment resources are then combined to produce the query resources. The basic assumptions for the new resource estimates are: * resources are acquired during or after the first call to Open() and released in Close(). * Blocking nodes call Open() on their child before acquiring their own resources (this required some backend changes). * Blocking nodes call Close() on their children before returning from Open(). * The peak resource consumption of the query is the sum of the independent fragments (except for the parallel join build plans where we can assume there will be synchronisation). This is conservative but we don't synchronise fragment Open() and Close() across exchanges so can't make stronger assumptions in general. Also compute the sum of minimum reservations. This will be useful in the backend to determine exactly when all of the initial reservations have been claimed from a shared pool of initial reservations. Testing: * Updated planner tests to reflect behavioural changes. * Added extra resource requirement planner tests for unions, subplans, pipelines of blocking operators, and bushy join plans. * Added single-node plans to resource-requirements tests. These have more complex plan trees inside a single fragment, which is useful for testing the peak resource requirement logic. Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Reviewed-on: http://gerrit.cloudera.org:8080/7223 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-07-12 01:17:24 +00:00
Tim Armstrong	9a29dfc91b	IMPALA-3748: minimum buffer requirements in planner Compute the minimum buffer requirement for spilling nodes and per-host estimates for the entire plan tree. This builds on top of the existing resource estimation code, which computes the sets of plan nodes that can execute concurrently. This is cleaned up so that the process of producing resource requirements is clearer. It also removes the unused VCore estimates. Fixes various bugs and other issues: * computeCosts() was not called for unpartitioned fragments, so the per-operator memory estimate was not visible. * Nested loop join was not treated as a blocking join. * The TODO comment about union was misleading * Fix the computation for mt_dop > 1 by distinguishing per-instance and per-host estimates. * Always generate an estimate instead of unpredictably returning -1/"unavailable" in many circumstances - there was little rhyme or reason to when this happened. * Remove the special "trivial plan" estimates. With the rest of the cleanup we generate estimates <= 10MB for those trivial plans through the normal code path. I left one bug (IMPALA-4862) unfixed because it is subtle, will affect estimates for many plans and will be easier to review once we have the test infra in place. Testing: Added basic planner tests for resource requirements in both the MT and non-MT cases. Re-enabled the explain_level tests, which appears to be the only coverage for many of these estimates. Removed the complex and brittle test cases and replaced with a couple of much simpler end-to-end tests. Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Reviewed-on: http://gerrit.cloudera.org:8080/5847 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-18 20:36:08 +00:00
Nong Li	5d903efca3	ExecSummary The runtime profile as we present it is not very useful and I think the structure of it makes it hard to consume. This patch adds a new client facing schemed set of counters that are collected from the runtime profiles. For example, with this structure it would be easy to have the shell get the stats of a running query and print a useful progress report or to check the most relevant metrics for diagnosing issues. Here's an example of the output for one of the tpch queries: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------ 09:MERGING-EXCHANGE 1 79.738us 79.738us 5 5 0 -1.00 B UNPARTITIONED 05:TOP-N 3 84.693us 88.810us 5 5 12.00 KB 120.00 B 04:AGGREGATE 3 5.263ms 6.432ms 5 5 44.00 KB 10.00 MB MERGE FINALIZE 08:AGGREGATE 3 16.659ms 27.444ms 52.52K 600.12K 3.20 MB 15.11 MB MERGE 07:EXCHANGE 3 2.644ms 5.1ms 52.52K 600.12K 0 0 HASH(o_orderpriority) 03:AGGREGATE 3 342.913ms 966.291ms 52.52K 600.12K 10.80 MB 15.11 MB 02:HASH JOIN 3 2s165ms 2s171ms 144.87K 600.12K 13.63 MB 941.01 KB INNER JOIN, BROADCAST \|--06:EXCHANGE 3 8.296ms 8.692ms 57.22K 15.00K 0 0 BROADCAST \| 01:SCAN HDFS 2 1s412ms 1s978ms 57.22K 15.00K 24.21 MB 176.00 MB tpch.orders o 00:SCAN HDFS 3 8s032ms 8s558ms 3.79M 600.12K 32.29 MB 264.00 MB tpch.lineitem l Change-Id: Iaad4b9dd577c375006313f19442bee6d3e27246a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2964 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-11 03:10:11 -07:00
Alex Behm	15e05082c0	IMPALA-831: Distributed aggregation and top-n over unions. Change-Id: I056e8271421008378db93e8b2393861cc9dd4b90 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1886	2014-03-13 15:42:31 -07:00
Alex Behm	7fcd7cd64e	Add list of tables missing stats to explain header and mem-limit exceeded error. Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880	2014-03-12 21:15:22 -07:00
Alex Behm	58950a52a3	IMPALA-798: Distributed execution of CTAS and explain CTAS. Change-Id: I32004a4b31c54cf5c185169fece143a61213d12d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1850 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1867	2014-03-12 16:51:50 -07:00
Alex Behm	69a840d965	Consistent memory estimates for explain tests. Our new build machines (e.g., beefy) have more cores than our other machines, so scan nodes may have a different memory estimate causing the explain tests to fail. This patch fixes the num_scanner_threads to 1 for explain tests to ensure consisteny estimates. Change-Id: Ie6194f3c3b17d04aa141d04fcddb7ac948e92fcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1735 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1753 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-05 05:38:30 -08:00
Lenni Kuff	95404d4888	Support prioritized background table loading The overall goal of this change allow for table metadata to be loaded in the background but also to allow prioritization of loading on an as-needed basis. As part of analysis, any tables that are not loaded are tracked and if analysis fails the Impalad will make an RPC to the CatalogServer to requiest the metadata loading of these tables be prioritized and analysis will be restarted. To support this, the CatalogServer now has a deque of the tables to load. For background loading, tables to load are added to the tail of the deque. However, a new CatalogServer RPC was added that can prioritize the loading of one or more tables in which case they will get added to the head of the deque. The next table to load is always taken from the head. This helps prioritize loading but is admittedly not the most fair approach. The support the prioritized loading, some changes had to made on the Impalad side during analysis: - During analysis, any tables that are missing metadata are tracked. - Analysis now runs in a loop. If it fails due to an AnalysisException AND at least 1 table/view was missing metadata, these tables missing metadata are requested to be loaded by calling the CatalogServer. - The impalad will wait until the required tables are received (by getting notified each time there is a call to updateCatalog()), and waiting to run analysis until all tables are available. Once the tables are available, analysis will restart. This change also introduces two new flags: --load_catalog_in_background (bool). When this is true (the default) the catalog server will run a period background thread to queue all unloaded tables for loading. This is generally the desired behavior, but there may be some cases (very large metastores) where this may need to be disabled. --num_metadata_loading_threads (int32). The number of threads to use when loading catalog metadata (degree of parallelism). The default is 16, but it can be increased to improve performance at the cost of stressing the Hive metastore/HDFS. Change-Id: Ib94dbbf66ffcffea8c490f50f5c04d19fb2078ad Reviewed-on: http://gerrit.ent.cloudera.com:8080/1476 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1538	2014-02-13 23:43:06 -08:00
Alex Behm	6799c93922	Simplified/enhanced explain plans with a total of four explain levels. There are now 4 explain levels summarized as follows: - Level 0: MINIMAL Non-fragmented parallel plan only showing plan nodes with minimal attributes - Level 1: STANDARD Non-fragmented parallel plan with some details in plan nodes - Level 2: EXTENDED Non-fragmented parallel plan with full details in plan nodes including the table/column stats, row size, #hosts, cardinality, and estimated per-host memory requirement - Level 3: VERBOSE Fragmented parallel plan with full details (like level 2) This patch also includes several bugfixes related to plan costing and/or testing of explain plans. Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-10 19:17:59 -08:00

11 Commits