impala

mirror of https://github.com/apache/impala.git synced 2025-12-23 21:08:39 -05:00

Author	SHA1	Message	Date
Bikramjeet Vig	d99bc14936	IMPALA-8858: Add metrics tracking num queries running on executor groups With this patch, all executor groups with at least one executor will have a metric added that displays the number of queries (admitted by the local coordinator) running on them. The metric is removed only when the group has no executors in it. It gets updated when either the cluster membership changes or a query gets admitted or released by the admission controller. Also adds the ability to delete metrics from a metric group after registration. Testing: - Added a custom cluster test and a BE metric test. - Had to modify some metric tests that relied on ordering of metrics by their name. Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Reviewed-on: http://gerrit.cloudera.org:8080/14103 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-20 02:43:58 +00:00
Lars Volker	576f205bff	IMPALA-8936: Improve queuing reason for unhealthy executor groups In some situations, users might actually expect not having a healthy executor group around, e.g. when they're starting one and it takes a while to come online. This change makes the queuing reason more generic and drops the "unhealthy" concept from it to reduce confusion. Change-Id: Idceab7fb56335bab9d787b0f351a41e6efd7dd59 Reviewed-on: http://gerrit.cloudera.org:8080/14210 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-11 08:04:05 +00:00
Sahil Takiar	df365a8e30	IMPALA-8803: Coordinator should release admitted memory per-backend Changes the Coordinator to release admitted memory when each Backend completes, rather than waiting for the entire query to complete before releasing admitted memory. When the Coordinator detects that a Backend has completed (via ControlService::ReportExecStatus) it updates the state of the Backend in Coordinator::BackendResourceState. BackendResourceState tracks the state of the admitted resources for each Backend and decides when the resources for a group of Backend states should be released. BackendResourceState defines a state machine to help coordinate the state of the admitted memory for each Backend. It guarantees that by the time the query is shutdown, all Backends release their admitted memory. BackendResourceState implements three rules to control the rate at which the Coordinator releases admitted memory from the AdmissionController: * Resources are released at most once every 1 second, this prevents short lived queries from causing high load on the admission controller * Resources are released at most O(log(num_backends)) times; the BackendResourceStates can release multiple BackendStates from the AdmissionController at a time * All pending resources are released if the only remaining Backend is the Coordinator Backend; this is useful for result spooling where all Backends may complete, except for the Coordinator Backend Exposes the following hidden startup flags to help tune the heuristics above: --batched_release_decay_factor * Defaults to 2 * Controls the base value for the O(log(num_backends)) bound when batching the release of Backends. --release_backend_states_delay_ms * Defaults to 1000 milliseconds * Controls how often Backends can release their resources. Testing: * Ran core tests * Added new tests to test_result_spooling.py and test_admission_controller.py Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9 Reviewed-on: http://gerrit.cloudera.org:8080/14104 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-07 21:15:52 +00:00
Bikramjeet Vig	b5193f36de	IMPALA-8806: Add metrics to improve observability of executor groups This patch adds 3 metrics under a new metric group called "cluster-membership" that keep track of the number of executor groups that have at least one live executor, number of executor groups that are in a healthy state and the number of backends registered with the statestore. Testing: Modified tests to use these metrics for verification. Change-Id: I7745ea1c7c6778d3fb5e59adbc873697beb0f3b9 Reviewed-on: http://gerrit.cloudera.org:8080/13979 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 02:33:59 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00

1 2

55 Commits