Commit Graph

10 Commits

Author SHA1 Message Date
Bikramjeet Vig
45fb0fb3e7 IMPALA-10397: De-flake test_single_workload
This patch removes a flaky part of the test that relies on query
completion rate. Since we are already verifying that number of
healthy executor groups increases, this additional check is not
adding much to the test.

Change-Id: I6f75afdbe676d9dd6922b6ba8aa1919daa161947
Reviewed-on: http://gerrit.cloudera.org:8080/17239
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-30 23:46:48 +00:00
Bikramjeet Vig
0b79464d9c IMPALA-10397: Fix test_single_workload
The logs on failed runs indicated that the autoscaler never started
another cluster. This can only happen if it never notices a queued
query which is possible since this test was only failing in release
builds. This patch increases the runtime of the sample query to
make execution more predictable.

Testing:
Looped on my local on a release build

Change-Id: Ide3c7fb4509ce9a797b4cbdd141b2a319b923d4e
Reviewed-on: http://gerrit.cloudera.org:8080/17218
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-25 22:59:16 +00:00
Bikramjeet Vig
f888d36295 IMPALA-10397 : Reduce flakiness in test_single_workload
This test failed recently due to a timeout waiting for executors to
come up. The logs showed that the executors came up on time but it
was not recognized by the coordinator. This patch attempts to reduce
flakiness by increasing the timeout and adding more logging in case
this happens in the future.

Testing:
Ran in a loop on my local for a few hours.

Change-Id: I73ea5eb663db6d03832b19ed323670590946f514
Reviewed-on: http://gerrit.cloudera.org:8080/17028
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-10 06:05:51 +00:00
Lars Volker
ddd07e1713 IMPALA-8867: Further deflake test_auto_scaling
A previous attempt to deflake this test by lowering the threshold for
multi-group throughput had improved things but we still saw another
occurrence of test_auto_scaling failing recently. This change lowers the
threshold even further to try and eradicate the flakiness. From
inspecting the logs of the failed run I could see that the new threshold
would have prevented the failure.

Change-Id: I29808982cc6226152c544cb99f76961b582975a7
Reviewed-on: http://gerrit.cloudera.org:8080/14740
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 02:03:41 +00:00
Sahil Takiar
df365a8e30 IMPALA-8803: Coordinator should release admitted memory per-backend
Changes the Coordinator to release admitted memory when each Backend
completes, rather than waiting for the entire query to complete before
releasing admitted memory. When the Coordinator detects that a Backend
has completed (via ControlService::ReportExecStatus) it updates the
state of the Backend in Coordinator::BackendResourceState.
BackendResourceState tracks the state of the admitted resources for
each Backend and decides when the resources for a group of Backend
states should be released. BackendResourceState defines a state machine
to help coordinate the state of the admitted memory for each Backend.
It guarantees that by the time the query is shutdown, all Backends
release their admitted memory.

BackendResourceState implements three rules to control the rate at
which the Coordinator releases admitted memory from the
AdmissionController:
* Resources are released at most once every 1 second, this prevents
short lived queries from causing high load on the admission controller
* Resources are released at most O(log(num_backends)) times; the
BackendResourceStates can release multiple BackendStates from the
AdmissionController at a time
* All pending resources are released if the only remaining Backend is
the Coordinator Backend; this is useful for result spooling where all
Backends may complete, except for the Coordinator Backend

Exposes the following hidden startup flags to help tune the heuristics
above:

--batched_release_decay_factor
* Defaults to 2
* Controls the base value for the O(log(num_backends)) bound when
batching the release of Backends.

--release_backend_states_delay_ms
* Defaults to 1000 milliseconds
* Controls how often Backends can release their resources.

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py and
test_admission_controller.py

Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9
Reviewed-on: http://gerrit.cloudera.org:8080/14104
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-07 21:15:52 +00:00
Lars Volker
cebed8b88a IMPALA-8867: Deflake test_auto_scaling, improve logging
test_auto_scaling sometimes failed to reach the required query rate
within the expected time. This change increases the timeout that we wait
for the query rate to increase. It also adds logging for the maximum
query rate observed to help with debugging in case the issue still
occurs.

Testing: I looped this for a day on my desktop box and did not observe
any issues.

Change-Id: I22c43618a40ff197784add69223359e23fa1bdec
Reviewed-on: http://gerrit.cloudera.org:8080/14074
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-16 01:50:34 +00:00
Tim Armstrong
4fb8e8e324 IMPALA-8816: reduce custom cluster test runtime in core
This includes some optimisations and a bulk move of tests
to exhaustive.

Move a bunch of custom cluster tests to exhaustive. I selected
these partially based on runtime (i.e. I looked most carefully
at the tests that ran for over a minute) and the likelihood
of them catching a precommit bug.  Regression tests for specific
edge cases and tests for parts of the code that are very stable
were prime candidates.

Remove an unnecessary cluster restart in test_breakpad.

Merge test_scheduler_error into test_failpoints to avoid an unnecessary
cluster restart.

Speed up cluster starts by ensuring that the default statestore args are
applied even when _start_impala_cluster() is called directly. This
shaves a couple of seconds off each restart. We made the default args
use a faster update frequency - see IMPALA-7185 - but they did not
take effect in all tests.

Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62
Reviewed-on: http://gerrit.cloudera.org:8080/13967
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-06 21:34:26 +00:00
Bikramjeet Vig
b5193f36de IMPALA-8806: Add metrics to improve observability of executor groups
This patch adds 3 metrics under a new metric group called
"cluster-membership" that keep track of the number of executor groups
that have at least one live executor, number of executor groups that are
in a healthy state and the number of backends registered with the
statestore.

Testing:
Modified tests to use these metrics for verification.

Change-Id: I7745ea1c7c6778d3fb5e59adbc873697beb0f3b9
Reviewed-on: http://gerrit.cloudera.org:8080/13979
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-06 02:33:59 +00:00
Lars Volker
6dd36d9d1b IMPALA-8798: Skip TestAutoScaling on EC runs
TestAutoScaling uses ConcurrentWorkload, which does not set the
required query options to scan erasure-coded files. This change
disables the affected test when running with erasure-coding enabled.

Change-Id: I96690914f12679619ba96d79edde8af9cccf65fa
Reviewed-on: http://gerrit.cloudera.org:8080/13935
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2019-07-27 02:42:28 +00:00
Lars Volker
2397ae5590 IMPALA-8484: Run queries on disjoint executor groups
This change adds support for running queries inside a single admission
control pool on one of several, disjoint sets of executors called
"executor groups".

Executors can be configured with an executor group through the newly
added '--executor_groups' flag. Note that in anticipation of future
changes, the flag already uses the plural form, but only a single
executor group may be specified for now. Each executor group
specification can optionally contain a minimum size, separated by a
':', e.g. --executor_groups default-pool-1:3. Only when the cluster
membership contains at least that number of executors for the groups
will it be considered for admission.

Executor groups are mapped to resource pools by their name: An executor
group can service queries from a resource pool if the pool name is a
prefix of the group name separated by a '-'. For example, queries in
poll poolA can be serviced by executor groups named poolA-1 and poolA-2,
but not by groups name foo or poolB-1.

During scheduling, executor groups are considered in alphabetical order.
This means that one group is filled up entirely before a subsequent
group is considered for admission. Groups also need to pass a health
check before considered. In particular, they must contain at least the
minimum number of executors specified.

If no group is specified during startup, executors are added to the
default executor group. If - during admission - no executor group for a
pool can be found and the default group is non-empty, then the default
group is considered. The default group does not have a minimum size.

This change inverts the order of scheduling and admission. Prior to this
change, queries were scheduled before submitting them to the admission
controller. Now the admission controller computes schedules for all
candidate executor groups before each admission attempt. If the cluster
membership has not changed, then the schedules of the previous attempt
will be reused. This means that queries will no longer fail if the
cluster membership changes while they are queued in the admission
controller.

This change also alters the default behavior when using a dedicated
coordinator and no executors have registered yet. Prior to this change,
a query would fail immediately with an error ("No executors registered
in group"). Now a query will get queued and wait until executors show
up, or it times out after the pools queue timeout period.

Testing:

This change adds a new custom cluster test for executor groups. It
makes use of new capabilities added to start-impala-cluster.py to bring
up additional executors into an already running cluster.

Additionally, this change adds an instructional implementation of
executor group based autoscaling, which can be used during development.
It also adds a helper to run queries concurrently. Both are used in a
new test to exercise the executor group logic and to prevent regressions
to these tools.

In addition to these tests, the existing tests for the admission
controller (both BE and EE tests) thoroughly exercise the changed code.
Some of them required changes themselves to reflect the new behavior.

I looped the new tests (test_executor_groups and test_auto_scaling) for
a night (110 iterations each) without any issues.

I also started an autoscaling cluster with a single group and ran
TPC-DS, TPC-H, and test_queries on it successfully.

Known limitations:

When using executor groups, only a single coordinator and a single AC
pool (i.e. the default pool) are supported. Executors to not include the
number of currently running queries in their statestore updates and so
admission controllers are not aware of the number of queries admitted by
other controllers per host.

Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b
Reviewed-on: http://gerrit.cloudera.org:8080/13550
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-21 04:54:03 +00:00