impala

mirror of https://github.com/apache/impala.git synced 2025-12-21 02:48:14 -05:00

Author	SHA1	Message	Date
Bikramjeet Vig	06c9016a37	IMPALA-8762: Track host level admission stats across all coordinators This patch adds the ability to share the per-host stats for locally admitted queries across all coordinators. This helps to get a more consolidated view of the cluster for stats like slots_in_use and mem_admitted when making local admission decisions. Testing: Added e2e py test Change-Id: I2946832e0a89b077d0f3bec755e4672be2088243 Reviewed-on: http://gerrit.cloudera.org:8080/17683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-28 05:33:16 +00:00
Joe McDonnell	1d63348b93	IMPALA-9549: Handle catalogd startup delays when using local catalog Impalads should be tolerant of delays in catalogd startup. Currently, when running with the local catalog (use_local_catalog=true), impalad startup can fail when catalogd startup is delayed. What happens is that ImpalaServer's constructor calls ImpalaServer::UpdateCatalogMetrics(), which maintains two metrics counting the number of tables and databases. This is before the code in ImpalaServer::Start() that waits for the catalogd to start (added by IMPALA-4704), so there is no guarantee that catalogd is running. The UpdateCatalogMetrics() call ends up calling getDbs() in the frontend catalog. LocalCatalog::getDbs() tries to load the databases (and thus contact catalogd), and this call will fail if catalogd is not running. This fails startup. use_local_catalog=false is immune to this only because it does not contact catalogd in Catalog::getDbs(). This moves the UpdateCatalogMetrics() call from the ImpalaServer constructor to ImpalaServer::Start() after the impalad has already waited for the catalogd to start up. It also limits the call to run only in coordinators. Prior to this change, when using local catalog, the executors would have catalog.num-databases and catalog.num-tables set to the right values at startup. These values would not be kept up to date. With this change, the executors do not have these values set. Without local catalog, both before and after this change, executors do not have accurate counts for catalog.num-databases or catalog.num-tables. Testing: - Added a test to custom_cluster.test_catalog_wait to delay catalogd start up by 60 seconds and verify that the impalads successfully start up. This test fails prior to this change. - Hand tested to verify that the metrics that are maintained by UpdateCatalogMetrics() are not meaningfully changed for coordinators and that executors do not have metrics set. Change-Id: I1b5a94c59faaaa25927a169dcb58f310ce6b1044 Reviewed-on: http://gerrit.cloudera.org:8080/15561 Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-27 03:41:53 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00
Vuk Ercegovac	6769220e28	IMPALA-6198: marks a test as debug-only The test_catalog_wait test uses flags that are only compiled for debug binaries. This change marks the test as debug-only so that it does not break release tests. Change-Id: I92640b8192545cccea0411c04cc5fcf59fbefed0 Reviewed-on: http://gerrit.cloudera.org:8080/8573 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-16 12:04:17 +00:00
Vuk Ercegovac	6a2b7a64fb	IMPALA-4704: Turns on client connections when local catalog initialized. Currently, impalad starts beeswax and hs2 servers even if the catalog has not yet been initialized. As a result, client connections see an error message stating that the impalad is not yet ready. This patch changes the impalad startup sequence to wait until the catalog is received before opening beeswax and hs2 ports and starting their servers. Testing: - python e2e tests that start a cluster without a catalog and check that client connections are rejected as expected. Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Reviewed-on: http://gerrit.cloudera.org:8080/8202 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-13 21:14:14 +00:00

5 Commits