Commit Graph

103 Commits

Author SHA1 Message Date
Qifan Chen
07a3e6e0df IMPALA-10992 Planner changes for estimate peak memory
This patch provides replan support for multiple executor group sets.
Each executor group set is associated with a distinct number of nodes
and a threshold for estimated memory per host in bytes that can be
denoted as [<group_name_prefix>:<#nodes>, <threshold>].

In the patch, a query of type EXPLAIN, QUERY or DML can be compiled
more than once. In each attempt, per host memory is estimated and
compared with the threshold of an executor group set. If the estimated
memory is no more than the threshold, the iteration process terminates
and the final plan is determined. The executor group set with the
threshold is selected to run the query.

A new query option 'enable_replan', default to 1 (enabled), is added.
It can be set to 0 to disable this patch and to generate the distributed
plan for the default executor group.

To avoid long compilation time, the following enhancement is enabled.
Note 1) can be disabled when relevant meta-data change is
detected.

 1. Authorization is performed only for the 1st compilation;
 2. openTransaction() is called for transactional queries in 1st
    compilation and the saved transactional info is used in
    subsequent compilations. Similar logic is applied to Kudu
    transactional queries.

To facilitate testing, the patch imposes an artificial two executor
group setup in FE as follows.

 1. [regular:<#nodes>, 64MB]
 2. [large:<#nodes>, 8PB]

This setup is enabled when a new query option 'test_replan' is set
to 1 in backend tests, or RuntimeEnv.INSTANCE.isTestEnv() is true as
in most frontend tests. This query option is set to 0 by default.

Compilation time increases when a query is compiled in several
iterations, as shown below for several TPCDs queries. The increase
is mostly due to redundant work in either single node plan creation
or recomputing value transfer graph phase. For small queries, the
increase can be avoided if they can be compiled in single iteration
by properly setting the smallest threshold among all executor group
sets. For example, for the set of queries listed below, the smallest
threshold can be set to 320MB to catch both q15 and q21 in one
compilation.

                              Compilation time (ms)
Queries	 Estimated Memory   2-iterations  1-iteration  Percentage of
                                                         increase
 q1         408MB              60.14         25.75       133.56%
 q11	   1.37GB             261.00        109.61       138.11%
 q10a	    519MB             139.24         54.52       155.39%
 q13	    339MB             143.82         60.08       139.38%
 q14a	   3.56GB             762.68        312.92       143.73%
 q14b	   2.20GB             522.01        245.13       112.95%
 q15	    314MB               9.73          4.28       127.33%
 q21	    275MB              16.00          8.18        95.59%
 q23a	   1.50GB             461.69        231.78        99.19%
 q23b	   1.34GB             461.31        219.61       110.05%
 q4	   2.60GB             218.05        105.07       107.52%
 q67	   5.16GB             694.59        334.24       101.82%

Testing:
 1. Almost all FE and BE tests are now run in the artificial two
    executor setup except a few where a specific cluster configuration
    is desirable;
 2. Ran core tests successfully;
 3. Added a new observability test and a new query assignment test;
 4. Disabled concurrent insert test (test_concurrent_inserts) and
    failing inserts (test_failing_inserts) test in local catalog mode
    due to flakiness. Reported both in IMPALA-11189 and IMPALA-11191.

Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Reviewed-on: http://gerrit.cloudera.org:8080/18178
Reviewed-by: Qifan Chen <qchen@cloudera.com>
Tested-by: Qifan Chen <qchen@cloudera.com>
2022-03-21 20:17:28 +00:00
Fucun Chu
157086cb80 IMPALA-10771: Add Tencent COS support
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-12-08 16:32:02 +00:00
Bikramjeet Vig
aa2302e53b IMPALA-10596: De-flake TestAdmissionControllerStress
Currently some tests in TestAdmissionControllerStress fail because
they rely on queries holding onto resources on backends till they
are explicitly closed. This was done by keeping the query alive by
fetching very limited rows in intervals. After results spooling was
turned on by default, the queries started to finish early and
released their resources on other backends which let queued queries
to get admitted. This patch turns off result spooling for the queries
to avoid this situation.

Testing:
Ran test in a loop on my local dev machine.

Change-Id: I1594a82f507db8dc094d4441dd8739d037f974ff
Reviewed-on: http://gerrit.cloudera.org:8080/17272
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-10 00:51:47 +00:00
Thomas Tauber-Marshall
9adb093ae0 IMPALA-10594: Handle failed coordinators in admissiond
This patch adds a statestore callback for the admissiond that monitors
for coordinators that have been removed from the cluster membership
and releases all of the resources for queries running on those
coordinators.

Testing:
- Added a custom cluster test that kills a coordinator and verifies
  that resources for queries running on it are eventually released.

Change-Id: I883f323bb765680ef24b3c3f51fb209dea15f0b0
Reviewed-on: http://gerrit.cloudera.org:8080/17209
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-01 22:19:25 +00:00
Thomas Tauber-Marshall
8ac761348a IMPALA-10591: Handle failed ReleaseQueryBackends rpcs
When the admission control service is in use, coordinators will
retry failed ReleaseQueryBackends rpcs 3 times before giving up.
This can potentially result in resources not being released when they
are no longer in use.

This patch fixes the issue by automatically releasing all remaining
backends when ReleaseQuery is called in the context of the admission
control service (either as the result of a ReleaseQUery rpc or when
cleaning up a query's resources with the heartbeat mechanism).

Testing:
- Added a custom cluster test that simulates failed
  ReleaseQueryBackends rpcs and ensures that query resources are
  eventually released.

Change-Id: I22842b2fb8ee170b5e91f12cd83f57a5f5502ae9
Reviewed-on: http://gerrit.cloudera.org:8080/17208
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
2021-04-01 16:27:14 +00:00
Bikramjeet Vig
96e7f85275 IMPALA-10596: De-flake teardown in TestAdmissionControllerStress
Currently if the threads running queries in
TestAdmissionControllerStress hit an error, they close their client
which ultimately closes the query that it was running. If teardown()
runs after the client is closed, it tries to cancel the query that
the thread was running and hits an exception trying to cancel an
already closed query. This results in the pytest throws the exception
encountered in teardown() instead of the original exception that
caused the test to fail in the first place. This patch fixes this by
removing the query handle from the thread if the client is closed.

Testing:
Simulated hitting an error condition in the main thread that
initially triggered this condition.

Change-Id: I8aa8315d9f598ba80d13cd2091e3cc743c64ba77
Reviewed-on: http://gerrit.cloudera.org:8080/17256
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-01 05:56:25 +00:00
Thomas Tauber-Marshall
e3bafcbef4 IMPALA-10590: Introduce admission service heartbeat mechanism
Currently, if a ReleaseQuery rpc fails, it's possible for the
admission service to think that some resources are still being used
that are actually free.

This patch fixes the issue by introducing a periodic heartbeat rpc
from coordinators to the admission service which contains a list of
queries registered at that coordinator.

If there is a query that the admission service thinks is running but
is not included in the heartbeat, the admission service can conclude
that the query must have already completed and release its resources.

Testing:
- Added a test that uses a debug action to simulate ReleaseQuery rpcs
  failing and checks that query resources are released properly.

Change-Id: Ia528d92268cea487ada20b476935a81166f5ad34
Reviewed-on: http://gerrit.cloudera.org:8080/17194
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-24 22:41:27 +00:00
Andrew Sherman
b28da054f3 IMPALA-10592: prevent pytest from hanging at exit.
In TestAdmissionControllerStress mark worker threads as daemons so that
an exception in teardown() will not cause pytest to hang just after
printing the test results.
https://stackoverflow.com/questions/19219596/py-test-hangs-after-showing-test-results

TESTING:

Simulated the failure in IMPALA-10596 by throwing an exception during
teardown(). Without this fix the pytest invocation hangs.

Change-Id: I74cca8f577c7fbc4d394311e2f039cf4f68b08df
Reviewed-on: http://gerrit.cloudera.org:8080/17212
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-22 23:33:55 +00:00
Thomas Tauber-Marshall
c823f17f2c IMPALA-10577: Add retrying of AdmitQuery
This patch adds retries of the AdmitQuery rpc by coordinators.
This helps to ensure that if an admissiond goes down and is restarted
or is temporarily unreachable, queries won't fail.

The retries are done with backoff and jitter to avoid overloading the
admissiond in these scenarios.

A new flag, --admission_max_retry_time_s, is added to control how long
queries will continue retrying before giving up.

The AdmitQuery rpc is made idempotent - if a query is submitted with
the same query id as one the admissiond already knows about,
AdmitQuery will return OK without submitting the query to be scheduled
again.

Testing:
- Added a custom cluster test that checks that queries won't fail when
  the admissiond goes down.

Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Reviewed-on: http://gerrit.cloudera.org:8080/17188
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-18 00:36:54 +00:00
stiga-huang
2dfc68d852 IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-13 11:20:08 +00:00
Riza Suminto
49ac55fb69 IMPALA-9856: Enable result spooling by default.
Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size

Testing:
- Pass exhaustive tests.

Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Reviewed-on: http://gerrit.cloudera.org:8080/16755
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-02 04:58:51 +00:00
Thomas Tauber-Marshall
91adb33b22 IMPALA-9975 (part 2): Introduce new admission control daemon
A recent patch (IMPALA-9930) introduces a new admission control rpc
service, which can be configured to perform admission control for
coordinators. In that patch, the admission service runs in an impalad.

This patch separates the service out to run in a new daemon, called
the admissiond. It also integrates this new daemon with the build
infrastructure around Docker.

Some notable changes:
- Adds a new class, AdmissiondEnv, which performs the same function
  for the admissiond as ExecEnv does for impalads.
- The '/admission' http endpoint is exposed on the admissiond's webui
  if the admission control service is in use, otherwise it is exposed
  on coordinator impalad's webuis.
- start-impala-cluster.py takes a new flag --enable_admission_service
  which configures the minicluster to have an admissiond with all
  coordinators using it for admission control.
- Coordinators are now configured to use the admission service by
  specifying the startup flag --admission_service_host. This is
  intended to mirror the configuration of the statestored/catalogd
  location.

Testing:
- Existing tests for the admission control serivce are modified to run
  with an admissiond.
- Manually ran start-impala-cluster.py with --enable_admission_service
  and --docker_network to verify Docker integration.

Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9
Reviewed-on: http://gerrit.cloudera.org:8080/16891
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-13 06:03:37 +00:00
Tim Armstrong
ab6b7960db IMPALA-10027: configurable default anonymous user
A username can be determined for a session via two mechanisms:
* In a secure env, the user is authenticated by LDAP or Kerberos
* In an unsecure env, the client specifies the user name, either
  as a parameter to the OpenSession API (HS2) or as a parameter
  to the first query run (beeswax)

This patch affects what happens if neither of the above mechanisms
is used. Previously we would end up with the username being an
empty string, but this makes Ranger unhappy. Hive uses the name
"anonymous" in this situation, so we change Impala's behaviour too.

This is configurable by -anonymous_user_name. -anonymous_user_name=
reverts to the old behaviour.

Test
* Add an end-to-end test that exercises this via impala-shell for
  HS2, HS2-HTTP and beeswax protocols.
* Tweak a couple of existing tests that depended on the previous
  behavior.

Change-Id: I6db491231fa22484aed476062b8fe4c8f69130b0
Reviewed-on: http://gerrit.cloudera.org:8080/16902
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-09 00:15:25 +00:00
Thomas Tauber-Marshall
ca884476bb IMPALA-9930 (part 2): Introduce new admission control rpc service
This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Reviewed-on: http://gerrit.cloudera.org:8080/16412
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-03 23:46:29 +00:00
Bikramjeet Vig
a34bb2ad11 IMPALA-8990: Fix flakiness in test_set_request_pool
This patch fixes flakiness in test_set_request_pool where one of the
test cases expects a query to already be accounted for in a resource
pool but sometimes gets into a state where it runs before the
previous query passes admission control. This fixes the issue by
waiting for the previous query to clear admission control before
attempting to run the aforementioned failing query.

Testing:
- Ran test locally in a loop for an hour to flush out flakiness.

Change-Id: Ife06509e936443579ca60780013ce01352c8932e
Reviewed-on: http://gerrit.cloudera.org:8080/16749
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-03 05:19:24 +00:00
Bikramjeet Vig
b97c9e7966 IMPALA-8202: Extend query timeout for test_mem_limit
With the current timeout set to 1 sec, we saw flaky failure where
the query timed out due to client inactivity. This can happen if
the thread keeping it alive fails to execute a fetch within the
query timeout period. This patch attempts to fix this flakiness
by increasing the timeout period.

Testing:
Looped the test locally.

Change-Id: Ic02f73bea528af12053043e0a57b4158532833b4
Reviewed-on: http://gerrit.cloudera.org:8080/16774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-24 01:57:56 +00:00
Andrew Sherman
4159a8085c IMPALA-10244: Make non-scalable failures to dequeue observable.
One of the important ways to observe Impala throughput is by looking at
when queries are queued. This can be an indication that more resources
should be added to the cluster by adding more executor groups. This is
only a good strategy if adding more resources will help with the current
workload. In some situations the head of the query queue cannot be
executed because of resource constraints on the coordinator. In these
cases the coordinator is the bottleneck so adding more executor groups
will not help. This change is to make these cases observable by adding a
new counter which is incremented when a dequeue fails because of
resource constraints on the coordinator.

The two cases that cause the counter to be incremented are:
- when there are not enough admission control slots on the coordinator
- when there is not enough memory on the coordinator
but it is possible that other conditions may be added in future.

TESTING:
Added new unit tests.
Ran all end-to-end tests.

Change-Id: I3456396ac139c562ad9cd3ac1a624d8f35487518
Reviewed-on: http://gerrit.cloudera.org:8080/16613
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 06:35:54 +00:00
Thomas Tauber-Marshall
28181cbe6c IMPALA-9930 (part 1): Initial refactor for admission control service
This patch contains the following refactors that are needed for the
admission control service, in order to make the main patch easier to
review:
- Adds a new class AdmissionControlClient which will be used to
  abstract the logic for submitting queries to either a local or
  remote admission controller out from ClientRequestState/Coordinator.
  Currently only local submission is supported.
- SubmitForAdmission now takes a BackendId representing the
  coordinator instead of assuming that the local impalad will be the
  coordinator.
- The CRS_BEFORE_ADMISSION debug action is moved into
  SubmitForAdmission() so that it will be executed on whichever daemon
  is performing admission control rather than always on the
  coordinator (needed for TestAdmissionController.test_cancellation).
- ShardedQueryMap is extended to allow keys to be either TUniqueId or
  UniqueIdPB and Add(), Get(), and Delete() convenience functions are
  added.
- Some utils related to seralizing Thrift objects into sidecars are
  added.

Testing:
- Passed a run of existing core tests.

Change-Id: I7974a979cf05ed569f31e1ab20694e29fd3e4508
Reviewed-on: http://gerrit.cloudera.org:8080/16411
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-29 23:14:50 +00:00
Qifan Chen
2ef6184ee1 IMPALA-9989 Improve admission control pool stats logging
This work addresses the current limitation in admission controller by
appending the last known memory consumption statistics about the set of
queries running or waiting on a host or in a pool to the existing memory
exhaustion message. The statistics is logged in impalad.INFO when a
query is queued or queued and then timed out due to memory pressure in
the pool or on the host. The statistics can also be part of the query
profile.

The new memory consumption statistics can be either stats on host or
aggregated pool stats. The stats on host describes memory consumption
for every pool on a host. The aggregated pool stats describes the
aggregated memory consumption on all hosts for a pool. For each stats
type, information such as query Ids and memory consumption of up to top
5 queries is provided, in addition to the min, the max, the average and
the total memory consumption for the query set.

When a query request is queued due to memory exhaustion, the above
new consumption statistics is logged when the BE logging level is set
at 2.

When a query request is timed out due to memory exhaustion, the above
new consumption statistics is logged when the BE logging level is set
at 1.

Testing:
1. Added a new test TopNQueryCheck in admission-controller-test.cc to
   verify that the topN query memory consumption details are reported
   correctly.
2. Add two new tests in test_admission_controller.py to simulate
   queries being queued and then timed out due to pool or host memory
   pressure.
3. Added a new test TopN in mem-tracker-test.cc to
   verify that the topN query memory consumption details are computed
   correctly from a mem tracker hierarchy.
4. Ran Core tests successfully.

Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781
Reviewed-on: http://gerrit.cloudera.org:8080/16220
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-08-28 20:22:57 +00:00
Tim Armstrong
521c81466f IMPALA-10036: schedule unpartitioned fragments on random executor
Some plans have unpartitioned fragments that are not the coordinator
fragment. It would be best to avoid scheduling these on a coordinator,
both to reduce coordinator load and to allow using the coordinator
mem estimates for these queries.

Testing:
Extended the dedicated coordinator admission control test with
a query with an unpartitioned fragment. The test failed on master
with dedicated coordinator estimates.

Manually ran the repro with TPC-DS Q23, which is a more complex
query with unpartitioned fragments, and confirmed that could
run with low coordinator mem limits.

Ran exhaustive tests.

Change-Id: I1eee78f24b048968f2fdf91f0da95b78552f8d33
Reviewed-on: http://gerrit.cloudera.org:8080/16272
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2020-08-03 04:30:13 +00:00
Bikramjeet Vig
f9cb0a65fe IMPALA-9077: Remove scalable admission control configs
Removed the 3 scalable configs added in IMPALA-8536:
- Max Memory Multiple
- Max Running Queries Multiple
- Max Queued Queries Multiple

This patch removes the functionality related to those configs but
retains the additional test coverage and cleanup added in
IMPALA-8536. This removal is to make it easier to enhance
Admission Control using Executor Groups which has turned out to
be a useful building block.

Testing:
Ran core tests.

Change-Id: Ib9bd63f03758a6c4eebb99c64ee67e60cb56b5ac
Reviewed-on: http://gerrit.cloudera.org:8080/16039
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-10 04:02:16 +00:00
Tim Armstrong
748e41ab41 IMPALA-9380: async query unregistration
This change improves query latency by doing much
of the heavyweight work of unregistering a query
asynchronously, instead of synchronously on the
RPC thread. The biggest win is to move the
profile serialization off the RPC thread.

Unregistration processing is done by a thread pool
with 4 threads by default. This is configurable by
--unregistration_thread_pool_size and
--unregistration_thread_pool_queue_depth.

This fixes a pre-existing bug where a
query was temporarily neither in the in-flight
queries nor the completed queries. It would be
much easier to hit this with async unregistration
because there is less synchronisation on the client
side. Now the query is briefly in both maps, but
this is handled as follows:
* All places that look up both the maps will check
  the in-flight map first, and return a reference to
  the ClientRequestState, i.e. ignoring the entry in
  the query log.
* The /queries page does not return completed queries
  if they were found in the in-flight queries map, so
  avoids duplicate results.

The thread safety story changes slightly.
Before this change, only one thread could
remove the query from the map and close it,
with only one thread "winning" the race to remove
the ClientRequestState from the map. Since we leave
the query in the map while being finalized, we
instead use an atomic in ClientRequestState to ensure
that only one thread does the finalization.

Some misc cleanup was done as a result of these changes:
* Fix a pre-existing TSAN race in RuntimeProfile that
  was revealed by the new concurrent unregister test.
* Consolidate the various unknown query handle errors into
  an error code so that we consistently return the same
  string.
* "Unregister query" should include flushing audit events.

Testing:
* Add a test that unregisters a query concurrent with other
  operations.
* Ran exhaustive tests

Perf:
Ran TPC-H 30 with mt_dop=4. No regressions and some improvements:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 5.38    | -2.67%     | 4.02       | -2.01%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval   |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.36   | 5.17        |   +3.61%   |   1.82%   |   1.17%        | 5     |   +3.73%       | 1.73    | 3.65   |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.77   | 1.74        |   +1.48%   |   2.00%   |   2.50%        | 5     |   +2.89%       | 0.87    | 1.03   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.02   | 3.00        |   +0.79%   |   2.18%   |   2.21%        | 5     |   +1.55%       | 0.00    | 0.57   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.65   | 1.64        |   +0.81%   |   1.35%   |   0.03%        | 5     |   +0.07%       | 1.15    | 1.34   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.21   | 1.21        |   -0.07%   |   2.11%   |   2.14%        | 5     |   -0.04%       | -0.29   | -0.05  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.50   | 2.52        |   -0.49%   |   2.43%   |   3.34%        | 5     |   -0.09%       | -0.29   | -0.27  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.86   | 2.90        |   -1.28%   |   2.30%   |   1.24%        | 5     |   -0.02%       | -0.58   | -1.11  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 4.35   | 4.40        |   -1.15%   |   1.76%   |   1.78%        | 5     |   -1.12%       | -0.87   | -1.03  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.10   | 4.17        |   -1.80%   |   1.05%   |   1.31%        | 5     |   -1.25%       | -1.73   | -2.40  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.20   | 3.25        |   -1.52%   |   0.79%   |   2.56%        | 5     |   -1.56%       | -0.58   | -1.26  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 10.81  | 11.07       |   -2.34%   |   5.00%   |   7.01%        | 5     |   -1.40%       | -0.58   | -0.61  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 11.19  | 11.56       |   -3.18%   |   3.47%   |   6.02%        | 5     |   -0.90%       | -0.87   | -1.03  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 19.91  | 20.32       |   -2.02%   |   0.66%   |   0.47%        | 5     |   -2.18%       | -2.31   | -5.64  |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 5.63   | 5.77        |   -2.40%   |   1.71%   |   2.01%        | 5     |   -1.84%       | -1.73   | -2.05  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.91   | 4.03        |   -2.74%   |   1.08%   |   1.86%        | 5     |   -2.45%       | -1.44   | -2.88  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.55   | 4.71        |   -3.48%   |   1.90%   |   3.53%        | 5     |   -2.35%       | -1.44   | -1.96  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.93   | 2.01        |   -3.96%   |   0.05%   |   4.05%        | 5     |   -2.59%       | -2.31   | -2.19  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.52   | 4.73        |   -4.26%   |   1.26%   |   2.43%        | 5     |   -3.40%       | -2.02   | -3.51  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.02   | 1.05        |   -3.58%   |   3.94%   |   2.36%        | 5     |   -4.56%       | -1.44   | -1.79  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.52   | 10.04       | I -5.24%   |   2.14%   |   0.56%        | 5     | I -4.67%       | -2.31   | -5.57  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.49   | 3.68        | I -5.08%   |   0.07%   |   0.56%        | 5     | I -5.66%       | -2.31   | -20.08 |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 11.92  | 12.71       | I -6.19%   |   0.57%   |   3.15%        | 5     | I -4.99%       | -2.31   | -4.33  |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+

Change-Id: I80027b1baeb4ab453938c0f6357b120f4035ba08
Reviewed-on: http://gerrit.cloudera.org:8080/15821
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-05 10:12:42 +00:00
Tim Armstrong
b0c6740fae IMPALA-8998: admission control accounting for mt_dop
This integrates mt_dop with the "slots" mechanism that's used
for non-default executor groups.

The idea is simple - the degree of parallelism on a backend
determines the number of slots consumed. The effective
degree of parallelism is used, not the raw mt_dop setting.
E.g. if the query only has a single input split and executes
only a single fragment instance on a host, we don't want
to count the full mt_dop value for admission control.

--admission_control_slots is added as a new flag that
replaces --max_concurrent_queries, since the name better
reflects the concept. --max_concurrent_queries is kept
for backwards compatibility and has the same meaning
as --admission_control_slots.

The admission control logic is extended to take this into
account. We also add an immediate rejection code path
since it is now possible for queries to not be admittable
based on the # of available slots.

We only factor in the "width" of the plan - i.e. the number
of instances of fragments. We don't account for the number
of distinct fragments, since they may not actually execute
in parallel with each other because of dependencies.

This number is added to the per-host profile as the
"AdmissionSlots" counter.

Testing:
Added unit tests for rejection and queue/admit checks.

Also includes a fix for IMPALA-9054 where we increase
the timeout.

Added end-to-end tests:
* test_admission_slots in test_mt_dop.py that checks the
  admission slot calculation via the profile.
* End-to-end admission test that exercises the admit
  immediately and queueing code paths.

Added checks to test_verify_metrics (which runs after
end-to-end tests) to ensure that the per-backend
slots in use goes to 0 when the cluster is quiesced.

Change-Id: I7b6b6262ef238df26b491352656a26e4163e46e5
Reviewed-on: http://gerrit.cloudera.org:8080/14357
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-10-16 20:26:11 +00:00
Sahil Takiar
c67e0868e0 IMPALA-8926, IMPALA-8989: Fix flaky result spooling tests
Disable the tests TestResultSpooling::test_full_queue(_large_fetch)
until we figure out why they are flaky.

Replace the sleep in TestAdmissionController::test_release_backend with
assert_eventually to reduce flakiness.

Change-Id: I7ea6bf3d84f174745c8a0b1e0f2b55ce05ee618b
Reviewed-on: http://gerrit.cloudera.org:8080/14337
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-04 01:10:25 +00:00
Bikramjeet Vig
104a454d3e IMPALA-8928: Add MEM_LIMIT_EXECUTORS query option
This developer only option provides a way to set the memory limit for
only the executors which would be useful for writing tests in the
future.

Testing:
- Added a simple sanity check

Change-Id: I20dfd4e8fc7ffd9130db9f942efa78965c724e18
Reviewed-on: http://gerrit.cloudera.org:8080/14294
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-27 02:02:11 +00:00
Sahil Takiar
df365a8e30 IMPALA-8803: Coordinator should release admitted memory per-backend
Changes the Coordinator to release admitted memory when each Backend
completes, rather than waiting for the entire query to complete before
releasing admitted memory. When the Coordinator detects that a Backend
has completed (via ControlService::ReportExecStatus) it updates the
state of the Backend in Coordinator::BackendResourceState.
BackendResourceState tracks the state of the admitted resources for
each Backend and decides when the resources for a group of Backend
states should be released. BackendResourceState defines a state machine
to help coordinate the state of the admitted memory for each Backend.
It guarantees that by the time the query is shutdown, all Backends
release their admitted memory.

BackendResourceState implements three rules to control the rate at
which the Coordinator releases admitted memory from the
AdmissionController:
* Resources are released at most once every 1 second, this prevents
short lived queries from causing high load on the admission controller
* Resources are released at most O(log(num_backends)) times; the
BackendResourceStates can release multiple BackendStates from the
AdmissionController at a time
* All pending resources are released if the only remaining Backend is
the Coordinator Backend; this is useful for result spooling where all
Backends may complete, except for the Coordinator Backend

Exposes the following hidden startup flags to help tune the heuristics
above:

--batched_release_decay_factor
* Defaults to 2
* Controls the base value for the O(log(num_backends)) bound when
batching the release of Backends.

--release_backend_states_delay_ms
* Defaults to 1000 milliseconds
* Controls how often Backends can release their resources.

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py and
test_admission_controller.py

Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9
Reviewed-on: http://gerrit.cloudera.org:8080/14104
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-07 21:15:52 +00:00
Bikramjeet Vig
0018b710f4 IMPALA-8760: Disable TestAdmissionControllerStress tests for CentOS 6
This test is tuned for certain timing which makes it flaky when run on
CentOS 6 where that timing is a bit off. Since this is not providing any
additional coverage by running on a different OS, it'll be disabled for
CentOS 6.

Change-Id: If63799f880f0883532467a00e362105a78878f17
Reviewed-on: http://gerrit.cloudera.org:8080/14124
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-27 02:32:38 +00:00
Bikramjeet Vig
e0c0fe988a IMPALA-8791: Handle the case where there is no fragment scheduled on
the coordinator

This patch fixes a bug where if an insert or CTAS query has no
fragments scheduled on the coordinator and a mem limit is to be
enforced on the query (either through query option or automatically
through estimates) then the same limit is also applied to the
coordinator backend even though it does not execute anything.

Highlights:
- coord_backend_mem_to_admit_/mem_limit will always refer to the memory
to admit/limit for the coordinator regardless of which fragments are
scheduled on it.

- There will always be a BackendExecParams added for the coordinator
because coordinator always spawns a QueryState object with a mem_tracker
for tracking runtime filter mem and the result set cache. For the case
where this BackendExecParams is empty (no instances scheduled) it would
ensure that some minimal amount of memory is accounted for by the
admission controller and the right mem limit is applied to the
QueryState spawned by the coordinator

- added changes to Coordinator and Coordinator::BackendState classes
to handle an empty BackendExecParams object

Testing:
The following cases need to be tested where the kind of fragments
schduled on the coordinator backend are:
1. Coordinator fragment + other exec fragments
2. Coordinator fragment only
3. other exec fragments only (eg. insert into values OR insert
   into select 1)
4. No fragments, but coordinator still creates a QueryState

Case 1 is covered by tests working with non-dedicated coordinators.
Rest are covered by test_dedicated_coordinator_planner_estimates and
test_sanity_checks_dedicated_coordinator in
test_admission_controller.py

Change-Id: Ic4fba02bb7b20553a20634f8c5989d43ba2c0721
Reviewed-on: http://gerrit.cloudera.org:8080/14058
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-14 06:11:45 +00:00
Tim Armstrong
4fb8e8e324 IMPALA-8816: reduce custom cluster test runtime in core
This includes some optimisations and a bulk move of tests
to exhaustive.

Move a bunch of custom cluster tests to exhaustive. I selected
these partially based on runtime (i.e. I looked most carefully
at the tests that ran for over a minute) and the likelihood
of them catching a precommit bug.  Regression tests for specific
edge cases and tests for parts of the code that are very stable
were prime candidates.

Remove an unnecessary cluster restart in test_breakpad.

Merge test_scheduler_error into test_failpoints to avoid an unnecessary
cluster restart.

Speed up cluster starts by ensuring that the default statestore args are
applied even when _start_impala_cluster() is called directly. This
shaves a couple of seconds off each restart. We made the default args
use a faster update frequency - see IMPALA-7185 - but they did not
take effect in all tests.

Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62
Reviewed-on: http://gerrit.cloudera.org:8080/13967
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-06 21:34:26 +00:00
Bikramjeet Vig
a0b0fd4fce IMPALA-7486: Add specialized estimation scheme for dedicated coordinators
This patch computes two memory estimates in the frontend:
an estimate for any host that is an executor (including
a combined coordinator and executor) and an estimate
for a dedicated coordinator. This is computed regardless
of whether it is a dedicated coordinator or not.

Admission control then, in the case when the coordinator
is dedicated, uses the coordinator memory estimate for
the coordinator node and the executor memory estimate
for all other nodes.

Other highlights:
 - if MEM_LIMIT query option is set, it is applied to all backends,
   both executors and coordinators.
 - the min_query_mem_limit pool config is not enforced on the
   dedicated coordinator estimates unless MEM_LIMIT query option is set.
 - the lower cap on estimates and the admission checks based on the
   min mem limit required for reservation are applied separately on
   coordinator's and executors' mem requirements.
 - Added a hidden startup option 'use_dedicated_coordinator_estimates'
   which if set to false, reverts to previous estimation behavior.

Testing:
- Added unit test for admission/rejection in dedicated
  coordinator clusters.
- Added end to end tests.

Change-Id: I2b94e7293b91dec8a18491079c34923eadd94b21
Reviewed-on: http://gerrit.cloudera.org:8080/13740
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-01 13:18:19 +00:00
Lars Volker
2397ae5590 IMPALA-8484: Run queries on disjoint executor groups
This change adds support for running queries inside a single admission
control pool on one of several, disjoint sets of executors called
"executor groups".

Executors can be configured with an executor group through the newly
added '--executor_groups' flag. Note that in anticipation of future
changes, the flag already uses the plural form, but only a single
executor group may be specified for now. Each executor group
specification can optionally contain a minimum size, separated by a
':', e.g. --executor_groups default-pool-1:3. Only when the cluster
membership contains at least that number of executors for the groups
will it be considered for admission.

Executor groups are mapped to resource pools by their name: An executor
group can service queries from a resource pool if the pool name is a
prefix of the group name separated by a '-'. For example, queries in
poll poolA can be serviced by executor groups named poolA-1 and poolA-2,
but not by groups name foo or poolB-1.

During scheduling, executor groups are considered in alphabetical order.
This means that one group is filled up entirely before a subsequent
group is considered for admission. Groups also need to pass a health
check before considered. In particular, they must contain at least the
minimum number of executors specified.

If no group is specified during startup, executors are added to the
default executor group. If - during admission - no executor group for a
pool can be found and the default group is non-empty, then the default
group is considered. The default group does not have a minimum size.

This change inverts the order of scheduling and admission. Prior to this
change, queries were scheduled before submitting them to the admission
controller. Now the admission controller computes schedules for all
candidate executor groups before each admission attempt. If the cluster
membership has not changed, then the schedules of the previous attempt
will be reused. This means that queries will no longer fail if the
cluster membership changes while they are queued in the admission
controller.

This change also alters the default behavior when using a dedicated
coordinator and no executors have registered yet. Prior to this change,
a query would fail immediately with an error ("No executors registered
in group"). Now a query will get queued and wait until executors show
up, or it times out after the pools queue timeout period.

Testing:

This change adds a new custom cluster test for executor groups. It
makes use of new capabilities added to start-impala-cluster.py to bring
up additional executors into an already running cluster.

Additionally, this change adds an instructional implementation of
executor group based autoscaling, which can be used during development.
It also adds a helper to run queries concurrently. Both are used in a
new test to exercise the executor group logic and to prevent regressions
to these tools.

In addition to these tests, the existing tests for the admission
controller (both BE and EE tests) thoroughly exercise the changed code.
Some of them required changes themselves to reflect the new behavior.

I looped the new tests (test_executor_groups and test_auto_scaling) for
a night (110 iterations each) without any issues.

I also started an autoscaling cluster with a single group and ran
TPC-DS, TPC-H, and test_queries on it successfully.

Known limitations:

When using executor groups, only a single coordinator and a single AC
pool (i.e. the default pool) are supported. Executors to not include the
number of currently running queries in their statestore updates and so
admission controllers are not aware of the number of queries admitted by
other controllers per host.

Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b
Reviewed-on: http://gerrit.cloudera.org:8080/13550
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-21 04:54:03 +00:00
Tim Armstrong
84dfbb4880 IMPALA-8565: larger query in test_admission_controller
It appears that in some rare cases, fragments from the
query used can finish too early, before the client has
started fetching to hit EOS.

Reduce the chances of this happening by increasing
the query size and reducing the fetch frequency.

Testing:
I could not reproduce this rare flake locally, probably
because my hardware is different from the hardware that
the failures occurred on.

I looped the failing test for a while locally to try and
ensure that the flakiness did not at least get worse.

Change-Id: Ifd7716d5d5ccc4b2f79b3acf2fbc880716b4c0fe
Reviewed-on: http://gerrit.cloudera.org:8080/13718
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-25 11:03:11 +00:00
Tim Armstrong
9ecbe7d3dc IMPALA-8553,IMPALA-8552: fix checks for remote cluster
Apparently IMPALA_REMOTE_URL is not generally used for remote cluster
tests: only --testing_remote_cluster is reliably set. Fix the
is_remote_cluster() implementation to take into account
REMOTE_DATA_LOAD and --testing_remote_cluster in addition to
IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in
other tests instead of checking the pytest flag directly.

There were a few lifecycle headaches with how
ImpalaTestClusterProperties is used:
* common.environ is imported from conftest, which means that
  the top-level code in the file runs *before* pytest
  command-line arguments have been registered and parsed.
* ImpalaTestClusterProperties is used by various code,
  like build_flavor_timeout(), which runs before pytest
  command-line arguments have been parsed.
* ImpalaTestClusterProperties is called from non-pytest
  scripts like start-impala-cluster.py, so the command-line
  arguments are not available.

I dealt with the above challenges by making a few changes
to do the detection later:
* Lazily initializing a singleton ImpalaTestClusterProperties.
  This was not strictly necessary but makes the whole problem
  less sensitive to import order and module dependencies.
* Adding cluster_properties fixture to make ImpalaTestClusterProperties
  available in tests without additional boilerplate.
* Removing the caching of the local/remote build calculation.
  ImpalaTestClusterProperties is instantiated outside of python
  tests, but is_remote_cluster() is only called from python tests,
  so if we check flags in is_remote_cluster() we'll get the
  right results reliably.

As a workaround to unblock remote tests, also assume catalog_v1 if
accessing the web UI fails.

Testing:
Ran core tests against a regular minicluster.

Ran tests against a remote cluster

Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4
Reviewed-on: http://gerrit.cloudera.org:8080/13386
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 20:27:31 +00:00
Andrew Sherman
acea702276 IMPALA-8536: Add Scalable Pool Configuration to Admission Controller.
Add 3 configuration parameters to Admission Controller that scale with
the number of hosts in the resource pool. These parameters are specified
to the Impalad through the -llama_site_path flag which points to a Llama
XML configuration file.

The new configuration parameters are:
+ Max Running Queries Multiple - this floating point number is
  multiplied by the current total number of executors at runtime to give
  the maximum number of concurrently running queries allowed in the
  pool. This calculation is rounded up to the nearest integer so the
  result will always be at least one as long as the parameter is
  non-zero.
+ Max Queued Queries Multiple - this floating point number is multiplied
  by the current total number of executors at runtime to give the
  maximum number of queries that can be queued in the pool. This
  calculation is rounded up to the nearest integer so the result will
  always be at least one as long as the parameter is non-zero.
+ Max Memory Multiple - this number of bytes is multiplied by the
  current total number of executors at runtime to give the maximum
  memory available across the cluster for the pool.
If any of these parameters have zero value then they will be ignored.
In this case the corresponding non-scalable parameters will be used, if
they are set.

The new parameters are exposed through the webui.

At various points in the code Admission Controller looks at the Pool
Config objects to find non-scalable parameters such as the max number of
queries that can run in the pool. These access have been encapsulated in
functions that return the scalable version of the configuration value if
the new scalable parameters are being used. Diagnostic messages are
enhanced to show the origin of the encapsulated parameters.

TESTING

All end-to-end tests are running clean with ASAN.

The unit test admission-controller-test.cc has been expanded to test the
newly added code.

Added an end-to-end test that adds and removes Impalads from a
minicluster.

Change-Id: If47508728124076f3b9200c27cffc989f7a4f188
Reviewed-on: http://gerrit.cloudera.org:8080/13307
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-19 05:58:10 +00:00
Tim Armstrong
d820952d86 IMPALA-8469: admit_mem_limit for dedicated coordinator
Refactored to avoid the code duplication that resulted in this bug:
* admit_mem_limit is calculated once in ExecEnv
* The local backend descriptor is always constructed with
  a static helper: Scheduler::BuildLocalBackendDescriptor()

I chose to factor it in this way, in part, to avoid invasive
changes to scheduler-test, which currently doesn't depend on
ExecEnv or ImpalaServer.

Testing:
Added basic test that reproduces the bug.

Change-Id: Iaceb21b753b9b021bedc4187c0d44aaa6a626521
Reviewed-on: http://gerrit.cloudera.org:8080/13180
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-01 00:37:04 +00:00
Bikramjeet Vig
95c6495ae5 IMPALA-8295: Fix flakiness in TestAdmissionControllerStress
Currently the test_mem_limit in TestAdmissionControllerStress encounters
flaky failures due to a timeout waiting on queries to end. Increasing
the timeout from 60 to 90 seconds to accommodate this flakiness.

Testing:
Ran the test in a loop for more than 15 hours without any failures.

Change-Id: If9ada7bdbe2461d164100badabb905cf9a8cf4cc
Reviewed-on: http://gerrit.cloudera.org:8080/13103
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-25 00:05:42 +00:00
Bikramjeet Vig
75ec9336e0 IMPALA-8202: Add more logging to TestAdmissionControllerStress
This change adds some logging and fixes a bug where the string
match that identified a timeout was wrong. The added logging
would provide more information about the state to help diagnose
root cause for IMPALA-8202.

Change-Id: Ib62cefac0dc53da0a9f26fdd959975deafc70c8f
Reviewed-on: http://gerrit.cloudera.org:8080/12829
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-22 05:00:35 +00:00
Tim Armstrong
c3c69ae362 IMPALA-5043: diagnostics for topic staleness in AC
The default threshold for the admission control topic to be considered
stale is 5s.

Adds diagnostics for stale topic updates:
* A banner on the /admission web UI if the topic is considered stale.
* Time since last update on the /admission web UI
* Append a warning to rejection/queuing messages where topic staleness
  may have affected the decision.
* Append a warning to profiles of admitted queries where the topic was
  stale at the time the query was admitted.
* Include the topic staleness in all profiles of admitted queries

Testing:
Add a custom cluster test that kills the statestore, validates that
admission control behaves as expected and that stateless warnings
show up in the appropriate places.

Change-Id: Ib9e26adb6419589ccf7625e423356df45bee4ac9
Reviewed-on: http://gerrit.cloudera.org:8080/12407
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-09 05:16:43 +00:00
Sahil Takiar
ec2dfd9482 IMPALA-7625: test_web_pages.py backend tests are failing
The backend tests in test_web_page.py do some basic validation of
the query_backends and query_finstances endpoints. The tests were
failing due to a race condition in the test itself. The issue is that
the backend endpoints are only usable while a query is running. When
queried against completed queries, the backend endpoints return nothing.
The original test worked by running a "complex" query asynchronously and
then querying the backend endpoints before the query completed. The
assumption was that the query would run long enough for the request to
the backend endpoint to complete. However, that is not always the case
and the query easily completes before the backend endpoints can be
reached.

This patch updates the test so that instead of running a "complex"
query, it runs a query that calls the sleep UDF, guaranteeing the query
will take enough time for the backend endpoints to be reached.

Furthermore, the patch adds additional validation and cleanup of all the
backend endpoint tests.

Testing:

Ran core tests.

Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Reviewed-on: http://gerrit.cloudera.org:8080/12024
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-07 22:05:04 +00:00
Fredy Wijaya
9c44853998 IMPALA-6591: Fix test_ssl flaky test
test_ssl has a logic that waits for the number of in-flight queries to
be 1. However, the logic for wait_for_num_in_flight_queries(1) only
waits for the condition to be true for a period of time and does not
throw an exception when the time has elapsed and the condition is not
met. In other words, the logic in test_ssl that loops while the number
of in-flight queries is 1 never gets executed. I was able to simulate
this issue by making Impala shell start much longer.

Prior to this patch, in the event that Impala shell took much longer to
start, the test started sending the commands to Impala shell even when
Impala shell was not ready to receive commands. The patch fixes the
issue by waiting until Impala shell is connected. The patch also adds
assert in other places that calls wait_for_num_in_flight_queries and
updates the default behavior for Impala shell to wait until it is
connected.

Testing:
- Ran core and exhaustive tests several times on CentOS 6 without any
  issue

Change-Id: I9805269d8b806aecf5d744c219967649a041d49f
Reviewed-on: http://gerrit.cloudera.org:8080/12047
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-12-12 22:44:34 +00:00
Tim Armstrong
3f0989a4fc IMPALA-7811: optionally count JVM heap towards process mem limit
Adds a flag --mem_limit_includes_jvm that alters memory accounting to
include the amount of memory we think that the JVM is likely to use.
By default this flag is false, so behaviour is unchanged.

We're not ready to change the default but I want to check this in to
enable experimentation.

Two metrics are counted towards the process limit:
* The maximum JVM heap size. We count this because the JVM memory
  consumption can expand up to this threshold at any time.
* JVM non-heap committed memory. This can be a non-trivial amount of
  memory (e.g. I saw 150MB on one production cluster). There isn't a
  hard upper bound on this memory that I know of but should not
  grow rapidly.

This requires adjustments in a couple of other places:
* Admission control previous assumed that all of the process memory
  limit was available to queries (an assumption that is not strictly
  true because of untracked memory, etc, but close enough). However,
  the JVM heap makes a large part of the process limit unusable to
  queries, so we should only admit up to "process limit - max JVM heap
  size" per node.
* The buffer pool is now a percentage of the remaining process limit
  after the JVM heap, instead of the total process limit.

Currently, end-to-end tests fail if run with this flag for two reasons:
* The default JVM heap size is 1/4 of physical memory, which means that
  essentially all of the process memory limit is consumed by the JVM
  heaps when we running 3 impala daemons per host, unless -Xmx is
  explicitly set.
* If the heap size is limited to 1-2GB like below, then most tests pass
  but TestInsert.test_insert_large_string fails because IMPALA-4865
  lets it create giant strings that eat up all the JVM heap.

  start-impala-cluster.py \
      --impalad_args=--mem_limit_includes_jvm=true --jvm_args="-Xmx1g"

Testing:
Add a custom cluster test that uses the new option and validates the
the memory consumption values.

Change-Id: I39dd715882a32fc986755d573bd46f0fd9eefbfc
Reviewed-on: http://gerrit.cloudera.org:8080/10928
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-12-04 08:20:34 +00:00
Sahil Takiar
691f9d9ff9 IMPALA-6249: Expose several build flags via web UI
Exposes a list of build flags via the impalad web UI. The build flags
can be viewed on the root page under the "Version" section. They can
be accessed via other tests through the debug version of the root page
(e.g. adding &json to the URL). The build flags are listed in a JSON
array so that they can be parsed easily. This should help run Impala
tests against a remote Impala cluster.

The build flags are read in CMakeLists.txt and then stored in
preprocessor variables.

Three build flags are exposed as part of this commit:
- Is_NDEBUG = [true, false]
    - Whether NDEBUG was true or false at compile time
- CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN,
  UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG]
    - The value of CMAKE_BUILD_TYPE at compile time
- Library_Link_Type = [DYNAMIC, STATIC]
    - Derived from the compile time value of BUILD_SHARED_LIBS

There are a few other minor changes that are apart of this commit:

* The patch modifies environ.py so that it supports fetching build metadata
for both local and remote clusters.

* The tests under the tests/webserver directory were not being run because
'webserver' was not whitelisted in tests/run-tests.py. This patch fixes
that and addresses several test failures in run-tests.py.

* It reverts part of IMPALA-6947 so that their is no dependency from
start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947
is now set at compile time.

Testing:

Added new tests to webserver/test_web_pages.py to ensure that the build
flags are being set. Some tests are only run when run against a local
cluster because we have no way of getting the build info from a remote
cluster, whereas local clusters contain a .cmake_build_type file.

Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19
Reviewed-on: http://gerrit.cloudera.org:8080/11410
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-11-05 22:47:31 +00:00
Bikramjeet Vig
932bf2dd81 IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
This patch adds the queuing status, that is, whether the query was
queued and what was the latest queuing reason, to the ExecSummary.
Also added changes to allow impala-shell to expose this status by
pulling it out from the ExecSummary when either live_summary or
live_progress is set to true.

Testing:
Added custom cluster tests.

Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49
Reviewed-on: http://gerrit.cloudera.org:8080/11816
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-30 02:02:37 +00:00
Sean Mackrory
7a022cf36a IMPALA-7681. Add Azure Blob File System (ADLS Gen2) support.
HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the
ADLS Gen2 service. It's in the hadoop-azure module as a replacement for
WASB. Filesystem semantics should be the same, so skipped tests and
other behavior changes have simply mirrored what is done for ADLS Gen1
by default. Tests skipped on ADLS Gen1 due to eventual consistency of
the Python client can be run against ADLS Gen2.

Change-Id: I5120b071760e7655e78902dce8483f8f54de445d
Reviewed-on: http://gerrit.cloudera.org:8080/11630
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-20 06:43:00 +00:00
poojanilangekar
ec2dabafb9 IMPALA-7545: Add queuing reason to query log
After this change, the HS2 GetLog() function returns the queuing
reason for a query when it is queued by the AdmissionController.

Testing: Added an end-to-end test to test_admission_controller.py
to verify the query logs returned.

Change-Id: I2e5d8de4f6691a9ba2594ca68c54ea4dca760545
Reviewed-on: http://gerrit.cloudera.org:8080/11669
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-16 01:47:05 +00:00
Bikramjeet Vig
9ac874d1be IMPALA-7684: Fix Admission result printed in the query profile
With this patch the right result is printed for the admission result
when the query is admitted immediately vs when it is admitted after
being queued.

Testing:
Added coverage for checking the result for these two cases to an
existing test.

Change-Id: I410993a555d9590cca42902fbfa1fe7aa883be06
Reviewed-on: http://gerrit.cloudera.org:8080/11634
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-10 03:00:27 +00:00
Bikramjeet Vig
fc91e706b4 IMPALA-7349: Add Admission control support for automatically setting
per host memory limit for a query

With this patch the per host memory limit of a query is automatically
set using the mem_limit set in the query options and the mem_estimate
calculated by the planner based on the following pseudo code:

if mem_limit is set in query options:
  use that and if 'clamp-mem-limit-query-option' is true:
    enforce the min/max query mem limits defined in the pool config.
else:
  mem_limit = max(mem_estiamte,
    min_mem_limit_required_to_accomodate_largest_initial_reservation)
  finally, enforce min/max query mem limits defined in the pool
  config on this value.

This calculated mem limit will also be used for admission accounting
and consequently for admission control. Moreover, three new pool
configuration options have been added to enable this behaviour:

"min-query-mem-limit" & "max-query-mem-limit" => help
clamp the per host memory limit for a query. If both these limits
are not configured, then the estimates from planning are not used
as a memory limit and only used for making admission decisions.
Moreover the estimates will no longer have a lower bound based
on the largest initial reservation.

"clamp-mem-limit-query-option" => if false, the mem_limit defined in
the query options is used directly and the max/min query mem limits
are not enforced on it.

Testing:
Added e2e test cases.
Added frontend tests for changes to RequestPoolService.
Successfully passed exhaustive tests.

Change-Id: Ifec00141651982f5975803c2165b7d7a10ebeaa6
Reviewed-on: http://gerrit.cloudera.org:8080/11157
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-05 04:38:24 +00:00
Tim Armstrong
e83fe23a5f IMPALA-7632: fix erasure coding build for custom cluster tests
Fix tests to always pass query options via the query_options
parameter.

Modified the infrastructure to fail on non-erasure-coding builds if
tests pass in default query options in the wrong way.

Skip an restart test that makes assumptions about scheduling that EC
seems to break.

Testing:
Ran core tests with erasure coding enabled.

Change-Id: I4d809faedc0c45417519f13c73559efb6c54154e
Reviewed-on: http://gerrit.cloudera.org:8080/11536
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-28 01:23:41 +00:00
Bikramjeet Vig
ab6bd74ff7 IMPALA-7516: Fix query location accounting
As a side-effect of IMPALA-5216, queries that were scheduled to be
executed but eventually got rejected or canceled before starting
execution (initializing the coordinator object) got added to the
'query_locations' map but never got removed. As a result, the queries
debug page would sometimes show queries running under the
"Query Locations" section even if there weren't. This patch fixes that
behavior.

Testing:
Added a custom cluster test.

Change-Id: I66c8e59299747df57c9f39db1cb1f46ff6bbab7e
Reviewed-on: http://gerrit.cloudera.org:8080/11372
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-10 22:05:49 +00:00
Bikramjeet Vig
5cb956e3fd IMPALA-7361: Fix flakiness in test_heterogeneous_proc_mem_limit
This patch fixes flakiness in test_heterogeneous_proc_mem_limit wherein
one of the fragments on a host from a previous query lingers and holds
on to resources which causes the next query to fail the test since it
expects the cluster to be idle at that point.

Change-Id: I3e5c9b0c6a99d7157640c02e6b3c808b4ae9e73c
Reviewed-on: http://gerrit.cloudera.org:8080/11166
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-09 03:04:33 +00:00