Commit Graph

103 Commits

Author SHA1 Message Date
Dan Hecht
11aaa6caa0 IMPALA-7046: introduce "global" debug_actions
The motivation is to add jitter to backend startup in test_failpoints.
The race in IMPALA-7033 can be reproduced by adding jitter to the exec
rpcs when some backends fail. Let's add jitter to test_failpoints to get
better coverage of exec startup races.

This builds on top of the debug action extensions added in the async
admission control patch by allowing the new "global" debug actions
(i.e. actions that can be used in points outside of the ExecNodes).
See the code comments for details.

For now, we're only using the SLEEP and JITTER commands, but I've
included a FAIL command as well since I'll want to use that to write a
test for IMPALA-6788 to simulate exec rpc failure.

Note that I don't bother resolving the actions ahead of time (like we do
for ExecNode actions). It doesn't seem worth it since the resolution
only needs to occur after we've matched the label and I don't expect the
same label to be hit many times within a single thread. We can always
optimize this later if needed.

Testing:
- Verified that test_failpoints can reproduce the race in
  IMPALA-7033 by reverting that fix and testing.
- Ran the modified tests and grepped the impalad log to see
  that the sleeps are still occuring.
- Manually verify global FAIL command (in a build with another patch).
- Manually verified invalid debug_actions (both ExecNode and global)

Change-Id: I77663a539be18711a4f12c470ffd7474e3d69388
Reviewed-on: http://gerrit.cloudera.org:8080/10690
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-20 19:37:51 +00:00
Tim Armstrong
0d476f81f1 IMPALA-6969: add AC last queued reason to profile
The reason is updated during initial admission and when the query is at
the head of the queue but can't be admitted. It is not updated while
the query is in the middle of the queue.

Together with the async admission change, this makes it possible to
determine from the profile why the query has not been admitted yet.

Testing:
Added admission control tests that check that the
string is set for queries queued based both on the
query count and the max memory.

Looped the tests overnight to confirm non-flakiness.

Change-Id: Ida9b75dc50dfb7a27f59deda91bad6ac838130a1
Reviewed-on: http://gerrit.cloudera.org:8080/10731
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-19 19:24:09 +00:00
Tim Armstrong
547bc57121 IMPALA-7174: fix test_cancellation for RELEASE builds
The test was DOA when run against a release build because the debug
actions that it depends on were disabled. The fix is to enable the
debug actions for release builds, similar to other debug actions.

I assume the original motivation of the NDEBUG checks was to avoid
adding overhead to release builds. The cost is minimised by quickly
checking whether the string is empty before proceeding with any
further work.

Also remove wonky exception handling - the test was swallowing
exceptions but we don't expect that code to throw exceptions.

Testing:
Looped the test on a release build.

Change-Id: I41da7b5ac58a468a8ed117777969906f63df6d4b
Reviewed-on: http://gerrit.cloudera.org:8080/10722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-15 03:21:27 +00:00
Bikramjeet Vig
2de9db8fc6 IMPALA-5216: Make admission control queuing async
Implement asynchronous admission control queuing. This is achieved by
running the admission control code-path in a separate thread. Major
changes include: propagating cancellation to the admission control
thread and dequeuing thread, and adding a new Query Operation State
called "PENDING" that represents the state between completion of
planning and starting of query execution.

Testing:
- Added a deterministic end to end test and a session expiry test.
- Ran multiple stress tests successfully with a cancellation probability
of 60% and with different values for the following parameters:
max_requests, queue_wait_timeout_ms. Ensured that the impalad was in a
valid state afterwards (no orphan fragments or wrong metrics).
- Ran all exhaustive tests and ASAN core tests successfully.
- Ran data load successfully.

Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Reviewed-on: http://gerrit.cloudera.org:8080/10060
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-13 15:48:17 +00:00
Bikramjeet Vig
8d474a0170 IMPALA-3134: Support different proc mem limits among impalads for
admission control checks

Currently the admission controller assumes that all backends have the
same process mem limit as the impalad it itself is running on. With
this patch the proc mem limit for each impalad is available to the
admission controller and it uses it for making correct admisssion
decisions. It currently works under the assumption that the
per-process memory limit does not change dynamically.

Testing:
Added an e2e test.

IMPALA-5662: Log the queuing reason for a query

The queuing reason is now logged both while queuing for the first
time and while trying to dequeue.

Change-Id: Idb72eee790cc17466bbfa82e30f369a65f2b060e
Reviewed-on: http://gerrit.cloudera.org:8080/10396
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-22 22:27:21 +00:00
Tianyi Wang
21d92aacbf IMPALA-7019: Schedule EC as remote & disable failed tests
This patch schedules HDFS EC files without considering locality. Failed
tests are disabled and a jenkins build should succeed with export
ERASURE_COINDG=true.

Testing: It passes core tests.

Cherry-picks: not for 2.x.

Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Reviewed-on: http://gerrit.cloudera.org:8080/10413
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-22 01:10:14 +00:00
Tim Armstrong
ab2fc5c8b8 IMPALA-6227: reduce window of metric inconsistency
The admission controller test fetches multiple metrics relating to the
admission controller. Before this patch it fetched the whole metrics
list for each metric, meaning there was a substantial window for
the metrics to be inconsistent for a single backend. Now the metrics are
only fetched once. Metric updates are not transactional so there is
still a small window for raciness if an admission decision is made
exactly when the metrics are fetched.

Also try to detect the specific race between updating "dequeued"
and "admitted" that we saw in practice, since the race is still
possible with a smaller window. In that case we retry getting
the metrics.

Change-Id: I2f16edbec53e49446c4c37ef5f926eedb5604319
Reviewed-on: http://gerrit.cloudera.org:8080/10330
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-08 22:22:47 +00:00
Tim Armstrong
418c705787 IMPALA-6679,IMPALA-6678: reduce scan reservation
This has two related changes.

IMPALA-6679: defer scanner reservation increases
------------------------------------------------
When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
---------------------------------------------------------
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-28 23:41:39 +00:00
Tim Armstrong
c557a5bfb7 IMPALA-6906: disable test that depends on memory estimates on S3
S3 divides up scan ranges into synthetic blocks smaller than the
equivalent HDFS blocks, which in turn affects the memory estimate
calculation, so the test that was tuned for HDFS does not work
in the same way as S3.

The test is exercising an admission control code path that is
independent of the filesystem, so we don't gain important coverage by
running this on S3.

ADLS can have similar block size issues, so skip that too.

Change-Id: Ida763a402203286c02ad3cbcbed5336c70abef7c
Reviewed-on: http://gerrit.cloudera.org:8080/10207
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-26 02:59:27 +00:00
Tim Armstrong
3ebf30a2a4 IMPALA-6847: work around high memory estimates for AC
Adds MAX_MEM_ESTIMATE_FOR_ADMISSION query option, which takes
effect if and only if
* Memory-based admission control is enabled for the pool
* No mem_limit is set (i.e. best practices are not being followed)

In that case min(MAX_MEM_ESTIMATE_FOR_ADMISSION, mem_estimate)
is used for admission control instead of mem_estimate.

This provides a way to override the planner's estimate if
it happens to be incorrect and are preventing the query from
running. Setting MEM_LIMIT is usually a better alternative
but sometimes it is not feasible to set MEM_LIMIT for each
individual query.

Testing:
Added an admission control test to verify that query option allows
queries with high estimates to run.

Also tested manually on a minicluster started with:

  start-impala-cluster.py --impalad_args='-vmodule admission-controller=3 \
      -default_pool_mem_limit 12884901888'

Change-Id: Ia5fc32a507ad0f00f564dfe4f954a829ac55d14e
Reviewed-on: http://gerrit.cloudera.org:8080/10058
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-18 01:18:20 +00:00
Bikramjeet Vig
4438a85a34 IMPALA-5814: Remove startup flag to disable admission control
Remove "--disable admission control" startup flag and its related
functionality and usage.

Cherry-picks: not for 2.x

Change-Id: I9bf4087ce03ca63f82fd27c6d94b578881b85d42
Reviewed-on: http://gerrit.cloudera.org:8080/9964
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-12 21:05:22 +00:00
Tim Armstrong
35d459479e IMPALA-6227: more logging in test_admission_controller
To enable debugging the occasional flakiness, lets log what each query
is doing by default.

Change-Id: Icbb58a9be4a5d023c1ee3fd76e5992dfba03188c
Reviewed-on: http://gerrit.cloudera.org:8080/9555
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-09 00:00:18 +00:00
Tim Armstrong
b0d3433e36 IMPALA-4953,IMPALA-6437: separate AC/scheduler from catalog topic updates
This adds a set of "prioritized" statestore topics that are small but
are important to deliver in a timely manner. These are delivered more
frequently by a separate thread pool to reduce the window for stale
admission control and scheduling information.

The contract between statestore and subscriber is changed so that the
statestore can send concurrent Update() RPCs for disjoint sets of
topics. This required changes to the subscriber implementation, which
assumed that only one Update RPC would arrive at a time.

It also changes the locking in the statestore so that the prioritized
update threads don't get stuck behind the catalog threads holding
'topic_lock_'. Specifically, it uses a reader-writer lock to protect
modification of the set of topics and a reader-writer lock per topic to
allow the topic data to be read by multiple threads concurrently.

Added metrics to monitor the per-topic update interval.

Testing:
Ran core tests.

Inspected metrics on Impala daemons, saw that membership and request
queue processing times had more samples recorded than the catalog
topic, reflecting the increased frequency.

Ran under thread sanitizer, made sure no data races were reported in
Statestore or StatestoreSubscriber.

Change-Id: Ifc49c2d0f2a5bfad822545616b8c62b4b95dc210
Reviewed-on: http://gerrit.cloudera.org:8080/9123
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-14 22:44:40 +00:00
Tim Armstrong
acfd169c8e IMPALA-4319: remove some deprecated query options
Adds a concept of a "removed" query option that has no effect but does
not return an error when a user attempts to set it. These options are
not returned by "set" or "set all" commands that are executed in
impala-shell or server-side.

These query options have been deprecated for several releases:
DEFAULT_ORDER_BY_LIMIT, ABORT_ON_DEFAULT_LIMIT_EXCEEDED,
V_CPU_CORES, RESERVATION_REQUEST_TIMEOUT, RM_INITIAL_MEM,
SCAN_NODE_CODEGEN_THRESHOLD, MAX_IO_BUFFERS

RM_INITIAL_MEM did still have an effect, but it was undocumented and
MEM_LIMIT should be used in preference.

DISABLE_CACHED_READS also had an effect but it was documented as
deprecated.

Otherwise the options had no effect at all.

Testing:
Ran exhaustive build.

Updated query option tests to reflect the new behaviour.

Cherry-picks: not for 2.x.

Change-Id: I9e742e9b0eca0e5c81fd71db3122fef31522fcad
Reviewed-on: http://gerrit.cloudera.org:8080/9118
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-01 08:26:26 +00:00
stiga-huang
5c593be59c IMPALA-6301: Fix test failures when username or group name contains dots
Some tests use the local user's group name to construct SQLs, which may
lead to syntax errors when group name contains dots. We need to quote
the group names in SQL to avoid this error. Besides, a test in
test_admission_controller uses '\w+' to match the local user name. This
expression cannot match usernames with dots, which causes test failure
as well. Instead, we should use '\S+'.

Change-Id: Ib8ae15bb6a929dc48d3ad2176c8b3fafff87f32b
Reviewed-on: http://gerrit.cloudera.org:8080/8807
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-13 23:06:45 +00:00
Tim Armstrong
dc1282fbc9 IMPALA-6241: timeout in admission control test under ASAN
The fix for IMPALA-6241 is to increase the timeout for all slow builds.

While testing that fix, I discovered that the ASAN build detection logic
was failing silently, resulting in it assuming that it was testing a
DEBUG build. The error was:

  Unexpected DW_AT_name in first CU:
  /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc;
  choosing DEBUG

The fix for that issue is to remove the build type detection heuristic
and instead just write a file with the build type as part of the build process.

Testing:
Before this change I was able to reproduce locally every 5-10 test
iterations. After this change I haven't seen it reproduce.

Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536
Reviewed-on: http://gerrit.cloudera.org:8080/8652
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-29 03:28:22 +00:00
Tim Armstrong
1a7b0d0bdc IMPALA-6227: deflake admission stress tests
The problem was that, during the initial admission decision phase, some
queries were initially queued then dequeued once memory came available.
All of the accounting in the test implicitly relies on queries not being
dequeued until queries are later explicitly ended, so if this happened,
the test broke in multiple subtle ways.

This happened because the query only scanned a small number of
rows, which could be all buffered on the receiver side of the
exchange even before the client fetched any rows from the coordinator.
This means that the reserved memory on some backends could increase
then decrease during the initial admission phase, resulting in a
query being queued then dequeued.

The fix is to increase the number of rows returned by the query so that
all fragments remain active during the initial admission phase.
This increased test execution time somewhat, so I also had to bump the
queue wait timeout for the admission stress tests (they assume that
queries don't time out in the queue).

Testing:
Ran the test under debug, release and ASAN builds, i.e.

  impala-py.test tests/custom_cluster/test_admission_controller.py \
    --workload_exploration_strategy="functional-query:exhaustive"

I looped the mem_limit test for a while to confirm it didn't reproduce
(it reproduced reliably every 2-3 iterations before this fix).

It still reproduces every 5-10 runs with exhaustive+release, so I
need to do further work to make it more robust.

Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4
Reviewed-on: http://gerrit.cloudera.org:8080/8631
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-23 07:48:18 +00:00
Tim Armstrong
7487c5de04 IMPALA-1575: part 2: yield admission control resources
This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

I ran into a problem similar to IMPALA-3772 with the admission control
tests becoming flaky due to query timeouts on release builds, which I
solved in a similar way by increasing the frequency of statestore
updates.

This is based on an earlier patch by Joe McDonnell.

Change-Id: Ib1fae8dc1c4b0eca7bfa8fadae4a56ef2b37947a
Reviewed-on: http://gerrit.cloudera.org:8080/8581
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-20 04:34:47 +00:00
Tim Armstrong
a772f84562 IMPALA-6171: Revert "IMPALA-1575: part 2: yield admission control resources"
This reverts commit fe90867d89.

Change-Id: I3eec4b5a6ff350933ffda0bb80949c5960ecdf25
Reviewed-on: http://gerrit.cloudera.org:8080/8499
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-08 22:03:59 +00:00
Tim Armstrong
fe90867d89 IMPALA-1575: part 2: yield admission control resources
This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

This is based on an earlier patch by Joe McDonnell.

Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Reviewed-on: http://gerrit.cloudera.org:8080/8323
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-07 05:16:11 +00:00
Philip Zeyliger
c9740b43d1 IMPALA-5908: Allow SET to unset modified query options.
The query 'SET <option>=""' will now unset an option within the session,
reverting it to its default state.

This change became necessary when "SET" started returning an empty
string for unset options which don't have a default. The test
infrastructure (impala_test_suite.py) resets options to what it thinks
is its defaults, and, when this broke, some ASAN builds started to fail,
presumably due to a timing issue with how we re-use connections between
tests.

Previously, SessionState copied over the default options from the server
when the session was created and then mutated that. To support unsetting
options at the session layer, this change keeps a pointer to the default
server settings, keeps separately the mutations, and overlays the
options each time they're requested. Similarly, for configuration
overlays that happen per-query, the overlay is now done explicitly,
because empty per-query overlay values (key=..., value="") now have no effect.

Because "set key=''" is ambiguous between "set to the empty string" and
"unset", it's now impossible to set to the empty string, at the session
layer, an option that is configured at a previous layer. In practice,
this is just debug_action and request_pool. debug_action is essentially
an internal tool. For request_pool, this means that setting the default
request_pool via impalad command line is now a bad idea, as it can't
be cleared at a per-session level. For request_pool, the correct
course of action for users is to use placement rules, and to have a
default placement rule.

Testing:
* Added a simple test that triggered this side-effect without this code.
  Specifically, "impala-python infra/python/env/bin/py.test tests/metadata/test_set.py -s"
  with the modified set.test triggers.
* Amended tests/custom_cluster/test_admission_controller.py; it was
  useful for testing these code paths.
* Added cases to query-options-test to check behavior for both
  defaulted and non-defaulted values.
* Added a custom cluster test that checks that overlays are
  working against
* Ran an ASAN build where this was triggering previously.

Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Reviewed-on: http://gerrit.cloudera.org:8080/8070
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-05 03:04:38 +00:00
Matthew Jacobs
77e9e262af IMPALA-5838: Improve errors on AC buffer mem rejection
The error message returned when a query is rejected due to
insufficient buffer memory is misleading. It recommended a
mem_limit which would be high enough, but changing the
mem_limit may result in changing the plan, which may result
in further changes to the buffer memory requirement.

In particular, this can happen when the planner compares the
expected hash table size to the mem_limit, and decides to
choose a partitioned join over a broadcast join.

While we might consider other code changes to improve this,
for now lets just be clear in the error message.

Testing:
* Adds tests that verify the expected behavior with the new
  error message.

Change-Id: I3dc3517195508d86078a8a4b537ae7d2f52fbcb7
Reviewed-on: http://gerrit.cloudera.org:8080/7834
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-29 02:43:47 +00:00
Bikramjeet Vig
c67b198a19 IMPALA-5784: Separate planner and user set query options in profile
This separation will help the user better understand the query
runtime profile.

Testing:
Modified an existing test case.

Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876
Reviewed-on: http://gerrit.cloudera.org:8080/7721
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-24 02:42:01 +00:00
Matthew Jacobs
7264c54751 IMPALA-5644,IMPALA-5810: Min reservation improvements
Rejects queries during admission control if:
* the largest (across all backends) min buffer reservation is
  greater than the query mem_limit or buffer_pool_limit
* the sum of the min buffer reservations across the cluster
  is larger than the pool max mem resources

There are some other interesting cases to consider later:
* every per-backend min buffer reservation is less than the
  associated backend's process mem_limit; the current
  admission control code doesn't know about other backend's
  proc mem_limits.

Also reduces minimum non-reservation memory (IMPALA-5810).
See the JIRA for experimental results that show this
slightly improves min memory requirements for small queries.
One reason to tweak this is to compensate for the fact that
BufferedBlockMgr didn't count small buffers against the
BlockMgr limit, but BufferPool counts all buffers against
it.

Testing:
* Adds new test cases in test_admission_controller.py
* Adds BE tests in reservation-tracker-test for the
  reservation-util code.

Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b
Reviewed-on: http://gerrit.cloudera.org:8080/7678
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-22 08:27:12 +00:00
Tim Armstrong
c4f903033c IMPALA-3200: more buffer pool end-to-end tests
This adds most of the end-to-end tests described in the test plan.
See http://goo.gl/v3Strz.

* End-to-end test for disk spill encryption.
* Admission control test for the case when acquiring initial
  reservation fails.
* Initial reservation acquire failure test
* scratch_limit tests for Join, Agg, Sort, Analytic
* Memory usage scaling tests for Join, Agg, Sort, Analytic

Also splits out the slow sort queries in test_spilling and moves them
to exhaustive so the individual tests run faster and have better
parallelism.

Testing:
Ran all the core tests. Will do a full exhaustive run before
committing.

Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f
Reviewed-on: http://gerrit.cloudera.org:8080/7552
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-07 00:57:46 +00:00
Bikramjeet Vig
83bfc142e4 IMPALA-4276: Profile displays non-default query options set by planner
Fix to populate the non-default query options set by planner in the
runtime profile.

Added a corresponding test case.

Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Reviewed-on: http://gerrit.cloudera.org:8080/7419
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-21 01:14:07 +00:00
Bikramjeet Vig
67bc7a774c IMPALA-5104: Admit queries with mem equal to proc mem_limit
This allows queries to be admitted with estimated or requested memory
equal to the process memory limit.

Added a corresponding test case.

Change-Id: I197648f4162f2057141517b4b42ab5196884a65a
Reviewed-on: http://gerrit.cloudera.org:8080/7401
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-12 04:40:51 +00:00
Tim Armstrong
9a29dfc91b IMPALA-3748: minimum buffer requirements in planner
Compute the minimum buffer requirement for spilling nodes and
per-host estimates for the entire plan tree.

This builds on top of the existing resource estimation code, which
computes the sets of plan nodes that can execute concurrently. This is
cleaned up so that the process of producing resource requirements is
clearer. It also removes the unused VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* Fix the computation for mt_dop > 1 by distinguishing per-instance and
  per-host estimates.
* Always generate an estimate instead of unpredictably returning
  -1/"unavailable" in many circumstances - there was little rhyme or
  reason to when this happened.
* Remove the special "trivial plan" estimates. With the rest of the
  cleanup we generate estimates <= 10MB for those trivial plans through
  the normal code path.

I left one bug (IMPALA-4862) unfixed because it is subtle, will affect
estimates for many plans and will be easier to review once we have the
test infra in place.

Testing:
Added basic planner tests for resource requirements in both the MT and
non-MT cases.

Re-enabled the explain_level tests, which appears to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Reviewed-on: http://gerrit.cloudera.org:8080/5847
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-18 20:36:08 +00:00
David Knupp
f590bc0da6 IMPALA-4750: Rename test infra classes so they don't mimic test classes.
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-

git grep -l 'TestDimension' | xargs \
    sed -i 's/TestDimension/ImpalaTestDimension/g'

git grep -l 'TestMatrix' | xargs \
    sed -i 's/TestMatrix/ImpalaTestMatrix/g'

git grep -l 'TestVector' | xargs \
    sed -i 's/TestVector/ImpalaTestVector/g'

The tests all passed in an exhaustive run on the upstream jenkins
server:

http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/

Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-01-26 23:40:22 +00:00
Thomas Tauber-Marshall
3be4b3efd0 IMPALA-1169: Admission control info on the queries debug webpage
This patch adds a new event, 'Queued', to the query event log to
indicate when a query is queued by the admission controller. This
means that queries on the '/queries' page that are currently
queued will display this as their 'Last Event', making it possible
to see which queries are current queued.

It also adds a column to show the resource pool associated with
the queries, and it updates the wording of the first event that
gets marked for each query from 'Start execution' to 'Query
submitted', since this is before planning and admission control
and therefore execution hasn't actually startd yet.

Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283
Reviewed-on: http://gerrit.cloudera.org:8080/4756
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-11-07 23:26:02 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Matthew Jacobs
7237241087 IMPALA-3790: Fix admission control flaky stress test
In addition to a previous change which extended the
admission control test timeouts for code coverage jobs, the
tests with high concurrency are still experiencing timeouts
in the admission control queues (which is different from the
timeouts that were set on the test cases). Rather than
extend the timeouts on the queues as well (which would
increase the already ridiculously long test time ~2hrs),
this limits the number of concurrent queries that are
submitted with code coverage.

Change-Id: Id62f7603f1174aa02469c6ca57513c3f1fa1e221
Reviewed-on: http://gerrit.cloudera.org:8080/3861
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 02:58:02 +00:00
Matthew Jacobs
f60b2beb8d IMPALA-3790: AC tests timeout in codecoverage builds
The codecoverage builds are often timing out in the
admission control stress tests. I don't believe there to be
any admission control issue, just the codecoverage overhead
seems to be very, very high, especially under the concurrent
load generated by this tool. This increases the timeout
significantly.

Change-Id: I9e844cd9fe31464eb410707ae7ef7c71f492f129
Reviewed-on: http://gerrit.cloudera.org:8080/3794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-07-27 22:43:45 +00:00
Taras Bobrovytsky
609b80410e Clean up Python test import statements
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.

Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-07-15 23:26:18 +00:00
Matthew Jacobs
4680252a9e IMPALA-3772: Fix admission control flaky stress test
The admission control stress test could occasionally fail
with some queued queries timing out. At some point the
default statestore topic update rpc frequency was increased
to 2sec, and because the updates trigger the admission
control queue check (to determine if queued queries can be
dequeued), it is possible that the topic updates weren't
frequent enough for a large admission control queue to fully
drain within the 60sec queue timeout.

In the test that failed, queries were submitted round-robin
so the admission control behavior is non-deterministic.
There isn't enough information to prove that this is
definitely the issue, but it seems likely and improving the
test seems valuable regardless.

This change reduces the statestore topic update frequency to
500ms so that state can be shared faster and the queued
requests have more chances to drain within the 60sec queue
timeout. The code was also a bit confusing before because it
was waiting for statestore topic updates but the test had
only changed the heartbeat frequency (which is a different
RPC). This now sets the lower frequency for both and makes
the code a bit more clear.

Change-Id: I235a14c4674240dc0a01dabb664da87c752153cf
Reviewed-on: http://gerrit.cloudera.org:8080/3450
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2016-07-14 19:04:44 +00:00
Matthew Jacobs
a1b035a251 IMPALA-3600: Add missing admission control tests
* -require_username (not strictly admission control related
  but it came up in the context of RM).
* Coverage of failure cases: The handling of the full queue
  case wasn't being verified. This changes existing stress
  test to expect a specific message when the queue is full.
* Requesting MAXINT memory, which previously led to an
  overflow in the pool-level mem tracker accounting.

This does not yet address:
* Changing pool cfg while running
* Verify profile string for queued reason

This is just a minimal incremental change to get additional
coverage. Right now, many of the tests rely on some
pre-defined configuration files which is cumbersome. In the
future, we plan on refreshing the configuration story at
which point we should also build more general test
infrastructure for easily testing different configurations.

Change-Id: I6682b15a5caac5824384c4b48a7b40afa2548954
Reviewed-on: http://gerrit.cloudera.org:8080/3272
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2016-06-06 18:34:13 -07:00
Matthew Jacobs
309040088f IMPALA-1092: Fix estimates for trivial coord-only queries
Trivial queries (e.g. those with only const exprs or limit
0) currently get assigned a mem resource estimate of '0' (to
indicate unknown), which the scheduler treats conservatively
by reserving 4gb/node. This changes the Planner to handle
these trivial queries differently, assigning them a tiny
mem estimate.

Change-Id: I4913d316cec039dc3fffbaecf28d4caa97e398d1
Reviewed-on: http://gerrit.cloudera.org:8080/2308
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-02-26 15:37:24 -08:00
Matthew Jacobs
3c2cb95698 Memory-Based Admission Control Improvements
Improves the memory-based admission control mechanism by
"reserving" memory on both a cluster-wide basis and also a
per-node basis. The algorithm today only accounts for
cluster-wide memory, so we may oversubscribe particular
impalads even if queries are admitted within the allowed
cluster-wide pool memory limits.

The header comment in admission-controller.h explains the
new algorithm in much more detail.

Testing notes:
The admission control functional tests exercise the max
running queries and memory limits at a high level, though
they don’t yet validate the behavior of the new per-node mem
accounting. Those tests will be written soon. For now, there
is manual test coverage of the new mem resources behavior:
 a) Submitting queries with mem limits, queries use a subset
    of the nodes so they compete for resources on some nodes
    more than others - sanity checked via metrics
 b) Submitting queries without mem limits, verified
    mem_reserved appears to be as expected.

Most* EE tests were also forced to run with admission control
enabled (by manually changing gflag defaults before running),
and checked there was not unexpected behavior in the following
scenarios:
 a) a default pool with no limits (i.e. should be same
    behavior as no admission control)
 b) a default pool with mem resources set, and a default query
    mem limit of 1g
 c) a default pool with mem resources set but no default mem
    limit: this case falls back to using planner estimates
    during admission (supported for legacy only), so queries
    with very large estimates are rejected and those tests
    fail (this is expected). The new CM UI will not allow
    this configuration moving forward so it should be
    uncommon.

*Excluding custom_cluster and some others for unrelated issues.
The majority of regular tests passed, those that failed were not
due to AC issues.

Change-Id: Ia0d1eb8c07457cbe4b67b7f7f57136b4774720bc
Reviewed-on: http://gerrit.cloudera.org:8080/1710
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-02-11 21:23:10 +00:00
Matthew Jacobs
7c81e7e057 IMPALA-2538,IMPALA-1168: Add per-pool settings, metric rename
Adds the ability to set:
1) Per-pool MEM_LIMIT query options (IMPALA-2538).
2) Per-pool queue timeouts (milliseconds) (IMPALA-1168).

Both are set via the llama-site.xml (future work to define
a better admission control specific configuration format,
IMPALA-2573).

Also renames a number of admission control pool metrics in
preparation for a larger change to the admission control
logic. This will allow us to get the new metric names
available so that CM can start collecting them before the
other changes which will take longer to get in.

This mostly just changes the string metric key names with a
few small exceptions:
1) The 'cluster-mem-usage' metric was removed because we
   will not be sending per-backend mem_usage in statestore
   updates anymore, so the aggregate metric no longer makes
   sense.
2) The 'cluster-mem-reserved' metric is registered even
   though it is not yet updated. Having it exposed unblocks
   is necessary for CM. The follow-up change adds the
   ability to collect mem_reserved from the pool MemTracker
   and to update this metric.

Change-Id: Ie36b8a06b1b11c8ecad63c3ac4506d369b9835fa
Reviewed-on: http://gerrit.cloudera.org:8080/1806
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-02-05 20:17:19 +00:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
Matthew Jacobs
3e031fb3fc Change default admission control limits
Increases the pool limits for the number of concurrent requests
and the default queue size from 20/50 to 200/200. This is
accompanied by a CM change that updates these pool defaults
as well.

Change-Id: Ifd7c7862a45e56bdb34c6d9c3b2620eef9591369
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5329
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 26b184d3ce207200a7441eaf7818074089cbad65)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5363
2014-11-21 15:40:12 -08:00
Henry Robinson
433a86263c Separate failure detection from topic updates in statestore
As originally envisaged, the statestore was not intended to transmit GBs
of data to large clusters. Now that it is, some changes to the design
are required to improve stability and responsiveness.

One of the most acute problems is the way that the topic updates, which
may be extremely large, were responsible for conveying failure detection
information. The delay in sending topic heartbeats (which were often
blocked behind a queue of other updates) meant that large clusters
needed to set large timeouts for every subscriber, and to reduce the
frequency at which topic updates were sent.

This change adds a second heartbeat type to the statestore which is
responsibie only for coordinating failure information between the
subscriber and the statestore. These 'keep-alive' heartbeats are sent
much more frequently, as they have a tiny payload and do very little
work. The statestore now does not use the result of topic updates to
maintain its view of which subscribers are alive or dead, but only the
results of the keep-alive heartbeats.

I've tested these changes with a 500MB catalog on a 73 node
cluster. Keep-alive frequencies are nice and stable (at 500ms out of the
box) even during the initial large topic-update distribution phase, or
after a statestore failure.

Internally, this patch changes the nomenclature for the statestore: we
stay away from 'heartbeat' (except for the failure detector, which is
heartbeat-based), and instead use 'message' as a generic term for both
keep-alive and topic update. Externally, we need to rationalise the flag
names to control both update types. The benefit of this change is that
only the keep-alive frequency matters for cluster stability, and that
depends only on the cluster size, not on the size of the catalog and the
cluster size. So we can set it to, e.g., 1s out of the box with some
confidence that that will work for clusters up to ~200 nodes. The topic
update frequency may actually be set aggressively, because the
statestore will naturally throttle itself during times of heavy traffic
and nothing will time out.

Change-Id: Ia447e4ebefda890a5b810a213e97f00cca499989
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4003
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5138
2014-11-05 14:59:25 -08:00
Dan Hecht
4895c68bd1 Remove the test workaround for IMPALA-1047.
I ran 330 iterations of the test (~13 hours) without a single failure.
Also inspected the Cancel code and didn't notice any potential races
that could lead to this.  If anyone knows some secret recipe for
reproducing this, please let me know.  Otherwise, let's remove the
workaround in the test.

Change-Id: I216ed6416e150a790b8453afe5890efadf472739
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3720
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 06c771301004deb775ebc0c41eb5d065c5f2ed1e)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3752
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
2014-08-04 18:34:48 -07:00
Matthew Jacobs
b3c98cf3c8 Fix occasional admission control test failures
The admission control tests could occasionally fail when cancelled
queries return OK (IMPALA-1047). Until fixed, we can just treat
such queries as if there were cancelled.

Change-Id: Id9fc8e9f585e466059d4ffefb4d9ed407206ad1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3019
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 2901a8a960076f2aec74cb5a1f5000953359a68f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3025
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-06-16 15:50:33 -07:00
Matthew Jacobs
dbe1b534ed IMPALA-1050: NPE error when pool placement policy cannot map user to pool
Change-Id: I53ed823ee55bee96269f4119af7da2dab25d4a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3028
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 569bd5d4a8e30a907a33551c58a3ab80849b8dc9)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3061
2014-06-15 13:38:20 -07:00
Lenni Kuff
bb09b5270f IMPALA-839: Update tests to be more thorough when run exhaustively
Some tests have constraints that were there only to help reduce runtime which
reduces coverage when running in exhaustive mode. The majority of the constraints
are because it adds no value to run the test across additional dimensions (or
it is invalid to run with those dimensions). Updates the tests that have
legitimate constraints to use two new helper methods for constraining the table format
dimension:
create_uncompressed_text_dimension()
create_parquet_dimension()

These will create a dimension that will produce a single test vector, either
uncompressed text or parquet respectively.

Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290
2014-04-18 20:11:31 -07:00
Matthew Jacobs
e817c3742c Admission controller: fix a number of TODOs
* Remove requirement that fair scheduler and Llama conf files be on the classpath if
  specified as relative paths. Now they can be specified as any relative or absolute
  path.
* Add flags to disable all per-pool max requests limits or mem limits.
* Rename RequestPoolUtils to RequestPoolService
* Make it more clear RequestPoolService is a singleton by putting it in ExecEnv
* FileWatchService: use Executors.newScheduledThreadPool instead of a thread
* Moved MEGABYTE (and related constants) to new Constants class (frontend)
* Test RequestPoolService: Removed AllocationFileLoaderServiceHelper, replaced with
  reflection

Change-Id: Iadf79cf77a7894a469c3587d0019a6d0bee7e58f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1787
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b9a167f6fdb4ab2595aca6035e1f9d926b909d94)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1858
2014-03-12 14:23:54 -07:00
Matthew Jacobs
d64c516fa8 Admission controller: Add mem limit to tests
Change-Id: Ieae5c25e0d034317113f97ed66b8971cd80e0bae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1705
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 0d8d1fa370264acd94d62399863ab751e6cbff06)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1804
2014-03-07 15:46:08 -08:00
Matthew Jacobs
d0386083fb Admission control tests: Increase thread join timeout and remove unnecessary locking
In some rare cases on overloaded machines, the thread join timeout of 10 seconds isn't
long enough. Also, taking the lock at that time isn't necessary because the main thread
will not attempt to cancel a thread unless it is already in the list of running threads.
Threads are added to that list only after they submit their query.

Change-Id: I23a67d726bc25221f0e9331ca1a3e9f5363f821d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1744
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 27cf239592fafdb36a5680c480914f38a16037da)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1760
2014-03-07 14:49:57 -08:00
Matthew Jacobs
989830186f Remove RM pool configuration and yarn_pool query option/profile property
Admission control adds support for configuring pools via a fair scheduler
allocation configuration, so the pool configuration mechanism is no longer
needed. This also renames the "yarn_pool" query option to the more general
"request_pool" as it can also be used to configure the admission controller
when RM/Yarn is not used. Similarly, the query profile shows the pool as
"Request Pool" rather than "Yarn Pool".

Change-Id: Id2cefb77ccec000e8df954532399d27eb18a2309
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1668
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8d59416fb519ec357f23b5267949fd9682c9d62f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1759
2014-03-06 14:46:09 -08:00