TestRPCTimeout::test_reportexecstatus_retries is broken in exhaustive
exploration after IMPALA-10465. This patch fix the issue by changing the
test query to raise column constraint violation rather than primary key
conflict.
Testing:
- Pass custom_cluster/test_rpc_timeout.py::TestRPCTimeout in exhaustive
exploration.
Change-Id: I67b9555d823f5cf5be59900d89e305ef92e5e89f
Reviewed-on: http://gerrit.cloudera.org:8080/18573
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In a rare case, callback Coordinator::BackendState::ExecCompleteCb()
is not called for the corresponding ExecQueryFInstances RPC when the
RPC is cancelled. This causes coordinator to wait indefinitely when
calling Coordinator::BackendState::Cancel() to cancel a fragment
instance.
This patch adds timeout for BackendState::WaitOnExecLocked() so that
coordinator will not be blocked indefinitely when cancelling a query.
Testing:
- Added a test case to simulate the callback missing when a query
is failed. Verified that the coordinator would hang without the
fixing, and would not hang with the fixing.
- Passed exhaustive-debug tests.
Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be
Reviewed-on: http://gerrit.cloudera.org:8080/18439
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
--control_service_queue_mem_limit is set to 1% by default
to increase the maximum size of the queue in typical
production deployments. E.g. an Impala daemon with
a 50GB memory limit will have a limit of 512MB on
the control service queue.
Add --control_service_queue_mem_limit_floor_bytes so
that this does not have the unintended effect of
reducing the memory given to the control service queue.
I.e. the default behaviour does not change for
impala daemons with a daemon mem limit of <= 5000MB,
but does increase the control service queue memory
limit for impala daemons with mem limits > 5000MB.
The default process memory limit in the mocked backend
test ExecEnv is changed to be 8GB. Previously it
was unlimited, so we couldn't calculate 1% of it.
It cannot be unlimited in an actual impalad since
IMPALA-5653 was fixed.
Testing:
This had been previously problematic on a 64 node TPC-DS
workload with mt_dop=12 where impalads had ~100GB of memory.
Status report RPCs would fail and have to be retried.
We tested this new value on the same workload and the retries
were avoided.
Change-Id: Ic7fe93b5ce7eb6b63e48293ac287d98cc1d9e3fa
Reviewed-on: http://gerrit.cloudera.org:8080/16848
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Adds the following startup flags for statestore subscribers:
'statestore_client_rpc_timeout_ms'. The timeout is set to 5 minutes by
default.
Testing:
* Adds some tests for catalog_client_rpc_timeout_ms that validate the
timeout is used correctly, and that retries are triggered
Change-Id: If49892ff1950cf474f951aabf4c952dbc44189e2
Reviewed-on: http://gerrit.cloudera.org:8080/16150
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch ports the ExecQueryFInstances rpc to use KRPC. The
parameters for this call contain a huge number of Thrift structs
(eg. everything related to TPlanNode and TExpr), so to avoid
converting all of these to protobuf and the resulting effect that
would have on the FE and catalog, this patch stores most of the
parameters in a sidecar (in particular the TQueryCtx,
TPlanFragmentCtx's, and TPlanFragmentInstanceCtx's).
Testing:
- Passed a full exhaustive run on the minicluster.
Set up a ten node cluster with tpch 500:
- Ran perf tests: 3 iterations per tpch query, 4 concurrent streams,
no perf change.
- Ran the stress test for 1000 queries, passed.
Change-Id: Id3f1c6099109bd8e5361188005a7d0e892147570
Reviewed-on: http://gerrit.cloudera.org:8080/13583
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch removes the FAULT_INJECTION_RPC_DELAY macro and replaces
its uses with DebugAction which is more flexible. For example, it
supports JITTER which injects random delays.
Every backend rpc has a debug action of the form RPC_NAME_DELAY.
DebugAction has previously always been used via query options.
However, for the rpcs considered here there is not always a query with
an accessible TQUeryOptions available (for example, we do not send any
query info with the RemoteShutdown rpc), so this patch introduces a
flag, '--debug_actions', which is used to control these rpc delay
debug actions.
Testing:
- Updated existing tests to use the new mechanism.
Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
Reviewed-on: http://gerrit.cloudera.org:8080/13060
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The coordinator currently waits indefinitely if it does not receive a
status report from a backend. This could cause a query to hang
indefinitely in certain situations, for example if the backend decides
to cancel itself as a result of failed status report rpcs.
This patch adds a thread to ImpalaServer which periodically iterates
over all queries for which that server is the coordinator and cancels
any that haven't had a report from a backend in a certain amount of
time.
This patch adds two flags:
--status_report_max_retry_s: the maximum number of seconds a backend
will attempt to send status reports before giving up. This is used
in place of --status_report_max_retries which is now deprecated.
--status_report_cancellation_padding: the coordinator will wait
--status_report_max_retry_s *
(1 + --status_report_cancellation_padding / 100)
before concluding a backend is not responding and cancelling the
query.
Testing:
- Added a functional test that runs a query that is cancelled through
the new mechanism.
- Passed a full set of exhaustive tests.
Ran tests on a 10 node cluster loaded with tpch 500:
- Ran the stress test for 1000 queries with the debug actions:
'REPORT_EXEC_STATUS_DELAY:JITTER@1000'
Prior to this patch, this setup results in hanging queries. With
this patch, no hangs were observed.
- Ran perf tests with 4 concurrent streams, 3 iterations per query.
Found no change in performance.
Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Reviewed-on: http://gerrit.cloudera.org:8080/12299
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.
The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
the default webserver port. It failed previously because it
relied on start-impala-cluster.py setting -webserver_port
for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
it.
* Fix some tests that had 'localhost' hardcoded - instead it should
be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
the same for all dockerised impalads.
Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:
./buildall.sh -noclean -notests -ninja
ninja -j $IMPALA_BUILD_THREADS docker_images
export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
export FE_TEST=false
export BE_TEST=false
export JDBC_TEST=false
export CLUSTER_TEST=false
./bin/run-all-tests.sh
Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The coordinator skips over any stale or duplicated status
reports of fragment instances. In the previous implementation,
the index pointing into the vector of Thrift profiles wasn't
updated when skipping over a status report. This breaks the
assumption that the status reports and thrift profiles vectors
have one-to-one correspondence. Consequently, we may end up
passing the wrong profile to InstanceStats::Update(), leading
to random crashes.
This change fixes the problem above by using iterators to
iterate through the status reports and thrift profiles vectors
and ensures that both iterators are updated on every iteration
of the loop.
Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Reviewed-on: http://gerrit.cloudera.org:8080/12651
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test is designed to cause ReportExecStatus() rpcs to fail by
backing up the control service queue. Previously, after a failed
ReportExecStatus() we would wait 'report_status_retry_interval_ms'
between retries, which was 100ms by default and wasn't touched by the
test. That 100ms was right on the edge of being enough time for the
coordinator to keep up with processing the reports, so that some would
fail but most would succeed. It was always possible that we could hit
IMPALA-2990 in this setup, but it was unlikely.
Now, with IMPALA-4555 'report_status_retry_interval_ms' was removed
and we instead wait 'status_report_interval_ms' between retries. By
default, this is 5000ms, so it should give the coordinator even more
time and make these issues less likely. However, the test sets
'status_report_interval_ms' to 10ms, which isn't nearly enough time
for the coordinator to do its processing, causing lots of the
ReportExecStatus() rpcs to fail and making us hit IMPALA-2990 pretty
often.
The solution is to set 'status_report_interval_ms' to 100ms in the
test, which roughly achieves the same retry frequency as before. The
same change is made to a similar test test_reportexecstatus_timeout.
Testing:
- Ran test_reportexecstatus_retry in a loop 400 times without seeing a
failure. It previously repro-ed for me about once per 50 runs.
- Manually verified that both tests are still hitting the error paths
that they are supposed to be testing.
Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Reviewed-on: http://gerrit.cloudera.org:8080/12461
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
QueryState periodically collects runtime profiles from all of its
fragment instances and sends them to the coordinator. Previously, each
time this happens, if the rpc fails, QueryState will retry twice after
a configurable timeout and then cancel the fragment instances under
the assumption that the coordinator no longer exists.
We've found in real clusters that this logic is too sensitive to
failed rpcs and can result in fragment instances being cancelled even
in cases where the coordinator is still running.
This patch makes a few improvements to this logic:
- When a report fails to send, instead of retrying the same report
quickly (after waiting report_status_retry_interval_ms), we wait the
regular reporting interval (status_report_interval_ms), regenerate
any stale portions of the report, and then retry.
- A new flag, --status_report_max_retries, is introduced, which
controls the number of failed reports that are allowed before the
query is cancelled. --report_status_retry_interval_ms is removed.
- Backoff is used for repeated failed attempts, such that for a period
between retries of 't', on try 'n' the actual timeout will be t * n.
Testing:
- Added a test which results in a large number of failed intermediate
status reports but still succeeds.
Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Reviewed-on: http://gerrit.cloudera.org:8080/12049
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.
This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.
This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.
Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
and RPC retries/timeout.
Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Reviewed-on: http://gerrit.cloudera.org:8080/10855
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The "exec resources" reference count on the QueryState expects that
it will transition from 0 -> non-zero -> 0 at most once. The
reference count is taken on the coordinator side (sender of this
RPC) and also the backend (receiver of this RPC). Usually, the
lifetimes of those references overlap (the coordinator won't give up
the reference until the backend execution is complete or failed),
and so the assumption is not violated. However, when the RPC times
out, the receiver may run after the sender has given up its
reference (since the sender doesn't know the receiver is actually
still executing).
As it turns out, the coordinator doesn't really need to take a
reference given the current code (verified via code inspection), as
these resources are backend-only). So, stop taking the reference on
the coordinator side, and add some DCHECKs to document that (the
dchecks aren't particularly good at verifying it, however, since the
lifetimes generally will overlap).
Note that this patch can't be easily backported to older versions
without careful inspection since older versions of the code may have
relied on the reference count protecting things used by the
coordinator.
Testing:
- New test_rpc_timeout case that reproduced the problem 100%
- exhaustive build
Change-Id: If60d983e0e68b00e6557185db1f86757ab8b3f2d
Reviewed-on: http://gerrit.cloudera.org:8080/11339
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Datastream sender default timeout is 2 mins which could
block some fragments to complete until the timeout and
cause the metric "num-fragments-in-flight" not back to 0
after 60 seconds.
Decrease the sender timeout to 30 seconds and adding
some logging.
Change-Id: I19f8b3fea66c5a0398e3476a46f060be9f951983
Reviewed-on: http://gerrit.cloudera.org:8080/4080
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a configurable timeout for all backend client
RPC to avoid query hang issue.
Prior to this change, Impala doesn't set socket send/recv timeout for
backend client. RPC will wait forever for data. In extreme cases
of bad network or destination host has kernel panic, sender will not
get response and RPC will hang. Query hang is hard to detect. If
hang happens at ExecRemoteFragment() or CancelPlanFragments(), query
cannot be canelled unless you restart coordinator.
Added send/recv timeout to all RPCs to avoid query hang. For catalog
client, keep default timeout to 0 (no timeout) because ExecDdl()
could take very long time if table has many partitons, mainly waiting
for HMS API call.
Added a wrapper RetryRpcRecv() to wait for receiver response for
longer time. This is needed by certain RPCs. For example, TransmitData()
by DataStreamSender, receiver could hold response to add back pressure.
If an RPC fails, the connection is left in an unrecoverable state.
we don't put the underlying connection back to cache but close it. This
is to make sure broken connection won't cause more RPC failure.
Added retry for CancelPlanFragment RPC. This reduces the chance that cancel
request gets lost due to unstable network, but this can cause cancellation
takes longer time. and make test_lifecycle.py more flaky.
The metric num-fragments-in-flight might not be 0 yet due to previous tests.
Modified the test to check the metric delta instead of comparing to 0 to
reduce flakyness. However, this might not capture some failures.
Besides the new EE test, I used the following iptables rule to
inject network failure to verify RPCs never hang.
1. Block network traffic on a port completely
iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP
2. Randomly drop 5% of TCP packets to slowdown network
iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP
Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Reviewed-on: http://gerrit.cloudera.org:8080/3343
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins