This patch make following changes to support running KRPC over UDS.
- Add FLAGS_rpc_use_unix_domain_socket to enable running KRPC over
UDS. Add FLAGS_uds_address_unique_id to specify unique Id for UDS
address. It could be 'ip_address', 'backend_id', or 'none'.
- Add variable uds_address in NetworkAddressPB and TNetworkAddress.
Replace TNetworkAddress with NetworkAddressPB for KRPC related
class variables and APIs.
- Set UDS address for each daemon as @impala-kprc:<unique_id>
during initialization with unique_id specified by starting flag
FLAGS_uds_address_unique_id.
- When FLAG_rpc_use_unix_domain_socket is true, the socket of KRPC
server will be binded to the UDS address of the daemon.
KRPC Client will connect to KRPC server with the UDS address of
the server when creating proxy service, which in turn call
kudu::Socket::Connect() function to connect KRPC server.
- rpcz Web page show TCP related stats as 'N/A' when using UDS.
Show remote UDS address for KRPC inbound connections on rpcz Web
page as '*' when using UDS since the remote UDS addresses are
not available.
- Add new unit-tests for UDS.
- BackendId of admissiond is not available. Use admissiond's IP
address as unique ID for UDS.
TODO: Advertise BackendId of admissiond in global admission
control mode.
Testing:
- Passed core test with FLAG_rpc_use_unix_domain_socket as fault
value false.
- Passed core test with FLAG_rpc_use_unix_domain_socket as true.
Change-Id: I439f5a03eb425c17451bcaa96a154bb0bca17ee7
Reviewed-on: http://gerrit.cloudera.org:8080/18369
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.
New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.
Follow-up:
- Support for caching COS file handles will be addressed in
IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
COS (IMPALA-10773).
Tests:
- Upload hdfs test data to a COS bucket. Modify all locations in HMS
DB to point to the COS bucket. Remove some hdfs caching params.
Run CORE tests.
Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ImpaladCatalog#updateCatalog() doesn't trigger a full topic update
request when detecting catalogServiceId changes. It just updates the
local catalogServiceId and throws an exception to abort applying the
DDL/DML results. This causes a problem when catalogd is restarted and
the DDL/DML is executed on the restarted instance. In this case, only
the local catalogServiceId is updated to the latest. The local catalog
remains stale. Then when dealing with the following updates from
statestore, the catalogServiceId always matches, so updates will be
applied without exceptions. However, the catalog objects usually won't
be updated since they have higher versions (from the old catalogd
instance) than those in the update. This brings the local catalog out
of sync until the catalog version of the new catalogd grows larger
enough.
Note that in dealing with the catalog updates from statestore, if the
catalogServiceId unmatches, impalad will request a full topic update.
See more in ImpalaServer::CatalogUpdateCallback().
This patch fixes this issue by checking the catalogServiceId before
invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match
the one in the DDL/DML result, wait until it changes. The following
update from statestore will change it and unblocks the DDL/DML thread.
Testing
add several tests in
tests/custom_cluster/test_restart_services.py
Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Reviewed-on: http://gerrit.cloudera.org:8080/17645
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Making deadline for Shutdown caused by SIGRTMIN configurable
using flag: shutdown_deadline_s. The deadline for shutdown
by SIGRTMIN was fixed to 1 year and was independent of the
flag earlier. This patch ensures even this shutdown behaviour
is governed by the common flag: shutdown_deadline_s.
TESTING:
1. Modified existing test to reflect the configurable deadline.
2. Verified manually
3. Ran the cluster tests (which include test_restart_services)
Change-Id: I52cb1ba76e7ce9de86ceb2f84389b1ab257e4c05
Reviewed-on: http://gerrit.cloudera.org:8080/17348
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The legacy Thrift based Impala internal service has been deprecated
and can be removed now.
This patch removes ImpalaInternalService. All infrastructures around it
are cleaned up, except one place for flag be_port.
StatestoreSubscriber::subscriber_id consists be_port, but we cannot
change format of subscriber_id now. This remaining be_port issue will be
fixed in a succeeding patch (part 4).
TQueryCtx.coord_address is changed to TQueryCtx.coord_hostname since the
port in TQueryCtx.coord_address is set as be_port and is unused now.
Also Rename TQueryCtx.coord_krpc_address as TQueryCtx.coord_ip_address.
Testing:
- Passed the exhaustive test.
- Passed Quasar-L0 test.
Change-Id: I5fa83c8009590124dded4783f77ef70fa30119e6
Reviewed-on: http://gerrit.cloudera.org:8080/16291
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch contains the following refactors that are needed for the
admission control service, in order to make the main patch easier to
review:
- Adds a new class AdmissionControlClient which will be used to
abstract the logic for submitting queries to either a local or
remote admission controller out from ClientRequestState/Coordinator.
Currently only local submission is supported.
- SubmitForAdmission now takes a BackendId representing the
coordinator instead of assuming that the local impalad will be the
coordinator.
- The CRS_BEFORE_ADMISSION debug action is moved into
SubmitForAdmission() so that it will be executed on whichever daemon
is performing admission control rather than always on the
coordinator (needed for TestAdmissionController.test_cancellation).
- ShardedQueryMap is extended to allow keys to be either TUniqueId or
UniqueIdPB and Add(), Get(), and Delete() convenience functions are
added.
- Some utils related to seralizing Thrift objects into sidecars are
added.
Testing:
- Passed a run of existing core tests.
Change-Id: I7974a979cf05ed569f31e1ab20694e29fd3e4508
Reviewed-on: http://gerrit.cloudera.org:8080/16411
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch introduces the concept of 'backend ids', which are unique
ids that can be used to identify individual impalads. The ids are
generated by each impalad on startup.
The patch then uses the ids to fix a bug where the statestore may fail
to inform coordinators when an executor has failed and restarted. The
bug was caused by the fact that the statestore cluster membership
topic was keyed on statestore subscriber ids, which are host:port
pairs.
So, if an impalad fails and a new one is started at the same host:port
before a particular coordinator has a cluster membership update
generated for it by the statestore, the statestore has no way of
differentiating the prior impalad from the newly started impalad, and
the topic update will not show the deletion of the original impalad.
With this patch, the cluster membership topic is now keyed by backend
id, so since the restarted impalad will have a different backend id
the next membership update after the prior impalad failed is
guaranteed to reflect that failure.
The patch also logs the backend ids on startup and adds them to the
/backends webui page and to the query locations section of the
/queries page, for use in debugging.
Further patches will apply the backend ids in other places where we
currently key off host:port pairs to identify impalads.
Testing:
- Added an e2e test that uses a new debug action to add delay to
statestore topic updates. Due to the use of JITTER the test is
non-deterministic, but it repros the original issue locally for me
about 50% of the time.
- Passed a full run of existing tests.
Change-Id: Icf8067349ed6b765f6fed830b7140f60738e9061
Reviewed-on: http://gerrit.cloudera.org:8080/15321
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch applies various fixes to Impala and to the copied Kudu
source code in be/src/kudu/* to allow everything to compile.
Some highlights of the changes made:
- Various Kudu files were removed from compilation due to issues like
relying on libraries that Impala does not provide. The linking of
some executable is also changed for similar reasons.
- The Kudu Cache implementation changed to support unique_ptr,
allowing us to remove various uses of MakeScopeExitTrigger.
- Some flags that have a DEFINE in both Kudu and Impala are modified
to change one of the DEFINEs to a DECLARE.
This patch was in part based on the patches that were applied the last
time we rebased the Kudu code in IMPALA-7006, and I ensured that all
changes from those commits that are still relevant were included here.
I also went through all commits that have been applied to the
be/src/kudu directory since the last rebase and ensured that all
relevant changes from those are included here.
Testing:
- Passed an exhaustive DEBUG build and a core ASAN build.
Change-Id: I1eb4caf927c729109426fb50a28b5e15d6ac46cb
Reviewed-on: http://gerrit.cloudera.org:8080/15144
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.
This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.
Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests
Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Reviewed-on: http://gerrit.cloudera.org:8080/14246
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a helper script to initiate graceful daemon shutdown
via the signaling mechanism. It also includes that helper script in the
docker containers.
Testing: This change adds a test to verify that the script works as
expected. In addition, I manually verified that the script gets added to
the containers and that calling it inside the container will cause a
shutdown as expected.
Change-Id: I877483a385cd0747f69b82a6488de203a4029599
Reviewed-on: http://gerrit.cloudera.org:8080/13912
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds support for running queries inside a single admission
control pool on one of several, disjoint sets of executors called
"executor groups".
Executors can be configured with an executor group through the newly
added '--executor_groups' flag. Note that in anticipation of future
changes, the flag already uses the plural form, but only a single
executor group may be specified for now. Each executor group
specification can optionally contain a minimum size, separated by a
':', e.g. --executor_groups default-pool-1:3. Only when the cluster
membership contains at least that number of executors for the groups
will it be considered for admission.
Executor groups are mapped to resource pools by their name: An executor
group can service queries from a resource pool if the pool name is a
prefix of the group name separated by a '-'. For example, queries in
poll poolA can be serviced by executor groups named poolA-1 and poolA-2,
but not by groups name foo or poolB-1.
During scheduling, executor groups are considered in alphabetical order.
This means that one group is filled up entirely before a subsequent
group is considered for admission. Groups also need to pass a health
check before considered. In particular, they must contain at least the
minimum number of executors specified.
If no group is specified during startup, executors are added to the
default executor group. If - during admission - no executor group for a
pool can be found and the default group is non-empty, then the default
group is considered. The default group does not have a minimum size.
This change inverts the order of scheduling and admission. Prior to this
change, queries were scheduled before submitting them to the admission
controller. Now the admission controller computes schedules for all
candidate executor groups before each admission attempt. If the cluster
membership has not changed, then the schedules of the previous attempt
will be reused. This means that queries will no longer fail if the
cluster membership changes while they are queued in the admission
controller.
This change also alters the default behavior when using a dedicated
coordinator and no executors have registered yet. Prior to this change,
a query would fail immediately with an error ("No executors registered
in group"). Now a query will get queued and wait until executors show
up, or it times out after the pools queue timeout period.
Testing:
This change adds a new custom cluster test for executor groups. It
makes use of new capabilities added to start-impala-cluster.py to bring
up additional executors into an already running cluster.
Additionally, this change adds an instructional implementation of
executor group based autoscaling, which can be used during development.
It also adds a helper to run queries concurrently. Both are used in a
new test to exercise the executor group logic and to prevent regressions
to these tools.
In addition to these tests, the existing tests for the admission
controller (both BE and EE tests) thoroughly exercise the changed code.
Some of them required changes themselves to reflect the new behavior.
I looped the new tests (test_executor_groups and test_auto_scaling) for
a night (110 iterations each) without any issues.
I also started an autoscaling cluster with a single group and ran
TPC-DS, TPC-H, and test_queries on it successfully.
Known limitations:
When using executor groups, only a single coordinator and a single AC
pool (i.e. the default pool) are supported. Executors to not include the
number of currently running queries in their statestore updates and so
admission controllers are not aware of the number of queries admitted by
other controllers per host.
Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b
Reviewed-on: http://gerrit.cloudera.org:8080/13550
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a class to track cluster membership called
ClusterMembershipMgr. It replaces the logic that was partially
duplicated between the ImpalaServer and the Coordinator and makes sure
that the local backend descriptor is consistent (IMPALA-8469).
The ClusterMembershipMgr maintains a view of the cluster membership and
incorporates incoming updates from the statestore. It also registers the
local backend with the statestore after startup. Clients can obtain a
consistent, immutable snapshot of the current cluster membership from
the ClusterMembershipMgr. Additionally, callbacks can be registered to
receive notifications of cluster membership changes. The ImpalaServer
and Frontend use this mechanism.
This change also generalizes the fix for IMPALA-7665: updates from the
statestore to the cluster membership topic are only made visible to the
rest of the local server after a post-recovery grace period has elapsed.
As part of this the flag
'failed_backends_query_cancellation_grace_period_ms' is replaced with
'statestore_subscriber_recovery_grace_period_ms'. To tell the initial
startup from post-recovery, a new metric
'statestore-subscriber.num-connection-failures' is exposed by the
daemon, which tracks the total number of connection failures to the
statestore over the lifetime process lifetime.
This change also unifies the naming of executor-related classes, in
particular it renames "BackendConfig" to "ExecutorGroup". In
anticipation of a subsequent change (IMPALA-8484), it adds maps to store
multiple executor groups.
This change also disables the generation of default operators from the
thrift files and instead adds explicit implementations for the ones that
we rely on. This forces us to explicitly specify comparators when
manipulating containers of thrift structs and will help prevent
accidental bugs.
Testing: This change adds a backend unit test for the new cluster
membership manager. The observable behavior of Impala does not change,
and the existing scheduler unit test and end to end tests should make
sure of that.
Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3
Reviewed-on: http://gerrit.cloudera.org:8080/13207
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test relies on scheduling decisions made on a 3 node minicluster
without erasure coding. This patch ensures that this test is skipped
if those conditions are not met by adding a new
SkipIfNotHdfsMinicluster.scheduling marker for the same. Existing
tests that rely on the same conditions were also updated to use the
marker.
Change-Id: I0a54b6e149c42b696c954b5240d6de61453bf7f9
Reviewed-on: http://gerrit.cloudera.org:8080/13406
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, if the statestore restarts and disseminates an inconsistent
view of cluster membership to the coordinators, then they might believe
that the backends no longer in the membership update are down and would
start canceling queries that are running or scheduled to run on those
allegedly failed backends. This patch adds a grace period after
statestore recovery/successful registration that give it enough time
to gather a consistent state of the cluster.
Testing:
- Added an e2e test.
- Did manual stress testing using concurrent_select.py with
statestore_subscriber_timeout_seconds set to 2 secs and
failed_backends_query_cancellation_grace_period_ms set to 5 seconds,
and the statestore being restarted every 15 seconds. To avoid other
effects of statestore restarts cropping up, I used a local catalog
(catalog v2) and ignored query errors caused due to scheduler having
an incomplete view of the cluster(no backends).
Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959
Reviewed-on: http://gerrit.cloudera.org:8080/13061
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch enables a user that has access to the impalad process,
to initiate the graceful shutdown process with a deadline of one year
by sending SIGRTMIN signal to it.
Sample usage: "kill -SIGRTMIN <IMPALAD_PID>"
Testing:
Added relevant e2e tests.
Tested on CentOS 6, CentOS 7, Ubuntu 16.04, Ubuntu 18.04 and SLES 12
Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Reviewed-on: http://gerrit.cloudera.org:8080/12973
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The :shutdown command is used to shutdown a remote server. The common
case is that a user specifies the impalad to shutdown by specifying a
host e.g. :shutdown('host100'). If a user has more than one impalad on a
remote host then the form :shutdown('<host>:<port>') can be used to
specify the port by which the impalad can be contacted. Prior to
IMPALA-7985 this port was the backend port, e.g.
:shutdown('host100:22000'). With IMPALA-7985 the port to use is the KRPC
port, e.g. :shutdown('host100:27000').
Shutdown is implemented by making an rpc call to the target impalad.
This changes the implementation of this call to use KRPC.
To aid the user in finding the KRPC port, the KRPC address is added to
the /backends section of the debug web page.
We attempt to detect the case where :shutdown is pointed at a thrift
port (like the backend port) and print an informative message.
Documentation of this change will be done in IMPALA-8098.
Further improvements to DoRpcWithRetry() will be done in IMPALA-8143.
For discussion of why it was chosen to implement this change in an
incompatible way, see comments in
https://issues.apache.org/jira/browse/IMPALA-7985.
TESTING
Ran all end-to-end tests.
Enhance the test for /backends in test_web_pages.py.
In test_restart_services.py add a call to the old backend port to the
test. Some expected error messages were changed in line with what KRPC
returns.
Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Reviewed-on: http://gerrit.cloudera.org:8080/12260
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There were two races:
* queries were terminated because of an impalad being detected
as failed by the statestore even if the query had finished
executing on that impalad.
* NUM_FRAGMENTS_IN_FLIGHT was used to detect the backend being
idle, but it was decremented before the final status report
was sent.
The fixes are:
* keep track of the backends that triggered the potential cancellation,
and only proceed with the cancellation if the coordinator has fragments
still executing on the backend.
* add a new metric that keeps track of the number of executing queries,
which isn't decremented until the final status report is sent.
Also do some cleanup/improvements in this code:
* use proper error codes for some errors
* more overloads for Status::Expected()
* also add a metric for the total number of queries executed on the
backend
Testing:
Add a new version of test_shutdown_executor with delays that
trigger both races. This test only runs in exhaustive to avoid
adding ~20s to core build time.
Ran exhaustive tests.
Looped test_restart_services overnight.
Change-Id: I7c1a80304cb6695d228aca8314e2231727ab1998
Reviewed-on: http://gerrit.cloudera.org:8080/12082
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Exposes a list of build flags via the impalad web UI. The build flags
can be viewed on the root page under the "Version" section. They can
be accessed via other tests through the debug version of the root page
(e.g. adding &json to the URL). The build flags are listed in a JSON
array so that they can be parsed easily. This should help run Impala
tests against a remote Impala cluster.
The build flags are read in CMakeLists.txt and then stored in
preprocessor variables.
Three build flags are exposed as part of this commit:
- Is_NDEBUG = [true, false]
- Whether NDEBUG was true or false at compile time
- CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN,
UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG]
- The value of CMAKE_BUILD_TYPE at compile time
- Library_Link_Type = [DYNAMIC, STATIC]
- Derived from the compile time value of BUILD_SHARED_LIBS
There are a few other minor changes that are apart of this commit:
* The patch modifies environ.py so that it supports fetching build metadata
for both local and remote clusters.
* The tests under the tests/webserver directory were not being run because
'webserver' was not whitelisted in tests/run-tests.py. This patch fixes
that and addresses several test failures in run-tests.py.
* It reverts part of IMPALA-6947 so that their is no dependency from
start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947
is now set at compile time.
Testing:
Added new tests to webserver/test_web_pages.py to ensure that the build
flags are being set. Some tests are only run when run against a local
cluster because we have no way of getting the build info from a remote
cluster, whereas local clusters contain a .cmake_build_type file.
Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19
Reviewed-on: http://gerrit.cloudera.org:8080/11410
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fix tests to always pass query options via the query_options
parameter.
Modified the infrastructure to fail on non-erasure-coding builds if
tests pass in default query options in the wrong way.
Skip an restart test that makes assumptions about scheduling that EC
seems to break.
Testing:
Ran core tests with erasure coding enabled.
Change-Id: I4d809faedc0c45417519f13c73559efb6c54154e
Reviewed-on: http://gerrit.cloudera.org:8080/11536
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the same patch except with fixes for the test failures
on EC and S3 noted in the JIRA.
This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).
Details:
* In order to allow future admin commands, this is implemented with
function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
":shutdown('hostname')", so that non-coordinators can be shut down
and for the convenience of the client, which does not have to
connect to the specific impalad. There is no assumption that the
other impalad is registered in the statestore; just that the
coordinator can connect to the other daemon's thrift endpoint.
This simplifies things and allows shutdown in various important
cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
slower shutdown by specifying a deadline in seconds after the
statement is executed.
* If shutting down, a banner is shown on the root debug page.
Workflow:
1. (if a coordinator) clients are prevented from submitting
queries to this coordinator via some out-of-band mechanism,
e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
which stop scheduling fragment instances on this daemon
(if an executor).
4. the query startup grace period (which is ideally set to the AC
queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
long-running queries), after a longer timeout (counted from the start
of the shutdown process) it will shut down anyway.
What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
there is an out-of-band mechanism to prevent submission of more
queries to the shut down coordinator. If queries are submitted to
a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
slow down but not prevent eventual shutdown.
Limitations:
* The startup grace period needs to be configured to be greater than
the latency of statestore updates + scheduling + admission +
coordinator startup. Otherwise a coordinator may send a
fragment instance to the shutting down impalad. (We could
automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
even if the cluster is idle.
* We depend on the statestore detecting the process going down
if queries are still running on that backend when the timeout
expires. This may still be subject to existing problems,
e.g. IMPALA-2990.
Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
queries are running.
* End-to-end test of shutting down a coordinator
- New queries cannot be started on coord, existing queries continue to run
- Exercises various Beeswax and HS2 operations.
Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463
Reviewed-on: http://gerrit.cloudera.org:8080/11484
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).
Details:
* In order to allow future admin commands, this is implemented with
function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
":shutdown('hostname')", so that non-coordinators can be shut down
and for the convenience of the client, which does not have to
connect to the specific impalad. There is no assumption that the
other impalad is registered in the statestore; just that the
coordinator can connect to the other daemon's thrift endpoint.
This simplifies things and allows shutdown in various important
cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
slower shutdown by specifying a deadline in seconds after the
statement is executed.
* If shutting down, a banner is shown on the root debug page.
Workflow:
1. (if a coordinator) clients are prevented from submitting
queries to this coordinator via some out-of-band mechanism,
e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
which stop scheduling fragment instances on this daemon
(if an executor).
4. the query startup grace period (which is ideally set to the AC
queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
long-running queries), after a longer timeout (counted from the start
of the shutdown process) it will shut down anyway.
What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
there is an out-of-band mechanism to prevent submission of more
queries to the shut down coordinator. If queries are submitted to
a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
slow down but not prevent eventual shutdown.
Limitations:
* The startup grace period needs to be configured to be greater than
the latency of statestore updates + scheduling + admission +
coordinator startup. Otherwise a coordinator may send a
fragment instance to the shutting down impalad. (We could
automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
even if the cluster is idle.
* We depend on the statestore detecting the process going down
if queries are still running on that backend when the timeout
expires. This may still be subject to existing problems,
e.g. IMPALA-2990.
Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
queries are running.
* End-to-end test of shutting down a coordinator
- New queries cannot be started on coord, existing queries continue to run
- Exercises various Beeswax and HS2 operations.
Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0
Reviewed-on: http://gerrit.cloudera.org:8080/10744
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previously, ImpalaServer::MembershipCallback() is used by each
Impala backend node to update cluster membership. It also removes
stale connections to nodes which are no longer members of the cluster.
However, the way it detects removed member is flawed as it relies
on query_locations_ to determine whether stale connections may
exist to the removed members. query_locations_ is a map of host
name to a set of queries running on that host. A entry for a remote
node only exists in query_locations_ if an Impalad node has acted
as coordinator of a query with fragment instances scheduled to run
on that remote node.
This change fixes this problem by closing connections to remote
hosts which are removed from the cluster regardless of whether
it can be found in query_locations_. A new test is added to
exercise this path by restarting Impalad backend nodes between
queries. Also change impala_cluster.py to use bin/start-impala.sh
to start Impala demon instead of directly forking and exec'ing
Impalad. This is needed as start-impala.sh sets up the proper
Java related environment variables.
Change-Id: I41b7297cf665bf291b09b23524d19b1d10ab281d
Reviewed-on: http://gerrit.cloudera.org:8080/10327
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-5990 introduced a bug where restarting the statestore
deterministically clears the metadata without ever coming back. The
cause of the bug is a wrong condition used by catalog to detect the
restart of statestore.
A custom cluster regression test is added. The process restarting
utility function in the custom cluster test is changed into using
shell=True in popen.
Change-Id: I332a60e172af84b93b3544373fe363cdced5e8d0
Reviewed-on: http://gerrit.cloudera.org:8080/9921
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tianyi Wang <twang@cloudera.com>