impala

mirror of https://github.com/apache/impala.git synced 2025-12-21 10:58:31 -05:00

Author	SHA1	Message	Date
wzhou-code	eda2aa5553	IMPALA-11129: Support running KRPC over Unix Domain Socket This patch make following changes to support running KRPC over UDS. - Add FLAGS_rpc_use_unix_domain_socket to enable running KRPC over UDS. Add FLAGS_uds_address_unique_id to specify unique Id for UDS address. It could be 'ip_address', 'backend_id', or 'none'. - Add variable uds_address in NetworkAddressPB and TNetworkAddress. Replace TNetworkAddress with NetworkAddressPB for KRPC related class variables and APIs. - Set UDS address for each daemon as @impala-kprc:<unique_id> during initialization with unique_id specified by starting flag FLAGS_uds_address_unique_id. - When FLAG_rpc_use_unix_domain_socket is true, the socket of KRPC server will be binded to the UDS address of the daemon. KRPC Client will connect to KRPC server with the UDS address of the server when creating proxy service, which in turn call kudu::Socket::Connect() function to connect KRPC server. - rpcz Web page show TCP related stats as 'N/A' when using UDS. Show remote UDS address for KRPC inbound connections on rpcz Web page as '*' when using UDS since the remote UDS addresses are not available. - Add new unit-tests for UDS. - BackendId of admissiond is not available. Use admissiond's IP address as unique ID for UDS. TODO: Advertise BackendId of admissiond in global admission control mode. Testing: - Passed core test with FLAG_rpc_use_unix_domain_socket as fault value false. - Passed core test with FLAG_rpc_use_unix_domain_socket as true. Change-Id: I439f5a03eb425c17451bcaa96a154bb0bca17ee7 Reviewed-on: http://gerrit.cloudera.org:8080/18369 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-25 06:27:24 +00:00
Fucun Chu	157086cb80	IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Reviewed-on: http://gerrit.cloudera.org:8080/17503 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-12-08 16:32:02 +00:00
liuyao	27179b691c	IMPALA-5476: Fix catalogd restart brings stale metadata ImpaladCatalog#updateCatalog() doesn't trigger a full topic update request when detecting catalogServiceId changes. It just updates the local catalogServiceId and throws an exception to abort applying the DDL/DML results. This causes a problem when catalogd is restarted and the DDL/DML is executed on the restarted instance. In this case, only the local catalogServiceId is updated to the latest. The local catalog remains stale. Then when dealing with the following updates from statestore, the catalogServiceId always matches, so updates will be applied without exceptions. However, the catalog objects usually won't be updated since they have higher versions (from the old catalogd instance) than those in the update. This brings the local catalog out of sync until the catalog version of the new catalogd grows larger enough. Note that in dealing with the catalog updates from statestore, if the catalogServiceId unmatches, impalad will request a full topic update. See more in ImpalaServer::CatalogUpdateCallback(). This patch fixes this issue by checking the catalogServiceId before invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match the one in the DDL/DML result, wait until it changes. The following update from statestore will change it and unblocks the DDL/DML thread. Testing add several tests in tests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Reviewed-on: http://gerrit.cloudera.org:8080/17645 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-20 17:09:08 +00:00
Amogh Margoor	87384e3377	IMPALA-10471: Making deadline for Shutdown by SIGRTMIN configurable Making deadline for Shutdown caused by SIGRTMIN configurable using flag: shutdown_deadline_s. The deadline for shutdown by SIGRTMIN was fixed to 1 year and was independent of the flag earlier. This patch ensures even this shutdown behaviour is governed by the common flag: shutdown_deadline_s. TESTING: 1. Modified existing test to reflect the configurable deadline. 2. Verified manually 3. Ran the cluster tests (which include test_restart_services) Change-Id: I52cb1ba76e7ce9de86ceb2f84389b1ab257e4c05 Reviewed-on: http://gerrit.cloudera.org:8080/17348 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-29 19:49:34 +00:00
stiga-huang	2dfc68d852	IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Support for caching GCS file handles will be addressed in IMPALA-10568. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on GCS (IMPALA-10562). - Some tests are skipped due to issues introduced by /etc/hosts setting on GCE instances (IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Reviewed-on: http://gerrit.cloudera.org:8080/17121 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-13 11:20:08 +00:00
wzhou-code	6bb3b88d05	IMPALA-9180 (part 1): Remove legacy ImpalaInternalService The legacy Thrift based Impala internal service has been deprecated and can be removed now. This patch removes ImpalaInternalService. All infrastructures around it are cleaned up, except one place for flag be_port. StatestoreSubscriber::subscriber_id consists be_port, but we cannot change format of subscriber_id now. This remaining be_port issue will be fixed in a succeeding patch (part 4). TQueryCtx.coord_address is changed to TQueryCtx.coord_hostname since the port in TQueryCtx.coord_address is set as be_port and is unused now. Also Rename TQueryCtx.coord_krpc_address as TQueryCtx.coord_ip_address. Testing: - Passed the exhaustive test. - Passed Quasar-L0 test. Change-Id: I5fa83c8009590124dded4783f77ef70fa30119e6 Reviewed-on: http://gerrit.cloudera.org:8080/16291 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-30 22:41:00 +00:00
Thomas Tauber-Marshall	28181cbe6c	IMPALA-9930 (part 1): Initial refactor for admission control service This patch contains the following refactors that are needed for the admission control service, in order to make the main patch easier to review: - Adds a new class AdmissionControlClient which will be used to abstract the logic for submitting queries to either a local or remote admission controller out from ClientRequestState/Coordinator. Currently only local submission is supported. - SubmitForAdmission now takes a BackendId representing the coordinator instead of assuming that the local impalad will be the coordinator. - The CRS_BEFORE_ADMISSION debug action is moved into SubmitForAdmission() so that it will be executed on whichever daemon is performing admission control rather than always on the coordinator (needed for TestAdmissionController.test_cancellation). - ShardedQueryMap is extended to allow keys to be either TUniqueId or UniqueIdPB and Add(), Get(), and Delete() convenience functions are added. - Some utils related to seralizing Thrift objects into sidecars are added. Testing: - Passed a run of existing core tests. Change-Id: I7974a979cf05ed569f31e1ab20694e29fd3e4508 Reviewed-on: http://gerrit.cloudera.org:8080/16411 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-29 23:14:50 +00:00
Thomas Tauber-Marshall	ae0bd674a8	IMPALA-9425 (part 1): Introduce uuids for impalads This patch introduces the concept of 'backend ids', which are unique ids that can be used to identify individual impalads. The ids are generated by each impalad on startup. The patch then uses the ids to fix a bug where the statestore may fail to inform coordinators when an executor has failed and restarted. The bug was caused by the fact that the statestore cluster membership topic was keyed on statestore subscriber ids, which are host:port pairs. So, if an impalad fails and a new one is started at the same host:port before a particular coordinator has a cluster membership update generated for it by the statestore, the statestore has no way of differentiating the prior impalad from the newly started impalad, and the topic update will not show the deletion of the original impalad. With this patch, the cluster membership topic is now keyed by backend id, so since the restarted impalad will have a different backend id the next membership update after the prior impalad failed is guaranteed to reflect that failure. The patch also logs the backend ids on startup and adds them to the /backends webui page and to the query locations section of the /queries page, for use in debugging. Further patches will apply the backend ids in other places where we currently key off host:port pairs to identify impalads. Testing: - Added an e2e test that uses a new debug action to add delay to statestore topic updates. Due to the use of JITTER the test is non-deterministic, but it repros the original issue locally for me about 50% of the time. - Passed a full run of existing tests. Change-Id: Icf8067349ed6b765f6fed830b7140f60738e9061 Reviewed-on: http://gerrit.cloudera.org:8080/15321 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-04 23:43:43 +00:00
Thomas Tauber-Marshall	19a4d8fe79	IMPALA-9335 (part 2): Fix rebased KRPC to compile This patch applies various fixes to Impala and to the copied Kudu source code in be/src/kudu/* to allow everything to compile. Some highlights of the changes made: - Various Kudu files were removed from compilation due to issues like relying on libraries that Impala does not provide. The linking of some executable is also changed for similar reasons. - The Kudu Cache implementation changed to support unique_ptr, allowing us to remove various uses of MakeScopeExitTrigger. - Some flags that have a DEFINE in both Kudu and Impala are modified to change one of the DEFINEs to a DECLARE. This patch was in part based on the patches that were applied the last time we rebased the Kudu code in IMPALA-7006, and I ensured that all changes from those commits that are still relevant were included here. I also went through all commits that have been applied to the be/src/kudu directory since the last rebase and ensured that all relevant changes from those are included here. Testing: - Passed an exhaustive DEBUG build and a core ASAN build. Change-Id: I1eb4caf927c729109426fb50a28b5e15d6ac46cb Reviewed-on: http://gerrit.cloudera.org:8080/15144 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-02-04 23:03:58 +00:00
Sahil Takiar	b96b3b0b1c	IMPALA-8634: Catalog client should retry RPCs Add retries to catalogd RPCs. Previously, connection failures triggered a retry, but failures on the actual RPC did not trigger a retry. This change replaces all usages of ClientCache::DoRpc() in the CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves the connection retry loop to DoRpcWithRetry(), instead of relying on the ClientCache to retry the connection. This patch is based to IMPALA-8904, which adds similar functionality to statestore RPCs. Testing: * Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py and added new tests for catalogd RPC errors * Added new tests to test_restart_services.py * Ran core tests Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c Reviewed-on: http://gerrit.cloudera.org:8080/14246 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-19 23:07:55 +00:00
Lars Volker	8d4ba5d146	IMPALA-8789: Add helper to initiate graceful shutdown This change adds a helper script to initiate graceful daemon shutdown via the signaling mechanism. It also includes that helper script in the docker containers. Testing: This change adds a test to verify that the script works as expected. In addition, I manually verified that the script gets added to the containers and that calling it inside the container will cause a shutdown as expected. Change-Id: I877483a385cd0747f69b82a6488de203a4029599 Reviewed-on: http://gerrit.cloudera.org:8080/13912 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-26 18:21:26 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00
Lars Volker	6b3e5fe426	IMPALA-8460: Simplify cluster membership management This change adds a class to track cluster membership called ClusterMembershipMgr. It replaces the logic that was partially duplicated between the ImpalaServer and the Coordinator and makes sure that the local backend descriptor is consistent (IMPALA-8469). The ClusterMembershipMgr maintains a view of the cluster membership and incorporates incoming updates from the statestore. It also registers the local backend with the statestore after startup. Clients can obtain a consistent, immutable snapshot of the current cluster membership from the ClusterMembershipMgr. Additionally, callbacks can be registered to receive notifications of cluster membership changes. The ImpalaServer and Frontend use this mechanism. This change also generalizes the fix for IMPALA-7665: updates from the statestore to the cluster membership topic are only made visible to the rest of the local server after a post-recovery grace period has elapsed. As part of this the flag 'failed_backends_query_cancellation_grace_period_ms' is replaced with 'statestore_subscriber_recovery_grace_period_ms'. To tell the initial startup from post-recovery, a new metric 'statestore-subscriber.num-connection-failures' is exposed by the daemon, which tracks the total number of connection failures to the statestore over the lifetime process lifetime. This change also unifies the naming of executor-related classes, in particular it renames "BackendConfig" to "ExecutorGroup". In anticipation of a subsequent change (IMPALA-8484), it adds maps to store multiple executor groups. This change also disables the generation of default operators from the thrift files and instead adds explicit implementations for the ones that we rely on. This forces us to explicitly specify comparators when manipulating containers of thrift structs and will help prevent accidental bugs. Testing: This change adds a backend unit test for the new cluster membership manager. The observable behavior of Impala does not change, and the existing scheduler unit test and end to end tests should make sure of that. Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3 Reviewed-on: http://gerrit.cloudera.org:8080/13207 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-02 06:38:07 +00:00
Bikramjeet Vig	dc0f8a1eee	IMPALA-8570: Fix flakiness in test_restart_statestore_query_resilience The test relies on scheduling decisions made on a 3 node minicluster without erasure coding. This patch ensures that this test is skipped if those conditions are not met by adding a new SkipIfNotHdfsMinicluster.scheduling marker for the same. Existing tests that rely on the same conditions were also updated to use the marker. Change-Id: I0a54b6e149c42b696c954b5240d6de61453bf7f9 Reviewed-on: http://gerrit.cloudera.org:8080/13406 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-31 04:26:02 +00:00
Bikramjeet Vig	e4352aa63f	IMPALA-7665: Fix unwarranted query cancellation on statestore restart Currently, if the statestore restarts and disseminates an inconsistent view of cluster membership to the coordinators, then they might believe that the backends no longer in the membership update are down and would start canceling queries that are running or scheduled to run on those allegedly failed backends. This patch adds a grace period after statestore recovery/successful registration that give it enough time to gather a consistent state of the cluster. Testing: - Added an e2e test. - Did manual stress testing using concurrent_select.py with statestore_subscriber_timeout_seconds set to 2 secs and failed_backends_query_cancellation_grace_period_ms set to 5 seconds, and the statestore being restarted every 15 seconds. To avoid other effects of statestore restarts cropping up, I used a local catalog (catalog v2) and ignored query errors caused due to scheduler having an incomplete view of the cluster(no backends). Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959 Reviewed-on: http://gerrit.cloudera.org:8080/13061 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-11 06:20:33 +00:00
Bikramjeet Vig	1a800457a1	IMPALA-8401: SIGRTMIN initiates the graceful shutdown process This patch enables a user that has access to the impalad process, to initiate the graceful shutdown process with a deadline of one year by sending SIGRTMIN signal to it. Sample usage: "kill -SIGRTMIN <IMPALAD_PID>" Testing: Added relevant e2e tests. Tested on CentOS 6, CentOS 7, Ubuntu 16.04, Ubuntu 18.04 and SLES 12 Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86 Reviewed-on: http://gerrit.cloudera.org:8080/12973 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-19 05:00:30 +00:00
Andrew Sherman	adde66b37c	IMPALA-7985: Port RemoteShutdown() to KRPC. The :shutdown command is used to shutdown a remote server. The common case is that a user specifies the impalad to shutdown by specifying a host e.g. :shutdown('host100'). If a user has more than one impalad on a remote host then the form :shutdown('<host>:<port>') can be used to specify the port by which the impalad can be contacted. Prior to IMPALA-7985 this port was the backend port, e.g. :shutdown('host100:22000'). With IMPALA-7985 the port to use is the KRPC port, e.g. :shutdown('host100:27000'). Shutdown is implemented by making an rpc call to the target impalad. This changes the implementation of this call to use KRPC. To aid the user in finding the KRPC port, the KRPC address is added to the /backends section of the debug web page. We attempt to detect the case where :shutdown is pointed at a thrift port (like the backend port) and print an informative message. Documentation of this change will be done in IMPALA-8098. Further improvements to DoRpcWithRetry() will be done in IMPALA-8143. For discussion of why it was chosen to implement this change in an incompatible way, see comments in https://issues.apache.org/jira/browse/IMPALA-7985. TESTING Ran all end-to-end tests. Enhance the test for /backends in test_web_pages.py. In test_restart_services.py add a call to the old backend port to the test. Some expected error messages were changed in line with what KRPC returns. Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Reviewed-on: http://gerrit.cloudera.org:8080/12260 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-09 02:18:09 +00:00
Tim Armstrong	a91b24cb79	IMPALA-7931: fix executor shutdown races There were two races: * queries were terminated because of an impalad being detected as failed by the statestore even if the query had finished executing on that impalad. * NUM_FRAGMENTS_IN_FLIGHT was used to detect the backend being idle, but it was decremented before the final status report was sent. The fixes are: * keep track of the backends that triggered the potential cancellation, and only proceed with the cancellation if the coordinator has fragments still executing on the backend. * add a new metric that keeps track of the number of executing queries, which isn't decremented until the final status report is sent. Also do some cleanup/improvements in this code: * use proper error codes for some errors * more overloads for Status::Expected() * also add a metric for the total number of queries executed on the backend Testing: Add a new version of test_shutdown_executor with delays that trigger both races. This test only runs in exhaustive to avoid adding ~20s to core build time. Ran exhaustive tests. Looped test_restart_services overnight. Change-Id: I7c1a80304cb6695d228aca8314e2231727ab1998 Reviewed-on: http://gerrit.cloudera.org:8080/12082 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-05 04:14:36 +00:00
Sahil Takiar	691f9d9ff9	IMPALA-6249: Expose several build flags via web UI Exposes a list of build flags via the impalad web UI. The build flags can be viewed on the root page under the "Version" section. They can be accessed via other tests through the debug version of the root page (e.g. adding &json to the URL). The build flags are listed in a JSON array so that they can be parsed easily. This should help run Impala tests against a remote Impala cluster. The build flags are read in CMakeLists.txt and then stored in preprocessor variables. Three build flags are exposed as part of this commit: - Is_NDEBUG = [true, false] - Whether NDEBUG was true or false at compile time - CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN, UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG] - The value of CMAKE_BUILD_TYPE at compile time - Library_Link_Type = [DYNAMIC, STATIC] - Derived from the compile time value of BUILD_SHARED_LIBS There are a few other minor changes that are apart of this commit: * The patch modifies environ.py so that it supports fetching build metadata for both local and remote clusters. * The tests under the tests/webserver directory were not being run because 'webserver' was not whitelisted in tests/run-tests.py. This patch fixes that and addresses several test failures in run-tests.py. * It reverts part of IMPALA-6947 so that their is no dependency from start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947 is now set at compile time. Testing: Added new tests to webserver/test_web_pages.py to ensure that the build flags are being set. Some tests are only run when run against a local cluster because we have no way of getting the build info from a remote cluster, whereas local clusters contain a .cmake_build_type file. Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19 Reviewed-on: http://gerrit.cloudera.org:8080/11410 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-05 22:47:31 +00:00
Tim Armstrong	e83fe23a5f	IMPALA-7632: fix erasure coding build for custom cluster tests Fix tests to always pass query options via the query_options parameter. Modified the infrastructure to fail on non-erasure-coding builds if tests pass in default query options in the wrong way. Skip an restart test that makes assumptions about scheduling that EC seems to break. Testing: Ran core tests with erasure coding enabled. Change-Id: I4d809faedc0c45417519f13c73559efb6c54154e Reviewed-on: http://gerrit.cloudera.org:8080/11536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-28 01:23:41 +00:00
Tim Armstrong	f46de21140	IMPALA-1760: Implement shutdown command This is the same patch except with fixes for the test failures on EC and S3 noted in the JIRA. This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463 Reviewed-on: http://gerrit.cloudera.org:8080/11484 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-26 01:28:36 +00:00
Tim Armstrong	16a04ce81b	Revert "IMPALA-1760: Implement shutdown command" This reverts commit `fda44aed9d`. A couple of the tests broken on S3 and erasure coding. Reverting to unblock testing until we can come up with a proper fix. Change-Id: Icef47b3aa67bc056c40592d47e93c4ebc57be98c Reviewed-on: http://gerrit.cloudera.org:8080/11435 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-09-14 01:12:22 +00:00
Tim Armstrong	fda44aed9d	IMPALA-1760: Implement shutdown command This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 Reviewed-on: http://gerrit.cloudera.org:8080/10744 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-11 23:57:20 +00:00
Michael Ho	47f95f14bc	IMPALA-6907: Close stale connections to removed cluster members Previously, ImpalaServer::MembershipCallback() is used by each Impala backend node to update cluster membership. It also removes stale connections to nodes which are no longer members of the cluster. However, the way it detects removed member is flawed as it relies on query_locations_ to determine whether stale connections may exist to the removed members. query_locations_ is a map of host name to a set of queries running on that host. A entry for a remote node only exists in query_locations_ if an Impalad node has acted as coordinator of a query with fragment instances scheduled to run on that remote node. This change fixes this problem by closing connections to remote hosts which are removed from the cluster regardless of whether it can be found in query_locations_. A new test is added to exercise this path by restarting Impalad backend nodes between queries. Also change impala_cluster.py to use bin/start-impala.sh to start Impala demon instead of directly forking and exec'ing Impalad. This is needed as start-impala.sh sets up the proper Java related environment variables. Change-Id: I41b7297cf665bf291b09b23524d19b1d10ab281d Reviewed-on: http://gerrit.cloudera.org:8080/10327 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-12 03:32:43 +00:00
Tianyi Wang	740fc6b57f	IMPALA-6793: Fix empty metadata after statestore restarts IMPALA-5990 introduced a bug where restarting the statestore deterministically clears the metadata without ever coming back. The cause of the bug is a wrong condition used by catalog to detect the restart of statestore. A custom cluster regression test is added. The process restarting utility function in the custom cluster test is changed into using shell=True in popen. Change-Id: I332a60e172af84b93b3544373fe363cdced5e8d0 Reviewed-on: http://gerrit.cloudera.org:8080/9921 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tianyi Wang <twang@cloudera.com>	2018-04-18 18:26:24 +00:00

25 Commits