impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	3e76da9f51	IMPALA-9708: Remove Sentry support Impala 4 decided to drop Sentry support in favor of Ranger. This removes Sentry support and related tests. It retires startup flags related to Sentry and does the first round of removing obsolete code. This does not adjust documentation to remove references to Sentry, and other dead code will be removed separately. Some issues came up when implementing this. Here is a summary of how this patch resolves them: 1. authorization_provider currently defaults to "sentry", but "ranger" requires extra parameters to be set. This changes the default value of authorization_provider to "", which translates internally to the noop policy that does no authorization. 2. These flags are Sentry specific and are now retired: - authorization_policy_provider_class - sentry_catalog_polling_frequency_s - sentry_config 3. The authorization_factory_class may be obsolete now that there is only one authorization policy, but this leaves it in place. 4. Sentry is the last component using CDH_COMPONENTS_HOME, so that is removed. There are still Maven dependencies coming from the CDH_BUILD_NUMBER repository, so that is not removed. 5. To make the transition easier, testdata/bin/kill-sentry-service.sh is not removed and it is still called from testdata/bin/kill-all.sh. Testing: - Core job passes Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e Reviewed-on: http://gerrit.cloudera.org:8080/15833 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-20 17:43:40 +00:00
stiga-huang	b6b31e4cc4	IMPALA-9071: Handle translated external HDFS table in CTAS After upgrading Hive-3 to a version containing HIVE-22158, it's not allowed for managed tables to be non transactional. Creating non ACID tables will result in creating an external table with table property 'external.table.purge' set to true. In Hive-3, the default location of external HDFS tables will be located in 'metastore.warehouse.external.dir' if it's set. This property is added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6 yet. In CTAS statement, we create a temporary HMS Table for the analysis on the Insert part. The table path is created assuming it's a managed table, and the Insert part will use this path for insertion. However, in Hive-3, the created table is translated to an external table. It's not the same as we passed to the HMS API. The created table is located in 'metastore.warehouse.external.dir', while the table path we assumed is in 'metastore.warehouse.dir'. This introduces bugs when these two properties are different. CTAS statement will create table in one place and insert data in another place. This patch adds a new method in MetastoreShim to wrap the difference for getting the default table path for non transactional tables between Hive-2 and Hive-3. Changes in the infra: - To support customizing hive configuration, add an env var, CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of existing CLASSPATH. The customized hive-site.xml should be put inside CUSTOM_CLASSPATH. - Change hive-site.xml.py to generate a hive-site.xml with non default 'metastore.warehouse.external.dir' - Add an option, --env_vars, in bin/start-impala-cluster.py to pass down CUSTOM_CLASSPATH. Tests: - Add a custom cluster test to start Hive with metastore.warehouse.external.dir being set to non default value. Run it locally using CDP components with HIVE-22158. xfail the test until we bump CDP_BUILD_NUMBER to 1507246. - Run CORE tests using CDH components Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9 Reviewed-on: http://gerrit.cloudera.org:8080/14527 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-10-24 22:10:03 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00
Fredy Wijaya	1b531be9be	IMPALA-8589: Re-enable flaky test_query_event_hooks.py This patch fixes the flaky test_query_event_hooks.py. The patch also cuts down the waiting time for impalad timeout to 5 seconds from the default 60 seconds especially for those tests that will fail Impala startup. Testing: - Ran test_query_event_hooks.py in a loop. Change-Id: Ia64550e986b5eba59a1d77657943932bb977d470 Reviewed-on: http://gerrit.cloudera.org:8080/13713 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-26 00:13:44 +00:00
Todd Lipcon	800f635855	IMPALA-8667. Remove --pull_incremental_stats flag This flag was added as a "chicken bit" -- so we could disable the new feature if we had some problems with it. It's been out in the wild for a number of months and we haven't seen any such problems, so at this point let's stop maintaining the old code path. Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9 Reviewed-on: http://gerrit.cloudera.org:8080/13671 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-19 01:07:00 +00:00
Hao Hao	6bb404dc35	IMPALA-8504 (part 2): Support CREATE TABLE statement with Kudu/HMS integration This commit supports the actual handling of CREATE TABLE DDL for managed Kudu tables when integration with Hive Metastore is enabled. When Kudu/HMS integration is enabled, for CREATE TABLE statement, Impala can rely on Kudu to create the table in the HMS. Change-Id: Icffe412395f47f5e07d97bad457020770cfa7502 Reviewed-on: http://gerrit.cloudera.org:8080/13375 Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Reviewed-by: Grant Henke <granthenke@apache.org> Tested-by: Thomas Marshall <tmarshall@cloudera.com>	2019-06-04 17:36:59 +00:00
Michael Ho	2ece4c9b2e	IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Reviewed-on: http://gerrit.cloudera.org:8080/12987 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-03 19:39:42 +00:00
Tim Armstrong	d820952d86	IMPALA-8469: admit_mem_limit for dedicated coordinator Refactored to avoid the code duplication that resulted in this bug: * admit_mem_limit is calculated once in ExecEnv * The local backend descriptor is always constructed with a static helper: Scheduler::BuildLocalBackendDescriptor() I chose to factor it in this way, in part, to avoid invasive changes to scheduler-test, which currently doesn't depend on ExecEnv or ImpalaServer. Testing: Added basic test that reproduces the bug. Change-Id: Iaceb21b753b9b021bedc4187c0d44aaa6a626521 Reviewed-on: http://gerrit.cloudera.org:8080/13180 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-01 00:37:04 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Radford Nguyen	f998d64767	IMPALA-8363: Fix E2E start with impala_log_dir This commit fixes the `CustomClusterTestSuite` to wait for the correct number of executors when `impala_log_dir` is specified in the test decorator. Previously, the default value of 3 was always used, regardless of `cluster_size`. Testing: - Manual verification using tests/authorization/test_ranger.py with custom `impala_log_dir` and `cluster_size` arguments. Failed before changes, passed after changes - Ran all original E2E tests Change-Id: I4f46f40474b4b380abe88647a37e8e4d2231d745 Reviewed-on: http://gerrit.cloudera.org:8080/12935 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-11 21:19:31 +00:00
Tim Armstrong	f9ced753ba	IMPALA-7999: clean up start-d.sh scripts Delete these wrapper scripts and replace with a generic start-daemon.sh script that sets environment variables without the other logic. Move the logic for setting JAVA_TOOL_OPTIONS into start-impala-cluster.py. Remove some options like -jvm_suspend, -gdb, -perf that may not be used. These can be reintroduced if needed. Port across the kerberized minicluster logic (which has probably bitrotted) in case it needs to be revived. Remove --verbose option that didn't appear to be useful (it claims to print daemon output to the console, but output is still redirected regardless). Removed a level of quoting in custom cluster test argument handling - this was made unnecessary by properly escaping arguments with pipes.escape() in run_daemon(). Testing: Ran exhaustive tests. * Ran on CentOS 6 to confirm we didn't reintroduce Popen issue worked around by kwho. Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b Reviewed-on: http://gerrit.cloudera.org:8080/12271 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-05 13:10:08 +00:00
Philip Zeyliger	13dfdc64db	IMPALA-6664: Tag log statements with fragment or query ids. This implements much of the desire in IMPALA-6664 to tag all log statements with their query ids. It re-uses the existing ThreadDebugInfo infrastructure as well as the existing InstallLogMessageListenerFunction() patch to glog (currently used for log redaction) to prefix log messages with fragment ids or query ids, when available. The fragment id is the query id with the last bits incremented, so it's possible to correlate a given query's log messages. For example: $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO I0108 10:39:16.453958 14752 impala-server.cc:1052] 85420d575b9ff4b9:402b886800000000] Registered query query_id=85420d575b9ff4b9:402b886800000000 session_id=aa45e480434f0516:101ae5ac12679d94 I0108 10:39:16.454738 14752 Frontend.java:1242] 85420d575b9ff4b9:402b886800000000] Analyzing query: select count() from tpcds.web_sales I0108 10:39:16.456627 14752 Frontend.java:1282] 85420d575b9ff4b9:402b886800000000] Analysis finished. I0108 10:39:16.463538 14818 admission-controller.cc:598] 85420d575b9ff4b9:402b886800000000] Schedule for id=85420d575b9ff4b9:402b886800000000 in pool_name=default-pool per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B I0108 10:39:16.463603 14818 admission-controller.cc:603] 85420d575b9ff4b9:402b886800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0) I0108 10:39:16.463780 14818 admission-controller.cc:635] 85420d575b9ff4b9:402b886800000000] Admitted query id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.463896 14818 coordinator.cc:93] 85420d575b9ff4b9:402b886800000000] Exec() query_id=85420d575b9ff4b9:402b886800000000 stmt=select count() from tpcds.web_sales I0108 10:39:16.464795 14818 coordinator.cc:356] 85420d575b9ff4b9:402b886800000000] starting execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.466384 24891 impala-internal-service.cc:49] ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b886800000000 coord=pannier.sf.cloudera.com:22000 #instances=2 I0108 10:39:16.467339 14818 coordinator.cc:370] 85420d575b9ff4b9:402b886800000000] started execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.467536 14823 query-state.cc:579] 85420d575b9ff4b9:402b886800000000] Executing instance. instance_id=85420d575b9ff4b9:402b886800000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1 I0108 10:39:16.467627 14824 query-state.cc:579] 85420d575b9ff4b9:402b886800000001] Executing instance. instance_id=85420d575b9ff4b9:402b886800000001 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0108 10:39:16.820933 14824 query-state.cc:587] 85420d575b9ff4b9:402b886800000001] Instance completed. instance_id=85420d575b9ff4b9:402b886800000001 #in-flight=1 status=OK I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 85420d575b9ff4b9:402b886800000000] DeregisterRecvr(): fragment_instance_id=85420d575b9ff4b9:402b886800000000, node=2 I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22001 remaining=2 query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] query_id=85420d575b9ff4b9:402b886800000000: first in-progress backend: pannier.sf.cloudera.com:22000 I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 85420d575b9ff4b9:402b886800000000] get_results_metadata(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.168762 14752 coordinator.cc:483] 85420d575b9ff4b9:402b886800000000] ExecState: query id=85420d575b9ff4b9:402b886800000000 execution completed I0108 10:39:17.168808 14752 coordinator.cc:608] 85420d575b9ff4b9:402b886800000000] Coordinator waiting for backends to finish, 1 remaining. query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.168880 14823 query-state.cc:587] 85420d575b9ff4b9:402b886800000000] Instance completed. instance_id=85420d575b9ff4b9:402b886800000000 #in-flight=0 status=OK I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): last report for 85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22000 remaining=1 query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174513 14752 coordinator.cc:814] 85420d575b9ff4b9:402b886800000000] Release admission control resources for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174815 14821 query-state.cc:604] Cancel: query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174837 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000001 I0108 10:39:17.174856 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179621 14752 impala-beeswax-server.cc:239] 85420d575b9ff4b9:402b886800000000] close(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179651 14752 impala-server.cc:1131] 85420d575b9ff4b9:402b886800000000] UnregisterQuery(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179666 14752 impala-server.cc:1238] 85420d575b9ff4b9:402b886800000000] Cancel(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179814 14752 coordinator.cc:684] 85420d575b9ff4b9:402b886800000000] CancelBackends() query_id=85420d575b9ff4b9:402b886800000000, tried to cancel 0 backends I0108 10:39:17.203898 14752 query-exec-mgr.cc:184] 85420d575b9ff4b9:402b886800000000] ReleaseQueryState(): deleted query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:18.108947 14752 impala-server.cc:1993] 85420d575b9ff4b9:402b886800000000] Connection from client ::ffff:172.16.35.186:52096 closed, closing 1 associated session(s) I0108 10:39:18.108996 14752 impala-server.cc:1249] 85420d575b9ff4b9:402b886800000000] Closing session: aa45e480434f0516:101ae5ac12679d94 I0108 10:39:18.109035 14752 impala-server.cc:1291] 85420d575b9ff4b9:402b886800000000] Closed session: aa45e480434f0516:101ae5ac12679d94 There are a few caveats here: the thread state isn't "scoped", so the "Closing session" log statement is technically not part of the query. When that thread is re-used for another query, it corrects itself. Some threads, like 14821, aren't using the thread locals. In some case, we should go back and add GetThreadDebugInfo()->SetQueryId(...) statements. I've used this to debug some crashes (of my own doing) while running parallel tests, and it's been quite helpful. An alternative would be to use Kudu's be/src/kudu/util/async_logger.h, and add the "Listener" functionality to it directly. Another alternative would be to re-write all the *LOG macros, but this is quite painful (and presumably was rejected when log redaction was introduced). I changed thread-debug-info to capture TUniqueId (a thrift struct with two int64s) rather than the string representation. This made it easier to compare with the "0:0" id, which we treat as "unset". If a developer needs to analyze it from a debugger, gdb can print out hex just fine. I added some context to request-context to be able to pipe ids through to disk IO threads as well. To test this, I moved "assert_log_contains" up to impala_test_suite, and had it handle the default log location case. The test needs a sleep for log buffering, but, it seems like a test with a sleep running in parallel is better than a custom cluster test, which reboots the cluster (and loads metadata). Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8 Reviewed-on: http://gerrit.cloudera.org:8080/12129 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-25 00:47:09 +00:00
Tim Armstrong	3f0989a4fc	IMPALA-7811: optionally count JVM heap towards process mem limit Adds a flag --mem_limit_includes_jvm that alters memory accounting to include the amount of memory we think that the JVM is likely to use. By default this flag is false, so behaviour is unchanged. We're not ready to change the default but I want to check this in to enable experimentation. Two metrics are counted towards the process limit: * The maximum JVM heap size. We count this because the JVM memory consumption can expand up to this threshold at any time. * JVM non-heap committed memory. This can be a non-trivial amount of memory (e.g. I saw 150MB on one production cluster). There isn't a hard upper bound on this memory that I know of but should not grow rapidly. This requires adjustments in a couple of other places: * Admission control previous assumed that all of the process memory limit was available to queries (an assumption that is not strictly true because of untracked memory, etc, but close enough). However, the JVM heap makes a large part of the process limit unusable to queries, so we should only admit up to "process limit - max JVM heap size" per node. * The buffer pool is now a percentage of the remaining process limit after the JVM heap, instead of the total process limit. Currently, end-to-end tests fail if run with this flag for two reasons: * The default JVM heap size is 1/4 of physical memory, which means that essentially all of the process memory limit is consumed by the JVM heaps when we running 3 impala daemons per host, unless -Xmx is explicitly set. * If the heap size is limited to 1-2GB like below, then most tests pass but TestInsert.test_insert_large_string fails because IMPALA-4865 lets it create giant strings that eat up all the JVM heap. start-impala-cluster.py \ --impalad_args=--mem_limit_includes_jvm=true --jvm_args="-Xmx1g" Testing: Add a custom cluster test that uses the new option and validates the the memory consumption values. Change-Id: I39dd715882a32fc986755d573bd46f0fd9eefbfc Reviewed-on: http://gerrit.cloudera.org:8080/10928 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-04 08:20:34 +00:00
Fredy Wijaya	76842acc34	IMPALA-7824: INVALIDATE METADATA should not hang when Sentry is unavailable Before this patch, running INVALIDATE METADATA when Sentry is unavailable could cause Impala query to hang. PolicyReader thread in SentryProxy is used by two use cases, one as a background thread that periodically refreshes Sentry policy and another one as a synchronous operation for INVALIDATE METADATA. For the background thread, we need to swallow any exception thrown while refreshing the Sentry policy in order to not kill the background thread. For a synchronous reset operation, such as INVALIDATE METADATA, swallowing an exception causes the Impala catalog to wait indefinitely for authorization catalog objects that never get processed due to Sentry being unavailable. The patch updates the code by not swallowing any exception in INVALIDATE METADATA and return the exception to the caller. Testing: - Ran all FE tests - Added a new E2E test - Ran all E2E authorization tests Change-Id: Icff987a6184f62a338faadfdc1a0d349d912fc37 Reviewed-on: http://gerrit.cloudera.org:8080/11897 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-08 07:05:09 +00:00
Vuk Ercegovac	97f028299c	IMPALA-7622: adds profile metrics for incremental stats Reapplies change after fixing where frontend profile is placed in runtime profile. When computing incremental statistics by fetching the stats directly from catalogd, a potentially expensive RPC is made from the impalad coordinator to catalogd. This change adds metrics to the frontend section of the profile to track how long the request takes, the size of the compressed bytes received, and the number of partitions received. The profile for a 'compute incremental ...' command on a table with no statistics looks like this: Frontend: - StatsFetch.CompressedBytes: 0 - StatsFetch.TotalPartitions: 24 - StatsFetch.NumPartitionsWithStats: 0 - StatsFetch.Time: 26ms And the profile looks as follows when the table has stats, so the stats are fetched: Frontend: - StatsFetch.CompressedBytes: 24622 - StatsFetch.TotalPartitions: 23 - StatsFetch.NumPartitionsWithStats: 23 - StatsFetch.Time: 14ms Testing: - manual inspection - e2e test to check the profile Change-Id: I94559a749500d44aa6aad564134d55c39e1d5273 Reviewed-on: http://gerrit.cloudera.org:8080/11670 Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-12 23:44:42 +00:00
Adam Holley	21f521a7c2	IMPALA-7554: Update custom cluster tests to have new logs for sentry This patch adds the ability to create a new log for each spawn of the sentry service. This will enable better trouble shooting for the custom cluster tests that restart the sentry service. Testing: - Ran all custom cluster tests. Change-Id: I6e538af7fd6e6ea21dc3f4442bdebf3b31558516 Reviewed-on: http://gerrit.cloudera.org:8080/11624 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-12 01:00:56 +00:00
Vuk Ercegovac	d918b2aeb5	Revert "IMPALA-7622: adds profile metrics when fetching incremental stats" Breaks downstream dependence on profile (1/2 of changes). This reverts commit `235748316c`. Change-Id: I80b4c0e4b8487572285ac788ab0195896f221842 Reviewed-on: http://gerrit.cloudera.org:8080/11551 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-01 21:33:43 +00:00
Vuk Ercegovac	235748316c	IMPALA-7622: adds profile metrics when fetching incremental stats When computing incremental statistics by fetching the stats directly from catalogd, a potentially expensive RPC is made from the impalad coordinator to catalogd. This change adds metrics to the frontend section of the profile to track how long the request takes, the size of the compressed bytes received, and the number of partitions received. The profile for a 'compute incremental ...' command on a table with no statistics looks like this: Frontend: - StatsFetch.CompressedBytes: 0 - StatsFetch.TotalPartitions: 24 - StatsFetch.NumPartitionsWithStats: 0 - StatsFetch.Time: 26ms And the profile looks as follows when the table has stats, so the stats are fetched: Frontend: - StatsFetch.CompressedBytes: 24622 - StatsFetch.TotalPartitions: 23 - StatsFetch.NumPartitionsWithStats: 23 - StatsFetch.Time: 14ms Testing: - manual inspection - e2e test to check the profile Change-Id: Ic9b268548c7a98c751eb99855ee08313d1d5a903 Reviewed-on: http://gerrit.cloudera.org:8080/11534 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-28 11:22:53 +00:00
Tim Armstrong	e83fe23a5f	IMPALA-7632: fix erasure coding build for custom cluster tests Fix tests to always pass query options via the query_options parameter. Modified the infrastructure to fail on non-erasure-coding builds if tests pass in default query options in the wrong way. Skip an restart test that makes assumptions about scheduling that EC seems to break. Testing: Ran core tests with erasure coding enabled. Change-Id: I4d809faedc0c45417519f13c73559efb6c54154e Reviewed-on: http://gerrit.cloudera.org:8080/11536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-28 01:23:41 +00:00
Adam Holley	48640b5dfa	IMPALA-7456: Deprecate file-based authorization This patch simply adds a warning message to the log when the authorization_policy_file run-time flag is used. Sentry has deprecated the use of policy files and they do not support user level privileges which are required for object ownership. Here is the Jira where it will be removed. SENTRY-1922 Test: - Added custom cluster test to validate logs - Ran all custom cluster tests Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7 Reviewed-on: http://gerrit.cloudera.org:8080/11502 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-25 23:03:27 +00:00
Adam Holley	c5dc6ded68	IMPALA-7537: REVOKE GRANT OPTION regression This patch fixes several issues around granting and revoking of privileges. This includes: - REVOKE ALL ON SERVER where the privilege has the grant option was removing from the cache but not Sentry. - With the addition of the grantoption to the name in the catalog object, refactoring was required to make grants and revokes work correctly. Assertions with regard to granting and revoking: - If there is a privilege that has the grant option, that privilege can be revoked simply with "REVOKE privilege..." or the grant option can be removed with "REVOKE GRANT OPTION ON..." - We should not limit the privilege being revoked simply because it has the grant option. - If a privilege already exists without the grant option, granting the privilege with the grant option should add the grant option to it. - If a privilege already exists with the grant option, granting the privilege without the grant option will not change anything as the expectation is if you want to remove the grant option, you should explicitly use the "REVOKE GRANT OPTION ON...". Testing: - Added new grant/revoke tests that validate cache and Sentry refresh - Ran all FE, E2E, and custom-cluster tests. Change-Id: I3be5c8f15e9bc53e9661347578832bf446abaedc Reviewed-on: http://gerrit.cloudera.org:8080/11483 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-25 22:21:57 +00:00
Tim Armstrong	16f9437b4b	IMPALA-7589: default query options for custom cluster The bug that caused the erasure coding test failure was that the default query options specified by the test overrode the allow_erasure_coded_files option that was added by the custom cluster test infrastructure when running erasure coded tests. Testing: Manually ran a custom cluster test with and without ERASURE_CODING=true and with --capture=no and confirmed the right arguments were passed to start-impala-cluster.py. Change-Id: I14f60ea8746657a731e48850b0e48300a2b7c66d Reviewed-on: http://gerrit.cloudera.org:8080/11463 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-24 20:02:40 +00:00
Adam Holley	23f5338bf6	Revert "Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER"" The problem was caused by update in Hive with changed notifications. HIVE-15180 was added but was incomplete and resulted in the break. HIVE-17747 fixed the issue by properly creating the messages. Change-Id: I4b9276c36bf96afccd7b8ff48803a30b47062c3d Reviewed-on: http://gerrit.cloudera.org:8080/11466 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-20 00:51:28 +00:00
Thomas Tauber-Marshall	23da624113	Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER" This patch has been causing a large number of build failures. Revert it until we figure out why. Change-Id: I7f4fc028962d4c6a630456a12a65884a62f01442 Reviewed-on: http://gerrit.cloudera.org:8080/11456 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-18 02:11:48 +00:00
Adam Holley	e5b424ba4e	IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER This patch adds calls to automatically create or remove owner privileges in the catalog based on the statement. This is similar to the existing pattern where after privileges are granted in Sentry, they are created in the catalog directly instead of pulled from Sentry. When object ownership is enabled: CREATE DATABASE will grant the user OWNER privileges to that database. ALTER DATABASE SET OWNER will transfer the OWNER privileges to the new owner. DROP DATABASE will revoke the OWNER privileges from the owner. This will apply to DATABASE, TABLE, and VIEW. Example: If ownership is enabled, when a table is created, the creator is the owner, and Sentry will create owner privileges for the created table so the user can continue working with it without waiting for Sentry refresh. Inserts will be available immediately. Testing: - Created new custom cluster tests for object ownership Change-Id: I1e09332e007ed5aa6a0840683c879a8295c3d2b0 Reviewed-on: http://gerrit.cloudera.org:8080/11314 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-14 06:03:44 +00:00
Todd Lipcon	b986f2a8bb	IMPALA-7510. Support principals/privileges with LocalCatalog This enables support for Sentry authorization when LocalCatalog is enabled. The design is detailed in a change to the comment on CatalogdMetaProvider, but to recap it briefly here: At a high level, this patch takes the approach of duplicating the "v1" catalog flow for PRINCIPAL and PRIVILEGE catalog objects. Namely, the catalog daemon publishes complete objects into the statestore topic, and the impalad fully replicates them locally. I took this approach rather than trying to do fine-grained caching and invalidation for the following reasons: - The PRINCIPAL and PRIVILEGE metadata is typically many orders of magnitude smaller than table metadata. So, the benefit of fine-grained caching and eviction is not as great. - The PRINCIPAL and PRIVILEGE catalog objects are fairly tightly intertwined with relationships between them and backwards mappings maintained from groups back to principals. This logic is implemented by the AuthorizationPolicy class. Implementing similar mapping in a fine-grained caching approach would be a reasonable amount of work. - This bit of code is under some current flux as others are working on implementing more fine grained permissioning. Thus, trying to duplicate the logic in a "fetch-on-demand" implementation might turn out to be chasing somewhat of a moving target. In order to take this approach, the patch is organized as follows: - refactored some of the role/principal removal logic from ImpaladCatalog into AuthorizationPolicy. This makes it easier to perform the similar "subscribe" with less duplicate cdoe. - changed catalogd to publish PRINCIPAL and PRIVILEGE objects to v2 catalogs in addition to v1. - passed through LocalCatalog.getAuthPolicy to CatalogdMetaProvider, and added an AuthorizationPolicy member there. This member is maintained when we see PRINCIPAL and PRIVILEGE objects come via the catalog updates. - had to implement LocalCatalog.isReady() to ensure that we don't allow user access until the first topic update has been consumed. - additionally had to copy some other code from ImpaladCatalog to protect against various races -- we need a CatalogDeltaLog as well as careful sequencing of the order in which the objects apply. With this patch and the following one to enable UDF support, I was able to run the tests in tests/authorization successfully with LocalCatalog enabled. Change-Id: Iccce5aabdb6afe466fdaeae0fb3700c66e658558 Reviewed-on: http://gerrit.cloudera.org:8080/11358 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Todd Lipcon <todd@apache.org>	2018-09-06 02:39:08 +00:00
Todd Lipcon	8dcf54aee2	IMPALA-7469. Invalidate LocalCatalog cache based on topic updates This implements cache invalidation inside CatalogdMetaProvider. The design is as follows: - when the catalogd collects updates into the statestore topic, it now adds an additional entry for each table and database. These additional entries are minimal - they only include the object's name, but no metadata. This new behavior is conditional on a new flag --catalog_topic_mode. The default mode is to keep the old style, but it can be configured to mixed (support both v1 and v2) or v2-only. - the old-style topic entries are prefixed with a '1:' whereas the new minimal entries are prefixed with a '2:'. The impalad will subscribe to one or the other prefix depending on whether it is running with --use_local_catalog. Thus, old impalads will not be confused by the new entries and vice versa. - when the impalad gets these topic updates, it forwards them through to the catalog implementation. The LocalCatalog implementation forwards them to the CatalogdMetaProvider, which uses them to invalidate cached metadata as appropriate. This patch includes some basic unit tests. I also did some manual testing by connecting to different impalads and verifying that a session connected to impalad #1 saw the effects of DDLs made by impalad #2 within a short period of time (the statestore topic update frequency). Existing end-to-end tests cover these code paths pretty thoroughly: - if we didn't automatically invalidate the cache on a coordinator in response to DDL operations, then any test which expects to "read its own writes" (eg access a table after creating one) would fail - if we didn't propagate invalidations via the statestore, then all of the tests that use sync_ddl would fail. I verified the test coverage above using some of the tests in test_ddl.py -- I selectively commented out a few of the invalidation code paths in the new code and verified that tests failed until I re-introduced them. Along the way I also improved test_ddl so that, when this code is broken, it properly fails with a timeout. It also has a bit of expanded coverage for both the SYNC_DDL and non-SYNC cases. I also wrote a new custom-cluster test for LocalCatalog that verifies a few of the specific edge cases like detecting catalogd restart, SYNC_DDL behavior in mixed mode, etc. One notable exception here is the implementation of INVALIDATE METADATA This turned out to be complex to implement, so I left a lengthy TODO describing the issue and filed a JIRA. Change-Id: I615f9e6bd167b36cd8d93da59426dd6813ae4984 Reviewed-on: http://gerrit.cloudera.org:8080/11280 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Todd Lipcon <todd@apache.org>	2018-09-05 22:51:15 +00:00
Vuk Ercegovac	72ee4a4275	IMPALA-7425: Change incremental stats to pull from catalogd. Currently, incremental stats can consume a substantial amount of metadata memory (per table, partition, column). This metadata is transmitted from catalogd to all coordinators. As a result, memory is used for all loaded tables that use incremental stats all the time at all coordinators. A consequence is that coordinators and catalogd die from OOM more often when incremental stats are used and more network bandwidth is used. This change removes incremental stats from impalads. These stats are only needed when computing incremental statistics and merging new results with the existing results. They are not used by queries. As a result, the change requires that coordinators fetch incremental stats directly from catalogd when computing incremental stats. In addition, catalogd no longer sends incremental stats to coordinators via the statestore. The option is enabled by setting a new flag, --pull_incremental_statistics, on the catalogd and all impalad coordinators. Testing: - manual testing - added end-to-end tests with --pull_incremental_statistics enabled for the compute-stats-incremental.test - added fe CatalogTest for new catalogd service method - passes exhaustive tests when --pull_incremental_statistics is enabled and disabled Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Reviewed-on: http://gerrit.cloudera.org:8080/11193 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-05 20:49:54 +00:00
Vuk Ercegovac	c692e5cc9e	IMPALA-7408: add a debugging flag to disable reading fs data from catalogd Add the flag: --disable_catalog_data_ops_debug_only that skips loading files from the file-system from catalogd. The flag is by default false and its hidden. Its intent is to avoid time-consuming accesses to the file-system when debugging metadata issues and the file-system contents are not available. For example, a recent ~18 GB catalog takes 10 hours to load without the flag set vs. 1 hour to load with the flag. The extra time comes from accessing the file-system, failing, and logging exceptions. This flag specifically disables copying jars from the fs when loading Java functions and it skips loading avro schema files. Additional cases can be added under this flag if more are needed. Testing: - manually confirmed that jars and avro schema files are skipped. - added a test to check the same behavior in a custom cluster test. - ran core tests. Change-Id: I15789fb489b285e2a6565025eb17c63cdc726354 Reviewed-on: http://gerrit.cloudera.org:8080/11191 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-15 01:58:18 +00:00
Todd Lipcon	4aec50484a	IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to dev@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. A new test verifies the behavior, set to 'xfail' when running on the existing catalog implementation. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Reviewed-on: http://gerrit.cloudera.org:8080/10970 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>	2018-08-07 17:38:04 +00:00
Michael Ho	8d7f638654	IMPALA-7212: Removes --use_krpc flag and remove old DataStream services This change removes the flag --use_krpc which allows users to fall back to using Thrift based implementation of DataStream services. This flag was originally added during development of IMPALA-2567. It has served its purpose. As we port more ImpalaInternalServices to use KRPC, it's becoming increasingly burdensome to maintain parallel implementation of the RPC handlers. Therefore, going forward, KRPC is always enabled. This change removes the Thrift based implemenation of DataStreamServices and also simplifies some of the tests which were skipped when KRPC is disabled. Testing done: core debug build. Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3 Reviewed-on: http://gerrit.cloudera.org:8080/10835 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-24 02:36:50 +00:00
Tim Armstrong	e07fbc1b63	IMPALA-7185: low statestore custom cluster interval This changes the default statestore interval for the custom cluster tests. This can reduce the time taken for the cluster to start and metadata to load. On some tests this resulted in saving 5+ seconds per test. Overall it shaved around a minute off the custom cluster tests. Testing: Ran 10 iterations of the tests. Change-Id: Ia5d1612283ff420d95b0dd0ca5a2a67f56765f79 Reviewed-on: http://gerrit.cloudera.org:8080/10845 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-02 22:06:38 +00:00
Taras Bobrovytsky	8060f4d50e	IMPALA-7102 (Part 1): Disable reading of erasure coding by default In this patch we add a query option ALLOW_ERASURE_CODED_FILES, that allows us to enable or disable the support of erasure coded files. Even though Impala should be able to handle HDFS erasure coded files already, this feature hasn't been tested thoroughly yet. Also, Impala lacks metrics, observability and DDL commands related to erasure coding. This is a query option instead of a startup flag because we want to make it possible for advanced users to enable the feature. We may also need a follow on patch to also disable the write path with this flag. Cherry-picks: not for 2.x Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3 Reviewed-on: http://gerrit.cloudera.org:8080/10646 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-29 23:26:35 +00:00
Michael Ho	3b72a6c0da	IMPALA-2567: Enable KRPC by default This change enables the switch to use KRPC by default. This change also fixes a bug in KrpcDataStreamMgr to check if maintenance thread was started before calling Join() on it. This shows up in BE tests as the maintenance thread isn't started in them. Testing done: exhaustive build. Change-Id: Iae736c1c1351758969b4d84e34fc5b2d048660a0 Reviewed-on: http://gerrit.cloudera.org:8080/9461 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-05 08:57:40 +00:00
Lars Volker	a8fc9f0fc7	IMPALA-6508: add KRPC test flag This change adds a flag "--use_krpc" to start-impala-cluster.py. The flag is currently passed as an argument to the impalad daemon. In the future it will also enable KRPC for the catalogd and statestored daemons. This change also adds a flag "--test_krpc" to pytest. When running tests using "impala-py.test --test_krpc", the test cluster will be started by passing "--use_krpc" to start-impala-cluster.py (see above). This change also adds a SkipIf to skip tests based on whether the cluster was started with KRPC support or not. - SkipIf.not_krpc can be used to mark a test that depends on KRPC. - SkipIf.not_thrift can be used to mark a test that depends on Thrift RPC. This change adds a meta test to make sure that the new SkipIf decorators work correctly. The test should be removed as soon as real tests have been added with the new decorators. Change-Id: Ie01a5de2afac4a0f43d5fceff283f6108ad6a3ab Reviewed-on: http://gerrit.cloudera.org:8080/9291 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-16 09:26:01 +00:00
Vuk Ercegovac	6a2b7a64fb	IMPALA-4704: Turns on client connections when local catalog initialized. Currently, impalad starts beeswax and hs2 servers even if the catalog has not yet been initialized. As a result, client connections see an error message stating that the impalad is not yet ready. This patch changes the impalad startup sequence to wait until the catalog is received before opening beeswax and hs2 ports and starting their servers. Testing: - python e2e tests that start a cluster without a catalog and check that client connections are rejected as expected. Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Reviewed-on: http://gerrit.cloudera.org:8080/8202 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-13 21:14:14 +00:00
Matthew Jacobs	7a1ff1e5e9	IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts The -use_local_tz_for_unix_timestamp_conversion flag exists to specify if TIMESTAMPs should be interpreted as localtime or UTC when converting to/from Unix time via builtins: from_unixtime(bigint unixtime) unix_timestamp(string datetime[, ...]) unix_timestamp(timestamp datetime) However, the KuduScanner was calling into code that, when the gflag above was set, interpreted Unix times as local time. Unfortunately the write path (KuduTableSink) and some FE TIMESTAMP code (see KuduUtil.java) did not have this behavior, i.e. we were handling the gflag inconsistently. Tests: * Adds a custom cluster test to run Kudu test cases with -use_local_tz_for_unix_timestamp_conversion. * Adds tests for the new builtin unix_micros_to_utc_timestamp() which run in a custom cluster test (added test_local_tz_conversion.py) as well as in the regular tests (added to test_exprs.py). Change-Id: I423a810427353be76aa64442044133a9a22cdc9b Reviewed-on: http://gerrit.cloudera.org:8080/7311 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-07-19 22:17:13 +00:00
Dimitris Tsirogiannis	e2c53a8bdf	IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Reviewed-on: http://gerrit.cloudera.org:8080/6628 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-26 01:45:40 +00:00
Dimitris Tsirogiannis	296df3c826	IMPALA-4041: Limit catalog and admission control updates to coordinators With this commit we add the ability to limit catalog updates to a limited set of coordinator nodes. A new startup option, termed 'is_coordinator' is added to indicate if a node is a coordinator. Coordinators accept connections through HS2 and Beeswax interfaces and can also participate in query execution. Non-coordinator nodes do not receive catalog updates from the statestore, do not initialize a query scheduler and cannot accept Beeswax and HS2 client connections. Testing: - Added a custom cluster test that launches a cluster in which the number of coordinators is less than the cluster size and runs a number of smoke queries. - Successfully run exhaustive tests. Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Reviewed-on: http://gerrit.cloudera.org:8080/6344 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-28 22:27:25 +00:00
David Knupp	f590bc0da6	IMPALA-4750: Rename test infra classes so they don't mimic test classes. This patch addresses warning messages from pytest re: the imported TestMatrix, TestVector, and TestDimension classes, which were being collected as potential test classes. The fix was to simply prepend the class names with Impala- git grep -l 'TestDimension' \| xargs \ sed -i 's/TestDimension/ImpalaTestDimension/g' git grep -l 'TestMatrix' \| xargs \ sed -i 's/TestMatrix/ImpalaTestMatrix/g' git grep -l 'TestVector' \| xargs \ sed -i 's/TestVector/ImpalaTestVector/g' The tests all passed in an exhaustive run on the upstream jenkins server: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/ Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1 Reviewed-on: http://gerrit.cloudera.org:8080/5794 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-26 23:40:22 +00:00
Lars Volker	8b7f876649	IMPALA-4722: Disable log caching in test_scratch_disk test_scratch_disk fails sporadically when trying to assert the presence of log messages. This is probably caused by log caching, since after such failures the log files do contains the lines in question. I manually tested this by running the tests repeatedly for 2 days (10k runs). To make future diagnosis of similar problems easier, this change also adds more output to assert_impalad_log_contains(). Change-Id: I9f21284338ee7b4374aca249b6556282b0148389 Reviewed-on: http://gerrit.cloudera.org:8080/5669 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 18:58:48 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
Michael Brown	067af1957c	IMPALA-3614: work around pytest bugs causing custom cluster test skips All versions of pytest contain various bugs regarding test marking (including skips) when tests are both: 1. class-level marked 2. inherited More info is available in IMPALA-3614 and IMPALA-2943, but the gist is that it's possible for some tests to be skipped when they shouldn't be. This is happening pretty badly with the custom cluster tests, because CustomClusterTestSuite has a class level skipif mark. The easiest workaround for now is to remove the pytest skipif mark in CustomClusterTestSuite and skip using explicit pytest.skip() in the setup_class() method. Some CustomClusterTestSuite children implemented their own setup_* methods, and I made some adjustments to them both to clean them up and implement proper parent method calling via super(). Testing: I ran the following combinations of all the custom cluster tests: DEBUG / HDFS / core RELEASE / HDFS / exhaustive DEBUG / LOCAL / core DEBUG / S3 / core Before, we'd get situations in which most of the tests were skipped. Consider the RELEASE/HDFS/exhaustive situation: custom_cluster/test_admission_controller.py ..... custom_cluster/test_alloc_fail.py ss custom_cluster/test_breakpad.py sssss custom_cluster/test_delegation.py sss custom_cluster/test_exchange_delays.py ss custom_cluster/test_hdfs_fd_caching.py s custom_cluster/test_hive_parquet_timestamp_conversion.py ss custom_cluster/test_insert_behaviour.py ss custom_cluster/test_legacy_joins_aggs.py s custom_cluster/test_parquet_max_page_header.py s custom_cluster/test_permanent_udfs.py sss custom_cluster/test_query_expiration.py sss custom_cluster/test_redaction.py ssss custom_cluster/test_s3a_access.py s custom_cluster/test_scratch_disk.py ssss custom_cluster/test_session_expiration.py s custom_cluster/test_spilling.py ssss authorization/test_authorization.py ss authorization/test_grant_revoke.py s Now, more tests run appropriately: custom_cluster/test_admission_controller.py ..... custom_cluster/test_alloc_fail.py ss custom_cluster/test_breakpad.py sssss custom_cluster/test_delegation.py ... custom_cluster/test_exchange_delays.py ss custom_cluster/test_hdfs_fd_caching.py . custom_cluster/test_hive_parquet_timestamp_conversion.py .. custom_cluster/test_insert_behaviour.py .. custom_cluster/test_kudu_not_available.py . custom_cluster/test_legacy_joins_aggs.py . custom_cluster/test_parquet_max_page_header.py . custom_cluster/test_permanent_udfs.py ... custom_cluster/test_query_expiration.py ... custom_cluster/test_redaction.py .... custom_cluster/test_s3a_access.py s custom_cluster/test_scratch_disk.py .... custom_cluster/test_session_expiration.py . custom_cluster/test_spilling.py .... authorization/test_authorization.py .. authorization/test_grant_revoke.py . Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c Reviewed-on: http://gerrit.cloudera.org:8080/3265 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Michael Brown <mikeb@cloudera.com>	2016-06-06 17:34:07 -07:00
Lars Volker	c9df348c38	IMPALA-2686: Add breakpad crash handler to all daemons This changes add breakpad crash handling support to catalogd, impalad, and statestored. The destination folder for minidump files can be configured via the 'minidump_path' command line flag. Leaving it empty will disable minidump generation. The daemons will rotate minidump files. The number of files to keep can be configured with the 'max_minidumps' command line flag. Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6 Reviewed-on: http://gerrit.cloudera.org:8080/2028 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:52 -07:00
Skye Wanderman-Milne	9c4eb9fc61	IMPALA-2605: prevent long-running child processes from keeping TCP connection open The problem: By default, all file descriptors opened by a process, including sockets, are inherited by any forked child processes. This includes the connection socket created at the beginning of each test in ImpalaTestSuite.setup_class(). In TestHiveMetaStoreFailure.test_hms_service_dies(), the Hive Metastore is stopped and restarted, meaning the metastore in now a child process of the test process. This causes the client connection not to be closed when the parent process (the test) exits, meaning that one of a finite number of connections (64) to Impala is left permanently in use. This would be barely noticeable except run-tests.py runs the mini stress test with 4 * <num CPUs> concurrent clients by default. On our build machines, this is 64 clients, which is also the default max number of connections for an impalad. When a test process tries to make the 65th connection (since the leaked connection is still there), it blocks until a connection is freed up. Due to a quirk of the xdist py.test plugin that I don't fully understand, the test framework will not clean up test classes (and close the connections) until a number of tests complete, causing the test process to deadlock. The solution: use the close_fds argument to make sure the TCP socket is closed in the spawned child process. This is also done in CustomClusterTestSuite._start_impala_cluster() when it starts the new cluster. This patch also switches test_hms_failure.py to use check_call() instead of call(), and explicitly caps the number of stress clients at 64. Change-Id: I03feae922883a0624df1422ffb6ba5f1d83fb869 Reviewed-on: http://gerrit.cloudera.org:8080/1853 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2016-01-22 22:59:22 +00:00
Tim Armstrong	7e92a5b8c9	Improve error handling when starting test impala cluster Check return code of start-impala-cluster.py and check that statestored was found in test_custom_cluster. This avoids various strange scenarios where the cluster wasn't created correctly. Change-Id: Iebaf325d085b85ad156f2bf8a39dddcf6319fb09 Reviewed-on: http://gerrit.cloudera.org:8080/1765 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 23:42:08 +00:00
Vlad Berindei	b6c20b2a40	Allow Impala to run against local filesystem. Allow Impala to start only with a running HMS (and no additional services like HDFS, HBase, Hive, YARN) and use the local file system. Skip all tests that need these services, use HDFS caching or assume that multiple impalads are running. To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has permissions since this is the location where the test data will be extracted. Test coverage (with core strategy) in comparison with HDFS and S3: HDFS 1348 tests passed S3 1157 tests passed Local Filesystem 1161 tests passed Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03 Reviewed-on: http://gerrit.cloudera.org:8080/1352 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins Readability: Alex Behm <alex.behm@cloudera.com>	2015-12-05 06:48:32 +00:00
Tim Armstrong	1d2afcfec2	IMPALA-2079: Part 1: report non-writable scratch dirs at startup Previously Impala could erroneously decide to use non-writable scratch directories, e.g. if /tmp/impala-scratch already exists and is not writable by the current user. With this change, if we cannot remove and recreate a fresh scratch directory, it is not used. If we have no valid scratch directories, we log an error and continue startup. Add unit test for CreateDirectory to test behavior for success and failure cases. Add system tests to check logging and query execution in various scenarios where we do not have scratch available. Modify FilesystemUtil to use non-exception-throwing Boost functions to avoid unhandled exceptions escaping into the rest of the Impala codebase, which does not expect the use of exceptions. Change-Id: Icaa8429051942424e1d811c54bde10102ac7f7b3 Reviewed-on: http://gerrit.cloudera.org:8080/565 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-08-14 00:38:22 +00:00

1 2

57 Commits