impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	48b38810e8	IMPALA-14465: Unset HEAPCHECK when custom cluster tests restart Kudu Custom cluster tests like TestKuduHMSIntegration restart the Kudu service with custom startup flags. On Redhat8 ARM64, these tests have been failing due to Kudu being unresponsive after this restart. Debugging showed that Kudu was stuck early in startup. This only reproduced via the custom cluster tests and never via regular minicluster startup. When custom cluster tests restart Kudu, the script to restart Kudu inherits environment variables from the test runner. It turns out that the HEAPCHECK environment variable (even when empty) causes Kudu to get stuck during startup on Redhat8 ARM64 after the recent toolchain update. As a short-term fix, this unsets HEAPCHECK when restarting the Kudu service for these tests. There will need to be further investigation / cleanup beyond this. Testing: - Ran the Kudu custom cluster tests on Redhat8 ARM64 and on Ubuntu 20 x86_64 Change-Id: I51513e194d9e605df199672231b412fae40343af Reviewed-on: http://gerrit.cloudera.org:8080/23467 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-26 04:45:22 +00:00
jasonmfehr	0fe8de0f3f	IMPALA-14401: Deflake/Improve OpenTelemetry Tracing Tests Contains the following improvements to the Impala queries as OpenTelemetry traces custom cluster tests: 1. Supporting code for asserting traces was moved to 'tests/util/otel_trace.py'. The moved code was modified to remove all references to 'self'. Since this code used 'self.assert_impalad_log_contains', it had to be modified so the caller provides the correct log file path to search. The '__find_span_log' function was updated to call a new generic file grep function to run the necessary log file search regex. All other code was moved unmodified. 2. Classes 'TestOtelTraceSelectsDMLs' and 'TestOtelTraceDDLs' contained a total of 11 individual tests that used the 'unique_database' fixture. When this fixture is used in a test, it results in two DDLs being run before the test to drop/create the database and one DDL being run after the test to drop the database. These classes now create a test database once during 'setup_class' and drop it once during 'teardown_class' because creating a new database for each test was unnecessary. This change dropped test execution time from about 97 seconds to about 77 seconds. 3. Each test now has comments describing what the test is asserting. 4. The unnecessary sleep in 'test_query_exec_fail' was removed saving five seconds of test execution time. 5. New test 'test_dml_insert_fail' added. Previously, the situation where an insert DML failed was not tested. The test passed without any changes to backend code. 6. Test 'test_ddl_createtable_fail' is greatly simplified by using a debug action to fail the query instead of multiple parallel queries where one dropped the database the other was inserting into. The simplified setup eliminated test flakiness caused by timing differences and sped up test execution by about 5 seconds. 7. Fixed test flakiness was caused by timing issues. Depending on when the close process was initiated, span events are sometimes in the QueryExecution span and sometimes in the Close span. Test assertions cannot handle these situations. All span event assertions for the Close span were removed. IMPALA-14334 will fix these assertions. 8. The function 'query_id_from_ui' which retrieves the query profile using the Impala debug ui now makes multiple attempts to retrieve the query. In slower test situations, such as ASAN, the query may not yet be available when the function is called initially which used to cause tests to fail. This test flakiness is now eliminated through the addition of the retries. Testing accomplished by running tests in test_otel_trace.py both locally and in a full Jenkins build. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: I0c3e0075df688c7ae601c6f2e5743f56d6db100e Reviewed-on: http://gerrit.cloudera.org:8080/23385 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-15 23:21:29 +00:00
Riza Suminto	28cff4022d	IMPALA-14333: Run impala-py.test using Python3 Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 10:01:29 +00:00
jasonmfehr	789991c6cc	IMPALA-13237: [Patch 8] - OpenTelemetry Traces for DML/DDL Queries and Handle Leading Comments Trace DML/DDL Queries * Adds tracing for alter, compute, create, delete, drop, insert, invalidate metadata, and with queries. * Stops tracing beeswax queries since that protocol is deprecated. * Adds Coordinator attribute to Init and Root spans for identifying where the query is running. Comment Handling * Corrects handling of leading comments, both inline and full line. Previously, queries with comments before the first keyword were always ignored. * Adds be ctest tests for determining whether or not a query should be traced. General Improvements * Handles the case where the first query keyword is followed by a newline character or an inline comment (without or with spaces between). * Corrects traces for errored/cancelled queries. These cases short-circuit the normal query processing code path and have to be handled accordingly. * Ends the root span when the query ends instead of waiting for the ClientRequestState to go out of scope. This change removes use-after-free issues caused by reading from ClientRequestState when the SpanManager went out of scope during that object's dtor. * Simplified minimum tls version handling because the validators on the ssl_minimum_version eliminate invalid values that previously had to be accounted for. * Removes the unnecessary otel_trace_enabled() function. * Fixes IMPALA-14314 by waiting for the full trace to be written to the output file before asserting that trace. Testing * Full test suite passed. * ASAN/TSAN builds passed. * Adds new ctest test. * Adds custom cluster tests to assert traces for the new supported query types. * Adds custom cluster tests to assert traces for errored and cancelled queries. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: Ie9e83d7f761f3d629f067e0a0602224e42cd7184 Reviewed-on: http://gerrit.cloudera.org:8080/23279 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-03 04:38:36 +00:00
jasonmfehr	2ad6f818a5	IMPALA-13237: [Patch 5] - Implement OpenTelemetry Traces for Select Queries Tracking Adds representation of Impala select queries using OpenTelemetry traces. Each Impala query is represented as its own individual OpenTelemetry trace. The one exception is retried queries which will have an individual trace for each attempt. These traces consist of a root span and several child spans. Each child span has the root as its parent. No child span has another child span as its parent. Each child span represents one high-level query lifecycle stage. Each child span also has span attributes that further describe the state of the query. Child spans: 1. Init 2. Submitted 3. Planning 4. Admission Control 5. Query Execution 6. Close Each child span contains a mix of universal attributes (available on all spans) and query phase specific attributes. For example, the "ErrorMsg" attribute, present on all child spans, is the error message (if any) at the end of that particular query phase. One example of a child span specific attribute is "QueryType" on the Planning span. Since query type is first determined during query planning, the "QueryType" attribute is present on the Planning span and has a value of "QUERY" (since only selects are supported). Since queries can run for lengthy periods of time, the Init span communicates the beginning of a query along with global query attributes. For example, span attributes include query id, session id, sql, user, etc. Once the query has closed, the root span is closed. Testing accomplished with new custom cluster tests. Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7) Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b Reviewed-on: http://gerrit.cloudera.org:8080/22924 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-12 04:11:06 +00:00
Zoltan Borok-Nagy	438461db9e	IMPALA-14138: Manually disable block location loading via Hadoop config For storage systems that support block location information (HDFS, Ozone) we always retrieve it with the assumption that we can use it for scheduling, to do local reads. But it's also typical that Impala is not co-located with the storage system, not even in on-prem deployments. E.g. when Impala runs in containers, and even if they are co-located, we don't try to figure out which container runs on which machine. In such cases we should not reach out to the storage system to collect file information because it can be very expensive for large tables and we won't benefit from it at all. Since currently there is no easy way to tell if Impala is co-located with the storage system this patch adds configuration options to disable block location retrieval during table loading. It can be disabled globally via Hadoop Configuration: 'impala.preload-block-locations-for-scheduling': 'false' We can restrict it to filesystem schemes, e.g.: 'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false' When multiple storage systems are configured with the same scheme, we can still control block location loading based on authority, e.g.: 'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false' The latter only disables block location loading for URIs like 'hdfs://mycluster/warehouse/tablespace/...' If block location loading is disabled by any of the switches, it cannot be re-enabled by another, i.e. the most restrictive setting prevails. E.g: disable scheme 'hdfs', enable authority 'mycluster' ==> hdfs://mycluster/ is still disabled disable globally, enable scheme 'hdfs', enable authority 'mycluster' ==> hdfs://mycluster/ is still disabled, as everything else is. Testing: * added unit tests for FileSystemUtil * added unit tests for the file metadata loaders * custom cluster tests with custom Hadoop configuration Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867 Reviewed-on: http://gerrit.cloudera.org:8080/23175 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-07-17 13:08:15 +00:00
Riza Suminto	0b1a32fad8	IMPALA-13850 (part 4): Implement in-place reset for CatalogD This patch improve the availability of CatalogD under huge INVALIDATE METADATA operation. Previously, CatalogServiceCatalog.reset() hold versionLock_.writeLock() for the whole reset duration. When the number of database, tables, or functions are big, this write lock can be held for a long time, preventing any other catalog operation from proceeding. This patch improve the situation by: 1. Making CatalogServiceCatalog.reset() rebuild dbCache_ in place and occasionally release the write lock between rebuild stages. 2. Fetch databases, tables, and functions metadata from MetaStore in background using ExecutorService. Added catalog_reset_max_threads flag to control number of threads to do parallel fetch. In order to do so, lexicographic order must be enforced during reset() and ensure all Db invalidation within a single stage is complete before releasing the write lock. Stages should run in approximately the same amount of time. A catalog operation over a database must ensure that no reset operation is currently running, or the database name is lexicographically less than the current database-under-invalidation. This patch adds CatalogResetManager to do background metadata fetching and provide helper methods to help facilitate waiting for reset progress. CatalogServiceCatalog must hold the versionLock_.writeLock() before calling most of CatalogResetManager methods. These are methods in CatalogServiceCatalog class that must wait for CatalogResetManager.waitOngoingMetadataFetch(): addDb() addFunction() addIncompleteTable() addTable() invalidateTableIfExists() removeDb() removeFunction() removeTable() renameTable() replaceTableIfUnchanged() tryLock() updateDb() InvalidateAwareDbSnapshotIterator.hasNext() Concurrent global IM must wait until currently running global IM complete. The waiting happens by calling waitFullMetadataFetch(). CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a time. With this patch, it is now possible that some Db in this snapshot maybe removed from dbCache() by concurrent reset(). Caller that cares about snapshot integrity like CatalogServiceCatalog.getCatalogDelta() should be careful when iterating the snapshot. It must iterate in lexicographic order, similar like reset(), and make sure that it does not go beyond the current database-under-invalidation. It also must skip the Db that it is currently being inspected if Db.isRemoved() is True. Added helper class InvalidateAwareDbSnapshot for this kind of iteration Override CatalogServiceCatalog.getDb() and CatalogServiceCatalog.getDbs() to wait until first reset metadata complete or looked up Db found in cache. Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice and test_restart_local_catalogd_twice. Update CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass timeout values to helper methods that it calls. Reduce cluster_size from 10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness. Fixed HMS connection leak between tests in AuthorizationStmtTest (see IMPALA-8073). Testing: - Pass exhaustive tests. Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d Reviewed-on: http://gerrit.cloudera.org:8080/22640 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-07-09 14:05:04 +00:00
Csaba Ringhofer	5cca1aa9e5	IMPALA-13820: add ipv6 support for webui/hs2/hs2-http/beeswax Main changes: - added flag external_interface to override hostname for beeswax/hs2/hs2-http port to allow testing ipv6 on these interfaces without forcing ipv6 on internal communication - compile Squeasel with USE_IPV6 to allow ipv6 on webui (webui interface can be configured with existing flag webserver_interface) - fixed the handling of [<ipv6addr>].<port> style addresses in impala-shell (e.g. [::1]:21050) and test framework - improved handling of custom clusters in test framework to allow webui/ImpalaTestSuite's clients to work with non standard settings (also fixes these clients with SSL) Using ipv4 vs ipv6 vs dual stack can be configured by setting the interface to bind to with flag webserver_interface and external_interface. The Thrift server behind hs2/hs2-http/beeswax only accepts a single host name and uses the first address returned by getaddrinfo() that it can successfully bind to. This means that unless an ipv6 address is used (like ::1) the behavior will depend on the order of addresses returned by getaddrinfo(): `63b7a263fc/lib/cpp/src/thrift/transport/TServerSocket.cpp (L481)` For dual stack the only way currently is to bind to "::", as the Thrift server can only listen a single socket. Testing: - added custom cluster tests for ipv6 only/dual interface with and without SSL - manually tested in dual stack environment with client on a different host - among clients impala-shell and impyla are tested, but not JDBC/ODBC - no tests yet on truly ipv6 only environment, as internal communication (e.g. krpc) is not ready for ipv6 To test manually the dev cluster can be started with ipv6 support: dual mode: bin/start-impala-cluster.py --impalad_args="--external_interface=:: --webserver_interface=::" --catalogd_args="--webserver_interface=::" --state_store_args="--webserver_interface=::" ipv6 only: bin/start-impala-cluster.py --impalad_args="--external_interface=::1 --webserver_interface=::1" --catalogd_args="--webserver_interface=::1" --state_store_args="--webserver_interface=::1" Change-Id: I51ac66c568cc9bb06f4a3915db07a53c100109b6 Reviewed-on: http://gerrit.cloudera.org:8080/22527 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-21 14:00:31 +00:00
Mihaly Szjatinya	e0cb533c25	IMPALA-13912: Use SHARED_CLUSTER_ARGS in more custom cluster tests In addition to IMPALA-13503 which allowed having the single cluster running for the entire test class, this attempts to minimize restarting between the existing tests without modifying any of their code. This changeset saves the command line with which 'start-impala-cluster.py' has been run and skips the restarting if the command line is the same for the next test. Some tests however do require restart due to the specific metrics being tested. Such tests are defined with the 'force_restart' flag within the 'with_args' decorator. NOTE: there might be more tests like that revealed after running the tests in different order resulting in test failures. Experimentally, this results in ~150 fewer restarts, mostly coming from restarts between tests. As for restarts between different variants of the same test, most of the cluster tests are restricted to single variant, although multi-variant tests occur occasionally. Change-Id: I7c9115d4d47b9fe0bfd9dbda218aac2fb02dbd09 Reviewed-on: http://gerrit.cloudera.org:8080/22901 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-19 17:48:25 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Zoltan Borok-Nagy	bd3486c051	IMPALA-13586: Initial support for Iceberg REST Catalogs This patch adds initial support for Iceberg REST Catalogs. This means now it's possible to run an Impala cluster without the Hive Metastore, and without the Impala CatalogD. Impala Coordinators can directly connect to an Iceberg REST server and fetch metadata for databases and tables from there. The support is read-only, i.e. DDL and DML statements are not supported yet. This was initially developed in the context of a company Hackathon program, i.e. it was a team effort that I squashed into a single commit and polished the code a bit. The Hackathon team members were: * Daniel Becker * Gabor Kaszab * Kurt Deschler * Peter Rozsa * Zoltan Borok-Nagy The Iceberg REST Catalog support can be configured via a Java properties file, the location of it can be specified via: --catalog_config_dir: Directory of configuration files Currently only one configuration file can be in the direcory as we only support a single Catalog at a time. The following properties are mandatory in the config file: * connector.name=iceberg * iceberg.catalog.type=rest * iceberg.rest-catalog.uri The first two properties can only be 'iceberg' and 'rest' for now, they are needed for extensibility in the future. Moreover, Impala Daemons need to specify the following flags to connect to an Iceberg REST Catalog: --use_local_catalog=true --catalogd_deployed=false Testing * e2e added to test basic functionlity with against a custom-built Iceberg REST server that delegates to HadoopCatalog under the hood * Further testing, e.g. Ranger tests are expected in subsequent commits TODO: * manual testing against Polaris / Lakekeeper, we could add automated tests in a later patch Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c Reviewed-on: http://gerrit.cloudera.org:8080/22353 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-02 20:04:12 +00:00
Riza Suminto	ea8f74a6ac	IMPALA-13861: Standardize workload management tests This patch standardizes tests against workload management tables (sys.impala_query_log and sys.impala_query_live) to use a common superclass named WorkloadManagementTestSuite. The setup_method of this superclass waits for workload management init completion (wait_for_wm_init_complete()), while the teardown_method waits until impala-server.completed-queries.queued metric reaches 0 (wait_for_wm_idle()). test_query_log.py and test_workload_mgmt_sql_details.py are refactored to extend from WorkloadManagementTestSuite. Tests to assert the query log table flush behavior are grouped together in TestQueryLogTableFlush. test_workload_mgmt_sql_details.py::TestWorkloadManagementSQLDetails now uses 1 minicluster instance for all tests. test_workload_mgmt_init.py does not extend from WorkloadManagementTestSuite because it is testing cluster start and restart scenario. This patch only adds wait_for_wm_idle() at teardown_method where it make sense to do so. test_query_live.py does not extend from WorkloadManagementTestSuite because most of its test method require long --query_log_write_interval_s so that DML queries from workload management worker does not disturb sys.impala_query_live. workload_mgmt parameter in CustomClusterTestSuite.with_args() is standardized to setup appropriate default flags in cluster_setup() rather than passing it down to _start_impala_cluster(): IMPALAD_ARGS --enable_workload_mgmt=true --query_log_write_interval_s=1 \ --shutdown_grace_period_s=0 --shutdown_deadline_s=60 and CATALOGD_ARGS --enable_workload_mgmt=true Note that IMPALAD_ARGS and CATALOGD_ARGS flags added by workload_mgmt and impalad_graceful_shutdown parameter are still overridable to different value by explicitly adding it in the impalad_args and catalogd_args parameters. Setting workload_mgmt=True now automatically enables graceful shutdown for the test. Thus, impalad_graceful_shutdown=True is now removed. With beeswax protocol deprecated, this patch also changes the protocol under test from beeswax to hs2. TestQueryLogTableBeeswax is now renamed to TestQueryLogTableBasic. Additionally, print total wait time in wait_for_metric_value(). Testing: - Run modified tests and pass. Change-Id: Iecf6452fa963304e263805ebeb017c843d17dd16 Reviewed-on: http://gerrit.cloudera.org:8080/22617 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-21 22:31:11 +00:00
Riza Suminto	7ae19b069e	IMPALA-13823: Clear existing entry of TMP_DIRS at cluster_setup Assertion was hit at CustomClusterTestSuite.make_tmp_dir() for attempting to create tmp dir that already have mapping in CustomClusterTestSuite.TMP_DIRS. It can happen if different custom cluster tests run declare the same tmp_dir_placeholders, and one of the earlier run fail to teardown properly, resulting in clear_tmp_dirs() not being called. This patch fix the issue by clearing existing TMP_DIRS entry without removing the underlying filesystem path. Along with it, a WARN log will be printed, saying about dirty entry in TMP_DIRS. Testing: - Manually comment clear_tmp_dirs() call in cluster_teardown() and run TestQueryLogTableBufferPool.test_select. Confirmed that all 4 of its test runs complete, the warning logs printed to logs/custom_cluster_tests/results/TEST-impala-custom-cluster.xml and the tmp dir stays in logs/custom_cluster_tests/. Change-Id: I3f528bb155eb3cf4dfa58a6a23feb438809556bc Reviewed-on: http://gerrit.cloudera.org:8080/22572 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-07 00:56:29 +00:00
jasonmfehr	ea989dfb28	IMPALA-13815: Fix Flaky Workload Management Tests The CustomClusterTestSuite.wait_for_wm_init() function checks for two specific log lines to be logged by the catalog. The first line is logged when workload manangement initialization is complete. The second line is when a catalog topic update has been assembled. However, if workload management initialization is slow, then there may not be a catalog topic update assembled after the initialization completes. When this happens, an assertion fails despite the workload management tables having been properly initialized and loaded by the catalog. This patch simplifies the CustomClusterTestSuite.wait_for_wm_init() function so it waits until the catalogd logs it has completed workload management initialization and then checks each coordinator's local catalog cache for the workload management tables. The following test suites passed locally and in an ASAN build. These tests all call the 'wait_for_wm_init' function of CustomClusterTestSuite. * tests/custom_cluster/test_query_live.py * tests/custom_cluster/test_query_log.py * tests/custom_cluster/test_workload_mgmt_init.py * tests/custom_cluster/test_workload_mgmt_sql_details.py Change-Id: Ieb4c86fa79bb1df000b6241bdd31c7641d807c4f Reviewed-on: http://gerrit.cloudera.org:8080/22570 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-03-05 14:33:32 +00:00
jasonmfehr	aac67a077e	IMPALA-13201: System Table Queries Execute When Admission Queues are Full Queries that run only against in-memory system tables are currently subject to the same admission control process as all other queries. Since these queries do not use any resources on executors, admission control does not need to consider the state of executors when deciding to admit these queries. This change adds a boolean configuration option 'onlyCoordinators' to the fair-scheduler.xml file for specifying a request pool only applies to the coordinators. When a query is submitted to a coordinator only request pool, then no executors are required to be running. Instead, all fragment instances are executed exclusively on the coordinators. A new member was added to the ClusterMembershipMgr::Snapshot struct to hold the ExecutorGroup of all coordinators. This object is kept up to date by processing statestore messages and is used when executing queries that either require the coordinators (such as queries against sys.impala_query_live) or that use an only coordinators request pool. Testing was accomplished by: 1. Adding cluster membership manager ctests to assert cluster membership manager correctly builds the list of non-quiescing coordinators. 2. RequestPoolService JUnit tests to assert the new optional <onlyCoords> config in the fair scheduler xml file is correctly parsed. 3. ExecutorGroup ctests modified to assert the new function. 4. Custom cluster admission controller tests to assert queries with a coordinator only request pool only run on the active coordinators. Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda Reviewed-on: http://gerrit.cloudera.org:8080/22249 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 04:27:11 +00:00
jasonmfehr	d3b6cbcc20	IMPALA-13201: Remove Unused Parameter from Test retry Function The custom cluster tests utilize the retry() function defined in retry.py. This function takes as input another function to do the assertions. This assertion function used to have a single boolean parameter indicating if the retry was on its last attempt. In actuality, this boolean was not used and thus caused flake8 failures. This change removes this unused parameter from the assertion function passed in to the retry function. Change-Id: I1bce9417b603faea7233c70bde3816beed45539e Reviewed-on: http://gerrit.cloudera.org:8080/22452 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-07 05:46:06 +00:00
Xuebin Su	d7ee509e93	IMPALA-12648: Add KILL QUERY statement To support killing queries programatically, this patch adds a new type of SQL statements, called the KILL QUERY statement, to cancel and unregister a query on any coordinator in the cluster. A KILL QUERY statement looks like ``` KILL QUERY '123:456'; ``` where `123:456` is the query id of the query we want to kill. We follow syntax from HIVE-17483. For backward compatibility, 'KILL' and 'QUERY' are added as "unreserved keywords", like 'DEFAULT'. This allows the three keywords to be used as identifiers. A user is authorized to kill a query only if the user is an admin or is the owner of the query. KILL QUERY statements are not affected by admission control. Implementation: Since we don't know in advance which impalad is the coordinator of the query we want to kill, we need to broadcast the kill request to all the coordinators in the cluster. Upon receiving a kill request, each coordinator checks whether it is the coordinator of the query: - If yes, it cancels and unregisters the query, - If no, it reports "Invalid or unknown query handle". Currently, a KILL QUERY statement is not interruptible. IMPALA-13663 is created for this. For authorization, this patch adds a custom handler of AuthorizationException for each statement to allow the exception to be handled by the backend. This is because we don't know whether the user is the owner of the query until we reach its coordinator. To support cancelling child queries, this patch changes ChildQuery::Cancel() to bypass the HS2 layer so that the session of the child query will not be added to the connection used to execute the KILL QUERY statement. Testing: - A new ParserTest case is added to test using "unreserved keywords" as identifiers. - New E2E test cases are added for the KILL QUERY statement. - Added a new dimension in TestCancellation to use the KILL QUERY statement. - Added file tests/common/cluster_config.py and made CustomClusterTestSuite.with_args() composable so that common cluster configs can be reused in custom cluster tests. Change-Id: If12d6e47b256b034ec444f17c7890aa3b40481c0 Reviewed-on: http://gerrit.cloudera.org:8080/21930 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2025-01-22 22:22:54 +00:00
jasonmfehr	cb35dc8769	IMPALA-13536: Workload Management Tests Failing on Init Check Most of the workload management tests verify that the workload management process has successfully completed. Part of this verification ensures a catalog update has propagated the workload management changes to the coordinators by determining the catalog version, from the catalogd logs, that contains the workload management table changes and ensuring that version is in the coordinator logs. The test flakiness occurs when multiple catalogd versions are combined into a later version. Specifically, tests were failing because the coordinator logs were checked for catalog version X but the actual version in the coordinator logs was X+1. The fix for the test flakiness is to allow for the exepected catalog version or any later version. Change-Id: I9f20a149ab1f45ee3506f098f8594965a24a89d3 Reviewed-on: http://gerrit.cloudera.org:8080/22200 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-13 05:44:58 +00:00
jasonmfehr	490f90c65e	IMPALA-13536: Fix Workload Management Init Tests Issues Several problems with the workload management code and test_workload_mgmt_init.py tests have been uncovered by the Ozone tests. * test_create_on_version_1_0_0 - Test comment said it ran on 10 nodes, test configuration specified 1 node. Fix was to modify the test configuration. * test_create_on_version_1_1_0 - Test comment said it ran on 10 nodes, test configuration specified 1 node. Fix was to modify the test configuration. * test_invalid_* - All four of these tests run the same internal function to execute the test. This internal function was not waiting long enough for the expected failure to appear. The fixed internal function waits longer for the expected failure. Additionally, the @CustomClusterTestSuite annotation has a new option named 'log_symlinks', which, if set to True will resolve all daemon log symlinks and output their actual paths to the log. Failed tests can then be easily traced to the exact log files for that test. The existing workload management tests in testdata have been expanded to also assert the expected table properties are present. Modified tests passed on Ozone builds both with and without erasure coding enabled. Change-Id: Ie3f34088d1d925f30abb63471387e6fdb62b95a7 Reviewed-on: http://gerrit.cloudera.org:8080/22119 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-11 01:01:46 +00:00
Michael Smith	2085edbe1c	IMPALA-13503: CustomClusterTestSuite for whole class Allow using CustomClusterTestSuite with a single cluster for the whole class. This speeds up tests by letting us group together multiple test cases on the same cluster configuration and only starting the cluster once. Updates tuple cache tests as an example of how this can be used. Reduces test_tuple_cache execution time from 100s to 60s. Change-Id: I7a08694edcf8cc340d89a0fb33beb8229163b356 Reviewed-on: http://gerrit.cloudera.org:8080/22006 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-18 23:57:39 +00:00
Riza Suminto	95f353ac4a	IMPALA-13507: Allow disabling glog buffering via with_args fixture We have plenty of custom_cluster tests that assert against content of Impala daemon log files while the process is still running using assert_log_contains() and it's wrappers. The method specifically mention about disabling glog buffering ('-logbuflevel=-1'), but not all custom_cluster tests do that. This often result in flaky test that hard to triage and often neglected if it does not frequently run in core exploration. This patch adds boolean param 'disable_log_buffering' into CustomClusterTestSuite.with_args for test to declare intention to inspect log files in live minicluster. If it is True, start minicluster with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on any calls to assert_log_contains(). There are several complex custom_cluster tests that left unchanged and print out such WARNING logs, such as: - TestQueryLive - TestQueryLogTableBeeswax - TestQueryLogOtherTable - TestQueryLogTableHS2 - TestQueryLogTableAll - TestQueryLogTableBufferPool - TestStatestoreRpcErrors - TestWorkloadManagementInitWait - TestWorkloadManagementSQLDetails This patch also fixed some small flake8 issues on modified tests. There is a flakiness sign at test_query_live.py where test query is submitted to coordinator and fail because sys.impala_query_live table has not exist yet from coordinator's perspective. This patch modify test_query_live.py to wait for few seconds until sys.impala_query_live is queryable. Testing: - Pass custom_cluster tests in exhaustive exploration. Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61 Reviewed-on: http://gerrit.cloudera.org:8080/22015 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-05 04:49:05 +00:00
jasonmfehr	7b6ccc644b	IMPALA-12737: Query columns in workload management tables. Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate Columns", and "OrderBy Columns" to the query profile and the workload management active/completed queries tables. These fields are presented as comma separate strings containing the fully qualified column name in the format database.table_name.column_name. Aggregate columns include all columns in the order by and having clauses. Since new columns are being added, the workload management init process is also being modified to allow for one-way upgrades of the table schemas if necessary. Additionally, workload management can be set up to run under a schema version that is not the latest. This ability will be useful during troubleshooting. To enable these upgrades, the workload management initialization that manages the structure of the tables has been moved to the catalogd. The changes in this patch must be backwards compatible so that Impala clusters running previous workload management code can co-exist with Impala clusters running this workload management code. To enable that backwards compatibility, a new table property named 'wm_schema_version' is now used to track the schema version of the workload management tables. Thus, the old property 'schema_version' will always be set to '1.0.0' since modifying that property value causes Impala running previous workload management code to error at startup. Testing accomplished by * Adding/updating workload and custom cluster tests to assert the new columns and the workload management upgrade process. * JUnit tests added to verify the new workload management columns are being correctly parsed. * GTests added to ensure the workload management columns are correctly defined and in the correct order. Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7 Reviewed-on: http://gerrit.cloudera.org:8080/21142 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-10-31 17:06:43 +00:00
Riza Suminto	9c87cf41bf	IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite There are many custom cluster tests that require creating temporary directory. The temporary directory typically live within a scope of test method and cleaned afterwards. However, some test do create temporary directory directly and forgot to clean them afterwards, leaving junk dirs under /tmp/ or $LOG_DIR. This patch unify the temporary directory management inside CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in CustomClusterTestSuite.with_args() that list tmp dirs to create. 'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept formatting pattern that is replaceable by a temporary dir path, defined through 'tmp_dir_placeholders'. There are few occurrences where mkdtemp is called and not replaceable by this work, such as tests/comparison/cluster.py. In that case, this patch change them to supply prefix arg so that developer knows that it comes from Impala test script. This patch also addressed several flake8 errors in modified files. Testing: - Pass custom cluster tests in exhaustive mode. - Manually run few modified tests and observe that the temporary dirs are created and removed under logs/custom_cluster_tests/ as the tests go. Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5 Reviewed-on: http://gerrit.cloudera.org:8080/21836 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-02 01:25:39 +00:00
stiga-huang	fcee022e60	IMPALA-13208: Add cluster id to the membership and request-queue topic names To share catalogd and statestore across Impala clusters, this adds the cluster id to the membership and request-queue topic names. So impalads are only visible to each other inside the same cluster, i.e. using the same cluster id. Note that impalads are still subscribe to the same catalog-update topic so they can share the same catalog service. If cluster id is empty, use the original topic names. This also adds the non-empty cluster id as the prefix of the statestore subscriber id for impalad and admissiond. Tests: - Add custom cluster test - Ran exhaustive tests Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f Reviewed-on: http://gerrit.cloudera.org:8080/21573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-07-18 03:38:27 +00:00
Michael Smith	3b35ddc8ca	IMPALA-13051: Speed up, refactor query log tests Sets faster default shutdown_grace_period_s and shutdown_deadline_s when impalad_graceful_shutdown=True in tests. Impala waits until grace period has passed and all queries are stopped (or deadline is exceeded) before flushing the query log, so grace period of 0 is sufficient. Adds them in setup_method to reduce duplication in test declarations. Re-uses TQueryTableColumn Thrift definitions for testing. Moves waiting for query log table to exist to setup_method rather than as a side-effect of get_client. Refactors workload management code to reduce if-clause nesting. Adds functional query workload tests for both the sys.impala_query_log and the sys.impala_query_live tables to assert the names and order of the individual columns within each table. Renames the python tests for the sys.impala_query_log table removing the unnecessary "_query_log_table_" string from the name of each test. Change-Id: I1127ef041a3e024bf2b262767d56ec5f29bf3855 Reviewed-on: http://gerrit.cloudera.org:8080/21358 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-05-13 22:46:42 +00:00
jasonmfehr	711a9f2bad	IMPALA-12426: Query History Table Adds the ability for users to specify that Impala will create and maintain an internal Iceberg table that contains data about all completed queries. This table is automatically created at startup by each coordinator if it does not exist. Then, most completed queries are queued in memory and flushed to the query history table at a set interval (either minutes or number of records). Set, use, and show queries are not written to this table. This commit leverages the InternalServer class to maintain the query history table. Ctest unit tests have been added to assert the various pieces of code. New custom cluster tests have been added to assert the query history table is properly populated with completed queries. Negative testing consists of attempting sql injection attacks and syntactically incorrect queries. Impala built-in string functions benchmarks have been updated to include the new built-in functions. Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Reviewed-on: http://gerrit.cloudera.org:8080/20770 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-19 22:17:16 +00:00
Riza Suminto	3381fbf761	IMPALA-12595: Allow automatic removal of old logs from previous PID IMPALA-11184 add code to target specific PID for log rotation. This align with glog behavior and grant safety. That is, it is strictly limit log rotation to only consider log files made by the currently running Impalad and exclude logs made by previous PID or other living-colocated Impalads. The downside of this limit is that logs can start accumulate in a node when impalad is frequently restarted and is only resolvable by admin doing manual log removal. To help avoid this manual removal, this patch adds a backend flag 'log_rotation_match_pid' that relax the limit by dropping the PID in glob pattern. Default value for this new flag is False. However, for testing purpose, start-impala-cluster.py will override it to True since test minicluster logs to a common log directory. Setting 'log_rotation_match_pid' to True will prevent one impalad from interfering with log rotation of other impalad in minicluster. As a minimum exercise for this new log rotation behavior, test_breakpad.py::TestLogging is modified to invoke start-impala-cluster.py with 'log_rotation_match_pid' set to False. Testing: - Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid. - Split TestLogging into two. One run test_excessive_cerr_ignore_pid in core exploration, while the other run the rest of logging tests in exhaustive exploration. - Pass exhaustive tests. Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb Reviewed-on: http://gerrit.cloudera.org:8080/20754 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-09 03:34:57 +00:00
wzhou-code	819db8fa46	IMPALA-12155: Support High Availability for CatalogD To support catalog HA, we allow two catalogd instances in an Active- Passive HA pair to be added to an Impala cluster. We add the preemptive behavior for catalogd. When enabled, the preemptive behavior allows the catalogd with the higher priority to become active and the paired catalogd becomes standby. The active catalogd acts as the source of metadata and provides catalog service for the Impala cluster. To enable catalog HA for a cluster, two catalogds in the HA pair and statestore must be started with starting flag "enable_catalogd_ha". The catalogd in an Active-Passive HA pair can be assigned an instance priority value to indicate a preference for which catalogd should assume the active role. The registration ID which is assigned by statestore can be used as instance priority value. The lower numerical value in registration ID corresponds to a higher priority. The catalogd with the higher priority is designated as active, the other catalogd is designated as standby. Only the active catalogd propagates the IMPALA_CATALOG_TOPIC to the cluster. This guarantees only one writer for the IMPALA_CATALOG_TOPIC in a Impala cluster. The statestore which is the registration center of an Impala cluster assigns the roles for the catalogd in the HA pair after both catalogds register to statestore. When statestore detects the active catalogd is not healthy, it fails over catalog service to standby catalogd. When failover occurs, statestore sends notifications with the address of active catalogd to all coordinators and catalogd in the cluster. The events are logged in the statestore and catalogd logs. When the catalogd with the higher priority recovers from a failure, statestore does not resume it as active to avoid flip-flop between the two catalogd. To make a specific catalogd in the HA pair as active instance, the catalogd must be started with starting flag "force_catalogd_active" so that the catalogd will be assigned with active role when it registers to statestore. This allows administrator to manually perform catalog service failover. Added option "--enable_catalogd_ha" in bin/start-impala-cluster.py. If the option is specified when running the script, the script will create an Impala cluster with two catalogd instances in HA pair. Testing: - Passed the core tests. - Added unit-test for auto failover and manual failover. Change-Id: I68ce7e57014e2a01133aede7853a212d90688ddd Reviewed-on: http://gerrit.cloudera.org:8080/19914 Reviewed-by: Xiang Yang <yx91490@126.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2023-06-21 14:02:55 +00:00
Joe McDonnell	0c7c6a335e	IMPALA-11977: Fix Python 3 broken imports and object model differences Python 3 changed some object model methods: - __nonzero__ was removed in favor of __bool__ - func_dict / func_name were removed in favor of __dict__ / __name__ - The next() function was deprecated in favor of __next__ (Code locations should use next(iter) rather than iter.next()) - metaclasses are specified a different way - Locations that specify __eq__ should also specify __hash__ Python 3 also moved some packages around (urllib2, Queue, httplib, etc), and this adapts the code to use the new locations (usually handled on Python 2 via future). This also fixes the code to avoid referencing exception variables outside the exception block and variables outside of a comprehension. Several of these seem like false positives, but it is better to avoid the warning. This fixes these pylint warnings: bad-python3-import eq-without-hash metaclass-assignment next-method-called nonzero-method exception-escape comprehension-escape Testing: - Ran core tests - Ran release exhaustive tests Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee Reviewed-on: http://gerrit.cloudera.org:8080/19592 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	88d49b6919	IMPALA-11693: Enable allow_erasure_coded_files by default Enables allow_erasure_coded_files by default as we've now completed all planned work to support it. Testing - Ran HDFS+EC test suite - Ran Ozone+EC test suite Change-Id: I0cfef087f2a7ae0889f47e85c5fab61a795d8fd4 Reviewed-on: http://gerrit.cloudera.org:8080/19362 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-31 16:53:46 +00:00
stiga-huang	77d80aeda6	IMPALA-11812: Deduplicate column schema in hmsPartitions A list of HMS Partitions will be created in many workloads in catalogd, e.g. table loading, bulk altering partitions by ComputeStats or AlterTableRecoverPartitions, etc. Currently, each of hmsPartition hold a unique list of column schema, i.e. a List<FieldSchema>. This results in lots of FieldSchema instances if the table is wide and lots of partitions need to be loaded/operated. Though the strings of column names and comments are interned, the FieldSchema objects could still occupy the majority of the heap. See the histogram in JIRA description. In reality, the hmsPartition instances of a table can share the table-level column schema since Impala doesn't respect the partition level schema. This patch replaces column list in StorageDescriptor of hmsPartitions with the table level column list to remove the duplications. Also add some progress logs in batch HMS operations, and avoid misleading logs when event-processor is disabled. Tests: - Ran exhaustive tests - Add tests on wide table operations that hit OOM errors without this fix. Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Reviewed-on: http://gerrit.cloudera.org:8080/19391 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-01 04:38:36 +00:00
Fang-Yu Rao	db1cac2a49	IMPALA-10399, IMPALA-11060, IMPALA-11788: Reset Ranger policy repository in an E2E test test_show_grant_hive_privilege() uses Ranger's REST API to get all the existing policies from the Ranger server after creating a policy that grants the LOCK and SELECT privileges on all the tables and columns in the unique database in order to verify the granted privileges indeed exist in Ranger's policy repository. The way we download all the policies from the Ranger server in test_show_grant_hive_privilege(), however, did not always work. Specifically, when there were already a lot of existing policies in Ranger, the policy that granted the LOCK and SELECT privileges would not be included in the result returned via one single GET request. We found that to reproduce the issue it suffices to add 300 Ranger policies before adding the policy granting those 2 privileges. Moreover, we found that even we set the argument 'stream' of requests.get() to True and used iter_content() to read the response in chunks, we still could not retrieve the policy added in test_show_grant_hive_privilege(). As a workaround, instead of changing how we download all the policies from the Ranger server, this patch resets Ranger's policy repository for Impala before we create the policy granting those 2 privileges so that this test will be more resilient to the number of existing policies in the repository. Change-Id: Iff56ec03ceeb2912039241ea302f4bb8948d61f8 Reviewed-on: http://gerrit.cloudera.org:8080/19373 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2022-12-28 01:48:26 +00:00
Michael Smith	35dc24fbc8	IMPALA-10148: Cleanup cores in TestHooksStartupFail Generalizes coredump cleanup and expecting startup failure from test_provider.py and uses it in test_query_event_hooks.py TestHooksStartupFail to ensure core dumps are cleaned up. Testing: ran changed tests, observed core files being created and cleaned up while they ran. Observed other core files already present were not cleaned up, as expected. Change-Id: Iec32e0acbadd65aa78264594c85ffcd574cf3458 Reviewed-on: http://gerrit.cloudera.org:8080/19103 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-13 16:20:34 +00:00
Bikramjeet Vig	06c9016a37	IMPALA-8762: Track host level admission stats across all coordinators This patch adds the ability to share the per-host stats for locally admitted queries across all coordinators. This helps to get a more consolidated view of the cluster for stats like slots_in_use and mem_admitted when making local admission decisions. Testing: Added e2e py test Change-Id: I2946832e0a89b077d0f3bec755e4672be2088243 Reviewed-on: http://gerrit.cloudera.org:8080/17683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-28 05:33:16 +00:00
Thomas Tauber-Marshall	91adb33b22	IMPALA-9975 (part 2): Introduce new admission control daemon A recent patch (IMPALA-9930) introduces a new admission control rpc service, which can be configured to perform admission control for coordinators. In that patch, the admission service runs in an impalad. This patch separates the service out to run in a new daemon, called the admissiond. It also integrates this new daemon with the build infrastructure around Docker. Some notable changes: - Adds a new class, AdmissiondEnv, which performs the same function for the admissiond as ExecEnv does for impalads. - The '/admission' http endpoint is exposed on the admissiond's webui if the admission control service is in use, otherwise it is exposed on coordinator impalad's webuis. - start-impala-cluster.py takes a new flag --enable_admission_service which configures the minicluster to have an admissiond with all coordinators using it for admission control. - Coordinators are now configured to use the admission service by specifying the startup flag --admission_service_host. This is intended to mirror the configuration of the statestored/catalogd location. Testing: - Existing tests for the admission control serivce are modified to run with an admissiond. - Manually ran start-impala-cluster.py with --enable_admission_service and --docker_network to verify Docker integration. Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9 Reviewed-on: http://gerrit.cloudera.org:8080/16891 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-13 06:03:37 +00:00
Fang-Yu Rao	34668fab87	IMPALA-10092: Do not skip test vectors of Kudu tests in a custom cluster We found that the following 4 tests do not run even we remove all the decorators like "@SkipIfKudu.no_hybrid_clock" or "@SkipIfHive3.kudu_hms_notifications_not_supported" to skip the tests. This is due to the fact that those 3 classes inherit the class of CustomClusterTestSuite, which adds a constraint that only allows test vectors with 'file_format' and 'compression_codec' being "text" and "none", respectively, to be run. 1. TestKuduOperations::test_local_tz_conversion_ops 2. TestKuduClientTimeout::test_impalad_timeout 3. TestKuduHMSIntegration::test_create_managed_kudu_tables 4. TestKuduHMSIntegration::test_kudu_alter_table To address this issue, in this patch we create a parent class for those 3 classes above and override the method of add_custom_cluster_constraints() for this newly created parent class so that we do not skip test vectors with 'file_format' and 'compression_codec' being "kudu" and "none", respectively. On the other hand, this patch also removes a redundant method call to super(CustomClusterTestSuite, cls).add_test_dimensions() in CustomClusterTestSuite.add_custom_cluster_constraints() since super(CustomClusterTestSuite, cls).add_test_dimensions() had already been called immediately before the call to add_custom_cluster_constraints() in CustomClusterTestSuite.add_test_dimensions(). Testing: - Manually verified that after removing the decorators to skip those tests, those tests could be run. Change-Id: I60a4bd4ac5a9026629fb840ab9cc7b5f9948290c Reviewed-on: http://gerrit.cloudera.org:8080/16348 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-28 01:37:16 +00:00
Joe McDonnell	3e76da9f51	IMPALA-9708: Remove Sentry support Impala 4 decided to drop Sentry support in favor of Ranger. This removes Sentry support and related tests. It retires startup flags related to Sentry and does the first round of removing obsolete code. This does not adjust documentation to remove references to Sentry, and other dead code will be removed separately. Some issues came up when implementing this. Here is a summary of how this patch resolves them: 1. authorization_provider currently defaults to "sentry", but "ranger" requires extra parameters to be set. This changes the default value of authorization_provider to "", which translates internally to the noop policy that does no authorization. 2. These flags are Sentry specific and are now retired: - authorization_policy_provider_class - sentry_catalog_polling_frequency_s - sentry_config 3. The authorization_factory_class may be obsolete now that there is only one authorization policy, but this leaves it in place. 4. Sentry is the last component using CDH_COMPONENTS_HOME, so that is removed. There are still Maven dependencies coming from the CDH_BUILD_NUMBER repository, so that is not removed. 5. To make the transition easier, testdata/bin/kill-sentry-service.sh is not removed and it is still called from testdata/bin/kill-all.sh. Testing: - Core job passes Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e Reviewed-on: http://gerrit.cloudera.org:8080/15833 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-20 17:43:40 +00:00
stiga-huang	b6b31e4cc4	IMPALA-9071: Handle translated external HDFS table in CTAS After upgrading Hive-3 to a version containing HIVE-22158, it's not allowed for managed tables to be non transactional. Creating non ACID tables will result in creating an external table with table property 'external.table.purge' set to true. In Hive-3, the default location of external HDFS tables will be located in 'metastore.warehouse.external.dir' if it's set. This property is added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6 yet. In CTAS statement, we create a temporary HMS Table for the analysis on the Insert part. The table path is created assuming it's a managed table, and the Insert part will use this path for insertion. However, in Hive-3, the created table is translated to an external table. It's not the same as we passed to the HMS API. The created table is located in 'metastore.warehouse.external.dir', while the table path we assumed is in 'metastore.warehouse.dir'. This introduces bugs when these two properties are different. CTAS statement will create table in one place and insert data in another place. This patch adds a new method in MetastoreShim to wrap the difference for getting the default table path for non transactional tables between Hive-2 and Hive-3. Changes in the infra: - To support customizing hive configuration, add an env var, CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of existing CLASSPATH. The customized hive-site.xml should be put inside CUSTOM_CLASSPATH. - Change hive-site.xml.py to generate a hive-site.xml with non default 'metastore.warehouse.external.dir' - Add an option, --env_vars, in bin/start-impala-cluster.py to pass down CUSTOM_CLASSPATH. Tests: - Add a custom cluster test to start Hive with metastore.warehouse.external.dir being set to non default value. Run it locally using CDP components with HIVE-22158. xfail the test until we bump CDP_BUILD_NUMBER to 1507246. - Run CORE tests using CDH components Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9 Reviewed-on: http://gerrit.cloudera.org:8080/14527 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-10-24 22:10:03 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00
Fredy Wijaya	1b531be9be	IMPALA-8589: Re-enable flaky test_query_event_hooks.py This patch fixes the flaky test_query_event_hooks.py. The patch also cuts down the waiting time for impalad timeout to 5 seconds from the default 60 seconds especially for those tests that will fail Impala startup. Testing: - Ran test_query_event_hooks.py in a loop. Change-Id: Ia64550e986b5eba59a1d77657943932bb977d470 Reviewed-on: http://gerrit.cloudera.org:8080/13713 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-26 00:13:44 +00:00
Todd Lipcon	800f635855	IMPALA-8667. Remove --pull_incremental_stats flag This flag was added as a "chicken bit" -- so we could disable the new feature if we had some problems with it. It's been out in the wild for a number of months and we haven't seen any such problems, so at this point let's stop maintaining the old code path. Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9 Reviewed-on: http://gerrit.cloudera.org:8080/13671 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-19 01:07:00 +00:00
Hao Hao	6bb404dc35	IMPALA-8504 (part 2): Support CREATE TABLE statement with Kudu/HMS integration This commit supports the actual handling of CREATE TABLE DDL for managed Kudu tables when integration with Hive Metastore is enabled. When Kudu/HMS integration is enabled, for CREATE TABLE statement, Impala can rely on Kudu to create the table in the HMS. Change-Id: Icffe412395f47f5e07d97bad457020770cfa7502 Reviewed-on: http://gerrit.cloudera.org:8080/13375 Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Reviewed-by: Grant Henke <granthenke@apache.org> Tested-by: Thomas Marshall <tmarshall@cloudera.com>	2019-06-04 17:36:59 +00:00
Michael Ho	2ece4c9b2e	IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Reviewed-on: http://gerrit.cloudera.org:8080/12987 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-03 19:39:42 +00:00
Tim Armstrong	d820952d86	IMPALA-8469: admit_mem_limit for dedicated coordinator Refactored to avoid the code duplication that resulted in this bug: * admit_mem_limit is calculated once in ExecEnv * The local backend descriptor is always constructed with a static helper: Scheduler::BuildLocalBackendDescriptor() I chose to factor it in this way, in part, to avoid invasive changes to scheduler-test, which currently doesn't depend on ExecEnv or ImpalaServer. Testing: Added basic test that reproduces the bug. Change-Id: Iaceb21b753b9b021bedc4187c0d44aaa6a626521 Reviewed-on: http://gerrit.cloudera.org:8080/13180 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-01 00:37:04 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Radford Nguyen	f998d64767	IMPALA-8363: Fix E2E start with impala_log_dir This commit fixes the `CustomClusterTestSuite` to wait for the correct number of executors when `impala_log_dir` is specified in the test decorator. Previously, the default value of 3 was always used, regardless of `cluster_size`. Testing: - Manual verification using tests/authorization/test_ranger.py with custom `impala_log_dir` and `cluster_size` arguments. Failed before changes, passed after changes - Ran all original E2E tests Change-Id: I4f46f40474b4b380abe88647a37e8e4d2231d745 Reviewed-on: http://gerrit.cloudera.org:8080/12935 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-11 21:19:31 +00:00
Tim Armstrong	f9ced753ba	IMPALA-7999: clean up start-d.sh scripts Delete these wrapper scripts and replace with a generic start-daemon.sh script that sets environment variables without the other logic. Move the logic for setting JAVA_TOOL_OPTIONS into start-impala-cluster.py. Remove some options like -jvm_suspend, -gdb, -perf that may not be used. These can be reintroduced if needed. Port across the kerberized minicluster logic (which has probably bitrotted) in case it needs to be revived. Remove --verbose option that didn't appear to be useful (it claims to print daemon output to the console, but output is still redirected regardless). Removed a level of quoting in custom cluster test argument handling - this was made unnecessary by properly escaping arguments with pipes.escape() in run_daemon(). Testing: Ran exhaustive tests. * Ran on CentOS 6 to confirm we didn't reintroduce Popen issue worked around by kwho. Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b Reviewed-on: http://gerrit.cloudera.org:8080/12271 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-05 13:10:08 +00:00
Philip Zeyliger	13dfdc64db	IMPALA-6664: Tag log statements with fragment or query ids. This implements much of the desire in IMPALA-6664 to tag all log statements with their query ids. It re-uses the existing ThreadDebugInfo infrastructure as well as the existing InstallLogMessageListenerFunction() patch to glog (currently used for log redaction) to prefix log messages with fragment ids or query ids, when available. The fragment id is the query id with the last bits incremented, so it's possible to correlate a given query's log messages. For example: $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO I0108 10:39:16.453958 14752 impala-server.cc:1052] 85420d575b9ff4b9:402b886800000000] Registered query query_id=85420d575b9ff4b9:402b886800000000 session_id=aa45e480434f0516:101ae5ac12679d94 I0108 10:39:16.454738 14752 Frontend.java:1242] 85420d575b9ff4b9:402b886800000000] Analyzing query: select count() from tpcds.web_sales I0108 10:39:16.456627 14752 Frontend.java:1282] 85420d575b9ff4b9:402b886800000000] Analysis finished. I0108 10:39:16.463538 14818 admission-controller.cc:598] 85420d575b9ff4b9:402b886800000000] Schedule for id=85420d575b9ff4b9:402b886800000000 in pool_name=default-pool per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B I0108 10:39:16.463603 14818 admission-controller.cc:603] 85420d575b9ff4b9:402b886800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0) I0108 10:39:16.463780 14818 admission-controller.cc:635] 85420d575b9ff4b9:402b886800000000] Admitted query id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.463896 14818 coordinator.cc:93] 85420d575b9ff4b9:402b886800000000] Exec() query_id=85420d575b9ff4b9:402b886800000000 stmt=select count() from tpcds.web_sales I0108 10:39:16.464795 14818 coordinator.cc:356] 85420d575b9ff4b9:402b886800000000] starting execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.466384 24891 impala-internal-service.cc:49] ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b886800000000 coord=pannier.sf.cloudera.com:22000 #instances=2 I0108 10:39:16.467339 14818 coordinator.cc:370] 85420d575b9ff4b9:402b886800000000] started execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.467536 14823 query-state.cc:579] 85420d575b9ff4b9:402b886800000000] Executing instance. instance_id=85420d575b9ff4b9:402b886800000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1 I0108 10:39:16.467627 14824 query-state.cc:579] 85420d575b9ff4b9:402b886800000001] Executing instance. instance_id=85420d575b9ff4b9:402b886800000001 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0108 10:39:16.820933 14824 query-state.cc:587] 85420d575b9ff4b9:402b886800000001] Instance completed. instance_id=85420d575b9ff4b9:402b886800000001 #in-flight=1 status=OK I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 85420d575b9ff4b9:402b886800000000] DeregisterRecvr(): fragment_instance_id=85420d575b9ff4b9:402b886800000000, node=2 I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22001 remaining=2 query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] query_id=85420d575b9ff4b9:402b886800000000: first in-progress backend: pannier.sf.cloudera.com:22000 I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 85420d575b9ff4b9:402b886800000000] get_results_metadata(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.168762 14752 coordinator.cc:483] 85420d575b9ff4b9:402b886800000000] ExecState: query id=85420d575b9ff4b9:402b886800000000 execution completed I0108 10:39:17.168808 14752 coordinator.cc:608] 85420d575b9ff4b9:402b886800000000] Coordinator waiting for backends to finish, 1 remaining. query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.168880 14823 query-state.cc:587] 85420d575b9ff4b9:402b886800000000] Instance completed. instance_id=85420d575b9ff4b9:402b886800000000 #in-flight=0 status=OK I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): last report for 85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22000 remaining=1 query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174513 14752 coordinator.cc:814] 85420d575b9ff4b9:402b886800000000] Release admission control resources for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174815 14821 query-state.cc:604] Cancel: query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174837 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000001 I0108 10:39:17.174856 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179621 14752 impala-beeswax-server.cc:239] 85420d575b9ff4b9:402b886800000000] close(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179651 14752 impala-server.cc:1131] 85420d575b9ff4b9:402b886800000000] UnregisterQuery(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179666 14752 impala-server.cc:1238] 85420d575b9ff4b9:402b886800000000] Cancel(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179814 14752 coordinator.cc:684] 85420d575b9ff4b9:402b886800000000] CancelBackends() query_id=85420d575b9ff4b9:402b886800000000, tried to cancel 0 backends I0108 10:39:17.203898 14752 query-exec-mgr.cc:184] 85420d575b9ff4b9:402b886800000000] ReleaseQueryState(): deleted query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:18.108947 14752 impala-server.cc:1993] 85420d575b9ff4b9:402b886800000000] Connection from client ::ffff:172.16.35.186:52096 closed, closing 1 associated session(s) I0108 10:39:18.108996 14752 impala-server.cc:1249] 85420d575b9ff4b9:402b886800000000] Closing session: aa45e480434f0516:101ae5ac12679d94 I0108 10:39:18.109035 14752 impala-server.cc:1291] 85420d575b9ff4b9:402b886800000000] Closed session: aa45e480434f0516:101ae5ac12679d94 There are a few caveats here: the thread state isn't "scoped", so the "Closing session" log statement is technically not part of the query. When that thread is re-used for another query, it corrects itself. Some threads, like 14821, aren't using the thread locals. In some case, we should go back and add GetThreadDebugInfo()->SetQueryId(...) statements. I've used this to debug some crashes (of my own doing) while running parallel tests, and it's been quite helpful. An alternative would be to use Kudu's be/src/kudu/util/async_logger.h, and add the "Listener" functionality to it directly. Another alternative would be to re-write all the *LOG macros, but this is quite painful (and presumably was rejected when log redaction was introduced). I changed thread-debug-info to capture TUniqueId (a thrift struct with two int64s) rather than the string representation. This made it easier to compare with the "0:0" id, which we treat as "unset". If a developer needs to analyze it from a debugger, gdb can print out hex just fine. I added some context to request-context to be able to pipe ids through to disk IO threads as well. To test this, I moved "assert_log_contains" up to impala_test_suite, and had it handle the default log location case. The test needs a sleep for log buffering, but, it seems like a test with a sleep running in parallel is better than a custom cluster test, which reboots the cluster (and loads metadata). Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8 Reviewed-on: http://gerrit.cloudera.org:8080/12129 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-25 00:47:09 +00:00

1 2

94 Commits