impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
stiga-huang	3725b4ea63	IMPALA-14617: Deflake TestMetadataReplicas.test_catalog_restart TestMetadataReplicas.test_catalog_restart creates a db and an underlying table in Hive, then expects an INVALIDATE command can bring up the table in impalads. The test runs in the legacy catalogd mode that has the bug of IMPALA-12103. So if the INVALIDATE command runs in a state that catalgod has the db in cache but the db doesn't show up in impalad's cache yet, catalogd will just return the table and impalad will skip it due to db not exists. Then the above assertion fails. The db is added in catalogd cache by processing the CREATE_DATABASE HMS event, which is asynchronous with executing the INVALIDATE command. If the command is triggered before that, the test passes. If the command is triggered after that, the test fails. When the test was written, we don't have HMS event processing yet. It's expected that the db is also added in catalogd by the INVALIDATE command. To deflake the issue, this patch disables HMS event processing in this test, so catalogd always has a consistent cache with impalad when executing the INVALIDATE command. This patch also changes the log level of a log in ImpaladCatalog to warn if a table is not added due to its db is missing. Tests: - Ran the test locally 10 times. Change-Id: I2d17404cc8093eacf9b51df3d22caf5cbb6a61a9 Reviewed-on: http://gerrit.cloudera.org:8080/23798 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-18 06:27:47 +00:00
Riza Suminto	6b005a793a	IMPALA-14296: Fix with_args fixture in TestMetadataReplicas IMPALA-13947 has incorrect fixture edit that cause following error: common/custom_cluster_test_suite.py:396: in setup_method pytest.fail("Cannot specify with_args on both class and methods") E Failed: Cannot specify with_args on both class and methods This patch move the with_args fixture at test_catalog_restart up to the class level. Testing: Run and pass TestMetadataReplicas in exhaustive mode. Change-Id: I9016eac859fb01326b3d1e0a8e8e135f03d696bb Reviewed-on: http://gerrit.cloudera.org:8080/23280 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Xuebin Su <xsu@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-08-11 11:34:54 +00:00
Riza Suminto	1cead45114	IMPALA-13947: Test local catalog mode by default Local catalog mode has been the default and works well in downstream Impala for over 5 years. This patch turn on local catalog mode by default (--catalog_topic_mode=minimal and --use_local_catalog=true) as preferred mode going forward. Implemented LocalCatalog.setIsReady() to facilitate using local catalog mode for FE tests. Some FE tests fail due to behavior differences in local catalog mode like IMPALA-7539. This is probably OK since Impala now largely hand over FileSystem permission check to Apache Ranger. The following custom cluster tests are pinned to evaluate under legacy catalog mode because their behavior changed in local catalog mode: TestCalcitePlanner.test_calcite_frontend TestCoordinators.test_executor_only_lib_cache TestMetadataReplicas TestTupleCacheCluster TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set --use_hms_column_order_for_hbase_tables=true flag for both impalad and catalogd to get consistent column order in either local or legacy catalog mode. Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error assertions to be more fine grained by matching individual query id. Move most of test methods from TestRangerLegacyCatalog to TestRangerLocalCatalog, except for some that do need to run in legacy catalog mode. Also renamed TestRangerLocalCatalog to TestRangerDefaultCatalog. Table ownership issue in local catalog mode remains unresolved (see IMPALA-8937). Testing: Pass exhaustive tests. Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f Reviewed-on: http://gerrit.cloudera.org:8080/23080 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-06 21:42:24 +00:00
Riza Suminto	0b1a32fad8	IMPALA-13850 (part 4): Implement in-place reset for CatalogD This patch improve the availability of CatalogD under huge INVALIDATE METADATA operation. Previously, CatalogServiceCatalog.reset() hold versionLock_.writeLock() for the whole reset duration. When the number of database, tables, or functions are big, this write lock can be held for a long time, preventing any other catalog operation from proceeding. This patch improve the situation by: 1. Making CatalogServiceCatalog.reset() rebuild dbCache_ in place and occasionally release the write lock between rebuild stages. 2. Fetch databases, tables, and functions metadata from MetaStore in background using ExecutorService. Added catalog_reset_max_threads flag to control number of threads to do parallel fetch. In order to do so, lexicographic order must be enforced during reset() and ensure all Db invalidation within a single stage is complete before releasing the write lock. Stages should run in approximately the same amount of time. A catalog operation over a database must ensure that no reset operation is currently running, or the database name is lexicographically less than the current database-under-invalidation. This patch adds CatalogResetManager to do background metadata fetching and provide helper methods to help facilitate waiting for reset progress. CatalogServiceCatalog must hold the versionLock_.writeLock() before calling most of CatalogResetManager methods. These are methods in CatalogServiceCatalog class that must wait for CatalogResetManager.waitOngoingMetadataFetch(): addDb() addFunction() addIncompleteTable() addTable() invalidateTableIfExists() removeDb() removeFunction() removeTable() renameTable() replaceTableIfUnchanged() tryLock() updateDb() InvalidateAwareDbSnapshotIterator.hasNext() Concurrent global IM must wait until currently running global IM complete. The waiting happens by calling waitFullMetadataFetch(). CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a time. With this patch, it is now possible that some Db in this snapshot maybe removed from dbCache() by concurrent reset(). Caller that cares about snapshot integrity like CatalogServiceCatalog.getCatalogDelta() should be careful when iterating the snapshot. It must iterate in lexicographic order, similar like reset(), and make sure that it does not go beyond the current database-under-invalidation. It also must skip the Db that it is currently being inspected if Db.isRemoved() is True. Added helper class InvalidateAwareDbSnapshot for this kind of iteration Override CatalogServiceCatalog.getDb() and CatalogServiceCatalog.getDbs() to wait until first reset metadata complete or looked up Db found in cache. Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice and test_restart_local_catalogd_twice. Update CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass timeout values to helper methods that it calls. Reduce cluster_size from 10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness. Fixed HMS connection leak between tests in AuthorizationStmtTest (see IMPALA-8073). Testing: - Pass exhaustive tests. Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d Reviewed-on: http://gerrit.cloudera.org:8080/22640 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-07-09 14:05:04 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	1eb0510eaa	IMPALA-11456: Collapse filesystem Skip logic Combines all SkipIf* classes for different filesystems into a single SkipIfFS class. Many cases are simplified to 'not IS_HDFS', with the rest as filesystem-specific special cases. The 'jira' option is removed in favor of specific flags for each issue. Change-Id: Ib928a6274baaaec45614887b9e762346a25812a1 Reviewed-on: http://gerrit.cloudera.org:8080/18781 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-10 22:37:08 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Fucun Chu	157086cb80	IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Reviewed-on: http://gerrit.cloudera.org:8080/17503 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-12-08 16:32:02 +00:00
stiga-huang	2dfc68d852	IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Support for caching GCS file handles will be addressed in IMPALA-10568. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on GCS (IMPALA-10562). - Some tests are skipped due to issues introduced by /etc/hosts setting on GCE instances (IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Reviewed-on: http://gerrit.cloudera.org:8080/17121 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-13 11:20:08 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Sahil Takiar	691f9d9ff9	IMPALA-6249: Expose several build flags via web UI Exposes a list of build flags via the impalad web UI. The build flags can be viewed on the root page under the "Version" section. They can be accessed via other tests through the debug version of the root page (e.g. adding &json to the URL). The build flags are listed in a JSON array so that they can be parsed easily. This should help run Impala tests against a remote Impala cluster. The build flags are read in CMakeLists.txt and then stored in preprocessor variables. Three build flags are exposed as part of this commit: - Is_NDEBUG = [true, false] - Whether NDEBUG was true or false at compile time - CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN, UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG] - The value of CMAKE_BUILD_TYPE at compile time - Library_Link_Type = [DYNAMIC, STATIC] - Derived from the compile time value of BUILD_SHARED_LIBS There are a few other minor changes that are apart of this commit: * The patch modifies environ.py so that it supports fetching build metadata for both local and remote clusters. * The tests under the tests/webserver directory were not being run because 'webserver' was not whitelisted in tests/run-tests.py. This patch fixes that and addresses several test failures in run-tests.py. * It reverts part of IMPALA-6947 so that their is no dependency from start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947 is now set at compile time. Testing: Added new tests to webserver/test_web_pages.py to ensure that the build flags are being set. Some tests are only run when run against a local cluster because we have no way of getting the build info from a remote cluster, whereas local clusters contain a .cmake_build_type file. Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19 Reviewed-on: http://gerrit.cloudera.org:8080/11410 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-05 22:47:31 +00:00
Sean Mackrory	7a022cf36a	IMPALA-7681. Add Azure Blob File System (ADLS Gen2) support. HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the ADLS Gen2 service. It's in the hadoop-azure module as a replacement for WASB. Filesystem semantics should be the same, so skipped tests and other behavior changes have simply mirrored what is done for ADLS Gen1 by default. Tests skipped on ADLS Gen1 due to eventual consistency of the Python client can be run against ADLS Gen2. Change-Id: I5120b071760e7655e78902dce8483f8f54de445d Reviewed-on: http://gerrit.cloudera.org:8080/11630 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-20 06:43:00 +00:00
Vuk Ercegovac	6af65697f2	IMPALA-7017: deflake/fix test_catalog_restart test The custom_cluster/test_metadata_replicas.py:test_catalog_restart test has been recently flaky/broken for two reasons: 1) Variable support for Hive and non-hdfs filesystems. Other tests that depend on Hive have disabled tests for non-hdfs filesystems. Since the functionality tested is not intended for all filesystems, this change disables this test for all filesystems other than hdfs. 2) Several builds have been flaky when looking up catalogd's version. This change adds a retry for obtaining the version. Change-Id: Iab6edb01f0bd7f5408cfef28fd05fdc95fb78469 Reviewed-on: http://gerrit.cloudera.org:8080/10397 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-17 09:01:14 +00:00
Vuk Ercegovac	28c1f76529	IMPALA-6948,IMPALA-6962: add end-to-end tests Adds end-to-end tests to validate that following various metadata operations, the catalog state in catalogd and impalads is the same. For IMPALA-6962, catalogd process restart for tests is fixed. Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5 Reviewed-on: http://gerrit.cloudera.org:8080/10291 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-09 22:27:36 +00:00

16 Commits