impala

mirror of https://github.com/apache/impala.git synced 2025-12-21 02:48:14 -05:00

Author	SHA1	Message	Date
pranav.lodha	acca24fe02	IMPALA-14005: Support for quoted reserved words column names This change updates the way column names are projected in the SQL query generated for JDBC external tables. Instead of relying on optional mapping or default behavior, all column names are now explicitly quoted using appropriate quote characters. Column names are now wrapped with quote characters based on the JDBC driver being used: 1. Backticks (`) for Hive, Impala and MySQL 2. Double quotes (") for all other databases This helps in the support for case-sensitive or reserved column names. Change-Id: I5da5bc7ea5df8f094b7e2877a0ebf35662f93805 Reviewed-on: http://gerrit.cloudera.org:8080/23066 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2025-08-12 15:01:13 +00:00
stiga-huang	5bdd9c7f39	IMPALA-14227: (Addendum) Add more tests for catalogd HA warm failover This adds more tests in test_catalogd_ha.py for warm failover. Refactored _test_metadata_after_failover to run in the following way: - Run DDL/DML in the active catalogd. - Kill the active catalogd and wait until the failover finishes. - Verify the DDL/DML results in the new active catalogd. - Restart the killed catalogd It accepts two methods in parameters to perform the DDL/DML and the verifier. In the last step, the killed catalogd is started so we keep having 2 catalogd and can merge these into a single test by invoking _test_metadata_after_failover for different method pairs. This saves some test time. The following DDL/DML statements are tested: - CreateTable - AddPartition - REFRESH - DropPartition - INSERT - DropTable After each failover, the table is verified to be warmed up (i.e. loaded). Also validate flags in startup to make sure enable_insert_events and enable_reload_events are both set to true when warm failover is enabled, i.e. --catalogd_ha_reset_metadata_on_failover=false. Change-Id: I6b20adeb0bd175592b425e521138c41196347600 Reviewed-on: http://gerrit.cloudera.org:8080/23206 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2025-07-30 00:29:07 +00:00
Riza Suminto	0b1a32fad8	IMPALA-13850 (part 4): Implement in-place reset for CatalogD This patch improve the availability of CatalogD under huge INVALIDATE METADATA operation. Previously, CatalogServiceCatalog.reset() hold versionLock_.writeLock() for the whole reset duration. When the number of database, tables, or functions are big, this write lock can be held for a long time, preventing any other catalog operation from proceeding. This patch improve the situation by: 1. Making CatalogServiceCatalog.reset() rebuild dbCache_ in place and occasionally release the write lock between rebuild stages. 2. Fetch databases, tables, and functions metadata from MetaStore in background using ExecutorService. Added catalog_reset_max_threads flag to control number of threads to do parallel fetch. In order to do so, lexicographic order must be enforced during reset() and ensure all Db invalidation within a single stage is complete before releasing the write lock. Stages should run in approximately the same amount of time. A catalog operation over a database must ensure that no reset operation is currently running, or the database name is lexicographically less than the current database-under-invalidation. This patch adds CatalogResetManager to do background metadata fetching and provide helper methods to help facilitate waiting for reset progress. CatalogServiceCatalog must hold the versionLock_.writeLock() before calling most of CatalogResetManager methods. These are methods in CatalogServiceCatalog class that must wait for CatalogResetManager.waitOngoingMetadataFetch(): addDb() addFunction() addIncompleteTable() addTable() invalidateTableIfExists() removeDb() removeFunction() removeTable() renameTable() replaceTableIfUnchanged() tryLock() updateDb() InvalidateAwareDbSnapshotIterator.hasNext() Concurrent global IM must wait until currently running global IM complete. The waiting happens by calling waitFullMetadataFetch(). CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a time. With this patch, it is now possible that some Db in this snapshot maybe removed from dbCache() by concurrent reset(). Caller that cares about snapshot integrity like CatalogServiceCatalog.getCatalogDelta() should be careful when iterating the snapshot. It must iterate in lexicographic order, similar like reset(), and make sure that it does not go beyond the current database-under-invalidation. It also must skip the Db that it is currently being inspected if Db.isRemoved() is True. Added helper class InvalidateAwareDbSnapshot for this kind of iteration Override CatalogServiceCatalog.getDb() and CatalogServiceCatalog.getDbs() to wait until first reset metadata complete or looked up Db found in cache. Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice and test_restart_local_catalogd_twice. Update CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass timeout values to helper methods that it calls. Reduce cluster_size from 10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness. Fixed HMS connection leak between tests in AuthorizationStmtTest (see IMPALA-8073). Testing: - Pass exhaustive tests. Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d Reviewed-on: http://gerrit.cloudera.org:8080/22640 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-07-09 14:05:04 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Pranav Lodha	4c549d79f2	IMPALA-12992: Support for Hive JDBC Storage handler tables This is an enhancement request to support JDBC tables created by Hive JDBC Storage handler. This is essentially done by making JDBC table properties compatible with Impala. It is done by translating when loading the table, and maintaining that only in the Impala cluster, i.e. it's not written back to HMS. Impala includes JDBC drivers for PostgreSQL and MySQL making 'driver.url' not mandatory in such cases. The Impala JDBC driver is still required for Impala-to-Impala JDBC connections. Additionally, Hive allows adding database driver JARs at runtime via Beeline, enabling users to dynamically include JDBC driver JARs. However, Impala does not support adding database driver JARs at runtime, making the driver.url field still useful in cases where additional drivers are needed. 'hive.sql.query' property is not handled in this patch. It'll be covered in a separate jira. Testing: End-to-end tests are included in test_ext_data_sources.py. Change-Id: I1674b93a02f43df8c1a449cdc54053cc80d9c458 Reviewed-on: http://gerrit.cloudera.org:8080/22134 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-27 11:44:38 +00:00
Pranav Lodha	d3134bfb4a	IMPALA-13101: test_data_source_tables fails with Data source does not exist The test failed with a "Data source does not exist" due to name conflicts with pre-existing Data source objects. To resolve this, each datasource name is made unique for each concurrently running test dimension. The fix ensures that the test runs smoothly without encountering errors related to conflicting Data source names. Testing: To test this change, it needs to be built with -ubsan flag, post which a bash script is triggered to set some environment variables, followed by './bin/run-all-tests.sh' command to make sure all tests are run. Some important environment variables of the bash script includes: 1. EXPLORATION_STRATEGY set to exhaustive to ensure all possible scenarios are covered. 2. The specific test file to run is query_test/ test_ext_data_sources.py::TestExtDataSources ::test_data_source_tables and custom_cluster/ test_ext_data_sources.py, while frontend (FE), backend (BE), and cluster tests are disabled. End-to-end tests are enabled (EE_TEST=true), with iteration and failure limits also specified. Change-Id: I29822855da8136e013c8a62bb0489a181bf131ae Reviewed-on: http://gerrit.cloudera.org:8080/21815 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-09-18 21:02:20 +00:00
wzhou-code	6c0c26146d	IMPALA-12896: Avoid JDBC table to be set as transactional table In some deployment environment, JDBC tables are set as transactional tables by default. This causes catalogd failed to load the metadata for JDBC tables. This patch explicitly add table properties with "transactional=false" for JDBC table to avoid the JDBC to be set as transactional table. The operations on JDBC table are processed only on coordinator. The processed rows should be estimated as 0 for DataSourceScanNode by planner so that coordinator-only query plans are generated for simple queries on JDBC tables and queries could be executed without invoking executor nodes. Also adds Preconditions.check to make sure numNodes equals 1 for DataSourceScanNode. Updates FileSystemUtil.copyFileFromUriToLocal() function to write log message for all types of exceptions. Testing: - Fixed planer tests for data source tables. - Ran end-to-end tests of JDBC tables with query option 'exec_single_node_rows_threshold' as default value 100. - Passed core-tests. Change-Id: I556faeda923a4a11d4bef8c1250c9616f77e6fa6 Reviewed-on: http://gerrit.cloudera.org:8080/21141 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-13 20:40:26 +00:00
wzhou-code	564f2ced73	IMPALA-12848: Fixed flaky test test_catalogd_ha_failover TestExtDataSources::test_catalogd_ha_failover failed to delete data source object after catalog service failed over to standby catalogd. Log messages showed that coordinator tried to submit the DDL request to original active catalogd since it did not receive failover notification from statestored yet. To fix the flaky test, wait until coordinator receive failover notification from statestored before executing DDL request to drop data source. Testing: - Looped to run the test for more than hundred times without failure. Change-Id: Ia6225271357740c055c25fdd349f1dc9162c2f53 Reviewed-on: http://gerrit.cloudera.org:8080/21078 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-01 08:53:59 +00:00
gaurav1086	f7a43b18aa	IMPALA-12503: Support date data type for predicates for external data source table This patch adds support for datatype date as predicates for external data sources. Testing: - Added tests for date predicates with operators: '=', '>', '<', '>=', '<=', '!=', 'BETWEEN'. Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Reviewed-on: http://gerrit.cloudera.org:8080/20915 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-05 21:28:00 +00:00
wzhou-code	b094f0e2e7	IMPALA-12642 (Part-2): Fixed unit-test to verify query options for JDBC external table Previous patch of IMPALA-12642 added supporting query options for Impala external JDBC table. It added one unit-test to verify query option ENABLED_RUNTIME_FILTER_TYPES by checking the queries in Queries Web page. The test failed on Ubuntu 18.04 since the value of query option is shown as single quoted string, instead of double quoted string. This patch fixed the error. Testing: - Ran tests/custom_cluster/test_ext_data_sources.py on Ubuntu 18.04, and Ubuntu 20.04. Change-Id: I996c8fac038132f2b132d5e6ac36aca1dff59d72 Reviewed-on: http://gerrit.cloudera.org:8080/20978 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: gaurav singh <gsingh@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com>	2024-01-31 21:41:09 +00:00
gaurav1086	adfe82c97c	IMPALA-12471 PART-2: skip mysql ext jdbc tests if setup environment fails. This patch modifies the mysql tests to be marked as xfailed if the mysql environment fails to setup successfully. Change-Id: Ib7829aed09d25ff3e636004f3d1f32ecc6f37299 Reviewed-on: http://gerrit.cloudera.org:8080/20975 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-01-31 01:00:09 +00:00
wzhou-code	f8e8cd0906	IMPALA-12642: Support query options for Impala external JDBC table This patch uses JDBC connection string to apply query options to the Impala server by setting the properties in "jdbc.properties" when creating JDBC external DataSource table. jdbc.properties are specified as comma-delimited key=value string, like "MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"". Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have double quotes in the beginning and ending of string. jdbc.properties can be used for other databases like Postgres and MySQL to set additional properties. The test cases will be added in separate patch. Testing: - Added end-to-end tests for setting query options on Impala JDBC tables. - Passed core tests. Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Reviewed-on: http://gerrit.cloudera.org:8080/20837 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-17 23:16:42 +00:00
wzhou-code	a2c2f118d2	IMPALA-12375: Make DataSource Object persistent DataSource objects are saved in-memory cache in Catalog server. They are not persisted to the HMS. The objects are lost after Catalog server is restarted and user needs to recreate DataSource objects before creating new external DataSource tables. This patch makes DataSource Object persistent by saving DataSource objects as DataConnector objects with type "impalaDataSource" in HMS. Since HMS events for DataConnector are not handled, Catalog server has to refresh DataSource objects when the catalogd becomes active. Note that this feature is not supported for Apache Hive 3.1 and older version. Testing: - Added two end-to-end unit tests with restarting of Catalog server, and catalogd HA failover. These two tests are skipped when USE_APACHE_HIVE is set as true and Apache Hive version is 3.x or older version. - Passed all-build-options-ub2004. - Passed core test. Change-Id: I500a99142bb62ce873e693d573064ad4ffa153ab Reviewed-on: http://gerrit.cloudera.org:8080/20768 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-01-03 03:25:18 +00:00
wzhou-code	ec22a1e1ca	IMPALA-12502: Support Impala to Impala federation This patch adds support to read Impala tables in the Impala cluster through JDBC external data source. It also adds a new counter NumExternalDataSourceGetNext in profile for the total number of calls to ExternalDataSource::GetNext(). Setting query options for Impala will be supported in a following patch. Testing: - Added an end-to-end unit test to read Impala tables from Impala cluster through JDBC external data source. Manually ran the unit-test with Impala tables in Impala cluster on a remote host by setting $INTERNAL_LISTEN_HOST in jdbc.url as the ip address of the remote host on which an Impala cluster is running. - Added LDAP test for reading table through JDBC external data source with LDAP authentication. Manually ran the unit-test with Impala tables in a remote Impala cluster. - Passed core tests. Change-Id: I79ad3273932b658cb85c9c17cc834fa1b5fbd64f Reviewed-on: http://gerrit.cloudera.org:8080/20731 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Wenzhe Zhou <wzhou@cloudera.com>	2023-12-22 21:44:49 +00:00
gaurav1086	aeb9a82060	IMPALA-12471 PART-2: Add check for mysqld socket file This patch adds a check for the existance of mysqld.sock file in directory: /var/run/mysqld/ inside the mysqld docker container. If the file is not present then the test is skipped. Testing: tested manually with and without the mysqld.sock file. Change-Id: I393fd03fa6efd4c11781d219f66978a4f556c668 Reviewed-on: http://gerrit.cloudera.org:8080/20780 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-13 06:52:49 +00:00
gaurav1086	4c762725c7	IMPALA-12471: Add unit tests of external jdbc tables for MySQL This patch adds MySql tests for the "external data source" mechanism in Impala to implement data source for querying JDBC. This patch also fixes the handling of case-sensitive table and column names for MySQL query. Testing: - Added unit test for mysql and ran unit-test with JDBC driver mysql-connector-j-8.1.0.jar. This test requires to add the docker to sudoer's group. Also, the test is only run in 'exhaustive' mode. Change-Id: I446ec3d4ebaf53c8edac0b2d181514bde587dfae Reviewed-on: http://gerrit.cloudera.org:8080/20710 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2023-12-02 06:03:05 +00:00
wzhou-code	a5a99adcd2	IMPALA-12376: DataSourceScanNode drop some returned rows DataSourceScanNode does not handle eos properly in function DataSourceScanNode::GetNext(). Rows, which are returned from external data source, could be dropped if data_source_batch_size is set with value which is greater than default value 1024. Testing: - Added end-to-end test with data_source_batch_size as 2048. The test failed without fixing, passed with fixing. Also added test with data_source_batch_size as 512. - Passed core tests. Change-Id: I978d0a65faa63a47ec86a0127c0bee8dfb79530b Reviewed-on: http://gerrit.cloudera.org:8080/20636 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Wenzhe Zhou <wzhou@cloudera.com>	2023-11-03 00:05:54 +00:00
gaurav1086	39adf42a30	IMPALA-12470: Support different schemes for jdbc driver url when creating external jdbc table This patch builds on top of IMPALA-5741 to copy the jdbc jar from remote filesystems: Ozone and S3. Currenty we only support hdfs. Testing: Commented out "@skipif.not_hdfs" qualifier in files: - tests/query_test/test_ext_data_sources.py - tests/custom_cluster/test_ext_data_sources.py 1) tested locally by running tests: - impala-py.test tests/query_test/test_ext_data_sources.py - impala-py.test tests/custom_cluster/test_ext_data_sources.py 2) tested using jenkins job for ozone and S3 Change-Id: I804fa3d239a4bedcd31569f2b46edb7316d7f004 Reviewed-on: http://gerrit.cloudera.org:8080/20639 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Wenzhe Zhou <wzhou@cloudera.com>	2023-11-01 23:32:10 +00:00
wzhou-code	c77a457520	IMPALA-7131: Support external data sources in LocalCatalog mode This patch makes external data source working in LocalCatalog mode: - Add APIs in CatalogdMetaProvider to fetch DataSource from Catalog server through RPC. - Add getDataSources() and getDataSource() in LocalCatalog. - Add LocalDataSourceTable class for loading DataSource table in LocalCatalog. - Handle request for loading DataSource in CatalogServiceCatalog on Catalog server. - Enable tests which are skipped by SkipIfCatalogV2.data_sources_unsupported(). Remove SkipIfCatalogV2.data_sources_unsupported(). - Add end-to-end tests for LocalCatalog mode. Testing: - Passed core tests Change-Id: I40841c9be9064ac67771c4d3f5acbb3b552a2e55 Reviewed-on: http://gerrit.cloudera.org:8080/20574 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2023-10-30 16:04:47 +00:00

19 Commits