Commit Graph

15 Commits

Author SHA1 Message Date
Pranav Lodha
4c549d79f2 IMPALA-12992: Support for Hive JDBC Storage handler tables
This is an enhancement request to support JDBC tables
created by Hive JDBC Storage handler. This is essentially
done by making JDBC table properties compatible with
Impala. It is done by translating when loading the table,
and maintaining that only in the Impala cluster, i.e. it's
not written back to HMS.

Impala includes JDBC drivers for PostgreSQL and MySQL
making 'driver.url' not mandatory in such cases. The
Impala JDBC driver is still required for Impala-to-Impala
JDBC connections. Additionally, Hive allows adding database
driver JARs at runtime via Beeline, enabling users to
dynamically include JDBC driver JARs. However, Impala does
not support adding database driver JARs at runtime,
making the driver.url field still useful
in cases where additional drivers are needed.

'hive.sql.query' property is not handled in this patch.
It'll be covered in a separate jira.

Testing: End-to-end tests are included in
test_ext_data_sources.py.

Change-Id: I1674b93a02f43df8c1a449cdc54053cc80d9c458
Reviewed-on: http://gerrit.cloudera.org:8080/22134
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-27 11:44:38 +00:00
Pranav Lodha
d3134bfb4a IMPALA-13101: test_data_source_tables fails with Data source does not exist
The test failed with a "Data source does not exist" due
to name conflicts with pre-existing Data source objects.
To resolve this, each datasource name is made unique for
each concurrently running test dimension. The fix ensures
that the test runs smoothly without encountering errors
related to conflicting Data source names.

Testing: To test this change, it needs to be built with
-ubsan flag, post which a bash script is triggered to set
some environment variables, followed by './bin/run-all-tests.sh'
command to make sure all tests are run. Some important
environment variables of the bash script includes:

1. EXPLORATION_STRATEGY set to exhaustive to ensure all
possible scenarios are covered.
2. The specific test file to run is query_test/
test_ext_data_sources.py::TestExtDataSources
::test_data_source_tables and custom_cluster/
test_ext_data_sources.py, while frontend (FE), backend (BE),
and cluster tests are disabled. End-to-end tests are enabled
(EE_TEST=true), with iteration and failure limits also
specified.

Change-Id: I29822855da8136e013c8a62bb0489a181bf131ae
Reviewed-on: http://gerrit.cloudera.org:8080/21815
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-09-18 21:02:20 +00:00
wzhou-code
6c0c26146d IMPALA-12896: Avoid JDBC table to be set as transactional table
In some deployment environment, JDBC tables are set as transactional
tables by default. This causes catalogd failed to load the metadata for
JDBC tables. This patch explicitly add table properties with
"transactional=false" for JDBC table to avoid the JDBC to be set as
transactional table.

The operations on JDBC table are processed only on coordinator. The
processed rows should be estimated as 0 for DataSourceScanNode by
planner so that coordinator-only query plans are generated for simple
queries on JDBC tables and queries could be executed without invoking
executor nodes. Also adds Preconditions.check to make sure numNodes
equals 1 for DataSourceScanNode.

Updates FileSystemUtil.copyFileFromUriToLocal() function to write log
message for all types of exceptions.

Testing:
 - Fixed planer tests for data source tables.
 - Ran end-to-end tests of JDBC tables with query option
   'exec_single_node_rows_threshold' as default value 100.
 - Passed core-tests.

Change-Id: I556faeda923a4a11d4bef8c1250c9616f77e6fa6
Reviewed-on: http://gerrit.cloudera.org:8080/21141
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-13 20:40:26 +00:00
wzhou-code
564f2ced73 IMPALA-12848: Fixed flaky test test_catalogd_ha_failover
TestExtDataSources::test_catalogd_ha_failover failed to delete data
source object after catalog service failed over to standby catalogd.
Log messages showed that coordinator tried to submit the DDL request
to original active catalogd since it did not receive failover
notification from statestored yet.

To fix the flaky test, wait until coordinator receive failover
notification from statestored before executing DDL request to drop
data source.

Testing:
 - Looped to run the test for more than hundred times without failure.

Change-Id: Ia6225271357740c055c25fdd349f1dc9162c2f53
Reviewed-on: http://gerrit.cloudera.org:8080/21078
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-01 08:53:59 +00:00
gaurav1086
f7a43b18aa IMPALA-12503: Support date data type for predicates
for external data source table

This patch adds support for datatype date as predicates
for external data sources.

Testing:
- Added tests for date predicates with operators:
  '=', '>', '<', '>=', '<=', '!=', 'BETWEEN'.

Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Reviewed-on: http://gerrit.cloudera.org:8080/20915
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-05 21:28:00 +00:00
wzhou-code
b094f0e2e7 IMPALA-12642 (Part-2): Fixed unit-test to verify query options for JDBC external table
Previous patch of IMPALA-12642 added supporting query options for Impala
external JDBC table. It added one unit-test to verify query option
ENABLED_RUNTIME_FILTER_TYPES by checking the queries in Queries Web
page. The test failed on Ubuntu 18.04 since the value of query option is
shown as single quoted string, instead of double quoted string.
This patch fixed the error.

Testing:
 - Ran tests/custom_cluster/test_ext_data_sources.py on Ubuntu 18.04,
   and Ubuntu 20.04.

Change-Id: I996c8fac038132f2b132d5e6ac36aca1dff59d72
Reviewed-on: http://gerrit.cloudera.org:8080/20978
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: gaurav singh <gsingh@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
2024-01-31 21:41:09 +00:00
gaurav1086
adfe82c97c IMPALA-12471 PART-2: skip mysql ext jdbc tests if
setup environment fails.

This patch modifies the mysql tests to be marked as xfailed
if the mysql environment fails to setup successfully.

Change-Id: Ib7829aed09d25ff3e636004f3d1f32ecc6f37299
Reviewed-on: http://gerrit.cloudera.org:8080/20975
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-01-31 01:00:09 +00:00
wzhou-code
f8e8cd0906 IMPALA-12642: Support query options for Impala external JDBC table
This patch uses JDBC connection string to apply query options to the
Impala server by setting the properties in "jdbc.properties" when
creating JDBC external DataSource table.
jdbc.properties are specified as comma-delimited key=value string, like
"MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"".

Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have
double quotes in the beginning and ending of string.

jdbc.properties can be used for other databases like Postgres and MySQL
to set additional properties. The test cases will be added in separate
patch.

Testing:
 - Added end-to-end tests for setting query options on Impala JDBC
   tables.
 - Passed core tests.

Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Reviewed-on: http://gerrit.cloudera.org:8080/20837
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-17 23:16:42 +00:00
wzhou-code
a2c2f118d2 IMPALA-12375: Make DataSource Object persistent
DataSource objects are saved in-memory cache in Catalog server. They are
not persisted to the HMS. The objects are lost after Catalog server is
restarted and user needs to recreate DataSource objects before creating
new external DataSource tables.
This patch makes DataSource Object persistent by saving DataSource
objects as DataConnector objects with type "impalaDataSource" in HMS.
Since HMS events for DataConnector are not handled, Catalog server
has to refresh DataSource objects when the catalogd becomes active.
Note that this feature is not supported for Apache Hive 3.1 and older
version.

Testing:
 - Added two end-to-end unit tests with restarting of Catalog server,
   and catalogd HA failover.
   These two tests are skipped when USE_APACHE_HIVE is set as true
   and Apache Hive version is 3.x or older version.
 - Passed all-build-options-ub2004.
 - Passed core test.

Change-Id: I500a99142bb62ce873e693d573064ad4ffa153ab
Reviewed-on: http://gerrit.cloudera.org:8080/20768
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-01-03 03:25:18 +00:00
wzhou-code
ec22a1e1ca IMPALA-12502: Support Impala to Impala federation
This patch adds support to read Impala tables in the Impala cluster
through JDBC external data source. It also adds a new counter
NumExternalDataSourceGetNext in profile for the total number of calls
to ExternalDataSource::GetNext().
Setting query options for Impala will be supported in a following patch.

Testing:
 - Added an end-to-end unit test to read Impala tables from Impala
   cluster through JDBC external data source.
   Manually ran the unit-test with Impala tables in Impala cluster on a
   remote host by setting $INTERNAL_LISTEN_HOST in jdbc.url as the ip
   address of the remote host on which an Impala cluster is running.
 - Added LDAP test for reading table through JDBC external data source
   with LDAP authentication.
   Manually ran the unit-test with Impala tables in a remote Impala
   cluster.
 - Passed core tests.

Change-Id: I79ad3273932b658cb85c9c17cc834fa1b5fbd64f
Reviewed-on: http://gerrit.cloudera.org:8080/20731
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-12-22 21:44:49 +00:00
gaurav1086
aeb9a82060 IMPALA-12471 PART-2: Add check for mysqld socket file
This patch adds a check for the existance of mysqld.sock
file in directory: /var/run/mysqld/ inside the mysqld
docker container. If the file is not present then the
test is skipped.

Testing: tested manually with and without the mysqld.sock
file.

Change-Id: I393fd03fa6efd4c11781d219f66978a4f556c668
Reviewed-on: http://gerrit.cloudera.org:8080/20780
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-12-13 06:52:49 +00:00
gaurav1086
4c762725c7 IMPALA-12471: Add unit tests of external jdbc
tables for MySQL

This patch adds MySql tests for the "external data source"
mechanism in Impala to implement data source for querying JDBC.

This patch also fixes the handling of case-sensitive table and
column names for MySQL query.

Testing:
- Added unit test for mysql and ran unit-test with JDBC
driver mysql-connector-j-8.1.0.jar. This test requires
to add the docker to sudoer's group. Also, the test is
only run in 'exhaustive' mode.

Change-Id: I446ec3d4ebaf53c8edac0b2d181514bde587dfae
Reviewed-on: http://gerrit.cloudera.org:8080/20710
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-12-02 06:03:05 +00:00
wzhou-code
a5a99adcd2 IMPALA-12376: DataSourceScanNode drop some returned rows
DataSourceScanNode does not handle eos properly in function
DataSourceScanNode::GetNext(). Rows, which are returned from
external data source, could be dropped if data_source_batch_size
is set with value which is greater than default value 1024.

Testing:
 - Added end-to-end test with data_source_batch_size as 2048.
   The test failed without fixing, passed with fixing.
   Also added test with data_source_batch_size as 512.
 - Passed core tests.

Change-Id: I978d0a65faa63a47ec86a0127c0bee8dfb79530b
Reviewed-on: http://gerrit.cloudera.org:8080/20636
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-03 00:05:54 +00:00
gaurav1086
39adf42a30 IMPALA-12470: Support different schemes for jdbc driver url when
creating external jdbc table

This patch builds on top of IMPALA-5741 to copy the jdbc jar from
remote filesystems: Ozone and S3. Currenty we only support hdfs.

Testing:
Commented out "@skipif.not_hdfs" qualifier in files:
  - tests/query_test/test_ext_data_sources.py
  - tests/custom_cluster/test_ext_data_sources.py
1) tested locally by running tests:
  - impala-py.test tests/query_test/test_ext_data_sources.py
  - impala-py.test tests/custom_cluster/test_ext_data_sources.py
2) tested using jenkins job for ozone and S3

Change-Id: I804fa3d239a4bedcd31569f2b46edb7316d7f004
Reviewed-on: http://gerrit.cloudera.org:8080/20639
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-01 23:32:10 +00:00
wzhou-code
c77a457520 IMPALA-7131: Support external data sources in LocalCatalog mode
This patch makes external data source working in LocalCatalog mode:
 - Add APIs in CatalogdMetaProvider to fetch DataSource from Catalog
   server through RPC.
 - Add getDataSources() and getDataSource() in LocalCatalog.
 - Add LocalDataSourceTable class for loading DataSource table in
   LocalCatalog.
 - Handle request for loading DataSource in CatalogServiceCatalog on
   Catalog server.
 - Enable tests which are skipped by
   SkipIfCatalogV2.data_sources_unsupported().
   Remove SkipIfCatalogV2.data_sources_unsupported().
 - Add end-to-end tests for LocalCatalog mode.

Testing:
 - Passed core tests

Change-Id: I40841c9be9064ac67771c4d3f5acbb3b552a2e55
Reviewed-on: http://gerrit.cloudera.org:8080/20574
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-10-30 16:04:47 +00:00