Commit Graph

11 Commits

Author SHA1 Message Date
wzhou-code
e50bfa8376 IMPALA-12925: Fix decimal data type for external JDBC table
Decimal type is a primitive data type for Impala. Current code returns
wrong values for columns with decimal data type in external JDBC tables.

This patch fixes wrong values returned from JDBC data source, and
supports pushing down decimal type of predicates to remote database
and remote Impala.
The decimal precision and scale of the columns in external JDBC table
must be no less than the decimal precision and scale of the
corresponding columns in the table of remote database. Otherwise,
Impala fails with an error since it may cause truncation of decimal
data.

Testing:
 - Added Planner test for pushing down decimal type of predicates.
 - Added end-to-end unit-tests for tables with decimal type of columns
   for Postgres, MySQL, and Impala-to-Impala.
 - Passed core-tests.

Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b
Reviewed-on: http://gerrit.cloudera.org:8080/21218
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-04-05 09:16:53 +00:00
wzhou-code
c0507c02cd IMPALA-12896 (Part 2): JDBC table must be created as external table
In some of the deployment environments, default table type is
transactional. In these scenarios, JDBC tables which are created as non
external table are not accepted by HMS due to strict managed table check
failures.

This patch forces JDBC tables to be created as external table, and
requires at least 1 column for JDBC tables.

Testing:
 - Updated frontend unit tests and end-to-end unit tests to create JDBC
   tables as external tables.
 - Passed core tests

Change-Id: Ib5533b52434cdf1c430e30ac28a0146ab4d9d4b9
Reviewed-on: http://gerrit.cloudera.org:8080/21159
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-23 09:54:30 +00:00
wzhou-code
74b6df7997 IMPALA-12930: Fix TestExtDataSources.test_jdbc_data_source failure
The patch of IMPALA-12802 added some negative test cases for altering
external JDBC table. These test cases verify the error messages.
One of test cases failed on some test environments due to different
error message returned from Postgres server.

This patch fixes the unit-test failure by checking if the error message
is matching with one of two possible error messages.

Testing:
 - Ran the unit-test on Jenkins with centos and ubuntu and verified the
   unit-test passed for different error messages.

Change-Id: I84566f67751538d72a4d17da21e7ea907e1dcdd2
Reviewed-on: http://gerrit.cloudera.org:8080/21181
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-22 01:11:26 +00:00
wzhou-code
eb5c8d6884 IMPALA-12802: Support ALTER TABLE for JDBC tables
IMPALA-12793 changes the syntax for creating JDBC table. The
configurations of connection credentials - url, username, password,
jdbc driver, etc, are set as table properties.

This patch allows user to change these table properties, or edit
columns via ALTER TABLE statement.

Testing:
 - Added frontend analysis unit-tests.
 - Added end-to-end unit-test.
 - Passed Core tests

Change-Id: I5ebb5de2c686d2015db78641f78299dd5f33621e
Reviewed-on: http://gerrit.cloudera.org:8080/21088
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-20 09:01:30 +00:00
gaurav1086
96964be7a3 IMPALA-12815: Support timestamp for scan predicates
for external data source table.

Binary SCAN predicates involving timestamp literals are pushed down
to remote Database. The current logic assumes ISO 8601 (SQL standard)
format for timestamp literals - 'yyyy-mm-dd hh:mm:ss.ms'

Testing:
- Added custom cluster tests for timestamp predicates with operators:
  '=', '>', '<', '>=', '<=', '!=', 'BETWEEN' for postgres, mysql
  and remote impala.
- Added coverage for timestamp with/without time in the timestamp
- Added coverage for timestamp with/without milliseconds in timestamp.
- Added Planner tests to check predicate pushdown for date/timestamp
  literals, date/timestamp functions and CASTs

Change-Id: If6ffe672b4027e2cee094cec4f99b9df9308e441
Reviewed-on: http://gerrit.cloudera.org:8080/21015
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-03-02 17:15:43 +00:00
wzhou-code
edd1e21493 IMPALA-12793: Create JDBC table without data source
This patch changes syntax of creating JDBC table statement as
  CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
  (col_name data_type
    [constraint_specification]
    [COMMENT 'col_comment']
    [, ...]
  )
  [COMMENT 'table_comment']
  STORED BY JDBC
  TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)

Both "STORED BY JDBC" and "STORED AS JDBC" are acceptable. A table
property '__IMPALA_DATA_SOURCE_NAME' is added to the JDBC table with
value 'impalajdbcdatasource', which is shown in the output of command
'show create table'.
Following required JDBC parameters must be specified as table
properties: database.type, jdbc.url, jdbc.driver, driver.url, and table.
Otherwise, AnalysisException will be thrown.

Testing:
 - Added frontend unit tests for new syntax of creating JDBC table.
 - Updated end-to-end unit tests to create JDBC tables without data
   source.
 - Passed core tests

Change-Id: I765aa86b430246786ad85ab6857cefaf4332c920
Reviewed-on: http://gerrit.cloudera.org:8080/21016
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-27 02:39:59 +00:00
wzhou-code
fc74ca672a IMPALA-12378: Auto Ship JDBC Data Source
This patch moves the source files of jdbc package to fe.
Data source location is optional. Data source could be created without
specifying HDFS location. Assume data source class is in the classpath
and instance of data source class could be created with current class
loader. Impala still try to load the jar file of the data source in
runtime if it's set in data source location.

Testing:
 - Passed core test
 - Passed dockerised-tests

Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Reviewed-on: http://gerrit.cloudera.org:8080/20971
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-02-07 16:29:11 +00:00
gaurav1086
f7a43b18aa IMPALA-12503: Support date data type for predicates
for external data source table

This patch adds support for datatype date as predicates
for external data sources.

Testing:
- Added tests for date predicates with operators:
  '=', '>', '<', '>=', '<=', '!=', 'BETWEEN'.

Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Reviewed-on: http://gerrit.cloudera.org:8080/20915
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-05 21:28:00 +00:00
wzhou-code
f8e8cd0906 IMPALA-12642: Support query options for Impala external JDBC table
This patch uses JDBC connection string to apply query options to the
Impala server by setting the properties in "jdbc.properties" when
creating JDBC external DataSource table.
jdbc.properties are specified as comma-delimited key=value string, like
"MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"".

Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have
double quotes in the beginning and ending of string.

jdbc.properties can be used for other databases like Postgres and MySQL
to set additional properties. The test cases will be added in separate
patch.

Testing:
 - Added end-to-end tests for setting query options on Impala JDBC
   tables.
 - Passed core tests.

Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Reviewed-on: http://gerrit.cloudera.org:8080/20837
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-17 23:16:42 +00:00
wzhou-code
c77a457520 IMPALA-7131: Support external data sources in LocalCatalog mode
This patch makes external data source working in LocalCatalog mode:
 - Add APIs in CatalogdMetaProvider to fetch DataSource from Catalog
   server through RPC.
 - Add getDataSources() and getDataSource() in LocalCatalog.
 - Add LocalDataSourceTable class for loading DataSource table in
   LocalCatalog.
 - Handle request for loading DataSource in CatalogServiceCatalog on
   Catalog server.
 - Enable tests which are skipped by
   SkipIfCatalogV2.data_sources_unsupported().
   Remove SkipIfCatalogV2.data_sources_unsupported().
 - Add end-to-end tests for LocalCatalog mode.

Testing:
 - Passed core tests

Change-Id: I40841c9be9064ac67771c4d3f5acbb3b552a2e55
Reviewed-on: http://gerrit.cloudera.org:8080/20574
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-10-30 16:04:47 +00:00
Fucun Chu
c2bd30a1b3 IMPALA-5741: Initial support for reading tiny RDBMS tables
This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external JDBC tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 02:13:59 +00:00