Decimal type is a primitive data type for Impala. Current code returns
wrong values for columns with decimal data type in external JDBC tables.
This patch fixes wrong values returned from JDBC data source, and
supports pushing down decimal type of predicates to remote database
and remote Impala.
The decimal precision and scale of the columns in external JDBC table
must be no less than the decimal precision and scale of the
corresponding columns in the table of remote database. Otherwise,
Impala fails with an error since it may cause truncation of decimal
data.
Testing:
- Added Planner test for pushing down decimal type of predicates.
- Added end-to-end unit-tests for tables with decimal type of columns
for Postgres, MySQL, and Impala-to-Impala.
- Passed core-tests.
Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b
Reviewed-on: http://gerrit.cloudera.org:8080/21218
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In some of the deployment environments, default table type is
transactional. In these scenarios, JDBC tables which are created as non
external table are not accepted by HMS due to strict managed table check
failures.
This patch forces JDBC tables to be created as external table, and
requires at least 1 column for JDBC tables.
Testing:
- Updated frontend unit tests and end-to-end unit tests to create JDBC
tables as external tables.
- Passed core tests
Change-Id: Ib5533b52434cdf1c430e30ac28a0146ab4d9d4b9
Reviewed-on: http://gerrit.cloudera.org:8080/21159
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The patch of IMPALA-12802 added some negative test cases for altering
external JDBC table. These test cases verify the error messages.
One of test cases failed on some test environments due to different
error message returned from Postgres server.
This patch fixes the unit-test failure by checking if the error message
is matching with one of two possible error messages.
Testing:
- Ran the unit-test on Jenkins with centos and ubuntu and verified the
unit-test passed for different error messages.
Change-Id: I84566f67751538d72a4d17da21e7ea907e1dcdd2
Reviewed-on: http://gerrit.cloudera.org:8080/21181
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-12793 changes the syntax for creating JDBC table. The
configurations of connection credentials - url, username, password,
jdbc driver, etc, are set as table properties.
This patch allows user to change these table properties, or edit
columns via ALTER TABLE statement.
Testing:
- Added frontend analysis unit-tests.
- Added end-to-end unit-test.
- Passed Core tests
Change-Id: I5ebb5de2c686d2015db78641f78299dd5f33621e
Reviewed-on: http://gerrit.cloudera.org:8080/21088
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
for external data source table.
Binary SCAN predicates involving timestamp literals are pushed down
to remote Database. The current logic assumes ISO 8601 (SQL standard)
format for timestamp literals - 'yyyy-mm-dd hh:mm:ss.ms'
Testing:
- Added custom cluster tests for timestamp predicates with operators:
'=', '>', '<', '>=', '<=', '!=', 'BETWEEN' for postgres, mysql
and remote impala.
- Added coverage for timestamp with/without time in the timestamp
- Added coverage for timestamp with/without milliseconds in timestamp.
- Added Planner tests to check predicate pushdown for date/timestamp
literals, date/timestamp functions and CASTs
Change-Id: If6ffe672b4027e2cee094cec4f99b9df9308e441
Reviewed-on: http://gerrit.cloudera.org:8080/21015
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
This patch changes syntax of creating JDBC table statement as
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type
[constraint_specification]
[COMMENT 'col_comment']
[, ...]
)
[COMMENT 'table_comment']
STORED BY JDBC
TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)
Both "STORED BY JDBC" and "STORED AS JDBC" are acceptable. A table
property '__IMPALA_DATA_SOURCE_NAME' is added to the JDBC table with
value 'impalajdbcdatasource', which is shown in the output of command
'show create table'.
Following required JDBC parameters must be specified as table
properties: database.type, jdbc.url, jdbc.driver, driver.url, and table.
Otherwise, AnalysisException will be thrown.
Testing:
- Added frontend unit tests for new syntax of creating JDBC table.
- Updated end-to-end unit tests to create JDBC tables without data
source.
- Passed core tests
Change-Id: I765aa86b430246786ad85ab6857cefaf4332c920
Reviewed-on: http://gerrit.cloudera.org:8080/21016
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch moves the source files of jdbc package to fe.
Data source location is optional. Data source could be created without
specifying HDFS location. Assume data source class is in the classpath
and instance of data source class could be created with current class
loader. Impala still try to load the jar file of the data source in
runtime if it's set in data source location.
Testing:
- Passed core test
- Passed dockerised-tests
Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Reviewed-on: http://gerrit.cloudera.org:8080/20971
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
for external data source table
This patch adds support for datatype date as predicates
for external data sources.
Testing:
- Added tests for date predicates with operators:
'=', '>', '<', '>=', '<=', '!=', 'BETWEEN'.
Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Reviewed-on: http://gerrit.cloudera.org:8080/20915
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch uses JDBC connection string to apply query options to the
Impala server by setting the properties in "jdbc.properties" when
creating JDBC external DataSource table.
jdbc.properties are specified as comma-delimited key=value string, like
"MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"".
Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have
double quotes in the beginning and ending of string.
jdbc.properties can be used for other databases like Postgres and MySQL
to set additional properties. The test cases will be added in separate
patch.
Testing:
- Added end-to-end tests for setting query options on Impala JDBC
tables.
- Passed core tests.
Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Reviewed-on: http://gerrit.cloudera.org:8080/20837
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch makes external data source working in LocalCatalog mode:
- Add APIs in CatalogdMetaProvider to fetch DataSource from Catalog
server through RPC.
- Add getDataSources() and getDataSource() in LocalCatalog.
- Add LocalDataSourceTable class for loading DataSource table in
LocalCatalog.
- Handle request for loading DataSource in CatalogServiceCatalog on
Catalog server.
- Enable tests which are skipped by
SkipIfCatalogV2.data_sources_unsupported().
Remove SkipIfCatalogV2.data_sources_unsupported().
- Add end-to-end tests for LocalCatalog mode.
Testing:
- Passed core tests
Change-Id: I40841c9be9064ac67771c4d3f5acbb3b552a2e55
Reviewed-on: http://gerrit.cloudera.org:8080/20574
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
- It is not distributed, e.g, fragment is unpartitioned. The queries
are executed on coordinator.
- Queries which read following data types from external JDBC tables
are not supported:
BINARY, CHAR, DATETIME, and COMPLEX.
- Only support binary predicates with operators =, !=, <=, >=,
<, > to be pushed to RDBMS.
- Following data types are not supported for predicates:
DECIMAL, TIMESTAMP, DATE, and BINARY.
- External tables with complex types of columns are not supported.
- Support is limited to the following databases:
MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
- Catalog V2 is not supported (IMPALA-7131).
- DataSource objects are not persistent (IMPALA-12375).
Additional fixes are planned on top of this patch.
Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.
In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.
2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh
3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh
4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql
5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.
Testing:
- Added unit-test for Postgres and ran unit-test with JDBC driver
postgresql-42.5.1.jar.
- Ran manual unit-test for MySql with JDBC driver
mysql-connector-j-8.1.0.jar.
- Ran core tests successfully.
Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>