Cleans up repetitive patterns in pom.xml.
Centralize plugin configuration in pluginManagement. Replace inline
maven-compiler-plugin configuration with newer maven.compiler.release
and update to latest plugin version.
Centralize common dependencies in dependencyManagement, including
exclusions when appropriate. Remove exclusions that are no longer
relevant.
Compared before and after with dependency:tree; only difference is that
commons-cli now comes from hadoop and jersey-serv{let,er} are
effectively excluded; all versions matched. Also ensured
USE_APACHE_COMPONENTS=true compiles.
Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure
it's not accidentally included alongside impala-minimal-s3a-aws-sdk.
Removes missed io.netty exclusion from IMPALA-12816.
Updates commons-dbcp2 to 2.12.0 to match Hive.
Change-Id: If96649840e23036b4a73ee23e8d12516497994f0
Reviewed-on: http://gerrit.cloudera.org:8080/23432
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala is preparing to switch to JDK17 for Java compilation by default.
While the source version might remain in 1.8 for longer, we should
experiment with targeting binary version 17.
This patch adds IMPALA_JAVA_TARGET env var to control target binary
version. It is initialized in impala-config-java.sh, depending on value
of IMPALA_JDK_VERSION env var.
Testing:
Pass data load and FE tests with IMPALA_JDK_VERSION=17.
Change-Id: If194d87c542d416b878661403c32c6adc2930199
Reviewed-on: http://gerrit.cloudera.org:8080/23096
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The unit test `JdbcDataSourceTest.java` was originally
implemented using the H2 database, which is no longer
available in Impala's environment. The test code was
also outdated and erroneous.
This commit addresses and fixes the failure of
JdbcDataSourceTest.java and rewrites it in
Postgres, hence ensures compatibility with Impala's
current environment and aligns with JDBC and external
data source APIs. Please note, this test is moved to fe
folder to fix the BackendConfig instance not initialized
error.
To test this file, run the following command:
pushd fe && mvn -fae test -Dtest=JdbcDataSourceTest
Please note that the tests in JdbcDataSourceTest have a
dependency on previous tests and individual tests cannot be
ran separately for this class.
Change-Id: Ie07173d256d73c88f5a6c041f087db16b6ff3127
Reviewed-on: http://gerrit.cloudera.org:8080/21805
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch moves the source files of jdbc package to fe.
Data source location is optional. Data source could be created without
specifying HDFS location. Assume data source class is in the classpath
and instance of data source class could be created with current class
loader. Impala still try to load the jar file of the data source in
runtime if it's set in data source location.
Testing:
- Passed core test
- Passed dockerised-tests
Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Reviewed-on: http://gerrit.cloudera.org:8080/20971
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
for external data source table
This patch adds support for datatype date as predicates
for external data sources.
Testing:
- Added tests for date predicates with operators:
'=', '>', '<', '>=', '<=', '!=', 'BETWEEN'.
Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Reviewed-on: http://gerrit.cloudera.org:8080/20915
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch uses JDBC connection string to apply query options to the
Impala server by setting the properties in "jdbc.properties" when
creating JDBC external DataSource table.
jdbc.properties are specified as comma-delimited key=value string, like
"MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"".
Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have
double quotes in the beginning and ending of string.
jdbc.properties can be used for other databases like Postgres and MySQL
to set additional properties. The test cases will be added in separate
patch.
Testing:
- Added end-to-end tests for setting query options on Impala JDBC
tables.
- Passed core tests.
Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Reviewed-on: http://gerrit.cloudera.org:8080/20837
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
external data source
In the current implementation of external JDBC data source,
the user has to provide both the username and password in
plain text which is not a good practice.
This patch extends the functionality of existing implementation
to either provide:
a) username and password
b) username or key and keystore
If the user provides the password, then that password is used.
However, if no password is provided and the user provides only the
key/keystore, then it fetches the password from the secure jceks
keystore.
Testing:
- Added unit test TestExtDataSourcesWithKeyStore
Change-Id: Iec83a9b6e00456f0a1bbee747bd752b2cf9bf238
Reviewed-on: http://gerrit.cloudera.org:8080/20809
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support to read Impala tables in the Impala cluster
through JDBC external data source. It also adds a new counter
NumExternalDataSourceGetNext in profile for the total number of calls
to ExternalDataSource::GetNext().
Setting query options for Impala will be supported in a following patch.
Testing:
- Added an end-to-end unit test to read Impala tables from Impala
cluster through JDBC external data source.
Manually ran the unit-test with Impala tables in Impala cluster on a
remote host by setting $INTERNAL_LISTEN_HOST in jdbc.url as the ip
address of the remote host on which an Impala cluster is running.
- Added LDAP test for reading table through JDBC external data source
with LDAP authentication.
Manually ran the unit-test with Impala tables in a remote Impala
cluster.
- Passed core tests.
Change-Id: I79ad3273932b658cb85c9c17cc834fa1b5fbd64f
Reviewed-on: http://gerrit.cloudera.org:8080/20731
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
tables for MySQL
This patch adds MySql tests for the "external data source"
mechanism in Impala to implement data source for querying JDBC.
This patch also fixes the handling of case-sensitive table and
column names for MySQL query.
Testing:
- Added unit test for mysql and ran unit-test with JDBC
driver mysql-connector-j-8.1.0.jar. This test requires
to add the docker to sudoer's group. Also, the test is
only run in 'exhaustive' mode.
Change-Id: I446ec3d4ebaf53c8edac0b2d181514bde587dfae
Reviewed-on: http://gerrit.cloudera.org:8080/20710
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
in GenericJdbcDatabaseAccessor close() function
The earlier change had a bug where we are deleting
the temporary jdbc jar file too early from the
/tmp directory before it can be loaded. The
GenericJdbcDatabaseAccessor class loader works by
OnDemand loading. Hence move the delete file logic
to the GenericJdbcDatabaseAccessor close()
function instead.
Testing:
1. Make sure the Impala cluster has been started.
2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh
Verify that the mysql-jdbc.jar is present in the hdfs path:
hadoop fs -ls /test-warehouse/data-sources/jdbc-drivers
3. Create an `alltypes` table in the mysql database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh
4. Create mysql data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql
5. Make sure that the mysql jar file is not present in the classpath
grep 'mysql' /home/gsingh/Impala/fe/target/build-classpath.txt \
/home/gsingh/Impala/fe/target/test-classpath.txt \
/home/gsingh/Impala/java/executor-deps/target/build-executor-\
deps-classpath.txt | wc -l
returns 0
6. Run the impala-shell query:
use functional;
select count(*) from alltypes_jdbc_mysql_datasource;
executes successfully and returns the row count.
Change-Id: I1becc01a9d93a99be8f47dfe99258dea3a8abeb3
Reviewed-on: http://gerrit.cloudera.org:8080/20706
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Backend function DataSourceScanNode::GetNext() handles count query
inefficiently. Even when there are no column data returned from
external data source, it still tries to materialize rows and add
rows to RowBatch one by one up to the number of row count. It also
call GetNextInputBatch() multiple times (count / batch_size), while
GetNextInputBatch() invokes JNI function in external data source.
This patch improves the DataSourceScanNode::GetNext() and
JdbcDataSource.getNext() to avoid unnecessary function calls.
Testing:
- Ran query_test/test_ext_data_sources.py which consists count
queries for jdbc external table.
- Passed core-tests.
Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6
Reviewed-on: http://gerrit.cloudera.org:8080/20653
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
/tmp after class loaded
This patch fixes the bug added in the previous patch for IMPALA-12470.
It adds the prefix "file://" to the unix standard path string to
create the corresponding valid hadoop.fs.Path object. For example:
"/tmp" is converted to "file:///tmp".
Testing:
1. Deleted all the jar files in the /tmp directory.
2. Ran the local jdbc ext data sources tests:
- impala-py.test tests/query_test/test_ext_data_sources.py
- impala-py.test tests/custom_cluster/test_ext_data_sources.py
3. Upon completion of the tests successfully, Verified that there were
no .jar files in the /tmp directory.
Change-Id: Iab7cc66383bc62f209987dd3fb42fc3fc6604726
Reviewed-on: http://gerrit.cloudera.org:8080/20654
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
creating external jdbc table
This patch builds on top of IMPALA-5741 to copy the jdbc jar from
remote filesystems: Ozone and S3. Currenty we only support hdfs.
Testing:
Commented out "@skipif.not_hdfs" qualifier in files:
- tests/query_test/test_ext_data_sources.py
- tests/custom_cluster/test_ext_data_sources.py
1) tested locally by running tests:
- impala-py.test tests/query_test/test_ext_data_sources.py
- impala-py.test tests/custom_cluster/test_ext_data_sources.py
2) tested using jenkins job for ozone and S3
Change-Id: I804fa3d239a4bedcd31569f2b46edb7316d7f004
Reviewed-on: http://gerrit.cloudera.org:8080/20639
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
- It is not distributed, e.g, fragment is unpartitioned. The queries
are executed on coordinator.
- Queries which read following data types from external JDBC tables
are not supported:
BINARY, CHAR, DATETIME, and COMPLEX.
- Only support binary predicates with operators =, !=, <=, >=,
<, > to be pushed to RDBMS.
- Following data types are not supported for predicates:
DECIMAL, TIMESTAMP, DATE, and BINARY.
- External tables with complex types of columns are not supported.
- Support is limited to the following databases:
MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
- Catalog V2 is not supported (IMPALA-7131).
- DataSource objects are not persistent (IMPALA-12375).
Additional fixes are planned on top of this patch.
Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.
In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.
2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh
3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh
4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql
5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.
Testing:
- Added unit-test for Postgres and ran unit-test with JDBC driver
postgresql-42.5.1.jar.
- Ran manual unit-test for MySql with JDBC driver
mysql-connector-j-8.1.0.jar.
- Ran core tests successfully.
Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>