Commit Graph

77 Commits

Author SHA1 Message Date
wzhou-code
fc74ca672a IMPALA-12378: Auto Ship JDBC Data Source
This patch moves the source files of jdbc package to fe.
Data source location is optional. Data source could be created without
specifying HDFS location. Assume data source class is in the classpath
and instance of data source class could be created with current class
loader. Impala still try to load the jar file of the data source in
runtime if it's set in data source location.

Testing:
 - Passed core test
 - Passed dockerised-tests

Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Reviewed-on: http://gerrit.cloudera.org:8080/20971
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-02-07 16:29:11 +00:00
gaurav1086
f7a43b18aa IMPALA-12503: Support date data type for predicates
for external data source table

This patch adds support for datatype date as predicates
for external data sources.

Testing:
- Added tests for date predicates with operators:
  '=', '>', '<', '>=', '<=', '!=', 'BETWEEN'.

Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Reviewed-on: http://gerrit.cloudera.org:8080/20915
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-05 21:28:00 +00:00
Csaba Ringhofer
c14156eb3a IMPALA-12746: Bump jackson.databind to 2.15.3
Also sets dependencyManagement to force using the same version
for jackson-databind, jackson-core and jackon-annotations. This is
needed because datagenerator depends on kitesdk, which would pull in a
very old jackson-core version (2.3.1) and lead to build failures
with the newer jackson.databind.

Change-Id: I8440426da1395045cf149aca0044286015861e5f
Reviewed-on: http://gerrit.cloudera.org:8080/20914
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-24 15:13:36 +00:00
wzhou-code
f8e8cd0906 IMPALA-12642: Support query options for Impala external JDBC table
This patch uses JDBC connection string to apply query options to the
Impala server by setting the properties in "jdbc.properties" when
creating JDBC external DataSource table.
jdbc.properties are specified as comma-delimited key=value string, like
"MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"".

Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have
double quotes in the beginning and ending of string.

jdbc.properties can be used for other databases like Postgres and MySQL
to set additional properties. The test cases will be added in separate
patch.

Testing:
 - Added end-to-end tests for setting query options on Impala JDBC
   tables.
 - Passed core tests.

Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Reviewed-on: http://gerrit.cloudera.org:8080/20837
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-17 23:16:42 +00:00
Gaurav Singh
9a132bc436 IMPALA-12380: Securing dbcp.password for JDBC
external data source

In the current implementation of external JDBC data source,
the user has to provide both the username and password in
plain text which is not a good practice.

This patch extends the functionality of existing implementation
to either provide:
a) username and password
b) username or key and keystore

If the user provides the password, then that password is used.
However, if no password is provided and the user provides only the
key/keystore, then it fetches the password from the secure jceks
keystore.

Testing:
- Added unit test TestExtDataSourcesWithKeyStore

Change-Id: Iec83a9b6e00456f0a1bbee747bd752b2cf9bf238
Reviewed-on: http://gerrit.cloudera.org:8080/20809
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-02 23:43:42 +00:00
wzhou-code
ec22a1e1ca IMPALA-12502: Support Impala to Impala federation
This patch adds support to read Impala tables in the Impala cluster
through JDBC external data source. It also adds a new counter
NumExternalDataSourceGetNext in profile for the total number of calls
to ExternalDataSource::GetNext().
Setting query options for Impala will be supported in a following patch.

Testing:
 - Added an end-to-end unit test to read Impala tables from Impala
   cluster through JDBC external data source.
   Manually ran the unit-test with Impala tables in Impala cluster on a
   remote host by setting $INTERNAL_LISTEN_HOST in jdbc.url as the ip
   address of the remote host on which an Impala cluster is running.
 - Added LDAP test for reading table through JDBC external data source
   with LDAP authentication.
   Manually ran the unit-test with Impala tables in a remote Impala
   cluster.
 - Passed core tests.

Change-Id: I79ad3273932b658cb85c9c17cc834fa1b5fbd64f
Reviewed-on: http://gerrit.cloudera.org:8080/20731
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-12-22 21:44:49 +00:00
gaurav1086
4c762725c7 IMPALA-12471: Add unit tests of external jdbc
tables for MySQL

This patch adds MySql tests for the "external data source"
mechanism in Impala to implement data source for querying JDBC.

This patch also fixes the handling of case-sensitive table and
column names for MySQL query.

Testing:
- Added unit test for mysql and ran unit-test with JDBC
driver mysql-connector-j-8.1.0.jar. This test requires
to add the docker to sudoer's group. Also, the test is
only run in 'exhaustive' mode.

Change-Id: I446ec3d4ebaf53c8edac0b2d181514bde587dfae
Reviewed-on: http://gerrit.cloudera.org:8080/20710
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-12-02 06:03:05 +00:00
Daniel Becker
6d3317b9a1 IMPALA-12570: Add longer strings to tables containing collections
IMPALA-12373 introduces small string optimisation, after which not all
strings will have a var-len part.

IMPALA-12159 adds support for ORDER BY for collections of variable
length types in the select list, but the test tables it uses only/mostly
contain short strings.

This patch has two modifications:

1. It introduces longer strings in 'collection_tbl' and
'collection_struct_mix'. It also adds two more rows to the existing one
in 'collection_tbl' so that it can be used in sorting tests. These
tables are only used by complex types tests, so the impact is limited.

2. It modifies RandomNestedDataGenerator.java, so that now it takes a
parameter for string length. Some variable names are changed to clearer
names. The references to and uses of RandomNestedDataGenerator are
updated.

Change-Id: Ief770d6bc9258fce159a733d5afa34fe594b96f8
Reviewed-on: http://gerrit.cloudera.org:8080/20718
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-22 18:50:22 +00:00
gaurav1086
8fe471d469 IMPALA-12470 (PART-3): delete temporary jar file
in GenericJdbcDatabaseAccessor close() function

The earlier change had a bug where we are deleting
the temporary jdbc jar file too early from the
/tmp directory before it can be loaded. The
GenericJdbcDatabaseAccessor class loader works by
OnDemand loading. Hence move the delete file logic
to the GenericJdbcDatabaseAccessor close()
function instead.

Testing:
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

Verify that the mysql-jdbc.jar is present in the hdfs path:
hadoop fs -ls /test-warehouse/data-sources/jdbc-drivers

3. Create an `alltypes` table in the mysql database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create mysql data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. Make sure that the mysql jar file is not present in the classpath
grep 'mysql' /home/gsingh/Impala/fe/target/build-classpath.txt \
/home/gsingh/Impala/fe/target/test-classpath.txt \
/home/gsingh/Impala/java/executor-deps/target/build-executor-\
deps-classpath.txt | wc -l

returns 0

6. Run the impala-shell query:
use functional;
select count(*) from alltypes_jdbc_mysql_datasource;

executes successfully and returns the row count.

Change-Id: I1becc01a9d93a99be8f47dfe99258dea3a8abeb3
Reviewed-on: http://gerrit.cloudera.org:8080/20706
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-15 02:38:02 +00:00
wzhou-code
d318f1c992 IMPALA-12377: Improve count(*) performance for jdbc external table
Backend function DataSourceScanNode::GetNext() handles count query
inefficiently. Even when there are no column data returned from
external data source, it still tries to materialize rows and add
rows to RowBatch one by one up to the number of row count. It also
call GetNextInputBatch() multiple times (count / batch_size), while
GetNextInputBatch() invokes JNI function in external data source.

This patch improves the DataSourceScanNode::GetNext() and
JdbcDataSource.getNext() to avoid unnecessary function calls.

Testing:
 - Ran query_test/test_ext_data_sources.py which consists count
   queries for jdbc external table.
 - Passed core-tests.

Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6
Reviewed-on: http://gerrit.cloudera.org:8080/20653
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-14 04:48:23 +00:00
gaurav1086
4ed6d765ed IMPALA-12470 (PART-2): delete temporary file in
/tmp after class loaded

This patch fixes the bug added in the previous patch for IMPALA-12470.
It adds the prefix "file://" to the unix standard path string to
create the corresponding valid hadoop.fs.Path object. For example:
"/tmp" is converted to "file:///tmp".

Testing:
1. Deleted all the jar files in the /tmp directory.
2. Ran the local jdbc ext data sources tests:
  - impala-py.test tests/query_test/test_ext_data_sources.py
  - impala-py.test tests/custom_cluster/test_ext_data_sources.py
3. Upon completion of the tests successfully, Verified that there were
   no .jar files in the /tmp directory.

Change-Id: Iab7cc66383bc62f209987dd3fb42fc3fc6604726
Reviewed-on: http://gerrit.cloudera.org:8080/20654
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-06 16:15:51 +00:00
gaurav1086
39adf42a30 IMPALA-12470: Support different schemes for jdbc driver url when
creating external jdbc table

This patch builds on top of IMPALA-5741 to copy the jdbc jar from
remote filesystems: Ozone and S3. Currenty we only support hdfs.

Testing:
Commented out "@skipif.not_hdfs" qualifier in files:
  - tests/query_test/test_ext_data_sources.py
  - tests/custom_cluster/test_ext_data_sources.py
1) tested locally by running tests:
  - impala-py.test tests/query_test/test_ext_data_sources.py
  - impala-py.test tests/custom_cluster/test_ext_data_sources.py
2) tested using jenkins job for ozone and S3

Change-Id: I804fa3d239a4bedcd31569f2b46edb7316d7f004
Reviewed-on: http://gerrit.cloudera.org:8080/20639
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-01 23:32:10 +00:00
Michael Smith
098ad53f65 IMPALA-12480: Use Hadoop version for hadoop-aliyun
Uses the imported Hadoop version for the hadoop-aliyun module, which is
a tool in the hadoop project. This allows us to exclude vulnerable
versions of jdom that were previously included via hadoop-aliyun.

Change-Id: I270f3895ec668d9fb907f35b04cad2f149e3d0de
Reviewed-on: http://gerrit.cloudera.org:8080/20532
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 20:38:36 +00:00
Fucun Chu
c2bd30a1b3 IMPALA-5741: Initial support for reading tiny RDBMS tables
This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external JDBC tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 02:13:59 +00:00
Michael Smith
1cf5bc6e79 Update version to 4.4.0-SNAPSHOT
Change-Id: I21c3b823c1b0db198d442d155c01d4cfd3a5c522
Reviewed-on: http://gerrit.cloudera.org:8080/20534
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-07 01:43:15 +00:00
Sai Hemanth Gantasala
a0cdb7b594 IMPALA-12231: Bump GBN to get HMS thrift API changes
We need a couple of hive changes HIVE-27319 and HIVE-27337 for catalogD
to work with latest HMS server to fix IMPALA-11768 and IMPALA-11939
respectively.

Bump CDP_BUILD_NUMBER (GBN) to 44206393
Bump various CDP versiona numbers to be based on 7.2.18.0-273

TESTING: Exhaustive tests ran clean
Added a couple of tests for IMPALA-11939 and IMPALA-11768

Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3
Reviewed-on: http://gerrit.cloudera.org:8080/20420
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-09-12 09:36:57 +00:00
Steve Carlin
bc83d46a9a IMPALA-12424: Allow third party JniFrontend interface.
This patch allows a third party to inject their own frontend
class instead of using the default JniFrontend included in the
project.

The test case includes an interface that runs queries as normal
except for the "select 1" query which gets changed to "select 42".

Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5
Reviewed-on: http://gerrit.cloudera.org:8080/20459
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-09-08 20:20:56 +00:00
Zoltan Borok-Nagy
a95859be0b IMPALA-12359: Add missing package-info file used by HiveVersionInfo
We create a minimal-impala-hive-exec.jar based on Hive's hive-exec.jar:
https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L34

This excludes lots of class files, including
org/apache/hive/common/package-info.class that is used by
HiveVersionAnnotation and HiveVersionInfo classes.

Because of this HiveVersionInfo returns "Unknown" version resulting
in failing Iceberg operations.

Change-Id: I444330a654d7d86e653588eb91d2f063d5be8c08
Reviewed-on: http://gerrit.cloudera.org:8080/20340
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-08-11 11:28:43 +00:00
Michael Smith
7fb6a9a1d2 IMPALA-11941: (Addendum) Use released jamm 0.4.0
Switches to the 0.4.0 release of jamm, as building a shaded JAR from
source was a temporary measure.

Change-Id: I5b88b479580f7d0baff502ad9551d2764971babf
Reviewed-on: http://gerrit.cloudera.org:8080/20237
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-25 00:27:56 +00:00
Laszlo Gaal
ee069687fc IMPALA-12212: Bump Maven to 3.9.2, pull dependencies in parallel
Maven 3.9.x offers a new dependency resolver, HttpClient, which allows
downloading project dependencies in parallel.

This patch bumps the Maven version installed by bootstrap_system.sh to
v3.9.2, and adds the flags enabling the new resolver to download
dependencies (including POM files) in parallel. Parallelism is set to
10 threads.

The flags are added to a project-specific Maven setting file in the
newly created java/.mvn directory. The settings file is added to the
RAT exclusion list in bin/rat_exclude_files.txt.

The --show-version flag is added for debugging purposes.

The same flags are added to the JAMM subproject as well.

The new resolver in Maven 3.9 has also changed the warning message
emitted for missing component checksums, so the new warning string
is added to the filter in bin/mvn-quiet.sh
Unfortunately Maven 3.9 has also changed the way it responds to missing
checksum files: the resolver now emits a stack trace when checksums
cannot be determined, and missing checksums are not explicitly ignored.

Detailed documentation for the new Maven resolver in Maven 3.9.0+ is
located at:
https://maven.apache.org/guides/mini/guide-resolver-transport.html
resolver configuration reference:
https://maven.apache.org/resolver/configuration.html

Tests:
- verified in a core-mode test run with Maven 3.9.2 installed
- verified in a local build using an earlier version of Maven
  to verify that the new default setting does not cause regressions
  with the old dependency resolver.

Change-Id: I75d05215effc724f5bd471646fb352f37443e185
Reviewed-on: http://gerrit.cloudera.org:8080/20142
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-07-24 18:50:34 +00:00
Joe McDonnell
a281d8eb8e IMPALA-12284: Use Maven's batch mode when building jamm
This adds the --batch-mode flag to the maven invocation
the builds jamm. That disables some of the download progress
output, reducing the total size of the output.

Testing:
 - Ran a build locally

Change-Id: I1634240b191168b13cf3be7c9266e21a746844b1
Reviewed-on: http://gerrit.cloudera.org:8080/20196
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-14 23:45:28 +00:00
Michael Smith
87fd844d3e IMPALA-11941: (Addendum) Produce shaded copy of Jamm
Produces a shaded copy of a pre-release jamm jar that supports Java 17.
Building a copy of jamm and directly depending on it meant any consumer
of Impala would have to provide their own build of it.

Testing: ran custom_cluster/test_local_catalog.py with Java 8 and 17

Change-Id: Ida42d720a2639b65391c07a9237556311e04fac6
Reviewed-on: http://gerrit.cloudera.org:8080/20147
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-07-01 01:10:12 +00:00
Daniel Becker
679d58fa6d IMPALA-12238: RandomNestedDataGenerator should take a seed argument
RandomNestedDataGenerator can be used to produce parquet files with
random data from Avro schemas. This change makes it possible to provide
a seed value for the random generator so the generated files are
reproducible. The seed can be given as the last (optional) command line
argument. It is parsed as a Java 'long'.

Testing:
 - manually verified that when run with the same arguments (including
   the seed), the data generator produces the same results

Change-Id: Iee33604bbfe12895100afbd0f98ac302dee9a238
Reviewed-on: http://gerrit.cloudera.org:8080/20136
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Daniel Becker <daniel.becker@cloudera.com>
2023-06-28 16:18:08 +00:00
Michael Smith
3b0705ba63 IMPALA-11941: Support Java 17 in Impala
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.

Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.

Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.

Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.

Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).

Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
  test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath

Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-24 10:11:54 +00:00
Peter Rozsa
6b571eb7e4 IMPALA-12184: Java UDF increment on an empty string is inconsistent
This change removes the Text-typed overload for BufferAlteringUDF to
avoid ambiguous function matchings. It also changes the 2-parameter
function in BufferAlteringUDF to cover Text typed arguments.

Tests:
 - test_udfs.py manually executed

Change-Id: I3a17240ce39fef41b0453f162ab5752f1c940f41
Reviewed-on: http://gerrit.cloudera.org:8080/20038
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-20 17:00:35 +00:00
Michael Smith
683bef1ca4 IMPALA-11253: Support testing with Java 11 (take 2)
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.

Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 16:04:29 +00:00
Michael Smith
1b6011c6a0 Revert "IMPALA-11253: Support testing with Java 11"
This reverts commit ee6395db76 as it is
not flexible enough at detecting Java automatically in likely build
environments.

Change-Id: I836c9f7fd10740b15f7e40b2e7f889ac7ee61fc3
Reviewed-on: http://gerrit.cloudera.org:8080/19908
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-05-21 14:00:14 +00:00
Michael Smith
ee6395db76 IMPALA-11253: Support testing with Java 11
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-05-19 22:32:00 +00:00
Michael Smith
d91cdb0cec IMPALA-12077: Remove deprecated Avro methods
Switches Avro methods deprecated in Avro 1.8 to new alternatives.

Change-Id: I8c01886774eb4ca5964a82c2fa568d7c4354c70c
Reviewed-on: http://gerrit.cloudera.org:8080/19772
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-21 22:53:46 +00:00
Michael Smith
f0289f3cbb IMPALA-11273: Remove APIs deprecated in Java 11
Replaces constructor calls for object versions of primitives - Integer,
Long, Float, Double, Boolean - with optimized valueOf calls as using
constructors for these is deprecated according to jdeprscan.

Removes override of finalize. Use of finalize is deprecated, and
hive-udf-call.cc ensures we always call close when unloading the UDF.
Adds try-with-resources to UdfExecutorTest to handle test cleanup.

Updates BigDecimal.setScale to use RoundingMode.

Change-Id: Idfb053223b6e098e6032502f873361696dd2da84
Reviewed-on: http://gerrit.cloudera.org:8080/19721
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-15 09:45:29 +00:00
Peter Rozsa
afe59f7f0d IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed as these operations are now
handled in the byte array. ImpalaStringWritable class is also removed,
writables that used it before now store the data directly.

Tests:
 - Test UDFs added as BufferAlteringUdf and GenericBufferAlteringUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Reviewed-on: http://gerrit.cloudera.org:8080/19507
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-08 19:54:38 +00:00
Csaba Ringhofer
67bb870aa3 IMPALA-11911: Fix NULL argument handling in Hive GenericUDFs
Before this patch if an argument of a GenericUDF was NULL, then Impala
passed it as null instead of a DeferredObject. This was incorrect, as
a DeferredObject is expected with a get() function that returns null.
See the Jira for more details and GenericUDF examples in Hive.

TestGenericUdf's NULL handling was further broken in IMPALA-11549,
leading to throwing null pointer exceptions when the UDF's result is
NULL. This test bug was not detected, because Hive udf tests were
running with default abort_java_udf_on_exception=false, which means
that exceptions from Hive UDFs only led to warnings and returning NULL,
which was the expected result in all affected test queries.

This patch fixes the behavior in HiveUdfExecutorGeneric and improves
FE/EE tests to catch null handling related issues. Most Hive UDF tests
are run with abort_java_udf_on_exception=true after this patch to treat
exceptions in UDFs as errors. The ones where the test checks that NULL
is returned if an exception is thrown while abort_java_udf_on_exception
is false are moved to new .test files.
TestGenericUdf is also fixed (and simplified) to handle NULL return
values correctly.

Change-Id: I53238612f4037572abb6d2cc913dd74ee830a9c9
Reviewed-on: http://gerrit.cloudera.org:8080/19499
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-06 13:45:56 +00:00
yx91490
f4d306cbca IMPALA-11629: Support for huawei OBS FileSystem
This patch adds support for huawei OBS (Object Storage Service)
FileSystem. The implementation is similar to other remote FileSystems.

New flags for OBS:
- num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16.

Testing:
 - Upload hdfs test data to an OBS bucket. Modify all locations in HMS
   DB to point to the OBS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1
Reviewed-on: http://gerrit.cloudera.org:8080/19110
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-09 08:10:19 +00:00
Peter Rozsa
1d05381b7b IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
(lack of varargs support and only partial generics support)
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.

Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177

A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

The ESRI geospatial implementation for Hive currently only
available in Hive 4, but CDP Hive backported it to Hive 3,
therefore for Apache Hive this feature is disabled
regardless of the "geospatial_library" flag.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works
   with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineString supports a maximum of 7
   pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop

Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com>

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Reviewed-on: http://gerrit.cloudera.org:8080/19425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-07 20:18:47 +00:00
Zoltan Borok-Nagy
b88cfadbbd IMPALA-11777: Bump CDP_BUILD_NUMBER to get HIVE-24498
Without HIVE-24498 we get java.lang.NoClassDefFoundError exceptions
when we write Iceberg tables via Hive. This makes it hard to write
interop tests between Hive and Impala which use Iceberg tables.

I also exclude some private Java components to get things built.

Change-Id: I486c2b1b224f72e082e331a57cf25a37ebb9fa54
Reviewed-on: http://gerrit.cloudera.org:8080/19331
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2022-12-13 13:30:25 +00:00
Daniel Becker
a71e69f570 IMPALA-11792: Update Impala version to 4.3.0-SNAPSHOT
As 4.2.0 has been released this commit updates the master to 4.3.0.
This step needs to happen on each release.

Testing:
 - Ran a build

Change-Id: Iebedcfbc1fd8018391a6c78a9aca4a9d754780fa
Reviewed-on: http://gerrit.cloudera.org:8080/19344
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-13 05:44:10 +00:00
Csaba Ringhofer
86740a7d35 IMPALA-11549: Support Hive GenericUdfs that return primitive java types
Before this patch only the Writable* types were accepted in GenericUdfs
as return types, while some GenericUdfs in the wild return primitive java
types (e.g. Integer instead of IntWritable). For legacy Hive UDFs these
return types were already handled, so the only change needed was to
map the ObjectInspector subclasses (e.g. JavaIntObjectInspector) to the
correct JavaUdfDataType in Impala.

Testing:
- Added a subclass for TestGenericUdf (TestGenericUdfWithJavaReturnTypes)
  that returns primitive java types (probably inheriting in the opposite
  direction would be more logical, but the diff is smaller this way).
- Changed EE tests to also use TestGenericUdfWithJavaReturnTypes.
- Changed FE tests (UdfExecutorTest) to check both
  TestGenericUdfWithJavaReturnTypes and TestGenericUdf.
- Also added a test with BINARY type to UdfExecutorTest as this was
  forgotten during the original BINARY patch.

Change-Id: I30679045d6693ebd35718b6f1a22aaa4963c1e63
Reviewed-on: http://gerrit.cloudera.org:8080/19304
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-08 17:51:00 +00:00
Michael Smith
c3ec9272c5 IMPALA-11724: Use CDP Ozone in test environment
Updates the test environment to default to the CDP build of Ozone, as
the latest build of CDP Hive depends on pre-release features unavailable
in Ozone 1.2.1. Apache Ozone 1.2 can still be used by setting
USE_APACHE_OZONE=true.

The latest CDP build also includes a version of Ozone based on
ozone#master with a candidate version of 1.3.0. Both Apache and CDP
therefore have builds of Ozone we can test with that use the new
artifact names introduced in Ozone 1.2, so this patch cleans up setup
that was only needed for Ozone versions prior to 1.2.

Change-Id: I1177a1b820fe21adca9f8c1cc51ff73ee001d3f2
Reviewed-on: http://gerrit.cloudera.org:8080/19247
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2022-11-16 22:13:06 +00:00
yacai
c953426692 IMPALA-11683: Support Aliyun OSS File System
This patch adds support for OSS (Aliyun Object Storage Service).
Using the hadoop-aliyun, the implementation is similar to other
remote FileSystems.

Tests:
- Prepare:
  Initialize OSS-related environment variables:
  OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT.
  Compile and create hdfs test data on a ECS instance. Upload test data
  to an OSS bucket.
- Modify all locations in HMS DB to point to the OSS bucket.
  Remove some hdfs caching params. Run CORE tests.

Change-Id: I267e6531da58e3ac97029fea4c5e075724587910
Reviewed-on: http://gerrit.cloudera.org:8080/19165
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-16 10:14:49 +00:00
Michael Smith
83c5e6e409 IMPALA-11670: Upgrade components, add envvars for override
Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address
CVEs. Adds environment variables for commonly-updated components so they
can be customized via the branch-specific impala-config-branch.sh in a
way that allows both to be updated regularly without merge conflicts.

Also updates httpcomponents.httpcore to 4.4.14 to be consistent with
other httpcomponents libraries included transitively.

Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216
Reviewed-on: http://gerrit.cloudera.org:8080/19147
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-19 15:54:00 +00:00
Michael Smith
22e5ca3d0a IMPALA-11667: Clean up Java dependency exclusions
Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.

Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.

Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-19 15:54:00 +00:00
Michael Smith
a1fddf1022 IMPALA-11628: Switch to reload4j, update slf4j
Switches from log4j 1.x to reload4j, a maintained fork. Updates slf4j to
the latest version so we can include all CVE fixes.

slf4j 2.0.x requires Java 8 and adds a backward-compatible fluent
logging api. Neither seems like a problem for Impala.

Bans all use of log4j 1.x so we only use reload4j.

Change-Id: I5238b9c8247af3e0f4cb05c0b76a75bfee37f5c8
Reviewed-on: http://gerrit.cloudera.org:8080/19102
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-11 21:23:11 +00:00
wzhou-code
010f1b943c IMPALA-11639: Upgrade Spring framework to 5.3.20
This patch upgrade the Spring framework to 5.3.20 to
address multiple CVEs:
 - CVE-2022-22971
 - CVE-2022-22968
 - CVE-2022-22970

Testing:
 - Ran core job
 - Ran custom cluster tests in exhaustive mode

Change-Id: I33f4f1d22fc27227e31d744658a17c16b61b9677
Reviewed-on: http://gerrit.cloudera.org:8080/19091
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-05 19:34:39 +00:00
Steve Carlin
4e813b7085 IMPALA-11528: Catalogd should start up with a corrupt Hive function.
This commit handles the case for a specific kind of corrupt function
within the Hive Metastore in the following situation:

A valid Hive SQL function gets created in HMS. This UDF is written in
Java and must derive from the "UDF" class. After creating this function
in Impala, we then replace the underlying jar file with a class that
does NOT derive from the "UDF" class.

In this scenario, catalogd should reject the function and still start
up gracefully. Before this commit, catalogd wasn't coming up. The
reason for this was because the Hive function
FunctionUtils.getUDFClassType() has a dependency on UDAF and was
throwing a LinkageError exception, so we need to include the UDAF
class in the shaded jar.

Change-Id: I54e7a1df6d018ba6cf5ecf32dc9946edf86e2112
Reviewed-on: http://gerrit.cloudera.org:8080/18927
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2022-09-13 14:48:31 +00:00
Joe McDonnell
7581cedd52 IMPALA-11394: Update jackson-databind to 2.12.6.1
This updates jackson-databind to address CVE-2020-36518.

Testing:
 - Ran a core job

Change-Id: I8db403a102097a22c48f5d9d42ced3b85930078f
Reviewed-on: http://gerrit.cloudera.org:8080/18891
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-23 15:50:35 +00:00
Csaba Ringhofer
7ca11dfc7f IMPALA-9482: Support for BINARY columns
This patch adds support for BINARY columns for all table formats with
the exception of Kudu.

In Hive the main difference between STRING and BINARY is that STRING is
assumed to be UTF8 encoded, while BINARY can be any byte array.
Some other differences in Hive:
- BINARY can be only cast from/to STRING
- Only a small subset of built-in STRING functions support BINARY.
- In several file formats (e.g. text) BINARY is base64 encoded.
- No NDV is calculated during COMPUTE STATISTICS.

As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly
identical, especially from the backend's perspective. For this reason,
BINARY is implemented a bit differently compared to other types:
while the frontend treats STRING and BINARY as two separate types, most
of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g.
in SlotDesc. Only the following parts of backend need to differentiate
between STRING and BINARY:
- table scanners
- table writers
- HS2/Beeswax service
These parts have access to column metadata, which allows to add special
handling for BINARY.

Only a very few builtins are allowed for BINARY at the moment:
- length
- min/max/count
- coalesce and similar "selector" functions
Other STRING functions can be only used by casting to STRING first.
Adding support for more of these functions is very easy, as simply
the BINARY type has to be "connected" to the already existing STRING
function's signature. Functions where the result depends on utf8_mode
need to ensure that with BINARY it always works as if utf8_mode=0 (for
example length() is mapped to bytes() as length count utf8 chars if
utf8_mode=1).

All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY,
though in case of legacy Hive UDFs it is only supported if the argument
and return types are set explicitely to ensure backward compatibility.
See IMPALA-11340 for details.

The original plan was to behave as close to Hive as possible, but I
realized that Hive has more relaxed casting rules than Impala, which
led to STRING<->BINARY casts being necessary in more cases in Impala.
This was needed to disallow passing a BINARY to functions that expect
a STRING argument. An example for the difference is that in
INSERT ... VALUES () string literals need to be explicitly cast to
BINARY, while this is not needed in Hive.

Testing:
- Added functional.binary_tbl for all file formats (except Kudu)
  to test scanning.
- Removed functional.unsupported_types and related tests, as now
  Impala supports all (non-complex) types that Hive does.
- Added FE/EE tests mainly based on the ones added to the DATE type

Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Reviewed-on: http://gerrit.cloudera.org:8080/16066
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-19 13:55:42 +00:00
Joe McDonnell
4845f36b4e IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies
Hadoop provides hadoop-cloud-storage, which includes most of
the dependencies that Impala currently uses like hadoop-aws,
hadoop-azure, Knox's gateway-cloud-bindings, etc. Hadoop has
put in a lot of work to make sure that this package includes
the right version of dependencies (including shading some
dependencies for GCS). It seems like this is a more reliable
way to consume these dependencies.

This switches the Java build to use hadoop-cloud-storage
and removes the dependencies that it replaces. This eliminates
the need to control the version of oauth and GCS, as those
are determined by hadoop-cloud-storage.

Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd
Reviewed-on: http://gerrit.cloudera.org:8080/18817
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-15 21:11:42 +00:00
Michael Smith
830625b104 IMPALA-9442: Add Ozone to minicluster
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.

Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.

Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.

Passes tests with FE_TEST=false BE_TEST=false.

Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-08-03 16:58:20 +00:00
Csaba Ringhofer
112f887ea5 IMPALA-11342: Fix class loading in Hive UDFs' constructors
Loading new classes from the same jar in the constructor of UDFs
did not work in the catalog because the URLClassLoader was closed
too early. Extended the lifecycle of the class loader a bit to
let the catalog finish all initialisation.

Note that the instantiation of legacy Hive UDFs doesn't seem
necessary in the catalog, we can get all relevant info from
the class. Generic UDFs do need to be instantiated to be able
to call initialize().

Testing:
- added new classes to load in test UDFs and loaded these
  in constructor / initialize()
- ran the Hive UDF ee tests

Change-Id: If16e38b8fc3b2577a5d32104ea9e6948b9562e24
Reviewed-on: http://gerrit.cloudera.org:8080/18611
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-20 19:58:15 +00:00
Michael Smith
00db9a27df IMPALA-11407: Upgrade google-oauth-client to 1.33.3
Upgrades google-oauth-client and google-oauth-client-java6 to 1.33.3 to
address CVE-2021-22573. These are included as dependencies of
com.google.cloud.bigdataoss/gcs-connector, which does not yet have a
release that includes versions 1.33.3 or later.

Change-Id: I8d95913f26e6073373374e169ee045881f40f065
Reviewed-on: http://gerrit.cloudera.org:8080/18683
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-01 04:10:13 +00:00