Commit Graph

187 Commits

Author SHA1 Message Date
Michael Smith
22b59d27d0 IMPALA-13243: Update Dropwizard Metrics to 4.2.x
Updates Dropwizard Metrics components to the latest 4.2.x release,
4.2.26. We directly use metrics-core, and metrics-jvm/metrics-json are
imported via Hive (via
https://github.com/joshelser/dropwizard-hadoop-metrics2).

Dropwizard Metrics manually tested with these versions on
https://github.com/joshelser/dropwizard-hadoop-metrics2/pull/8.

Change-Id: Ie9bec7a7c23194604430531bd83b25c5969e888e
Reviewed-on: http://gerrit.cloudera.org:8080/21599
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-07-23 05:22:59 +00:00
Steve Carlin
4c00cbff7e IMPALA-13136: Refactor AnalyzedFunctionCallExpr (for Calcite)
The analyze method is now called after the Expr is constructed.

This code is more in line with the existing way that Impala
constructs the Expr object.

Change-Id: Ideb662d9c7536659cb558bf62baec29c82217aa2
Reviewed-on: http://gerrit.cloudera.org:8080/21525
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-06-20 17:14:04 +00:00
Steve Carlin
a6db27850a IMPALA-12940: Added filtering capability for Calcite planner
The Filter RelNode is now handled in the Calcite planner.

The parsing and analysis is done by Calcite so there were no
changes added to that portion. The ImpalaFilterRel class was
created to handled the conversion of the Calcite LogicalFilter
to create a filter condition within the Impala plan nodes.

There is no explicit filter plan node in Impala. Instead, the
filter condition attaches itself to an existing plan node. The
filter condition gets passed into the children plan nodes through
the ParentPlanRelContext.

The ExprConjunctsConverter class is responsible for creating the
filter Expr list that is used. The list contains separate AND
conditions that are on the top level.

Change-Id: If104bf1cd801d5ee92dd7e43d398a21a18be5d97
Reviewed-on: http://gerrit.cloudera.org:8080/21498
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2024-06-19 19:09:47 +00:00
Steve Carlin
141f38197b IMPALA-12935: First pass on Calcite planner functions
This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-06-07 17:57:14 +00:00
Zoltan Borok-Nagy
1324a6e6c9 IMPALA-13108: Update version to 4.5.0-SNAPSHOT
Updated IMPALA_VERSION in impala-config.sh

Executed the followings for Java:

  cd java
  mvn versions:set -DnewVersion=4.5.0-SNAPSHOT

Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244
Reviewed-on: http://gerrit.cloudera.org:8080/21460
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-29 23:47:05 +00:00
Sai Hemanth Gantasala
68f8a6a1df IMPALA-12607: Bump the GBN and fetch events specific to the db/table
from the metastore

Bump the GBN to 49623641 to leverage HIVE-27499, so that Impala can
directly fetch the latest events specific to the db/table from the
metastore, instead of fetching the events from metastore and then
filtering in the cache matching the DbName/TableName.

Implementation Details:
Currently when a DDL/DML is performed in Impala, we fetch all the
events from metastore based on current eventId and then filter them in
Impala which can be a bottleneck if the events count is huge. This can
be optimized by including db name and/or table name in the notification
event request object and then filter by event type in impala. This can
provide performance boost on tables that generate a lot of events.

Note:
Also included ShowUtils class in hive-minimal-exec jar as it is
required in the current build version

Testing:
1) Did some tests in local cluster
2) Added a test case in MetaStoreEventsProcessorTest

Change-Id: I6aecd5108b31c24e6e2c6f9fba6d4d44a3b00729
Reviewed-on: http://gerrit.cloudera.org:8080/20979
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-10 05:47:28 +00:00
Steve Carlin
2a3ce2071b IMPALA-12934: Added Calcite parsing files to Impala
Adding the framework to create our own parsing syntax for Impala using
the base Calcite Parser.jj file.

The Parser.jj file here was grabbed from Calcite 1.36. So with this commit,
we are using the same parsing analysis as Calcite 1.36. Any changes made
on top of the Parser.jj file or the config.fmpp file in the future are Impala
specific changes, so a diff can be done from this commit to see all the Impala
parsing changes.

The config.fmpp file was grabbed from Calcite 1.36 default_config.fmpp. The
Calcite intention of the config.fmpp file is to allow markup of variables in
the Parser.jj file. So it is always preferable to modify the
default_config.fmpp file when possible. Our version is grabbed from
https://github.com/apache/calcite/blob/main/core/src/main/codegen/config.fmpp
and slightly modified with the class name to make it compile for Impala.

There's no unit test needed since there is no functional change. The Calcite
planner will eventually make changes in the ".jj" file to support the differences
between the Impala parser and the Calcite parser.
Change-Id: If756b5ea8beb85661a30fb5d029e74ebb6719767
Reviewed-on: http://gerrit.cloudera.org:8080/21194
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-05-09 01:12:45 +00:00
Peter Rozsa
7ad9400656 IMPALA-13044: Upgrade bouncycastle to 1.78
This patch upgrades bouncycastle to 1.78. As of bouncycastle:1.71, the
*-jdk15on artifact is no longer available, the artifact is changed to
*-jdk18on.

Tests:
 - core tests ran

Change-Id: I8372916ab79b863e7a07d22e8333abd54492fa29
Reviewed-on: http://gerrit.cloudera.org:8080/21371
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-03 00:09:15 +00:00
Joe McDonnell
d09c502490 IMPALA-13049: Add dependency management for log4j2 to use 2.18.0
Currently, there is no dependency management for the log4j2
version. Impala itself doesn't use log4j2. However, recently
we encountered a case where one dependency brought in
log4-core 2.18.0 and another brought in log4j-api 2.17.1.
log4j-core 2.18.0 relies on the existence of the ServiceLoaderUtil
class from log4j-api 2.18.0. log4j-api 2.17.1 doesn't have this
class, which causes class not found exceptions.

This uses dependency management to set the log4j2 version to 2.18.0
for log4j-core and log4j-api to avoid any mismatch.

Testing:
 - Ran a local build and verified that both log4j-core and log4j-api
   are using 2.18.0.

Change-Id: Ib4f8485adadb90f66f354a5dedca29992c6d4e6f
Reviewed-on: http://gerrit.cloudera.org:8080/21379
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-01 02:37:49 +00:00
Steve Carlin
b39cd79ae8 IMPALA-12872: Use Calcite for optimization - part 1: simple queries
This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Reviewed-on: http://gerrit.cloudera.org:8080/21109
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-04-25 20:09:09 +00:00
wzhou-code
fc74ca672a IMPALA-12378: Auto Ship JDBC Data Source
This patch moves the source files of jdbc package to fe.
Data source location is optional. Data source could be created without
specifying HDFS location. Assume data source class is in the classpath
and instance of data source class could be created with current class
loader. Impala still try to load the jar file of the data source in
runtime if it's set in data source location.

Testing:
 - Passed core test
 - Passed dockerised-tests

Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Reviewed-on: http://gerrit.cloudera.org:8080/20971
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-02-07 16:29:11 +00:00
gaurav1086
f7a43b18aa IMPALA-12503: Support date data type for predicates
for external data source table

This patch adds support for datatype date as predicates
for external data sources.

Testing:
- Added tests for date predicates with operators:
  '=', '>', '<', '>=', '<=', '!=', 'BETWEEN'.

Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Reviewed-on: http://gerrit.cloudera.org:8080/20915
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-05 21:28:00 +00:00
Csaba Ringhofer
c14156eb3a IMPALA-12746: Bump jackson.databind to 2.15.3
Also sets dependencyManagement to force using the same version
for jackson-databind, jackson-core and jackon-annotations. This is
needed because datagenerator depends on kitesdk, which would pull in a
very old jackson-core version (2.3.1) and lead to build failures
with the newer jackson.databind.

Change-Id: I8440426da1395045cf149aca0044286015861e5f
Reviewed-on: http://gerrit.cloudera.org:8080/20914
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-24 15:13:36 +00:00
wzhou-code
f8e8cd0906 IMPALA-12642: Support query options for Impala external JDBC table
This patch uses JDBC connection string to apply query options to the
Impala server by setting the properties in "jdbc.properties" when
creating JDBC external DataSource table.
jdbc.properties are specified as comma-delimited key=value string, like
"MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"".

Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have
double quotes in the beginning and ending of string.

jdbc.properties can be used for other databases like Postgres and MySQL
to set additional properties. The test cases will be added in separate
patch.

Testing:
 - Added end-to-end tests for setting query options on Impala JDBC
   tables.
 - Passed core tests.

Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992
Reviewed-on: http://gerrit.cloudera.org:8080/20837
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-17 23:16:42 +00:00
Gaurav Singh
9a132bc436 IMPALA-12380: Securing dbcp.password for JDBC
external data source

In the current implementation of external JDBC data source,
the user has to provide both the username and password in
plain text which is not a good practice.

This patch extends the functionality of existing implementation
to either provide:
a) username and password
b) username or key and keystore

If the user provides the password, then that password is used.
However, if no password is provided and the user provides only the
key/keystore, then it fetches the password from the secure jceks
keystore.

Testing:
- Added unit test TestExtDataSourcesWithKeyStore

Change-Id: Iec83a9b6e00456f0a1bbee747bd752b2cf9bf238
Reviewed-on: http://gerrit.cloudera.org:8080/20809
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-02 23:43:42 +00:00
wzhou-code
ec22a1e1ca IMPALA-12502: Support Impala to Impala federation
This patch adds support to read Impala tables in the Impala cluster
through JDBC external data source. It also adds a new counter
NumExternalDataSourceGetNext in profile for the total number of calls
to ExternalDataSource::GetNext().
Setting query options for Impala will be supported in a following patch.

Testing:
 - Added an end-to-end unit test to read Impala tables from Impala
   cluster through JDBC external data source.
   Manually ran the unit-test with Impala tables in Impala cluster on a
   remote host by setting $INTERNAL_LISTEN_HOST in jdbc.url as the ip
   address of the remote host on which an Impala cluster is running.
 - Added LDAP test for reading table through JDBC external data source
   with LDAP authentication.
   Manually ran the unit-test with Impala tables in a remote Impala
   cluster.
 - Passed core tests.

Change-Id: I79ad3273932b658cb85c9c17cc834fa1b5fbd64f
Reviewed-on: http://gerrit.cloudera.org:8080/20731
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-12-22 21:44:49 +00:00
gaurav1086
4c762725c7 IMPALA-12471: Add unit tests of external jdbc
tables for MySQL

This patch adds MySql tests for the "external data source"
mechanism in Impala to implement data source for querying JDBC.

This patch also fixes the handling of case-sensitive table and
column names for MySQL query.

Testing:
- Added unit test for mysql and ran unit-test with JDBC
driver mysql-connector-j-8.1.0.jar. This test requires
to add the docker to sudoer's group. Also, the test is
only run in 'exhaustive' mode.

Change-Id: I446ec3d4ebaf53c8edac0b2d181514bde587dfae
Reviewed-on: http://gerrit.cloudera.org:8080/20710
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-12-02 06:03:05 +00:00
Daniel Becker
6d3317b9a1 IMPALA-12570: Add longer strings to tables containing collections
IMPALA-12373 introduces small string optimisation, after which not all
strings will have a var-len part.

IMPALA-12159 adds support for ORDER BY for collections of variable
length types in the select list, but the test tables it uses only/mostly
contain short strings.

This patch has two modifications:

1. It introduces longer strings in 'collection_tbl' and
'collection_struct_mix'. It also adds two more rows to the existing one
in 'collection_tbl' so that it can be used in sorting tests. These
tables are only used by complex types tests, so the impact is limited.

2. It modifies RandomNestedDataGenerator.java, so that now it takes a
parameter for string length. Some variable names are changed to clearer
names. The references to and uses of RandomNestedDataGenerator are
updated.

Change-Id: Ief770d6bc9258fce159a733d5afa34fe594b96f8
Reviewed-on: http://gerrit.cloudera.org:8080/20718
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-22 18:50:22 +00:00
gaurav1086
8fe471d469 IMPALA-12470 (PART-3): delete temporary jar file
in GenericJdbcDatabaseAccessor close() function

The earlier change had a bug where we are deleting
the temporary jdbc jar file too early from the
/tmp directory before it can be loaded. The
GenericJdbcDatabaseAccessor class loader works by
OnDemand loading. Hence move the delete file logic
to the GenericJdbcDatabaseAccessor close()
function instead.

Testing:
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

Verify that the mysql-jdbc.jar is present in the hdfs path:
hadoop fs -ls /test-warehouse/data-sources/jdbc-drivers

3. Create an `alltypes` table in the mysql database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create mysql data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. Make sure that the mysql jar file is not present in the classpath
grep 'mysql' /home/gsingh/Impala/fe/target/build-classpath.txt \
/home/gsingh/Impala/fe/target/test-classpath.txt \
/home/gsingh/Impala/java/executor-deps/target/build-executor-\
deps-classpath.txt | wc -l

returns 0

6. Run the impala-shell query:
use functional;
select count(*) from alltypes_jdbc_mysql_datasource;

executes successfully and returns the row count.

Change-Id: I1becc01a9d93a99be8f47dfe99258dea3a8abeb3
Reviewed-on: http://gerrit.cloudera.org:8080/20706
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-15 02:38:02 +00:00
wzhou-code
d318f1c992 IMPALA-12377: Improve count(*) performance for jdbc external table
Backend function DataSourceScanNode::GetNext() handles count query
inefficiently. Even when there are no column data returned from
external data source, it still tries to materialize rows and add
rows to RowBatch one by one up to the number of row count. It also
call GetNextInputBatch() multiple times (count / batch_size), while
GetNextInputBatch() invokes JNI function in external data source.

This patch improves the DataSourceScanNode::GetNext() and
JdbcDataSource.getNext() to avoid unnecessary function calls.

Testing:
 - Ran query_test/test_ext_data_sources.py which consists count
   queries for jdbc external table.
 - Passed core-tests.

Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6
Reviewed-on: http://gerrit.cloudera.org:8080/20653
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-11-14 04:48:23 +00:00
gaurav1086
4ed6d765ed IMPALA-12470 (PART-2): delete temporary file in
/tmp after class loaded

This patch fixes the bug added in the previous patch for IMPALA-12470.
It adds the prefix "file://" to the unix standard path string to
create the corresponding valid hadoop.fs.Path object. For example:
"/tmp" is converted to "file:///tmp".

Testing:
1. Deleted all the jar files in the /tmp directory.
2. Ran the local jdbc ext data sources tests:
  - impala-py.test tests/query_test/test_ext_data_sources.py
  - impala-py.test tests/custom_cluster/test_ext_data_sources.py
3. Upon completion of the tests successfully, Verified that there were
   no .jar files in the /tmp directory.

Change-Id: Iab7cc66383bc62f209987dd3fb42fc3fc6604726
Reviewed-on: http://gerrit.cloudera.org:8080/20654
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-06 16:15:51 +00:00
gaurav1086
39adf42a30 IMPALA-12470: Support different schemes for jdbc driver url when
creating external jdbc table

This patch builds on top of IMPALA-5741 to copy the jdbc jar from
remote filesystems: Ozone and S3. Currenty we only support hdfs.

Testing:
Commented out "@skipif.not_hdfs" qualifier in files:
  - tests/query_test/test_ext_data_sources.py
  - tests/custom_cluster/test_ext_data_sources.py
1) tested locally by running tests:
  - impala-py.test tests/query_test/test_ext_data_sources.py
  - impala-py.test tests/custom_cluster/test_ext_data_sources.py
2) tested using jenkins job for ozone and S3

Change-Id: I804fa3d239a4bedcd31569f2b46edb7316d7f004
Reviewed-on: http://gerrit.cloudera.org:8080/20639
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-11-01 23:32:10 +00:00
Michael Smith
098ad53f65 IMPALA-12480: Use Hadoop version for hadoop-aliyun
Uses the imported Hadoop version for the hadoop-aliyun module, which is
a tool in the hadoop project. This allows us to exclude vulnerable
versions of jdom that were previously included via hadoop-aliyun.

Change-Id: I270f3895ec668d9fb907f35b04cad2f149e3d0de
Reviewed-on: http://gerrit.cloudera.org:8080/20532
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 20:38:36 +00:00
Fucun Chu
c2bd30a1b3 IMPALA-5741: Initial support for reading tiny RDBMS tables
This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external JDBC tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 02:13:59 +00:00
Michael Smith
1cf5bc6e79 Update version to 4.4.0-SNAPSHOT
Change-Id: I21c3b823c1b0db198d442d155c01d4cfd3a5c522
Reviewed-on: http://gerrit.cloudera.org:8080/20534
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-07 01:43:15 +00:00
Sai Hemanth Gantasala
a0cdb7b594 IMPALA-12231: Bump GBN to get HMS thrift API changes
We need a couple of hive changes HIVE-27319 and HIVE-27337 for catalogD
to work with latest HMS server to fix IMPALA-11768 and IMPALA-11939
respectively.

Bump CDP_BUILD_NUMBER (GBN) to 44206393
Bump various CDP versiona numbers to be based on 7.2.18.0-273

TESTING: Exhaustive tests ran clean
Added a couple of tests for IMPALA-11939 and IMPALA-11768

Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3
Reviewed-on: http://gerrit.cloudera.org:8080/20420
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-09-12 09:36:57 +00:00
Steve Carlin
bc83d46a9a IMPALA-12424: Allow third party JniFrontend interface.
This patch allows a third party to inject their own frontend
class instead of using the default JniFrontend included in the
project.

The test case includes an interface that runs queries as normal
except for the "select 1" query which gets changed to "select 42".

Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5
Reviewed-on: http://gerrit.cloudera.org:8080/20459
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-09-08 20:20:56 +00:00
Zoltan Borok-Nagy
a95859be0b IMPALA-12359: Add missing package-info file used by HiveVersionInfo
We create a minimal-impala-hive-exec.jar based on Hive's hive-exec.jar:
https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L34

This excludes lots of class files, including
org/apache/hive/common/package-info.class that is used by
HiveVersionAnnotation and HiveVersionInfo classes.

Because of this HiveVersionInfo returns "Unknown" version resulting
in failing Iceberg operations.

Change-Id: I444330a654d7d86e653588eb91d2f063d5be8c08
Reviewed-on: http://gerrit.cloudera.org:8080/20340
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-08-11 11:28:43 +00:00
Michael Smith
7fb6a9a1d2 IMPALA-11941: (Addendum) Use released jamm 0.4.0
Switches to the 0.4.0 release of jamm, as building a shaded JAR from
source was a temporary measure.

Change-Id: I5b88b479580f7d0baff502ad9551d2764971babf
Reviewed-on: http://gerrit.cloudera.org:8080/20237
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-25 00:27:56 +00:00
Laszlo Gaal
ee069687fc IMPALA-12212: Bump Maven to 3.9.2, pull dependencies in parallel
Maven 3.9.x offers a new dependency resolver, HttpClient, which allows
downloading project dependencies in parallel.

This patch bumps the Maven version installed by bootstrap_system.sh to
v3.9.2, and adds the flags enabling the new resolver to download
dependencies (including POM files) in parallel. Parallelism is set to
10 threads.

The flags are added to a project-specific Maven setting file in the
newly created java/.mvn directory. The settings file is added to the
RAT exclusion list in bin/rat_exclude_files.txt.

The --show-version flag is added for debugging purposes.

The same flags are added to the JAMM subproject as well.

The new resolver in Maven 3.9 has also changed the warning message
emitted for missing component checksums, so the new warning string
is added to the filter in bin/mvn-quiet.sh
Unfortunately Maven 3.9 has also changed the way it responds to missing
checksum files: the resolver now emits a stack trace when checksums
cannot be determined, and missing checksums are not explicitly ignored.

Detailed documentation for the new Maven resolver in Maven 3.9.0+ is
located at:
https://maven.apache.org/guides/mini/guide-resolver-transport.html
resolver configuration reference:
https://maven.apache.org/resolver/configuration.html

Tests:
- verified in a core-mode test run with Maven 3.9.2 installed
- verified in a local build using an earlier version of Maven
  to verify that the new default setting does not cause regressions
  with the old dependency resolver.

Change-Id: I75d05215effc724f5bd471646fb352f37443e185
Reviewed-on: http://gerrit.cloudera.org:8080/20142
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-07-24 18:50:34 +00:00
Joe McDonnell
a281d8eb8e IMPALA-12284: Use Maven's batch mode when building jamm
This adds the --batch-mode flag to the maven invocation
the builds jamm. That disables some of the download progress
output, reducing the total size of the output.

Testing:
 - Ran a build locally

Change-Id: I1634240b191168b13cf3be7c9266e21a746844b1
Reviewed-on: http://gerrit.cloudera.org:8080/20196
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-14 23:45:28 +00:00
Michael Smith
87fd844d3e IMPALA-11941: (Addendum) Produce shaded copy of Jamm
Produces a shaded copy of a pre-release jamm jar that supports Java 17.
Building a copy of jamm and directly depending on it meant any consumer
of Impala would have to provide their own build of it.

Testing: ran custom_cluster/test_local_catalog.py with Java 8 and 17

Change-Id: Ida42d720a2639b65391c07a9237556311e04fac6
Reviewed-on: http://gerrit.cloudera.org:8080/20147
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-07-01 01:10:12 +00:00
Daniel Becker
679d58fa6d IMPALA-12238: RandomNestedDataGenerator should take a seed argument
RandomNestedDataGenerator can be used to produce parquet files with
random data from Avro schemas. This change makes it possible to provide
a seed value for the random generator so the generated files are
reproducible. The seed can be given as the last (optional) command line
argument. It is parsed as a Java 'long'.

Testing:
 - manually verified that when run with the same arguments (including
   the seed), the data generator produces the same results

Change-Id: Iee33604bbfe12895100afbd0f98ac302dee9a238
Reviewed-on: http://gerrit.cloudera.org:8080/20136
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Daniel Becker <daniel.becker@cloudera.com>
2023-06-28 16:18:08 +00:00
Michael Smith
3b0705ba63 IMPALA-11941: Support Java 17 in Impala
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.

Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.

Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.

Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.

Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).

Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
  test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath

Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-24 10:11:54 +00:00
Peter Rozsa
6b571eb7e4 IMPALA-12184: Java UDF increment on an empty string is inconsistent
This change removes the Text-typed overload for BufferAlteringUDF to
avoid ambiguous function matchings. It also changes the 2-parameter
function in BufferAlteringUDF to cover Text typed arguments.

Tests:
 - test_udfs.py manually executed

Change-Id: I3a17240ce39fef41b0453f162ab5752f1c940f41
Reviewed-on: http://gerrit.cloudera.org:8080/20038
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-20 17:00:35 +00:00
Michael Smith
683bef1ca4 IMPALA-11253: Support testing with Java 11 (take 2)
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.

Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 16:04:29 +00:00
Michael Smith
1b6011c6a0 Revert "IMPALA-11253: Support testing with Java 11"
This reverts commit ee6395db76 as it is
not flexible enough at detecting Java automatically in likely build
environments.

Change-Id: I836c9f7fd10740b15f7e40b2e7f889ac7ee61fc3
Reviewed-on: http://gerrit.cloudera.org:8080/19908
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-05-21 14:00:14 +00:00
Michael Smith
ee6395db76 IMPALA-11253: Support testing with Java 11
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-05-19 22:32:00 +00:00
Michael Smith
d91cdb0cec IMPALA-12077: Remove deprecated Avro methods
Switches Avro methods deprecated in Avro 1.8 to new alternatives.

Change-Id: I8c01886774eb4ca5964a82c2fa568d7c4354c70c
Reviewed-on: http://gerrit.cloudera.org:8080/19772
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-21 22:53:46 +00:00
Michael Smith
f0289f3cbb IMPALA-11273: Remove APIs deprecated in Java 11
Replaces constructor calls for object versions of primitives - Integer,
Long, Float, Double, Boolean - with optimized valueOf calls as using
constructors for these is deprecated according to jdeprscan.

Removes override of finalize. Use of finalize is deprecated, and
hive-udf-call.cc ensures we always call close when unloading the UDF.
Adds try-with-resources to UdfExecutorTest to handle test cleanup.

Updates BigDecimal.setScale to use RoundingMode.

Change-Id: Idfb053223b6e098e6032502f873361696dd2da84
Reviewed-on: http://gerrit.cloudera.org:8080/19721
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-15 09:45:29 +00:00
Peter Rozsa
afe59f7f0d IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed as these operations are now
handled in the byte array. ImpalaStringWritable class is also removed,
writables that used it before now store the data directly.

Tests:
 - Test UDFs added as BufferAlteringUdf and GenericBufferAlteringUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Reviewed-on: http://gerrit.cloudera.org:8080/19507
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-08 19:54:38 +00:00
Csaba Ringhofer
67bb870aa3 IMPALA-11911: Fix NULL argument handling in Hive GenericUDFs
Before this patch if an argument of a GenericUDF was NULL, then Impala
passed it as null instead of a DeferredObject. This was incorrect, as
a DeferredObject is expected with a get() function that returns null.
See the Jira for more details and GenericUDF examples in Hive.

TestGenericUdf's NULL handling was further broken in IMPALA-11549,
leading to throwing null pointer exceptions when the UDF's result is
NULL. This test bug was not detected, because Hive udf tests were
running with default abort_java_udf_on_exception=false, which means
that exceptions from Hive UDFs only led to warnings and returning NULL,
which was the expected result in all affected test queries.

This patch fixes the behavior in HiveUdfExecutorGeneric and improves
FE/EE tests to catch null handling related issues. Most Hive UDF tests
are run with abort_java_udf_on_exception=true after this patch to treat
exceptions in UDFs as errors. The ones where the test checks that NULL
is returned if an exception is thrown while abort_java_udf_on_exception
is false are moved to new .test files.
TestGenericUdf is also fixed (and simplified) to handle NULL return
values correctly.

Change-Id: I53238612f4037572abb6d2cc913dd74ee830a9c9
Reviewed-on: http://gerrit.cloudera.org:8080/19499
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-06 13:45:56 +00:00
yx91490
f4d306cbca IMPALA-11629: Support for huawei OBS FileSystem
This patch adds support for huawei OBS (Object Storage Service)
FileSystem. The implementation is similar to other remote FileSystems.

New flags for OBS:
- num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16.

Testing:
 - Upload hdfs test data to an OBS bucket. Modify all locations in HMS
   DB to point to the OBS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1
Reviewed-on: http://gerrit.cloudera.org:8080/19110
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-09 08:10:19 +00:00
Peter Rozsa
1d05381b7b IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
(lack of varargs support and only partial generics support)
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.

Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177

A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

The ESRI geospatial implementation for Hive currently only
available in Hive 4, but CDP Hive backported it to Hive 3,
therefore for Apache Hive this feature is disabled
regardless of the "geospatial_library" flag.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works
   with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineString supports a maximum of 7
   pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop

Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com>

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Reviewed-on: http://gerrit.cloudera.org:8080/19425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-07 20:18:47 +00:00
Zoltan Borok-Nagy
b88cfadbbd IMPALA-11777: Bump CDP_BUILD_NUMBER to get HIVE-24498
Without HIVE-24498 we get java.lang.NoClassDefFoundError exceptions
when we write Iceberg tables via Hive. This makes it hard to write
interop tests between Hive and Impala which use Iceberg tables.

I also exclude some private Java components to get things built.

Change-Id: I486c2b1b224f72e082e331a57cf25a37ebb9fa54
Reviewed-on: http://gerrit.cloudera.org:8080/19331
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2022-12-13 13:30:25 +00:00
Daniel Becker
a71e69f570 IMPALA-11792: Update Impala version to 4.3.0-SNAPSHOT
As 4.2.0 has been released this commit updates the master to 4.3.0.
This step needs to happen on each release.

Testing:
 - Ran a build

Change-Id: Iebedcfbc1fd8018391a6c78a9aca4a9d754780fa
Reviewed-on: http://gerrit.cloudera.org:8080/19344
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-13 05:44:10 +00:00
Csaba Ringhofer
86740a7d35 IMPALA-11549: Support Hive GenericUdfs that return primitive java types
Before this patch only the Writable* types were accepted in GenericUdfs
as return types, while some GenericUdfs in the wild return primitive java
types (e.g. Integer instead of IntWritable). For legacy Hive UDFs these
return types were already handled, so the only change needed was to
map the ObjectInspector subclasses (e.g. JavaIntObjectInspector) to the
correct JavaUdfDataType in Impala.

Testing:
- Added a subclass for TestGenericUdf (TestGenericUdfWithJavaReturnTypes)
  that returns primitive java types (probably inheriting in the opposite
  direction would be more logical, but the diff is smaller this way).
- Changed EE tests to also use TestGenericUdfWithJavaReturnTypes.
- Changed FE tests (UdfExecutorTest) to check both
  TestGenericUdfWithJavaReturnTypes and TestGenericUdf.
- Also added a test with BINARY type to UdfExecutorTest as this was
  forgotten during the original BINARY patch.

Change-Id: I30679045d6693ebd35718b6f1a22aaa4963c1e63
Reviewed-on: http://gerrit.cloudera.org:8080/19304
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-08 17:51:00 +00:00
Michael Smith
c3ec9272c5 IMPALA-11724: Use CDP Ozone in test environment
Updates the test environment to default to the CDP build of Ozone, as
the latest build of CDP Hive depends on pre-release features unavailable
in Ozone 1.2.1. Apache Ozone 1.2 can still be used by setting
USE_APACHE_OZONE=true.

The latest CDP build also includes a version of Ozone based on
ozone#master with a candidate version of 1.3.0. Both Apache and CDP
therefore have builds of Ozone we can test with that use the new
artifact names introduced in Ozone 1.2, so this patch cleans up setup
that was only needed for Ozone versions prior to 1.2.

Change-Id: I1177a1b820fe21adca9f8c1cc51ff73ee001d3f2
Reviewed-on: http://gerrit.cloudera.org:8080/19247
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2022-11-16 22:13:06 +00:00
yacai
c953426692 IMPALA-11683: Support Aliyun OSS File System
This patch adds support for OSS (Aliyun Object Storage Service).
Using the hadoop-aliyun, the implementation is similar to other
remote FileSystems.

Tests:
- Prepare:
  Initialize OSS-related environment variables:
  OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT.
  Compile and create hdfs test data on a ECS instance. Upload test data
  to an OSS bucket.
- Modify all locations in HMS DB to point to the OSS bucket.
  Remove some hdfs caching params. Run CORE tests.

Change-Id: I267e6531da58e3ac97029fea4c5e075724587910
Reviewed-on: http://gerrit.cloudera.org:8080/19165
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-16 10:14:49 +00:00
Michael Smith
83c5e6e409 IMPALA-11670: Upgrade components, add envvars for override
Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address
CVEs. Adds environment variables for commonly-updated components so they
can be customized via the branch-specific impala-config-branch.sh in a
way that allows both to be updated regularly without merge conflicts.

Also updates httpcomponents.httpcore to 4.4.14 to be consistent with
other httpcomponents libraries included transitively.

Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216
Reviewed-on: http://gerrit.cloudera.org:8080/19147
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-19 15:54:00 +00:00