Commit Graph

43 Commits

Author SHA1 Message Date
Michael Smith
22b59d27d0 IMPALA-13243: Update Dropwizard Metrics to 4.2.x
Updates Dropwizard Metrics components to the latest 4.2.x release,
4.2.26. We directly use metrics-core, and metrics-jvm/metrics-json are
imported via Hive (via
https://github.com/joshelser/dropwizard-hadoop-metrics2).

Dropwizard Metrics manually tested with these versions on
https://github.com/joshelser/dropwizard-hadoop-metrics2/pull/8.

Change-Id: Ie9bec7a7c23194604430531bd83b25c5969e888e
Reviewed-on: http://gerrit.cloudera.org:8080/21599
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-07-23 05:22:59 +00:00
Zoltan Borok-Nagy
1324a6e6c9 IMPALA-13108: Update version to 4.5.0-SNAPSHOT
Updated IMPALA_VERSION in impala-config.sh

Executed the followings for Java:

  cd java
  mvn versions:set -DnewVersion=4.5.0-SNAPSHOT

Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244
Reviewed-on: http://gerrit.cloudera.org:8080/21460
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-29 23:47:05 +00:00
Peter Rozsa
7ad9400656 IMPALA-13044: Upgrade bouncycastle to 1.78
This patch upgrades bouncycastle to 1.78. As of bouncycastle:1.71, the
*-jdk15on artifact is no longer available, the artifact is changed to
*-jdk18on.

Tests:
 - core tests ran

Change-Id: I8372916ab79b863e7a07d22e8333abd54492fa29
Reviewed-on: http://gerrit.cloudera.org:8080/21371
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-03 00:09:15 +00:00
Joe McDonnell
d09c502490 IMPALA-13049: Add dependency management for log4j2 to use 2.18.0
Currently, there is no dependency management for the log4j2
version. Impala itself doesn't use log4j2. However, recently
we encountered a case where one dependency brought in
log4-core 2.18.0 and another brought in log4j-api 2.17.1.
log4j-core 2.18.0 relies on the existence of the ServiceLoaderUtil
class from log4j-api 2.18.0. log4j-api 2.17.1 doesn't have this
class, which causes class not found exceptions.

This uses dependency management to set the log4j2 version to 2.18.0
for log4j-core and log4j-api to avoid any mismatch.

Testing:
 - Ran a local build and verified that both log4j-core and log4j-api
   are using 2.18.0.

Change-Id: Ib4f8485adadb90f66f354a5dedca29992c6d4e6f
Reviewed-on: http://gerrit.cloudera.org:8080/21379
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-01 02:37:49 +00:00
Steve Carlin
b39cd79ae8 IMPALA-12872: Use Calcite for optimization - part 1: simple queries
This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Reviewed-on: http://gerrit.cloudera.org:8080/21109
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-04-25 20:09:09 +00:00
wzhou-code
fc74ca672a IMPALA-12378: Auto Ship JDBC Data Source
This patch moves the source files of jdbc package to fe.
Data source location is optional. Data source could be created without
specifying HDFS location. Assume data source class is in the classpath
and instance of data source class could be created with current class
loader. Impala still try to load the jar file of the data source in
runtime if it's set in data source location.

Testing:
 - Passed core test
 - Passed dockerised-tests

Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Reviewed-on: http://gerrit.cloudera.org:8080/20971
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-02-07 16:29:11 +00:00
Csaba Ringhofer
c14156eb3a IMPALA-12746: Bump jackson.databind to 2.15.3
Also sets dependencyManagement to force using the same version
for jackson-databind, jackson-core and jackon-annotations. This is
needed because datagenerator depends on kitesdk, which would pull in a
very old jackson-core version (2.3.1) and lead to build failures
with the newer jackson.databind.

Change-Id: I8440426da1395045cf149aca0044286015861e5f
Reviewed-on: http://gerrit.cloudera.org:8080/20914
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-24 15:13:36 +00:00
Michael Smith
098ad53f65 IMPALA-12480: Use Hadoop version for hadoop-aliyun
Uses the imported Hadoop version for the hadoop-aliyun module, which is
a tool in the hadoop project. This allows us to exclude vulnerable
versions of jdom that were previously included via hadoop-aliyun.

Change-Id: I270f3895ec668d9fb907f35b04cad2f149e3d0de
Reviewed-on: http://gerrit.cloudera.org:8080/20532
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-10 20:38:36 +00:00
Michael Smith
1cf5bc6e79 Update version to 4.4.0-SNAPSHOT
Change-Id: I21c3b823c1b0db198d442d155c01d4cfd3a5c522
Reviewed-on: http://gerrit.cloudera.org:8080/20534
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-07 01:43:15 +00:00
Steve Carlin
bc83d46a9a IMPALA-12424: Allow third party JniFrontend interface.
This patch allows a third party to inject their own frontend
class instead of using the default JniFrontend included in the
project.

The test case includes an interface that runs queries as normal
except for the "select 1" query which gets changed to "select 42".

Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5
Reviewed-on: http://gerrit.cloudera.org:8080/20459
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-09-08 20:20:56 +00:00
Michael Smith
7fb6a9a1d2 IMPALA-11941: (Addendum) Use released jamm 0.4.0
Switches to the 0.4.0 release of jamm, as building a shaded JAR from
source was a temporary measure.

Change-Id: I5b88b479580f7d0baff502ad9551d2764971babf
Reviewed-on: http://gerrit.cloudera.org:8080/20237
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-25 00:27:56 +00:00
Michael Smith
87fd844d3e IMPALA-11941: (Addendum) Produce shaded copy of Jamm
Produces a shaded copy of a pre-release jamm jar that supports Java 17.
Building a copy of jamm and directly depending on it meant any consumer
of Impala would have to provide their own build of it.

Testing: ran custom_cluster/test_local_catalog.py with Java 8 and 17

Change-Id: Ida42d720a2639b65391c07a9237556311e04fac6
Reviewed-on: http://gerrit.cloudera.org:8080/20147
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-07-01 01:10:12 +00:00
yx91490
f4d306cbca IMPALA-11629: Support for huawei OBS FileSystem
This patch adds support for huawei OBS (Object Storage Service)
FileSystem. The implementation is similar to other remote FileSystems.

New flags for OBS:
- num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16.

Testing:
 - Upload hdfs test data to an OBS bucket. Modify all locations in HMS
   DB to point to the OBS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1
Reviewed-on: http://gerrit.cloudera.org:8080/19110
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-09 08:10:19 +00:00
Daniel Becker
a71e69f570 IMPALA-11792: Update Impala version to 4.3.0-SNAPSHOT
As 4.2.0 has been released this commit updates the master to 4.3.0.
This step needs to happen on each release.

Testing:
 - Ran a build

Change-Id: Iebedcfbc1fd8018391a6c78a9aca4a9d754780fa
Reviewed-on: http://gerrit.cloudera.org:8080/19344
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-13 05:44:10 +00:00
Michael Smith
c3ec9272c5 IMPALA-11724: Use CDP Ozone in test environment
Updates the test environment to default to the CDP build of Ozone, as
the latest build of CDP Hive depends on pre-release features unavailable
in Ozone 1.2.1. Apache Ozone 1.2 can still be used by setting
USE_APACHE_OZONE=true.

The latest CDP build also includes a version of Ozone based on
ozone#master with a candidate version of 1.3.0. Both Apache and CDP
therefore have builds of Ozone we can test with that use the new
artifact names introduced in Ozone 1.2, so this patch cleans up setup
that was only needed for Ozone versions prior to 1.2.

Change-Id: I1177a1b820fe21adca9f8c1cc51ff73ee001d3f2
Reviewed-on: http://gerrit.cloudera.org:8080/19247
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2022-11-16 22:13:06 +00:00
yacai
c953426692 IMPALA-11683: Support Aliyun OSS File System
This patch adds support for OSS (Aliyun Object Storage Service).
Using the hadoop-aliyun, the implementation is similar to other
remote FileSystems.

Tests:
- Prepare:
  Initialize OSS-related environment variables:
  OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT.
  Compile and create hdfs test data on a ECS instance. Upload test data
  to an OSS bucket.
- Modify all locations in HMS DB to point to the OSS bucket.
  Remove some hdfs caching params. Run CORE tests.

Change-Id: I267e6531da58e3ac97029fea4c5e075724587910
Reviewed-on: http://gerrit.cloudera.org:8080/19165
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-16 10:14:49 +00:00
Michael Smith
83c5e6e409 IMPALA-11670: Upgrade components, add envvars for override
Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address
CVEs. Adds environment variables for commonly-updated components so they
can be customized via the branch-specific impala-config-branch.sh in a
way that allows both to be updated regularly without merge conflicts.

Also updates httpcomponents.httpcore to 4.4.14 to be consistent with
other httpcomponents libraries included transitively.

Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216
Reviewed-on: http://gerrit.cloudera.org:8080/19147
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-19 15:54:00 +00:00
Michael Smith
22e5ca3d0a IMPALA-11667: Clean up Java dependency exclusions
Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.

Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.

Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-19 15:54:00 +00:00
Michael Smith
a1fddf1022 IMPALA-11628: Switch to reload4j, update slf4j
Switches from log4j 1.x to reload4j, a maintained fork. Updates slf4j to
the latest version so we can include all CVE fixes.

slf4j 2.0.x requires Java 8 and adds a backward-compatible fluent
logging api. Neither seems like a problem for Impala.

Bans all use of log4j 1.x so we only use reload4j.

Change-Id: I5238b9c8247af3e0f4cb05c0b76a75bfee37f5c8
Reviewed-on: http://gerrit.cloudera.org:8080/19102
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-11 21:23:11 +00:00
wzhou-code
010f1b943c IMPALA-11639: Upgrade Spring framework to 5.3.20
This patch upgrade the Spring framework to 5.3.20 to
address multiple CVEs:
 - CVE-2022-22971
 - CVE-2022-22968
 - CVE-2022-22970

Testing:
 - Ran core job
 - Ran custom cluster tests in exhaustive mode

Change-Id: I33f4f1d22fc27227e31d744658a17c16b61b9677
Reviewed-on: http://gerrit.cloudera.org:8080/19091
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-05 19:34:39 +00:00
Steve Carlin
4e813b7085 IMPALA-11528: Catalogd should start up with a corrupt Hive function.
This commit handles the case for a specific kind of corrupt function
within the Hive Metastore in the following situation:

A valid Hive SQL function gets created in HMS. This UDF is written in
Java and must derive from the "UDF" class. After creating this function
in Impala, we then replace the underlying jar file with a class that
does NOT derive from the "UDF" class.

In this scenario, catalogd should reject the function and still start
up gracefully. Before this commit, catalogd wasn't coming up. The
reason for this was because the Hive function
FunctionUtils.getUDFClassType() has a dependency on UDAF and was
throwing a LinkageError exception, so we need to include the UDAF
class in the shaded jar.

Change-Id: I54e7a1df6d018ba6cf5ecf32dc9946edf86e2112
Reviewed-on: http://gerrit.cloudera.org:8080/18927
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2022-09-13 14:48:31 +00:00
Joe McDonnell
7581cedd52 IMPALA-11394: Update jackson-databind to 2.12.6.1
This updates jackson-databind to address CVE-2020-36518.

Testing:
 - Ran a core job

Change-Id: I8db403a102097a22c48f5d9d42ced3b85930078f
Reviewed-on: http://gerrit.cloudera.org:8080/18891
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-23 15:50:35 +00:00
Joe McDonnell
4845f36b4e IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies
Hadoop provides hadoop-cloud-storage, which includes most of
the dependencies that Impala currently uses like hadoop-aws,
hadoop-azure, Knox's gateway-cloud-bindings, etc. Hadoop has
put in a lot of work to make sure that this package includes
the right version of dependencies (including shading some
dependencies for GCS). It seems like this is a more reliable
way to consume these dependencies.

This switches the Java build to use hadoop-cloud-storage
and removes the dependencies that it replaces. This eliminates
the need to control the version of oauth and GCS, as those
are determined by hadoop-cloud-storage.

Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd
Reviewed-on: http://gerrit.cloudera.org:8080/18817
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-15 21:11:42 +00:00
Michael Smith
830625b104 IMPALA-9442: Add Ozone to minicluster
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.

Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.

Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.

Passes tests with FE_TEST=false BE_TEST=false.

Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-08-03 16:58:20 +00:00
Michael Smith
00db9a27df IMPALA-11407: Upgrade google-oauth-client to 1.33.3
Upgrades google-oauth-client and google-oauth-client-java6 to 1.33.3 to
address CVE-2021-22573. These are included as dependencies of
com.google.cloud.bigdataoss/gcs-connector, which does not yet have a
release that includes versions 1.33.3 or later.

Change-Id: I8d95913f26e6073373374e169ee045881f40f065
Reviewed-on: http://gerrit.cloudera.org:8080/18683
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-01 04:10:13 +00:00
Riza Suminto
06b1db4675 IMPALA-11369: Separate thrift compiler for different component
Impala used to have one thrift compiler version to compile C++, Java,
and Python code.

Most Thrift serialization/deserialization between minor versions are
compatible with each other. So it is possible to have different thrift
compiler versions for different target codes. It is beneficial to do so
because it will allow Impala to upgrade separate components
independently.

This patch implements the infrastructure change required to do so. It
replace most of the 'THRIFT_*' environment variable and CMake variable
with 'THRFIT_CPP_*', 'THRFIT_JAVA_*', and 'THRFIT_PY_*' to compile C++,
Java, and Python code accordingly. All three still refer to the same
thrift version (thrift-0.11.0-p5).

Testing:
- Build Impala and pass core tests.

Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4
Reviewed-on: http://gerrit.cloudera.org:8080/18636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-21 02:40:59 +00:00
Tamas Mate
97d3b25be3 IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT
As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057

Testing:
 - Ran a build

Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-07 22:50:50 +00:00
Joe McDonnell
3627b027fe IMPALA-11229: Upgrade Spring framework to 5.3.18
This upgrade the Spring framework to 5.3.18 to
address multiple CVEs:
 - CVE-2022-22965
 - CVE-2022-22950
 - CVE-2021-22060

Testing:
 - Ran core job
 - Ran custom cluster tests in exhaustive mode

Change-Id: Ie1b299c5b24e70c9db6eb0ce37fee9e32908423e
Reviewed-on: http://gerrit.cloudera.org:8080/18405
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2022-04-13 15:12:25 +00:00
Joe McDonnell
26398855bf IMPALA-10930: Bump the Java artifact versions to 4.1.0-SNAPSHOT
This changes the Maven pom.xml files to use verison
4.1.0-SNAPSHOT rather than 4.0.0-SNAPSHOT. In the
past, these versions were a fixed value, but that
changed with IMPALA-10198. This is a new step that
needs to happen on each release.

Testing:
 - Ran a build

Change-Id: I10a589b4fbc15048199943a0e06d079f57840239
Reviewed-on: http://gerrit.cloudera.org:8080/18390
Reviewed-by: Tamas Mate <tmater@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-04-11 16:06:46 +00:00
Joe McDonnell
aa404b856f IMPALA-11197/IMPALA-11149: Address CVEs in pac4j/xmlsec
This upgrades pac4j and several of its dependencies
(including xmlsec) to address CVEs in those components.
Specifically:
 - pac4j 4.5.5 addresses CVE-2021-44878
 - xmlsec 2.2.3 addresses CVE-2021-40690
 - bcprov 1.68 addresses CVE-2020-15522

This also upgrade springframework to 5.2.9.RELEASE to
match the version for pac4j 4.5.5.

Testing:
 - Ran core job

Change-Id: I8421d867dd0fce8eeaa6bc13a511ca3e8dd05723
Reviewed-on: http://gerrit.cloudera.org:8080/18348
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-03-24 15:49:47 +00:00
Fucun Chu
4186727fe6 IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2
Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.

Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.

Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.

Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.

Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Reviewed-on: http://gerrit.cloudera.org:8080/17774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-02-27 06:36:19 +00:00
Fucun Chu
157086cb80 IMPALA-10771: Add Tencent COS support
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.

New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.

Follow-up:
- Support for caching COS file handles will be addressed in
   IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   COS (IMPALA-10773).

Tests:
 - Upload hdfs test data to a COS bucket. Modify all locations in HMS
   DB to point to the COS bucket. Remove some hdfs caching params.
   Run CORE tests.

Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-12-08 16:32:02 +00:00
Zoltan Borok-Nagy
7b2bb13ecc IMPALA-10810: Bump json-smart from 2.3 to 2.4.7
I noticed that our json-smart dependency is stale and we could
pick up a newer version.

This patch upgrades it to 2.4.7 which is the newest version at
the time of writing.

Change-Id: I6b43f606f40e172aa267b55c564fa64d68515bd5
Reviewed-on: http://gerrit.cloudera.org:8080/17702
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-07-22 23:05:34 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
Joe McDonnell
267f4d67f4 IMPALA-10455: Reorder Maven repositories for cleaner mirror semantics
When using a Maven mirror that uses a mirrorOf pattern, the order
of repositories in the pom.xml has a strong influence on whether the
build tries the mirror for a particular artifact. If an early
repository matches the mirrorOf condition, Maven may try the mirror
for all artifacts, even those that only exist in the s3 bucket.
This extra check can slow down the build, especially if the mirror
is slow to respond for unknown artifacts.

For Impala, the common case is for a mirror to cover everything
except the artifacts that come from the Kudu local repository or
the s3 bucket. To optimize for that case, this reorders the Maven
repositories to be in this order:
1. Local/S3 repositories
2. Regular repositories
3. Banned repositories
The repositories are otherwise unchanged.

Testing:
 - Ran an ordinary build
 - Ran a build with a mirrorOf "external:*,!impala.cdp.repo" and verified
   that the build went directly to the s3 bucket first.

Change-Id: I7046c7ec5391833e98ee6a463fb8c08b6a04cb26
Reviewed-on: http://gerrit.cloudera.org:8080/17020
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-08 21:38:35 +00:00
stiga-huang
2dfc68d852 IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-13 11:20:08 +00:00
John Sherman
a29d06db53 IMPALA-9218: Add support for locally compiled Hive
- Add HIVE_VERSION_OVERRIDE, HIVE_STORAGE_API_VERSION_OVERRIDE,
  HIVE_METASTORE_THRIFT_DIR_OVERRIDE, HIVE_HOME_OVERRIDE environment
  variable support to impala-config.sh
- When used together with HIVE_SRC_DIR_OVERRIDE allows a user to
  specify a locally compiled version of Hive for development and the
  minicluster
- Hive jars are expected to have been installed into the local maven
  repository
- Currently only version 3 of Hive is supported due to the absence of
  API shims for Hive 4.0
Example:
  ~/hive $ mvn package install -Pdist -DskipTests

Example configuration:
export HIVE_VERSION_OVERRIDE=3.1.0-SNAPSHOT
export HIVE_STORAGE_API_VERSION_OVERRIDE=2.6.0
export HIVE_HOME_OVERRIDE=\
~/hive/packaging/target/apache-hive-3.1.0-SNAPSHOT-bin/apache-hive-3.1.0-SNAPSHOT-bin
export HIVE_SRC_DIR_OVERRIDE=~/hive
export HIVE_METASTORE_THRIFT_DIR_OVERRIDE=~/hive/standalone-metastore/src/main/thrift/

Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Reviewed-on: http://gerrit.cloudera.org:8080/17094
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-12 03:15:44 +00:00
Csaba Ringhofer
20d4df3651 IMPALA-10496: Bump springframework dependency to 4.3.29
Impala depended on springframework 4.3.19 through pac4j
since IMPALA-10496.

Testing:
- used dependency-check-maven plugin to check that the CVEs
  related to springframework disappear

Change-Id: I81a2b00a0dd1b1560fa97a13ccf4cf6bb69b4b51
Reviewed-on: http://gerrit.cloudera.org:8080/17112
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-24 02:12:10 +00:00
wzhou-code
4a65fcfbe5 IMPALA-10516: Bump up the versions of jackson databind and slf4j
A flaw was found in FasterXML Jackson Databind, where it did not have
entity expansion secured properly.

This patch bumps up jackson databind to 2.10.5.1. It also changes slf4j
to 1.7.30.

Testing:
 - Built Impala on local machine as clean build. Verified that new
   versions of jar files jackson-databind-2.10.5.1.jar,
   slf4j-api-1.7.30.jar, and slf4j-log4j12-1.7.30.jar were built in
   fe/target/build-classpath.txt.

Change-Id: Ie7b84a90fec955dbaebd36b63294229b05eb00d8
Reviewed-on: http://gerrit.cloudera.org:8080/17085
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-19 01:33:59 +00:00
Csaba Ringhofer
08cb4d36d2 IMPALA-10496: SAML implementation in Impala
The bulk of the SAML2 related code is done on Java side because:
- There is already an implementation for Hive on review (HIVE-24543).
- The only SAML lib for c++ seems to be OpenSaml, which is seemed
  quite hard to use and a heavy dependency.

Doing authentication in Java needed some plumbing, as the hs2-http
port is listened to in c++ and http related processing happens in
THttpServer/THttpTransport, which is not a "real" web server, just
a simple http implementation that processes the headers and passes
content to the thrift service.
- Http headers (and in one case body) are inspected and if it is
  SAML related, the http request is wrapped in TWrappedHttpRequest
  and sent to the Frontend. The Frontend processes it and returns
  a TWrappedHttpResponse with the info to return to the client.
- After the last SAML message (with the bearer token) we generate
  an auth cookie in c++ (which can be validated in c++),  so later
  requests in the session don't need to call to Java.

SAML auth can work alongside LDAP and Kerberos - for each hs2-http
request the path and the http headers are inspected to decide
whether it is SAML related, and if not, then we fallback to other
auth mechanisms. This "mixed mode" has no tests yet, so I consider it
experimental.

Planned followup work:
- It would be great to import the logic implemented in Hive instead
  of copy-pasting most of it. I plan to do this in a followup commit,
  as this needs changes on the Hive side too.
- Adding more tests will be much easier once we will have a hs2-http
  client that supports SAML. See IMPALA-10496 for Impyla support.
- Currently the debug webserver does not support SAML auth.
  Implementing SAML for the webserver is problematic on the statestore
  which doesn't have a Frontend.

Testing:
- Added EE tests that use Python's urllib2 to sent SAML
  requests to Impala. Impala works slightly differently
  during tests (saml2_ee_test_mode=true).

Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa
Reviewed-on: http://gerrit.cloudera.org:8080/16833
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-17 22:52:05 +00:00
Joe McDonnell
4b654e7c97 IMPALA-10058: Use commit hash as version for Kudu java artifacts
This uses a new version of the native toolchain where Kudu
now uses the commit hash as the version for its jars.
This means that IMPALA_KUDU_VERSION is the same as
IMPALA_KUDU_JAVA_VERSION, so this consolidates everything
to use IMPALA_KUDU_VERSION. This also eliminates SNAPSHOT
versions for the Kudu jars.

Kudu changed one error message, so this updates the impacted
tests.

Testing:
 - Ran a core job

Change-Id: I1a6c9676f4521d6709393143d3e82533486164d3
Reviewed-on: http://gerrit.cloudera.org:8080/16686
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-03 20:05:20 +00:00
Joe McDonnell
97792c4bad IMPALA-10198 (part 2): Add support for mvn versions:set
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.

Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.

Testing:
 - Ran core job
 - Added build-all-flag-combinations.sh case that
   does "mvn versions:set" and runs a build

Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 19:30:13 +00:00
Joe McDonnell
97856478ec IMPALA-10198 (part 1): Unify Java in a single java/ directory
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.

This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.

This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.

Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.

This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.

Testing:
 - Ran a core job
 - Verified existing maven commands from fe/ directory still work
 - Compared the *-classpath.txt files from fe and executor-deps
   and verified they are the same except for paths

Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-10-15 19:30:13 +00:00