This reverts commit 52b87fcefd.
The original commit caused an issue when Impala is deployed together
with Apache Atlas. Coordinator failed to start with error message:
java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/Layout
Solved minor conflict in impala-config.sh due to IMPALA-14478 applied
after IMPALA-14454.
Change-Id: I77127db8d833c675c18c30eb3d6542ca906cd2a9
Reviewed-on: http://gerrit.cloudera.org:8080/23788
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Switches to shaded Hbase so it can include its own versions of
dependencies. Note that hbase-client includes hbase-common,
hbase-protocol.
Excludes older protobuf-java from mysql-connector so we get it from
Hadoop.
Allows orc-format 1.0, which is a dependency in future ORC releases.
Change-Id: I386d03c3123ce1159abc54c505f60e0ae619f5fe
Reviewed-on: http://gerrit.cloudera.org:8080/23553
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Cleans up repetitive patterns in pom.xml.
Centralize plugin configuration in pluginManagement. Replace inline
maven-compiler-plugin configuration with newer maven.compiler.release
and update to latest plugin version.
Centralize common dependencies in dependencyManagement, including
exclusions when appropriate. Remove exclusions that are no longer
relevant.
Compared before and after with dependency:tree; only difference is that
commons-cli now comes from hadoop and jersey-serv{let,er} are
effectively excluded; all versions matched. Also ensured
USE_APACHE_COMPONENTS=true compiles.
Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure
it's not accidentally included alongside impala-minimal-s3a-aws-sdk.
Removes missed io.netty exclusion from IMPALA-12816.
Updates commons-dbcp2 to 2.12.0 to match Hive.
Change-Id: If96649840e23036b4a73ee23e8d12516497994f0
Reviewed-on: http://gerrit.cloudera.org:8080/23432
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This updates the GBN to include the RANGER-4585 functionality
to support multi-column policy creation. IMPALA-12554 will use
this to create a single multi-column policy for a GRANT statement
rather than many single-column policies.
This fixes a few issues encountered during the upgrade:
1. This includes the fix for IMPALA-13433 to make test_sfs.py
resilient to HMS versions that do not properly create the
database directories.
2. This modified test_metadata_query_statements.py to use
unique directories for the databases to avoid HMS bugs.
3. The version of Avro changed, which changed the version of
Jackson and the package name of the JsonParseException.
This adds code to tolerate both the old and new package
name in the error message.
4. This includes the fix for IMPALA-13391 to exclude log4j-slf4j-impl
from hadoop-cloud-storage.
5. This excludes an unnecessary org.cloudera.logredactor
dependency.
Testing:
- Ran a core job
Change-Id: I32727020a69a66c3af4f4096fe15bc81600e2215
Reviewed-on: http://gerrit.cloudera.org:8080/21921
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Uses the imported Hadoop version for the hadoop-aliyun module, which is
a tool in the hadoop project. This allows us to exclude vulnerable
versions of jdom that were previously included via hadoop-aliyun.
Change-Id: I270f3895ec668d9fb907f35b04cad2f149e3d0de
Reviewed-on: http://gerrit.cloudera.org:8080/20532
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.
Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This patch adds support for huawei OBS (Object Storage Service)
FileSystem. The implementation is similar to other remote FileSystems.
New flags for OBS:
- num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16.
Testing:
- Upload hdfs test data to an OBS bucket. Modify all locations in HMS
DB to point to the OBS bucket. Remove some hdfs caching params.
Run CORE tests.
Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1
Reviewed-on: http://gerrit.cloudera.org:8080/19110
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Without HIVE-24498 we get java.lang.NoClassDefFoundError exceptions
when we write Iceberg tables via Hive. This makes it hard to write
interop tests between Hive and Impala which use Iceberg tables.
I also exclude some private Java components to get things built.
Change-Id: I486c2b1b224f72e082e331a57cf25a37ebb9fa54
Reviewed-on: http://gerrit.cloudera.org:8080/19331
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
Updates the test environment to default to the CDP build of Ozone, as
the latest build of CDP Hive depends on pre-release features unavailable
in Ozone 1.2.1. Apache Ozone 1.2 can still be used by setting
USE_APACHE_OZONE=true.
The latest CDP build also includes a version of Ozone based on
ozone#master with a candidate version of 1.3.0. Both Apache and CDP
therefore have builds of Ozone we can test with that use the new
artifact names introduced in Ozone 1.2, so this patch cleans up setup
that was only needed for Ozone versions prior to 1.2.
Change-Id: I1177a1b820fe21adca9f8c1cc51ff73ee001d3f2
Reviewed-on: http://gerrit.cloudera.org:8080/19247
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This patch adds support for OSS (Aliyun Object Storage Service).
Using the hadoop-aliyun, the implementation is similar to other
remote FileSystems.
Tests:
- Prepare:
Initialize OSS-related environment variables:
OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT.
Compile and create hdfs test data on a ECS instance. Upload test data
to an OSS bucket.
- Modify all locations in HMS DB to point to the OSS bucket.
Remove some hdfs caching params. Run CORE tests.
Change-Id: I267e6531da58e3ac97029fea4c5e075724587910
Reviewed-on: http://gerrit.cloudera.org:8080/19165
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.
Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.
Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Switches from log4j 1.x to reload4j, a maintained fork. Updates slf4j to
the latest version so we can include all CVE fixes.
slf4j 2.0.x requires Java 8 and adds a backward-compatible fluent
logging api. Neither seems like a problem for Impala.
Bans all use of log4j 1.x so we only use reload4j.
Change-Id: I5238b9c8247af3e0f4cb05c0b76a75bfee37f5c8
Reviewed-on: http://gerrit.cloudera.org:8080/19102
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Hadoop provides hadoop-cloud-storage, which includes most of
the dependencies that Impala currently uses like hadoop-aws,
hadoop-azure, Knox's gateway-cloud-bindings, etc. Hadoop has
put in a lot of work to make sure that this package includes
the right version of dependencies (including shading some
dependencies for GCS). It seems like this is a more reliable
way to consume these dependencies.
This switches the Java build to use hadoop-cloud-storage
and removes the dependencies that it replaces. This eliminates
the need to control the version of oauth and GCS, as those
are determined by hadoop-cloud-storage.
Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd
Reviewed-on: http://gerrit.cloudera.org:8080/18817
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.
Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.
Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.
Passes tests with FE_TEST=false BE_TEST=false.
Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Upgrades google-oauth-client and google-oauth-client-java6 to 1.33.3 to
address CVE-2021-22573. These are included as dependencies of
com.google.cloud.bigdataoss/gcs-connector, which does not yet have a
release that includes versions 1.33.3 or later.
Change-Id: I8d95913f26e6073373374e169ee045881f40f065
Reviewed-on: http://gerrit.cloudera.org:8080/18683
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057
Testing:
- Ran a build
Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes the Maven pom.xml files to use verison
4.1.0-SNAPSHOT rather than 4.0.0-SNAPSHOT. In the
past, these versions were a fixed value, but that
changed with IMPALA-10198. This is a new step that
needs to happen on each release.
Testing:
- Ran a build
Change-Id: I10a589b4fbc15048199943a0e06d079f57840239
Reviewed-on: http://gerrit.cloudera.org:8080/18390
Reviewed-by: Tamas Mate <tmater@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.
New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.
Follow-up:
- Support for caching COS file handles will be addressed in
IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
COS (IMPALA-10773).
Tests:
- Upload hdfs test data to a COS bucket. Modify all locations in HMS
DB to point to the COS bucket. Remove some hdfs caching params.
Run CORE tests.
Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.
Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.
Testing:
- Ran core job
- Added build-all-flag-combinations.sh case that
does "mvn versions:set" and runs a build
Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>