Impala Frontend has plenty of dependency, along with multitudes of
dependency exclusion/inclusion rules in it. This patch adds maven
dependency tree log to logs/mvn/mvn.log when invoking "make java"
command.
Testing:
Manually run "make java" from $IMPALA_HOME and verify that the
dependency trees are logged to logs/mvn/mvn.log.
Change-Id: I8cbe20faeab24bae708733d54996bd6c1dd97757
Reviewed-on: http://gerrit.cloudera.org:8080/23551
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Maven 3.9.x offers a new dependency resolver, HttpClient, which allows
downloading project dependencies in parallel.
This patch bumps the Maven version installed by bootstrap_system.sh to
v3.9.2, and adds the flags enabling the new resolver to download
dependencies (including POM files) in parallel. Parallelism is set to
10 threads.
The flags are added to a project-specific Maven setting file in the
newly created java/.mvn directory. The settings file is added to the
RAT exclusion list in bin/rat_exclude_files.txt.
The --show-version flag is added for debugging purposes.
The same flags are added to the JAMM subproject as well.
The new resolver in Maven 3.9 has also changed the warning message
emitted for missing component checksums, so the new warning string
is added to the filter in bin/mvn-quiet.sh
Unfortunately Maven 3.9 has also changed the way it responds to missing
checksum files: the resolver now emits a stack trace when checksums
cannot be determined, and missing checksums are not explicitly ignored.
Detailed documentation for the new Maven resolver in Maven 3.9.0+ is
located at:
https://maven.apache.org/guides/mini/guide-resolver-transport.html
resolver configuration reference:
https://maven.apache.org/resolver/configuration.html
Tests:
- verified in a core-mode test run with Maven 3.9.2 installed
- verified in a local build using an earlier version of Maven
to verify that the new default setting does not cause regressions
with the old dependency resolver.
Change-Id: I75d05215effc724f5bd471646fb352f37443e185
Reviewed-on: http://gerrit.cloudera.org:8080/20142
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
This adds the --batch-mode flag to the maven invocation
the builds jamm. That disables some of the download progress
output, reducing the total size of the output.
Testing:
- Ran a build locally
Change-Id: I1634240b191168b13cf3be7c9266e21a746844b1
Reviewed-on: http://gerrit.cloudera.org:8080/20196
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.
Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.
Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.
Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.
Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).
Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath
Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
(lack of varargs support and only partial generics support)
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.
Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177
A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.
The ESRI geospatial implementation for Hive currently only
available in Hive 4, but CDP Hive backported it to Hive 3,
therefore for Apache Hive this feature is disabled
regardless of the "geospatial_library" flag.
Known limitations:
- ST_MultiLineString, ST_MultiPolygon only works
with the WKT overload
- ST_Polygon supports a maximum of 6 pairs of coordinates
- ST_MultiPoint, ST_LineString supports a maximum of 7
pairs of coordinates
- ST_ConvexHull, ST_Union supports a maximum of 6 geoms
These limits can be increased in gen_geospatial_udf_wrappers.py
Tests:
- test_geospatial_udfs.py added based on
https://github.com/Esri/spatial-framework-for-hadoop
Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com>
Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Reviewed-on: http://gerrit.cloudera.org:8080/19425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some tests saw log spew that causes the INFO log files to
be filled with output like this:
E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext
at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
...
It turns out that the catalogd/impalad use a CLASSPATH in
tests that refers to fe/target/classes. The maven command
that runs frontend tests recompiles these classes and
causes the files in fe/target/classes to be deleted and
recreated. There are race conditions where this causes
the symptoms above.
This changes the CLASSPATH to use the frontend jars, which
are not impacted by the machinations on fe/target/classes.
To find the appropriate jar, set-classpath.sh needs to
know the Impala version. This adds IMPALA_VERSION in
bin/impala-config.sh to provide an easy to use
environment variable.
To make the versioning more uniform, this modifies
bin/save-version.sh to use this environment variable.
It also adds a check to make sure that the Java pom.xml
files use the same version as the environment variable.
It fails the build if the Java pom.xml files do not
match.
Testing:
- Ran core jobs
- Checked the log file sizes on jobs
- Changed a Java pom.xml's version and verified that
bin/validate-java-pom-versions.sh fails
Change-Id: Id35544e446c5bf283c322d3fe2e7ad475cfa12eb
Reviewed-on: http://gerrit.cloudera.org:8080/18415
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>