Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address
CVEs. Adds environment variables for commonly-updated components so they
can be customized via the branch-specific impala-config-branch.sh in a
way that allows both to be updated regularly without merge conflicts.
Also updates httpcomponents.httpcore to 4.4.14 to be consistent with
other httpcomponents libraries included transitively.
Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216
Reviewed-on: http://gerrit.cloudera.org:8080/19147
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.
Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.
Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Switches from log4j 1.x to reload4j, a maintained fork. Updates slf4j to
the latest version so we can include all CVE fixes.
slf4j 2.0.x requires Java 8 and adds a backward-compatible fluent
logging api. Neither seems like a problem for Impala.
Bans all use of log4j 1.x so we only use reload4j.
Change-Id: I5238b9c8247af3e0f4cb05c0b76a75bfee37f5c8
Reviewed-on: http://gerrit.cloudera.org:8080/19102
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This patch upgrade the Spring framework to 5.3.20 to
address multiple CVEs:
- CVE-2022-22971
- CVE-2022-22968
- CVE-2022-22970
Testing:
- Ran core job
- Ran custom cluster tests in exhaustive mode
Change-Id: I33f4f1d22fc27227e31d744658a17c16b61b9677
Reviewed-on: http://gerrit.cloudera.org:8080/19091
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit handles the case for a specific kind of corrupt function
within the Hive Metastore in the following situation:
A valid Hive SQL function gets created in HMS. This UDF is written in
Java and must derive from the "UDF" class. After creating this function
in Impala, we then replace the underlying jar file with a class that
does NOT derive from the "UDF" class.
In this scenario, catalogd should reject the function and still start
up gracefully. Before this commit, catalogd wasn't coming up. The
reason for this was because the Hive function
FunctionUtils.getUDFClassType() has a dependency on UDAF and was
throwing a LinkageError exception, so we need to include the UDAF
class in the shaded jar.
Change-Id: I54e7a1df6d018ba6cf5ecf32dc9946edf86e2112
Reviewed-on: http://gerrit.cloudera.org:8080/18927
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
This patch adds support for BINARY columns for all table formats with
the exception of Kudu.
In Hive the main difference between STRING and BINARY is that STRING is
assumed to be UTF8 encoded, while BINARY can be any byte array.
Some other differences in Hive:
- BINARY can be only cast from/to STRING
- Only a small subset of built-in STRING functions support BINARY.
- In several file formats (e.g. text) BINARY is base64 encoded.
- No NDV is calculated during COMPUTE STATISTICS.
As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly
identical, especially from the backend's perspective. For this reason,
BINARY is implemented a bit differently compared to other types:
while the frontend treats STRING and BINARY as two separate types, most
of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g.
in SlotDesc. Only the following parts of backend need to differentiate
between STRING and BINARY:
- table scanners
- table writers
- HS2/Beeswax service
These parts have access to column metadata, which allows to add special
handling for BINARY.
Only a very few builtins are allowed for BINARY at the moment:
- length
- min/max/count
- coalesce and similar "selector" functions
Other STRING functions can be only used by casting to STRING first.
Adding support for more of these functions is very easy, as simply
the BINARY type has to be "connected" to the already existing STRING
function's signature. Functions where the result depends on utf8_mode
need to ensure that with BINARY it always works as if utf8_mode=0 (for
example length() is mapped to bytes() as length count utf8 chars if
utf8_mode=1).
All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY,
though in case of legacy Hive UDFs it is only supported if the argument
and return types are set explicitely to ensure backward compatibility.
See IMPALA-11340 for details.
The original plan was to behave as close to Hive as possible, but I
realized that Hive has more relaxed casting rules than Impala, which
led to STRING<->BINARY casts being necessary in more cases in Impala.
This was needed to disallow passing a BINARY to functions that expect
a STRING argument. An example for the difference is that in
INSERT ... VALUES () string literals need to be explicitly cast to
BINARY, while this is not needed in Hive.
Testing:
- Added functional.binary_tbl for all file formats (except Kudu)
to test scanning.
- Removed functional.unsupported_types and related tests, as now
Impala supports all (non-complex) types that Hive does.
- Added FE/EE tests mainly based on the ones added to the DATE type
Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Reviewed-on: http://gerrit.cloudera.org:8080/16066
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hadoop provides hadoop-cloud-storage, which includes most of
the dependencies that Impala currently uses like hadoop-aws,
hadoop-azure, Knox's gateway-cloud-bindings, etc. Hadoop has
put in a lot of work to make sure that this package includes
the right version of dependencies (including shading some
dependencies for GCS). It seems like this is a more reliable
way to consume these dependencies.
This switches the Java build to use hadoop-cloud-storage
and removes the dependencies that it replaces. This eliminates
the need to control the version of oauth and GCS, as those
are determined by hadoop-cloud-storage.
Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd
Reviewed-on: http://gerrit.cloudera.org:8080/18817
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.
Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.
Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.
Passes tests with FE_TEST=false BE_TEST=false.
Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Loading new classes from the same jar in the constructor of UDFs
did not work in the catalog because the URLClassLoader was closed
too early. Extended the lifecycle of the class loader a bit to
let the catalog finish all initialisation.
Note that the instantiation of legacy Hive UDFs doesn't seem
necessary in the catalog, we can get all relevant info from
the class. Generic UDFs do need to be instantiated to be able
to call initialize().
Testing:
- added new classes to load in test UDFs and loaded these
in constructor / initialize()
- ran the Hive UDF ee tests
Change-Id: If16e38b8fc3b2577a5d32104ea9e6948b9562e24
Reviewed-on: http://gerrit.cloudera.org:8080/18611
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Upgrades google-oauth-client and google-oauth-client-java6 to 1.33.3 to
address CVE-2021-22573. These are included as dependencies of
com.google.cloud.bigdataoss/gcs-connector, which does not yet have a
release that includes versions 1.33.3 or later.
Change-Id: I8d95913f26e6073373374e169ee045881f40f065
Reviewed-on: http://gerrit.cloudera.org:8080/18683
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala used to have one thrift compiler version to compile C++, Java,
and Python code.
Most Thrift serialization/deserialization between minor versions are
compatible with each other. So it is possible to have different thrift
compiler versions for different target codes. It is beneficial to do so
because it will allow Impala to upgrade separate components
independently.
This patch implements the infrastructure change required to do so. It
replace most of the 'THRIFT_*' environment variable and CMake variable
with 'THRFIT_CPP_*', 'THRFIT_JAVA_*', and 'THRFIT_PY_*' to compile C++,
Java, and Python code accordingly. All three still refer to the same
thrift version (thrift-0.11.0-p5).
Testing:
- Build Impala and pass core tests.
Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4
Reviewed-on: http://gerrit.cloudera.org:8080/18636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057
Testing:
- Ran a build
Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive has 2 types of UDFs. This commit contains limited
support for the second generation UDFs called GenericUDFs.
The main limitations are as follows:
Decimal types are not supported. The Impala framework determines
the precision and scale of the decimal return type. However, the
Hive GenericUDFs allow the capability to choose its own return
type based on the parameters. Until this can be resolved, it is
safer to forbid decimals from being used. Note that this
limitation currently exists in the first generation of Hive Java
UDFs.
Complex types are not supported.
Functions are not extracted from the jar file. The first generation
of Hive UDFs allowed this because the method prototypes are
explicitly defined and can be determined at function creation time. For
GenericUDFs, the return types are determined based on the parameters
passed in when running a query.
For the same reason as above, GenericUDFs cannot be made permanent.
They will need to be recreated everytime the server is restarted.
This is a severe limitation and will be resolved in the near future.
Change-Id: Ie6fd09120db413fade94410c83ebe8ff104013cd
Reviewed-on: http://gerrit.cloudera.org:8080/18295
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Updates gitignore for files generated during bootstrap_development.
Fixes deleting tracked files in be/src/thirdparty. Includes ignore rules
for past versions of shell dependencies and updates ignores for current
versions.
Change-Id: I03deba5e7fb151ef8e34039becdcc3fb47684084
Reviewed-on: http://gerrit.cloudera.org:8080/18499
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some tests saw log spew that causes the INFO log files to
be filled with output like this:
E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext
at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
...
It turns out that the catalogd/impalad use a CLASSPATH in
tests that refers to fe/target/classes. The maven command
that runs frontend tests recompiles these classes and
causes the files in fe/target/classes to be deleted and
recreated. There are race conditions where this causes
the symptoms above.
This changes the CLASSPATH to use the frontend jars, which
are not impacted by the machinations on fe/target/classes.
To find the appropriate jar, set-classpath.sh needs to
know the Impala version. This adds IMPALA_VERSION in
bin/impala-config.sh to provide an easy to use
environment variable.
To make the versioning more uniform, this modifies
bin/save-version.sh to use this environment variable.
It also adds a check to make sure that the Java pom.xml
files use the same version as the environment variable.
It fails the build if the Java pom.xml files do not
match.
Testing:
- Ran core jobs
- Checked the log file sizes on jobs
- Changed a Java pom.xml's version and verified that
bin/validate-java-pom-versions.sh fails
Change-Id: Id35544e446c5bf283c322d3fe2e7ad475cfa12eb
Reviewed-on: http://gerrit.cloudera.org:8080/18415
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This upgrade the Spring framework to 5.3.18 to
address multiple CVEs:
- CVE-2022-22965
- CVE-2022-22950
- CVE-2021-22060
Testing:
- Ran core job
- Ran custom cluster tests in exhaustive mode
Change-Id: Ie1b299c5b24e70c9db6eb0ce37fee9e32908423e
Reviewed-on: http://gerrit.cloudera.org:8080/18405
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
This changes the Maven pom.xml files to use verison
4.1.0-SNAPSHOT rather than 4.0.0-SNAPSHOT. In the
past, these versions were a fixed value, but that
changed with IMPALA-10198. This is a new step that
needs to happen on each release.
Testing:
- Ran a build
Change-Id: I10a589b4fbc15048199943a0e06d079f57840239
Reviewed-on: http://gerrit.cloudera.org:8080/18390
Reviewed-by: Tamas Mate <tmater@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This upgrades pac4j and several of its dependencies
(including xmlsec) to address CVEs in those components.
Specifically:
- pac4j 4.5.5 addresses CVE-2021-44878
- xmlsec 2.2.3 addresses CVE-2021-40690
- bcprov 1.68 addresses CVE-2020-15522
This also upgrade springframework to 5.2.9.RELEASE to
match the version for pac4j 4.5.5.
Testing:
- Ran core job
Change-Id: I8421d867dd0fce8eeaa6bc13a511ca3e8dd05723
Reviewed-on: http://gerrit.cloudera.org:8080/18348
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.
Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.
Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.
Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.
Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Reviewed-on: http://gerrit.cloudera.org:8080/17774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implements an abstraction layer to show files in a single directory.
Impala side part - filesystem drivers are in HIVE-25569.
Suppose that the filesystem has a directory in which there are multiple
files:
hdfs://somedir/f1.txt
hdfs://somedir/f2.txt
In case of a HMS backed table(s) - the contents of a directory could be
considered as table.
This patch enables a new file system wrapper 'sfs+' (sfs = single file
system) which provides a view of a single file in a directory.' The '+'
indicates that this wrapper can be added on top of multiple underlying
file systems/object storage such as HDFS, S3 etc. The directory which
contains the file could be specified:
sfs+hdfs://somedir/f1.txt/#SINGLEFILE#
This will be a directory containing only the f1.txt and nothing else.
This patch was tested locally - with a custom build of Hive version
which also had HIVE-25569.
Change-Id: I32be936243aa4c8320f5d06d2b7fbf98822f82e7
Reviewed-on: http://gerrit.cloudera.org:8080/17878
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Aman Sinha <amsinha@cloudera.com>
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.
New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.
Follow-up:
- Support for caching COS file handles will be addressed in
IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
COS (IMPALA-10773).
Tests:
- Upload hdfs test data to a COS bucket. Modify all locations in HMS
DB to point to the COS bucket. Remove some hdfs caching params.
Run CORE tests.
Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ORC-189 and ORC-666 added support for a new timestamp type
'TIMESTMAP WITH LOCAL TIMEZONE' to the Orc library.
This patch adds support for reading such timestamps with Impala.
These are UTC-normalized timestamps, therefore we convert them
to local timezone during scanning.
Testing:
* added test for CREATE TABLE LIKE ORC
* added scanner tests to test_scanners.py
Change-Id: Icb0c6a43ebea21f1cba5b8f304db7c4bd43967d9
Reviewed-on: http://gerrit.cloudera.org:8080/17347
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.
Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).
The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.
The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.
Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
compilation happens with no_utf8strings. This also made things a
bit faster, e.g the following is ~0.22s instead of ~0.25
shell/impala_shell.py \
-B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
instead of its number, leading to slightly different messages
in some cases.
- "templates" was added to the thift generator's parameters to avoid
a compilation issue (related to IMPALA-10600). I didn't notice any
change in compilation time. This option generated .tcc files with
templetized readers/writers for Thrift types. Currently we don't
use these, but they could potentially speed up (de)serialization.
Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests
Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When using a Maven mirror that uses a mirrorOf pattern, the order
of repositories in the pom.xml has a strong influence on whether the
build tries the mirror for a particular artifact. If an early
repository matches the mirrorOf condition, Maven may try the mirror
for all artifacts, even those that only exist in the s3 bucket.
This extra check can slow down the build, especially if the mirror
is slow to respond for unknown artifacts.
For Impala, the common case is for a mirror to cover everything
except the artifacts that come from the Kudu local repository or
the s3 bucket. To optimize for that case, this reorders the Maven
repositories to be in this order:
1. Local/S3 repositories
2. Regular repositories
3. Banned repositories
The repositories are otherwise unchanged.
Testing:
- Ran an ordinary build
- Ran a build with a mirrorOf "external:*,!impala.cdp.repo" and verified
that the build went directly to the s3 bucket first.
Change-Id: I7046c7ec5391833e98ee6a463fb8c08b6a04cb26
Reviewed-on: http://gerrit.cloudera.org:8080/17020
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change bumps up the GBN to 11920537 which includes several
changes to Hive needed to support Catalogd's HMS endpoint for
supporting external frontends.
Additionally, it excludes some dependencies from the pom.xml
which are not uploaded by default to the toolchain.
After the GBN bump up Hive doesn't write '_orc_acid_version'
files and hence the FileMetadataLoaderTest needed to be
modified.
Change-Id: If88ceeaffc94e5bedf2c9953122109e20663f743
Reviewed-on: http://gerrit.cloudera.org:8080/17243
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
- Add HIVE_VERSION_OVERRIDE, HIVE_STORAGE_API_VERSION_OVERRIDE,
HIVE_METASTORE_THRIFT_DIR_OVERRIDE, HIVE_HOME_OVERRIDE environment
variable support to impala-config.sh
- When used together with HIVE_SRC_DIR_OVERRIDE allows a user to
specify a locally compiled version of Hive for development and the
minicluster
- Hive jars are expected to have been installed into the local maven
repository
- Currently only version 3 of Hive is supported due to the absence of
API shims for Hive 4.0
Example:
~/hive $ mvn package install -Pdist -DskipTests
Example configuration:
export HIVE_VERSION_OVERRIDE=3.1.0-SNAPSHOT
export HIVE_STORAGE_API_VERSION_OVERRIDE=2.6.0
export HIVE_HOME_OVERRIDE=\
~/hive/packaging/target/apache-hive-3.1.0-SNAPSHOT-bin/apache-hive-3.1.0-SNAPSHOT-bin
export HIVE_SRC_DIR_OVERRIDE=~/hive
export HIVE_METASTORE_THRIFT_DIR_OVERRIDE=~/hive/standalone-metastore/src/main/thrift/
Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Reviewed-on: http://gerrit.cloudera.org:8080/17094
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala depended on springframework 4.3.19 through pac4j
since IMPALA-10496.
Testing:
- used dependency-check-maven plugin to check that the CVEs
related to springframework disappear
Change-Id: I81a2b00a0dd1b1560fa97a13ccf4cf6bb69b4b51
Reviewed-on: http://gerrit.cloudera.org:8080/17112
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A flaw was found in FasterXML Jackson Databind, where it did not have
entity expansion secured properly.
This patch bumps up jackson databind to 2.10.5.1. It also changes slf4j
to 1.7.30.
Testing:
- Built Impala on local machine as clean build. Verified that new
versions of jar files jackson-databind-2.10.5.1.jar,
slf4j-api-1.7.30.jar, and slf4j-log4j12-1.7.30.jar were built in
fe/target/build-classpath.txt.
Change-Id: Ie7b84a90fec955dbaebd36b63294229b05eb00d8
Reviewed-on: http://gerrit.cloudera.org:8080/17085
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The bulk of the SAML2 related code is done on Java side because:
- There is already an implementation for Hive on review (HIVE-24543).
- The only SAML lib for c++ seems to be OpenSaml, which is seemed
quite hard to use and a heavy dependency.
Doing authentication in Java needed some plumbing, as the hs2-http
port is listened to in c++ and http related processing happens in
THttpServer/THttpTransport, which is not a "real" web server, just
a simple http implementation that processes the headers and passes
content to the thrift service.
- Http headers (and in one case body) are inspected and if it is
SAML related, the http request is wrapped in TWrappedHttpRequest
and sent to the Frontend. The Frontend processes it and returns
a TWrappedHttpResponse with the info to return to the client.
- After the last SAML message (with the bearer token) we generate
an auth cookie in c++ (which can be validated in c++), so later
requests in the session don't need to call to Java.
SAML auth can work alongside LDAP and Kerberos - for each hs2-http
request the path and the http headers are inspected to decide
whether it is SAML related, and if not, then we fallback to other
auth mechanisms. This "mixed mode" has no tests yet, so I consider it
experimental.
Planned followup work:
- It would be great to import the logic implemented in Hive instead
of copy-pasting most of it. I plan to do this in a followup commit,
as this needs changes on the Hive side too.
- Adding more tests will be much easier once we will have a hs2-http
client that supports SAML. See IMPALA-10496 for Impyla support.
- Currently the debug webserver does not support SAML auth.
Implementing SAML for the webserver is problematic on the statestore
which doesn't have a Frontend.
Testing:
- Added EE tests that use Python's urllib2 to sent SAML
requests to Impala. Impala works slightly differently
during tests (saml2_ee_test_mode=true).
Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa
Reviewed-on: http://gerrit.cloudera.org:8080/16833
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HIVE-19064 introduced additional lexer classes that are required during
runtime. This commit adds the missing HiveLexer lexer classes to the
shared-deps. Without these classes queries such as 'select 1 as "``"'
would fail with 'NoClassDefFoundError'.
Testing:
- added a misc.test to verify that the classes are available and that
IMPALA-9641 is fixed by HIVE-19064
Change-Id: I6e3a00335983f26498c1130ab9f109f6e67256f5
Reviewed-on: http://gerrit.cloudera.org:8080/17019
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With newer versions of Iceberg, TestIcebergTable::test_create_iceberg_tables
fails with ClassNotFoundException for org.apache.hive.hadoop.common.type.Date.
This adds that missing location to the impala-minimal-hive-exec.
Testing:
- Ran TestIcebergTable::test_create_iceberg_tables with newer Iceberg
Change-Id: I3fc33ff17489c2bd54d2ec8798ec7a3e5cfb051c
Reviewed-on: http://gerrit.cloudera.org:8080/17005
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This uses a new version of the native toolchain where Kudu
now uses the commit hash as the version for its jars.
This means that IMPALA_KUDU_VERSION is the same as
IMPALA_KUDU_JAVA_VERSION, so this consolidates everything
to use IMPALA_KUDU_VERSION. This also eliminates SNAPSHOT
versions for the Kudu jars.
Kudu changed one error message, so this updates the impacted
tests.
Testing:
- Ran a core job
Change-Id: I1a6c9676f4521d6709393143d3e82533486164d3
Reviewed-on: http://gerrit.cloudera.org:8080/16686
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Newer versions of Hive shade guava, which means that they require
the presence of artifacts in org/apache/hive/com/google. To
support these newer versions, this adds that path to the inclusions
for impala-minimal-hive-exec.
Testing:
- Tested with a newer version of Hive that has the shading
and verified that Impala starts up and functions.
Change-Id: I87ac089fdacc6fc5089ed68be92dedce514050b9
Reviewed-on: http://gerrit.cloudera.org:8080/16614
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.
Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.
Testing:
- Ran core job
- Added build-all-flag-combinations.sh case that
does "mvn versions:set" and runs a build
Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>