Commit Graph

11 Commits

Author SHA1 Message Date
Zoltan Borok-Nagy
7b2bb13ecc IMPALA-10810: Bump json-smart from 2.3 to 2.4.7
I noticed that our json-smart dependency is stale and we could
pick up a newer version.

This patch upgrades it to 2.4.7 which is the newest version at
the time of writing.

Change-Id: I6b43f606f40e172aa267b55c564fa64d68515bd5
Reviewed-on: http://gerrit.cloudera.org:8080/17702
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-07-22 23:05:34 +00:00
Csaba Ringhofer
94f67a3432 IMPALA-7825: Upgrade Thrift version to 0.11.0
Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
    -B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Reviewed-on: http://gerrit.cloudera.org:8080/17170
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 13:36:54 +00:00
Joe McDonnell
267f4d67f4 IMPALA-10455: Reorder Maven repositories for cleaner mirror semantics
When using a Maven mirror that uses a mirrorOf pattern, the order
of repositories in the pom.xml has a strong influence on whether the
build tries the mirror for a particular artifact. If an early
repository matches the mirrorOf condition, Maven may try the mirror
for all artifacts, even those that only exist in the s3 bucket.
This extra check can slow down the build, especially if the mirror
is slow to respond for unknown artifacts.

For Impala, the common case is for a mirror to cover everything
except the artifacts that come from the Kudu local repository or
the s3 bucket. To optimize for that case, this reorders the Maven
repositories to be in this order:
1. Local/S3 repositories
2. Regular repositories
3. Banned repositories
The repositories are otherwise unchanged.

Testing:
 - Ran an ordinary build
 - Ran a build with a mirrorOf "external:*,!impala.cdp.repo" and verified
   that the build went directly to the s3 bucket first.

Change-Id: I7046c7ec5391833e98ee6a463fb8c08b6a04cb26
Reviewed-on: http://gerrit.cloudera.org:8080/17020
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-08 21:38:35 +00:00
stiga-huang
2dfc68d852 IMPALA-7712: Support Google Cloud Storage
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Support for caching GCS file handles will be addressed in
   IMPALA-10568.
 - test_concurrent_inserts and test_failing_inserts in
   test_acid_stress.py are skipped due to slow file listing on
   GCS (IMPALA-10562).
 - Some tests are skipped due to issues introduced by /etc/hosts setting
   on GCE instances (IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-13 11:20:08 +00:00
John Sherman
a29d06db53 IMPALA-9218: Add support for locally compiled Hive
- Add HIVE_VERSION_OVERRIDE, HIVE_STORAGE_API_VERSION_OVERRIDE,
  HIVE_METASTORE_THRIFT_DIR_OVERRIDE, HIVE_HOME_OVERRIDE environment
  variable support to impala-config.sh
- When used together with HIVE_SRC_DIR_OVERRIDE allows a user to
  specify a locally compiled version of Hive for development and the
  minicluster
- Hive jars are expected to have been installed into the local maven
  repository
- Currently only version 3 of Hive is supported due to the absence of
  API shims for Hive 4.0
Example:
  ~/hive $ mvn package install -Pdist -DskipTests

Example configuration:
export HIVE_VERSION_OVERRIDE=3.1.0-SNAPSHOT
export HIVE_STORAGE_API_VERSION_OVERRIDE=2.6.0
export HIVE_HOME_OVERRIDE=\
~/hive/packaging/target/apache-hive-3.1.0-SNAPSHOT-bin/apache-hive-3.1.0-SNAPSHOT-bin
export HIVE_SRC_DIR_OVERRIDE=~/hive
export HIVE_METASTORE_THRIFT_DIR_OVERRIDE=~/hive/standalone-metastore/src/main/thrift/

Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Reviewed-on: http://gerrit.cloudera.org:8080/17094
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-12 03:15:44 +00:00
Csaba Ringhofer
20d4df3651 IMPALA-10496: Bump springframework dependency to 4.3.29
Impala depended on springframework 4.3.19 through pac4j
since IMPALA-10496.

Testing:
- used dependency-check-maven plugin to check that the CVEs
  related to springframework disappear

Change-Id: I81a2b00a0dd1b1560fa97a13ccf4cf6bb69b4b51
Reviewed-on: http://gerrit.cloudera.org:8080/17112
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-24 02:12:10 +00:00
wzhou-code
4a65fcfbe5 IMPALA-10516: Bump up the versions of jackson databind and slf4j
A flaw was found in FasterXML Jackson Databind, where it did not have
entity expansion secured properly.

This patch bumps up jackson databind to 2.10.5.1. It also changes slf4j
to 1.7.30.

Testing:
 - Built Impala on local machine as clean build. Verified that new
   versions of jar files jackson-databind-2.10.5.1.jar,
   slf4j-api-1.7.30.jar, and slf4j-log4j12-1.7.30.jar were built in
   fe/target/build-classpath.txt.

Change-Id: Ie7b84a90fec955dbaebd36b63294229b05eb00d8
Reviewed-on: http://gerrit.cloudera.org:8080/17085
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-19 01:33:59 +00:00
Csaba Ringhofer
08cb4d36d2 IMPALA-10496: SAML implementation in Impala
The bulk of the SAML2 related code is done on Java side because:
- There is already an implementation for Hive on review (HIVE-24543).
- The only SAML lib for c++ seems to be OpenSaml, which is seemed
  quite hard to use and a heavy dependency.

Doing authentication in Java needed some plumbing, as the hs2-http
port is listened to in c++ and http related processing happens in
THttpServer/THttpTransport, which is not a "real" web server, just
a simple http implementation that processes the headers and passes
content to the thrift service.
- Http headers (and in one case body) are inspected and if it is
  SAML related, the http request is wrapped in TWrappedHttpRequest
  and sent to the Frontend. The Frontend processes it and returns
  a TWrappedHttpResponse with the info to return to the client.
- After the last SAML message (with the bearer token) we generate
  an auth cookie in c++ (which can be validated in c++),  so later
  requests in the session don't need to call to Java.

SAML auth can work alongside LDAP and Kerberos - for each hs2-http
request the path and the http headers are inspected to decide
whether it is SAML related, and if not, then we fallback to other
auth mechanisms. This "mixed mode" has no tests yet, so I consider it
experimental.

Planned followup work:
- It would be great to import the logic implemented in Hive instead
  of copy-pasting most of it. I plan to do this in a followup commit,
  as this needs changes on the Hive side too.
- Adding more tests will be much easier once we will have a hs2-http
  client that supports SAML. See IMPALA-10496 for Impyla support.
- Currently the debug webserver does not support SAML auth.
  Implementing SAML for the webserver is problematic on the statestore
  which doesn't have a Frontend.

Testing:
- Added EE tests that use Python's urllib2 to sent SAML
  requests to Impala. Impala works slightly differently
  during tests (saml2_ee_test_mode=true).

Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa
Reviewed-on: http://gerrit.cloudera.org:8080/16833
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-17 22:52:05 +00:00
Joe McDonnell
4b654e7c97 IMPALA-10058: Use commit hash as version for Kudu java artifacts
This uses a new version of the native toolchain where Kudu
now uses the commit hash as the version for its jars.
This means that IMPALA_KUDU_VERSION is the same as
IMPALA_KUDU_JAVA_VERSION, so this consolidates everything
to use IMPALA_KUDU_VERSION. This also eliminates SNAPSHOT
versions for the Kudu jars.

Kudu changed one error message, so this updates the impacted
tests.

Testing:
 - Ran a core job

Change-Id: I1a6c9676f4521d6709393143d3e82533486164d3
Reviewed-on: http://gerrit.cloudera.org:8080/16686
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-03 20:05:20 +00:00
Joe McDonnell
97792c4bad IMPALA-10198 (part 2): Add support for mvn versions:set
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.

Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.

Testing:
 - Ran core job
 - Added build-all-flag-combinations.sh case that
   does "mvn versions:set" and runs a build

Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 19:30:13 +00:00
Joe McDonnell
97856478ec IMPALA-10198 (part 1): Unify Java in a single java/ directory
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.

This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.

This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.

Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.

This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.

Testing:
 - Ran a core job
 - Verified existing maven commands from fe/ directory still work
 - Compared the *-classpath.txt files from fe and executor-deps
   and verified they are the same except for paths

Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-10-15 19:30:13 +00:00