Configure separate compile and link pools for ninja. Configures link
parallelism based on expected memory use, which can be reduced by
setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true.
Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make
operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to
buildall.sh to force using 'make'.
Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and
linking test binaries. However 'mold' is not selected as the default
due to test failures around SASL/GSSAPI (see IMPALA-14527).
Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in
bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer
needed with ninja.
SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with
TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja.
Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests'
with default (make) and IMPALA_MAKE_CMD=ninja.
Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db
Reviewed-on: http://gerrit.cloudera.org:8080/23572
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.
Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Removes IMPALA_JAVA_HOME_OVERRIDE and updates version selection. In
order of priority
1. If IMPALA_JDK_VERSION is set, use the OS JDK version from a known
location. This is primarily used when also installing the JDK as part
of automated builds.
2. If JAVA_HOME is set, use it.
3. Look for the system default JDK.
The IMPALA_JDK_VERSION variable is no longer modified to avoid issues
when sourcing impala-config.sh multiple times. JAVA_HOME will be
modified if IMPALA_JDK_VERSION is set; both must be unset to restore
using the system default Java.
If switching between JDKs, now prefer setting JAVA_HOME. If relying on
system Java, unset JAVA_HOME after e.g. update-java-alternatives.
The detected Java version is set in IMPALA_JAVA_TARGET, which is used to
add Java 9+ options and configure the Java compilation target.
Eliminates IMPALA_JDK_VERSION_NUM as it's value was always identical to
IMPALA_JAVA_TARGET.
Stops printing from impala-config-java.sh. It made the output from
impala-config.sh look strange, and the decisions can all be clearly
determined from impala-config.sh printed variables later or the packages
installed in bootstrap_system.sh.
Fixes JAVA_HOME in bootstrap_build.sh on ARM64 systems.
Change-Id: I68435ca69522f8310221a0f3050f13d86568b9da
Reviewed-on: http://gerrit.cloudera.org:8080/23434
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_*
variables, but we should support using specific apache components.
After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_
{HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise,
we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}.
Test:
- Built and ran a test cluster with setting USE_APACHE_HIVE=true
and USE_APACHE_COMPONENTS=false.
Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc
Reviewed-on: http://gerrit.cloudera.org:8080/23211
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The goals and non-goals of this patch could be summarized as follows.
Goals:
- Add changes to the minicluster configuration that allow a non-default
version of Ranger (possibly built locally) to run in the context of
the minicluster, and to be used as the authorization server by
Impala.
- Switch to the new constructor when instantiating
RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes
Impala compatible with Apache Ranger if RangerAccessRequestImpl from
Apache Ranger is consumed.
- Prepare Ranger and Impala patches as supplemental material to verify
what authorization-related tests could be passed if Apache Ranger is
the authorization provider. Merging IMPALA-12921_addendum.diff to
the Impala repository is not in the scope of this patch in that the
diff file changes the behavior of Impala and thus more discussion is
required if we'd like to merge it in the future.
Non-goals:
- Set up any automation for building Ranger from source.
- Pass all Impala authorization-related tests with a non-default
version of Ranger.
Instructions on running Impala with locally built Ranger:
Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could
execute the following to build Apache Ranger for easy reference. By
default, the compressed tarball is produced under
$RANGER_SRC_DIR/target.
mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \
package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true
After building Ranger, we need to build Impala's Java code so that
Impala's Java code could consume the locally produced Ranger classes. We
will need to export the following environment variables before building
Impala. This prevents bootstrap_toolchain.py from trying to download the
compressed Ranger tarball.
1. export RANGER_VERSION_OVERRIDE=\
$(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \
-Dexpression=project.version -DforceStdout)
2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\
ranger-${RANGER_VERSION_OVERRIDE}-admin
It then suffices to execute the following to point
Impala to the locally built Ranger server before starting Impala.
1. source $IMPALA_HOME/bin/impala-config.sh
2. tar zxv -f $RANGER_SRC_DIR/target/\
ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \
-C $RANGER_SRC_DIR/target/
3. $IMPALA_HOME/bin/create-test-configuration.sh
4. $IMPALA_HOME/bin/create-test-configuration.sh \
-create_ranger_policy_db
5. $IMPALA_HOME/testdata/bin/run-ranger.sh
(run-all.sh has to be executed instead if other underlying services
have not been started)
6. $IMPALA_HOME/testdata/bin/setup-ranger.sh
Testing:
- Manually verified that we could point Impala to a locally built
Apache Ranger on the master branch (with tip being
https://github.com/apache/ranger/commit/4abb993).
- Manually verified that with RANGER-4771.diff and
IMPALA-12921_addendum.diff, only 3 authorization-related tests
failed. They failed because the resource type of 'storage-type' is
not supported in Apache Ranger yet and thus the test cases added in
IMPALA-10436 could fail.
- Manually verified that the log files of Apache and CDP Ranger's Admin
server could be created under ${RANGER_LOG_DIR} after we start the
Ranger service.
- Verified that this patch passed the core tests when CDP Ranger is
used.
Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Reviewed-on: http://gerrit.cloudera.org:8080/21160
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
impala-python currently gets its Thrift from the toolchain
by adding the appropriate Thrift toolchain directories to
the PYTHONPATH. This is a problem when switching to Python 3,
because the toolchain Thrift was built with Python 2 and
this can produce complicated bugs. In general, it is also
not a good idea to get Python dependencies from the toolchain.
This switches to installing Thrift into the impala-python
virtualenv, which lets the different Python versions have
their own copy of compiled files.
Testing:
- Ran a core job
Change-Id: Ib36e8a1ce8d446b69b08e81ea458f95c158e28f5
Reviewed-on: http://gerrit.cloudera.org:8080/21046
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
bin/set-pythonpath.sh include $HIVE_HOME/lib/py. This is a
historical thing that is no longer needed today. Impala
should not be getting Python code directly from Hive. As a
cleanup, this removes $HIVE_HOME/lib/py from the
PYTHONPATH.
Testing:
- Ran a core job
Change-Id: I56d1ae3b1433d6240159f20da4680888b5f37357
Reviewed-on: http://gerrit.cloudera.org:8080/19689
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.
Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala used to have one thrift compiler version to compile C++, Java,
and Python code.
Most Thrift serialization/deserialization between minor versions are
compatible with each other. So it is possible to have different thrift
compiler versions for different target codes. It is beneficial to do so
because it will allow Impala to upgrade separate components
independently.
This patch implements the infrastructure change required to do so. It
replace most of the 'THRIFT_*' environment variable and CMake variable
with 'THRFIT_CPP_*', 'THRFIT_JAVA_*', and 'THRFIT_PY_*' to compile C++,
Java, and Python code accordingly. All three still refer to the same
thrift version (thrift-0.11.0-p5).
Testing:
- Build Impala and pass core tests.
Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4
Reviewed-on: http://gerrit.cloudera.org:8080/18636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
- Add HIVE_VERSION_OVERRIDE, HIVE_STORAGE_API_VERSION_OVERRIDE,
HIVE_METASTORE_THRIFT_DIR_OVERRIDE, HIVE_HOME_OVERRIDE environment
variable support to impala-config.sh
- When used together with HIVE_SRC_DIR_OVERRIDE allows a user to
specify a locally compiled version of Hive for development and the
minicluster
- Hive jars are expected to have been installed into the local maven
repository
- Currently only version 3 of Hive is supported due to the absence of
API shims for Hive 4.0
Example:
~/hive $ mvn package install -Pdist -DskipTests
Example configuration:
export HIVE_VERSION_OVERRIDE=3.1.0-SNAPSHOT
export HIVE_STORAGE_API_VERSION_OVERRIDE=2.6.0
export HIVE_HOME_OVERRIDE=\
~/hive/packaging/target/apache-hive-3.1.0-SNAPSHOT-bin/apache-hive-3.1.0-SNAPSHOT-bin
export HIVE_SRC_DIR_OVERRIDE=~/hive
export HIVE_METASTORE_THRIFT_DIR_OVERRIDE=~/hive/standalone-metastore/src/main/thrift/
Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Reviewed-on: http://gerrit.cloudera.org:8080/17094
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala 4 decided to drop Sentry support in favor of Ranger. This
removes Sentry support and related tests. It retires startup
flags related to Sentry and does the first round of removing
obsolete code. This does not adjust documentation to remove
references to Sentry, and other dead code will be removed
separately.
Some issues came up when implementing this. Here is a summary
of how this patch resolves them:
1. authorization_provider currently defaults to "sentry", but
"ranger" requires extra parameters to be set. This changes the
default value of authorization_provider to "", which translates
internally to the noop policy that does no authorization.
2. These flags are Sentry specific and are now retired:
- authorization_policy_provider_class
- sentry_catalog_polling_frequency_s
- sentry_config
3. The authorization_factory_class may be obsolete now that
there is only one authorization policy, but this leaves it
in place.
4. Sentry is the last component using CDH_COMPONENTS_HOME, so
that is removed. There are still Maven dependencies coming
from the CDH_BUILD_NUMBER repository, so that is not removed.
5. To make the transition easier, testdata/bin/kill-sentry-service.sh
is not removed and it is still called from testdata/bin/kill-all.sh.
Testing:
- Core job passes
Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e
Reviewed-on: http://gerrit.cloudera.org:8080/15833
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala 4 moved to using CDP versions for components, which involves
adopting Hive 3. This removes the old code supporting CDH components
and Hive 2. Specifically, it does the following:
1. Remove USE_CDP_HIVE and default to the values from USE_CDP_HIVE=true.
USE_CDP_HIVE now has no effect on the Impala environment. This also
means that bin/jenkins/build-all-flag-combinations.sh no longer
include USE_CDP_HIVE=false as a configuration.
2. Remove USE_CDH_KUDU and default to getting Impala from the
native toolchain.
3. Ban IMPALA_HIVE_MAJOR_VERSION<3 and remove related code, including
the IMPALA_HIVE_MAJOR_VERSION=2 maven profile in fe/pom.xml.
There is a fair amount of code that still references the Hive major
version. Upstream Hive is now working on Hive 4, so there is a high
likelihood that we'll need some code to deal with that transition.
This leaves some code (such as maven profiles) and test logic in
place.
Change-Id: Id85e849beaf4e19dda4092874185462abd2ec608
Reviewed-on: http://gerrit.cloudera.org:8080/15869
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>