To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3
This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
doesn't have a main function, it removes the hash-bang and makes
sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
(or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
replaced by the cm-client pypi package and interfaces have changed.
Rather than migrating the code (which hasn't been used in years), this
deletes the old code and stops installing cm-api into the virtualenv.
The code can be restored and revamped if there is any interest in
interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
bit-rotted. Some pieces can be run manually, but it can't be fully
verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
version that supports Python 3. The newest version of kazoo requires
upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
needing other upgrades.
The two remaining uses of impala-python are:
- bin/cmake_aux/create_virtualenv.sh
- bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.
The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)
Testing:
- Ran core job
- Ran build + dataload on Centos 7, Redhat 8
- Manual testing of individual scripts (except some bitrotted areas like the
random query generator)
Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Update the following elements of the Impala build environment to enable
builds on Ubuntu 24.04:
- Recognize and handle (where necessary) Ubuntu 24.04 in various
bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.)
- Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains
Ubuntu 24.04-specific binary packages
- Bump binutils to 2.42, and
- Bump the GDB version to 12.1-p1, as required by the new toolchain
version
- Update unique_ptr usage syntax in be/src/util/webserver-test.cc to
compensate for new GLIBC funtion prototypes:
System headers in Ubuntu 24.04 adopted attributes on several widely
used function prototypes. Such attributes are not considered to be part
of the function's signature during template evaluation, so GCC throws a
warning when such a function is passed as a template argument, which
breaks the build, as warnings are treated as errors.
webserver-test.cc uses pclose() as the deleter for a unique_ptr in a
utility function. This patch encapsulates pclose() and its attributes in
an explicit specialization for std::default_delete<>, "hiding" the
attributes inside a functor.
The particular solution was inspired by Anton-V-K's proposal in
https://gist.github.com/t-mat/5849549
This commit builds on an earlier patch for the same purpose by Michael
Smith: https://gerrit.cloudera.org/c/23058/
Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2
Reviewed-on: http://gerrit.cloudera.org:8080/23384
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
IMPALA-13920 adds java and javac version check in bootstrap_system.sh
This patch move that check after 'sudo yum install' command so that
Impala build in redhat machine can work.
Testing:
Build Impala using jenkins and Redhat 8.6 machine.
Change-Id: I25b26c146bf13138741272cd73727e7244462249
Reviewed-on: http://gerrit.cloudera.org:8080/22772
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11941 allowed building Impala and running tests with Java 17,
but it still uses Java 8 for minicluster components (e.g. Hadoop) and
skips several tests that would restart Hive. It should be possible to
use 17 for everything to be able to deprecate Java 8.
This patch mainly fixes Yarn+Hive+Tez startup issues with java 17 by
setting JAVA_TOOL_OPTIONS.
Another issues fixed is KuduHMSIntegrationTest: this test fails to
restart Kudu due to a bug in OpenJDK (see IMPALA-13856). The current
fix is to remove LD_PRELOAD to avoid loading libjsig (similarly to
the case when MINICLUSTER_JAVA_HOME is set). This works, but it
would be nice to clean up this area in a future patch.
Testing:
- ran exhaustive tests with Java 17
- ran core tests with default Java 8
Change-Id: If58b64a21d14a4a55b12dfe9ea0b9c3d5fe9c9cf
Reviewed-on: http://gerrit.cloudera.org:8080/22705
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
These tests failed in various ways depending on OS/openssl version.
An issue identified is that the certificates contained CN=* while
wildcard subject should be like *.<domain>. Recreated wildcard
certs with *.impala.test common name and added some host names
that match them in bootstrap_system.sh.
Removed the @xfail from the tests as my expectation is that they
should work on all supported OS.
Tested on
- Ubuntu 20.04 / OpenSSL 1.1.1f
- Ubuntu 22.04 / OpenSSL 3.0.2
- RHEL 7.9 / OpenSSL 1.0.2k
- RHEL 8.6 / OpenSSL 1.1.1k
- Rocky 9.2 / OpenSSL 3.2.2
Change-Id: Ieedf682d06bdb6f8f68a5f77e41175e895b77ca9
Reviewed-on: http://gerrit.cloudera.org:8080/22569
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
postgresql initialization can fail if run a second time:
sudo service postgresql initdb
3 ERROR: Data directory /var/lib/pgsql/data is not empty!
This can led to skipping the rest of the script - a
quick fix is to deal with postgresql at the end.
Change-Id: I55e862ebe3b823e4aeaaa656d5536b6317b5e19c
Reviewed-on: http://gerrit.cloudera.org:8080/22550
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Two permission issues caused this dataload step to fail:
- Lack of X permission on home directory (seems linux specific).
- LOAD statement has no right to use \tmp for some reason - using
\LOAD instead solves this. I don't know what postgres/configuration
change caused this.
Testing:
- dataload and ext-data-source related tests passed on Rocky Linux 9.5
Change-Id: I3829116f4c6d6f6cba2da824cd9f31259a15ca1b
Reviewed-on: http://gerrit.cloudera.org:8080/22383
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Ubuntu 20.04 introduced a bug in their Python 2.7's tarfile
functionality with 2.7.18-1~20.04.5. See
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/2089071
This breaks the Impala build with a message like
"tarfile.ReadError: invalid header".
The bug has an attached tarfile.py with a workaround for the
issue. This change bootstrap_system.sh and boostrap_build.sh to
detect the bad tarfile.py and replace it with the patched tarfile.py.
Since this is comparing the hash, this will become a no-op once
Ubuntu fixes the issue.
Testing:
- Ran a build on Ubuntu 20.04
Change-Id: I1d0691611cf53ae6dd1099b97f0aa15b450e0996
Reviewed-on: http://gerrit.cloudera.org:8080/22088
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Updates IMPALA_BUILD_THREADS to bound it based on guideline of 2 GB
memory per core during builds. Computes cores and memory from cgroup
limits if applicable; memory is used as a bound on physical memory, as
sometimes cgroups will report a larger limit than available physical
memory.
Uses IMPALA_BUILD_THREADS for load-data.
Adds a default in case USER is unset during bootstrap, which can occur
in devcontainer.
Change-Id: I87994d0464073fe2d91bc2f7c2592c012e42de71
Reviewed-on: http://gerrit.cloudera.org:8080/21200
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Maven version 3.9.7 consumed an upgraded version of the resolver
plugin that contains a fix around file locking. Issues with locking
files are seen occasionally on builds.
This patch consumes Maven 3.9.8 since it is the latest version
available at this time.
Testing was performed by running only the download code in a Redhat 8
docker container.
Change-Id: I509dd94799b99bf637a583eadc2905bc32a87c87
Reviewed-on: http://gerrit.cloudera.org:8080/21674
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Andrew Sherman <asherman@cloudera.com>
IMPALA-12212 upgraded Maven to 3.9.2 to gain access to the parallel
dependency resolver in the 3.9.x line. The Maven project has published
several new releases since 3.9.2, fixing various issues with the new
resolver, and also fixing problems with concurrent access to the
local Maven cache.
Pick up the latest version to gain access to these new fixes.
Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
Reviewed-on: http://gerrit.cloudera.org:8080/21332
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
On RedHat 8, RpcMgrKerberizedTest cases fail with
Jan 09 14:47:03 msmith.vpc.cloudera.com krb5kdc[609624](info): TGS_REQ
(1 etypes {aes128-cts-hmac-sha1-96(17)}) 127.0.0.1: LOOKING_UP_SERVER:
authtime 0, etypes {rep=UNSUPPORTED:(0)}
impala-test/msmith.vpc.cloudera.com@KRBTEST.COM for
impala-test/msmith@KRBTEST.COM, Server not found in Kerberos database
This happens because bootstrap_system.sh adds an entry to /etc/hosts to
resolve 127.0.0.1 to hostname and puts the short hostname first. During
negotiation, Kudu RPC will call GetFQDN to retrieve the FQDN, which for
our tests running on localhost returns the short hostname.
Fixes RpcMgrKerberizedTest by swapping the order of entries added to
/etc/hosts so the FQDN comes first. This is consistent with the example
provided in https://man7.org/linux/man-pages/man5/hosts.5.html.
Avoids 'hostname -f'; on RedHat it's identical to 'hostname', and on
Ubuntu it causes this test to fail.
Change-Id: I1eb24f9faec766e388d793408aedecdc92107185
Reviewed-on: http://gerrit.cloudera.org:8080/20876
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
With RHEL 8 on AWS Graviton instances,
dfs.datanode.max.locked.memory=64000 is insufficient to run
query_test/test_hdfs_caching.py::TestHdfsCaching::test_table_is_cached.
Sets dfs.datanode.max.locked.memory based on 'ulimit -l', and sets
memlock to 64MB in bootstrap_system.sh to match modern defaults and
provide space for future HDFS caching tests.
New setting can be seen in admin output like
node-1 will use ports DATANODE_PORT=31002, DATANODE_HTTP_PORT=31012,
DATANODE_IPC_PORT=31022, DATANODE_HTTPS_PORT=31032,
DATANODE_CLIENT_PORT=31042, NODEMANAGER_PORT=31102,
NODEMANAGER_LOCALIZER_PORT=31122, NODEMANAGER_WEBUI_PORT=31142,
KUDU_TS_RPC_PORT=31202, and KUDU_TS_WEBUI_PORT=31302;
DATANODE_LOCKED_MEM=65536000
Change-Id: I7722ddd0c7fbd9bbd1979503952b7522b808194a
Reviewed-on: http://gerrit.cloudera.org:8080/20623
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Currently Postgres server in Impala mini-cluster only accepts local
connections. In dockerised-tests, Impala daemons and Postgres server
are running on different hosts so Postgres server is not accessible
for Impala coordinators.
This patch changes the configurations of Postgres server to make it
accept remote connections from hosts in the same sub network.
Enables query_test/test_ext_data_sources.py for dockerised-tests.
Testing:
- Passed dockerised-tests.
- Passed regular core-tests.
Change-Id: I7dfaf38bf9178cb2ec7ef15c79c17a5ab1e1c6dc
Reviewed-on: http://gerrit.cloudera.org:8080/20634
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Pre-built toolchains are identified by a TOOLCHAIN_BUILD_ID. This commit
adds an aarch64 (64-bit ARM) native-toolchain build, separate from the
x86_64 native-toolchain build, with its own environment variable set in
impala-config.sh. bootstrap_toolchain.py selects which version to use
based on 'uname -m'.
impala-config.sh also verifies that IMPALA_TOOLCHAIN_BUILD_ID_AARCH64
and IMPALA_TOOLCHAIN_BUILD_ID_X86_64 were produced from the same
native-toolchain ref by checking the 2nd token of the build ID.
Updates package version to include the architecture tag to match how
native-toolchain now names them.
Testing:
- successfully built on ARM, and tests passed (exceptions noted in
IMPALA-12490)
Change-Id: I9bfa7125dbc647b33041c5572d97b7f7ccad6258
Reviewed-on: http://gerrit.cloudera.org:8080/20519
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
If NATIVE_TOOLCHAIN_HOME is set, that will be used to provide the native
toolchain instead of the default in IMPALA_TOOLCHAIN. Overrides
IMPALA_TOOLCHAIN_PACKAGES_HOME and sets SKIP_TOOLCHAIN_BOOTSTRAP=true.
Adds IMPALA_TOOLCHAIN_REPO, IMPALA_TOOLCHAIN_BRANCH, and
IMPALA_TOOLCHAIN_COMMIT_HASH so everything is clear about what toolchain
is used for this Impala commit.
If NATIVE_TOOLCHAIN_HOME does not yet exist, buildall.sh will clone the
repo and checkout the commit hash mentioned above before building.
Also skips downloading Kudu if SKIP_TOOLCHAIN_BOOTSTRAP is true as Kudu
is built from native-toolchain. Normalizes aarch64 logic, which skipped
Kudu because it would always build native-toolchain locally.
Change-Id: I3a9e51b7f54c738d8cc01b32428ac88a344de376
Reviewed-on: http://gerrit.cloudera.org:8080/20267
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
The Impala system preparation script bin/bootstrap_system.sh may
run multiple times on a system. Jenkins-based precommit runs may
reuse the worker node, or a developer could just run the script
one more time.
During such a run Maven's current version is downloaded and symlinked
into /usr/local/bin. However, the script was not prepared for an already
existing symlink there, and failed if it found one. This is especially
painful for Jenkins-based runs, where such a failure fails the whole
build.
This patch fixes this annoying failure by adding -f to the `ln` command
to disregard any existing symlink.
Change-Id: Ic057103dd770b22dfe27902d435692f54cbb9d3d
Reviewed-on: http://gerrit.cloudera.org:8080/20305
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Maven 3.9.x offers a new dependency resolver, HttpClient, which allows
downloading project dependencies in parallel.
This patch bumps the Maven version installed by bootstrap_system.sh to
v3.9.2, and adds the flags enabling the new resolver to download
dependencies (including POM files) in parallel. Parallelism is set to
10 threads.
The flags are added to a project-specific Maven setting file in the
newly created java/.mvn directory. The settings file is added to the
RAT exclusion list in bin/rat_exclude_files.txt.
The --show-version flag is added for debugging purposes.
The same flags are added to the JAMM subproject as well.
The new resolver in Maven 3.9 has also changed the warning message
emitted for missing component checksums, so the new warning string
is added to the filter in bin/mvn-quiet.sh
Unfortunately Maven 3.9 has also changed the way it responds to missing
checksum files: the resolver now emits a stack trace when checksums
cannot be determined, and missing checksums are not explicitly ignored.
Detailed documentation for the new Maven resolver in Maven 3.9.0+ is
located at:
https://maven.apache.org/guides/mini/guide-resolver-transport.html
resolver configuration reference:
https://maven.apache.org/resolver/configuration.html
Tests:
- verified in a core-mode test run with Maven 3.9.2 installed
- verified in a local build using an earlier version of Maven
to verify that the new default setting does not cause regressions
with the old dependency resolver.
Change-Id: I75d05215effc724f5bd471646fb352f37443e185
Reviewed-on: http://gerrit.cloudera.org:8080/20142
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
This patch bases on a previous patch contributed by Shant Hovsepian:
https://gerrit.cloudera.org/c/16612/
It adds a new option, -package, to buildall.sh for building a package
for the current OS type (e.g. CentOS/Ubuntu). You can also use
"make/ninja package" to build the package. Scripts for launching the
services and the required configuration files are also added.
Tests:
- Built on Ubuntu 18.04/20.04 and CentOS 7 using
./buildall.sh -noclean -skiptests -release -package
- Deployed the RPM package on a CDP cluster. Verifed the scripts.
- Deployed the DEB package on a docker container. Verified the scripts.
Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Reviewed-on: http://gerrit.cloudera.org:8080/18939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for Redhat 9 / Ubuntu 22. It updates
to a newer toolchain that has those builds, and it adds
supporting code in bootstrap_system.sh.
Redhat 9 and Ubuntu 22 use python = python3, which requires
various changes to build scripts and tests. Ubuntu 22 uses
Python 3.10, which deprecates certain ssl.PROTOCOL_TLS, so
this adapts test_client_ssl.py to that change until it
can be fully addressed in IMPALA-12219.
Various OpenSSL methods have been deprecated. As a workaround
until these can be addressed properly, this specifies
-Wno-deprecated-declarations. This can be removed once the
code is adapted to the non-deprecated APIs in IMPALA-12226.
Impala crashes with tcmalloc errors unless we update to a newer
gperftools, so this moves to gperftools 2.10. gperftools changed
the default for tcmalloc.aggressive_memory_decommit to off, so
this adapts our code to set it for backend tests. The gperftools
upgrade does not show any performance regression:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.08 | -0.64% | 2.20 | -0.37% |
+----------+-----------------------+---------+------------+------------+----------------+
With newer Python versions, the impala-virtualenv command
fails to create a Python 3 virtualenv. This switches to
using Python 3's builtin venv command for Python >=3.6.
Kudu needed a newer version and LLVM required a couple patches.
Testing:
- Ran a core job on Ubuntu 22 and Redhat 9. The tests run
to completion without crashing. There are test failures
that will be addressed in follow-up JIRAs.
- Ran dockerised tests on Ubuntu 22.
- Ran dockerised tests on Ubuntu 20 and Rocky 8.5.
Change-Id: If1fcdb2f8c635ecd6dc7a8a1db81f5f389c78b86
Reviewed-on: http://gerrit.cloudera.org:8080/20073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This removes a few stray lsb_release references in distcc
scripts and the install_docker.sh script. It then removes
the redhat-lsb package from the list of installed packages.
Testing:
- Ran a build on Rocky 8.5
- Ran dockerised tests on Ubuntu 20
Change-Id: I9d84e9ab8076fd8cc4727a5da118d9a747d4a005
Reviewed-on: http://gerrit.cloudera.org:8080/20071
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Experimental builds on systems based on Red Hat Linux v8.x revealed that
snappy-devel is installed only for RedHat-based systems in
bin/bootstrap_system.sh (the script that preps a workstation for local
Impala development). The same library was not installed for Ubuntu
variants -- probably because Impala has been using Snappy from the
toolchain for a long time now.
Subsequent tests revealed that the build and dataload phases can
complete successfully on Red Hat Linux v8.6 even in the absence of this
package, so this patch removes the installation of snappy-devel during
system preparation.
Tested by running bin/bootstrap_system.sh on a newly minted private VM
instance running RedHat Linux 8.6, then running
buildall.sh -skiptests -format -testdata
successfully.
Change-Id: I6b14e09fa78d51a387a066eb04495f758430fa9d
Reviewed-on: http://gerrit.cloudera.org:8080/20021
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.
Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This is required because ARM builds of the toolchain binaries are not
(yet) present in s3://native-toolchain, so ARM builds have to build the
toolchain locally, before being able to build Impala.
The toolchain's Python build failed before this patch, because it missed
the libreadline-dev package, which is needed for Python's readline support.
This patch adds it and its libncurses dependency.
Change-Id: I1bf6193027d691d3ded727cb59424c5dc9963ea9
Reviewed-on: http://gerrit.cloudera.org:8080/19835
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
IMPALA-9999 upgrades to GCC version to 10.4 which generates new gcov
format that the current gcovr version (3.4) can't parse. This patch
upgrades gcovr to the latest Python2-compatible version (4.2). Also adds
Jinja2, MarkupSafe and lxml as the required dependent packages. The
development packages of libxml2 and libxslt are also added in
bootstrap_system.sh and bootstrap_build.sh.
This patch also fixes a failure due to the gcov executable not found in
PATH.
Tests:
- Verified builds on Ubuntu 16.04 and CentOS 7.9
- Verified coverage_helper.sh work after this patch
Change-Id: I9458fa0dc97d69f88a4e8a3313dc9440215dfd52
Reviewed-on: http://gerrit.cloudera.org:8080/19226
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In IMPALA-11492, ExprTest.Utf8MaskTest was failing on some
configurations because the en_US.UTF-8 was missing. Since the
Docker images don't contain en_US.UTF-8, they are subject
to the same bug. This was confirmed by adding tests cases
to the test_utf8_strings.py end-to-end test and running it
in the dockerized tests.
This add the appropriate language pack to the list of packages
installed for the Docker build.
Testing:
- This adds end-to-end tests to test_utf8_strings.py covering the
same cases that were failing in ExprTest.Utf8MaskTest. They
failed without the added languages packs, and now succeed.
Change-Id: I353f257b3cb6d45f7d0a28f7d5319fdb457e6e3d
Reviewed-on: http://gerrit.cloudera.org:8080/19080
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Machines that don't have en_US.UTF-8 installed see
issues when running ExprTest.Utf8MaskTest.
This currently impacts the Docker-based tests.
This installs the appropriate language packs
to have en_US.UTF-8 installed.
Testing:
- Ran docker-based tests and verified that
ExprTest.Utf8MaskTest passes.
Change-Id: I1b8696190e4713bda787e773d48943b5dfc6335e
Reviewed-on: http://gerrit.cloudera.org:8080/18875
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
The bootstrap_system.sh will prepopulate the .m2 directory by
downloading a 800M+ m2_archive.tar.gz to speed up the following maven
packaging process, although it's not necessary. Meanwhile, due to
different network environment, the download speed of the archive file
is not necessarily fast.
There add an environment variable 'PREPOPULATE_M2_REPOSITORY' to control
whether to prepopulate the m2 directory or not, which is true by
default.
Testing:
- manually run './bin/bootstrap_system.sh' and expect to see the log
'>>> Populating m2 directory...' and 'Downloading m2 archive from ...',
and the terminal returned for a while.
- manually run 'PREPOPULATE_M2_REPOSITORY=true ./bin/bootstrap_system.sh'
and expect to see the log '>>> Populating m2 directory...' and
'Downloading m2 archive from ...', and the terminal returned for a while.
- manually run 'PREPOPULATE_M2_REPOSITORY=false ./bin/bootstrap_system.sh'
and expect to see the log ">>> Skip populating m2 directory", and
the terminal returned immediately.
Change-Id: Ie3ac55099f326e2abe2f7dc66c08ad7023cb6baf
Reviewed-on: http://gerrit.cloudera.org:8080/18740
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
TestKrpcSocket uses netstat as part of its test. netstat
is provided by the net-tools package on Ubuntu and Centos.
This adds that as a dependency in bootstrap_system.sh.
Docker-based tests had been hitting this test failure,
because they start from a clean docker image.
Testing:
- Ran docker-based tests and TestKrpcSocket now passes
Change-Id: I9ad704e408d4ca4741178d4ea7a857bf30d4cfb6
Reviewed-on: http://gerrit.cloudera.org:8080/18774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Build Python 3 eggs for the shell tarball so it works with both Python 2
and Python 3. The impala-shell script selects eggs based on the
available Python version.
Inlines thrift for impala-shell so we can easily build Python 2 and
Python 3 versions, consistent with other libraries. The impala-shell
version should always be at least as new as IMPALA_THRIFT_PY_VERSION.
Thrift 0.13.0+ wraps all exceptions during TSocket read/write operations
in TTransportException. Specifically socket.error that we got as raw
exceptions are now wrapped. Unwraps them before raising to preserve
prior behavior.
A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE;
otherwise it will use 'python', and if unavailable try 'python3'.
Adds tests for impala-shell tarball with Python 3.
Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
Reviewed-on: http://gerrit.cloudera.org:8080/18653
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala started adding Ubuntu 20.04 support in various places.
This patch extends bootstrap_config.sh for Ubuntu 20.04 coverage:
1. The runtime version check error message is updated to claim support
for Ubuntu 20.04.
2. Kudu needs libtinfo.5.so on Ubuntu 20.04 for the minicluster binaries.
bin/bootstrap_system.sh now installs it when running on Ubuntu 20.04.
3. The OpenJDK default version reset to JDK 8 is extended to Ubuntu 20.04.
Tested by running the code using docker/test-with-docker.py using
--base-image=ubuntu:20.04 and observing that Kudu was able to start in
the minicluster. The test runs completed, but there were test failures,
for which separate tickets will be filed.
Change-Id: I212f6df3657cf9d621a0669573e1e511eae13662
Reviewed-on: http://gerrit.cloudera.org:8080/17240
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Centos 8.3 changed package repo ID capitalization from MixedCase
to all lowercase. On Centos 8 snappy-devel is installed from the
PowerTools repo, which is not enabled by default, so the install command
has to enable is temporarily using the repo ID.
The capitalization change broke bootstrap_system.sh, failing builds
on Centos 8.
The patch changes the `dnf install` call to use a glob pattern
for the PowerTools repo ID to cover the naming conventions in all
Centos 8.x releases.
Change-Id: I224beb1189ce25ae66ecd78d70757537e117805a
Reviewed-on: http://gerrit.cloudera.org:8080/16844
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Change apt install retry times to 30 in bootstrap_system.sh,
Because this always timeout recently.
And add solution for waiting the apt's lock-frontend
Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Reviewed-on: http://gerrit.cloudera.org:8080/16751
Reviewed-by: Jim Apple <jbapple@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Ubuntu 20.04
This is a minor amendment to a previously merged change with
ChangeId I4f592f60881fd8f34e2bf393a76f5a921505010a, to address
additional review comments. In particular, the original commit
referred to Ubuntu 20.4 whereas it should have used Ubuntu 20.04.
Change-Id: I7db302b4f1d57ec9aa2100d7589d5e814db75947
Reviewed-on: http://gerrit.cloudera.org:8080/16241
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Including following changes:
1 build native-toolchain local by script on aarch64 platform
2 change some native-toolchain's lib version number
3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things,
because on aarch64, just need to download cdp components ,
but not need to download toolchain.
4 download hadoop aarch64 nativelibs , impala building needs these libs.
With this commit, on ubuntu 18.04 aarch64 version,
just need to run bin/bootstrap_development.sh, just like x86.
Change-Id: I769668c834ab0dd504a822ed9153186778275d59
Reviewed-on: http://gerrit.cloudera.org:8080/16065
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Ubuntu 20.4
This work addresses the current limitation in Impala development
environment in that Ubuntu 20.4 is not supportd. The fix modifies
bootstrap_system.sh and bootstrap_toolchain.py to specifically
allow the bootstrapping of the development environment on a maching
running Ubuntu 20.4. Limited use shows that the environment is useful
and stable, similar to the one running on Ubuntu 18.4.
Testing on a box running Ubuntu 20.4:
1. Successfully bootstrapped the entire Impala development environment
2. Interacted with the enviroment through the following tools:
gdb
jdb
clang-format
impalad GUI
vim
3. Ran all tests
Limitations found with Ubuntu 20.4 environment.
1. gdb in Impala toolchain is not compatible with Impala C++ test
code ${IMPALA_HOME}/be/build/latest/service\
/unifiedbetests (invoked by ${IMPALA_HOME}/be/build/latest/\
scheduling/admission-controller-test) and reports the following
error, after attaching to the test process.
BFD (GNU Binutils) 2.25.51 internal error, aborting at elf64-x86-64.c
ine 5583 in elf_x86_64_get_plt_sym_val
Change-Id: I4f592f60881fd8f34e2bf393a76f5a921505010a
Reviewed-on: http://gerrit.cloudera.org:8080/16238
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Ant released a new version in May 2020, which made the URL in
bootstrap_system.sh obsolete. At the same time Apache created new rules
for the download locations, moving older releases to archive.apache.org.
This patch changes the download URLs for Maven and Ant to point to the
stable locations at archive.apache.org. These locations don't change
when a new version of a project is released, so downloads pulling a
specific version will not be affected by a new release. At the same time
new releases are stored at the archive site as well, so this location
works for all versions.
Change-Id: I1875f260b931ef096fc91a4723f91310225c55c9
Reviewed-on: http://gerrit.cloudera.org:8080/16062
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.
This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.
The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.
Testing:
- Dryrun of GVO
- Modified TestPartitionMetadataUncompressedTextOnly's
test_unsupported_text_compression() to add LZO case
Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This adds a script to find an appropriate m2 archive
tarball, download it, and use it to prepopulate the
~/.m2 directory.
The script uses the JSON interface for Jenkins to search through
the all-build-options-ub1604 builds on jenkins.impala.io to
find one that:
1. Is building the "master" branch
2. Has the m2_archive.tar.gz
Then, it downloads the m2 archive and uses it to populate ~/.m2.
It does not overwrite or remove any files already in ~/.m2.
The build scripts that call populate_m2_directory.py do not
rely on the script succeeding. They will continue even if
the script fails.
This also modifies the build-all-flag-combinations.sh script
to only build the m2 archive if the GENERATE_M2_ARCHIVE
environment variable is true. GENERATE_M2_ARCHIVE=true will
clear out the ~/.m2 directory to build an accurate m2 archive.
Precommit jobs will use GENERATE_M2_ARCHIVE=false, which
will allow them to use the m2 archive to speed up the build.
Testing:
- Ran gerrify-verify-dryrun
- Tested locally
Change-Id: I5065658d8c0514550927161855b0943fa7b3a402
Reviewed-on: http://gerrit.cloudera.org:8080/15735
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes LD_LIBRARY_PATH and LD_PRELOAD from the
developer's shell and cleans it up. With the preceding
change, toolchain utilities like clang can be run without
a special LD_LIBRARY_PATH.
This fixes a bug where libjvm.so was registered as a
static instead of a shared library, which adds it to the
RUNPATH variable in the binary, which provides a default
search location that can be overriden by LD_LIBRARY_PATH.
Impala binaries don't have the rpath baked in for some
libraries, including Impala-lzo, libgcc and libstdc++.
, so we still need to set LD_LIBRARY_PATH when running
those. That is solved with wrapper scripts that sets
the environment variables only when invoking those
binaries, e.g. starting a daemon or running a backend
test. I added three scripts because there were 3 sets
of environment variables. The scripts are:
* run-binary.sh: just sets LD_LIBRARY_PATH
* run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH
* start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and
kerberos-related environment variables.
The binaries, in almost all cases, work fine without
those tweaks, because libstdc++ and libgcc are picked
up along with libkuduclient.so from the toolchain (they
are in the same directory). I decided to leave good enough
alone here. run-binary.sh and friends can be used in
any remaining edge cases to run binaries.
An alternative to the 3 scripts would be to have an
uber-script that set all the variables, but I felt
that it was better to be specific about what
each binary needed. Cleaning the LD_LIBRARY_PATH
mess up has given me a distaste for scattershot
setting of environment variables. I am open to
revisiting this.
Testing:
* Ran tests on centos 7
* Manually tested that my dev env with
LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued
to work (for now). All ubuntu 16.04 and 18.04 dev
envs that were set up with bootstrap_development.sh
will be in this state.
Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603
Reviewed-on: http://gerrit.cloudera.org:8080/14494
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
CentOS 8.1 is a new major version of the CentOS family.
It is now stable and popular enough to start supporting it for Impala
development.
Prepare a raw CentOS 8.1 system to support Impala development and testing.
This should work on a standalone computer, on a virtual machine,
or inside a Docker container.
Details:
- snappy-devel moved to the PowerTools repo, so it needs to be installed
from there
- CentOS 8 has no default Python version. The bootstrap script installs
(or configures) Python2 with pip2, then makes them the default via the
"alternatives" mechanism. The installer is adaptive, it performs only
the necessary steps, so it works in various environments.
The installer logic is also shared between bin/bootstrap_system.sh and
docker/entrypoint.sh
- The toolchain package tag "ec2-centos-8" is added to
bootstrap_toolchain.py
- For some unknown reason, when the downloaded Maven tarball is extracted
in a Docker-based test, the "bin" and "boot" directories are created
with owner-only permissions. The 'impdev' users has no access to the
maven executable, which then breaks the build.
This patch forcibly restores the correct permissions on these
directories; this is a no-op when the extraction happens correctly.
- TOOLCHAIN_ID is bumped to a build that already has CentOS 8 binaries.
- Centos8-specific bootstrap code was added to the Docker-based tests.
Tested:
- ran the Docker-based tests with --base-image=centos:8 to verify the following build
phases are successful:
* system prep
* build
* dataload
and that test can start. Passing all tests is was not a requirement for this step,
although plausible test results (i.e. not all of the tests fail) were.
- ran the Docker-based tests to verify nonregression with --base-image set to the
following: centos:7, ubuntu:16.04, ubuntu:18.04.
On centos:7 and ubuntu:16.04 the only failure was IMPALA-9097 (BE tests fail without
the minicluster running); ubuntu:18.04 showed the same failures as the current upstream
code.
- passed a core-mode test run on private infrastructure on Centos 7.4
- ran buildall.sh in core mode manually inside a Docker container, simulating a developer
workflow (prep-build-dataload-test). There were several observed test failures, but
the workflow itself was run to completion with no problems.
Change-Id: I3df5d48eca7a10219264e3604a4f05f072188e6e
Reviewed-on: http://gerrit.cloudera.org:8080/15623
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Automatically assume IMPALA_HOME is the source directory
in a couple of places.
Delete the cache_tables.py script and MINI_DFS_BASE_DATA_DIR
config var which had both bit-rotted and were unused.
Allow setting IMPALA_CLUSTER_NODES_DIR to put the minicluster
nodes, most important the data, in a different location, e.g.
on a different filesystem.
Testing:
I set up a dev environment using this code and was able to
load data and run some tests.
Change-Id: Ibd8b42a6d045d73e3ea29015aa6ccbbde278eec7
Reviewed-on: http://gerrit.cloudera.org:8080/15687
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>