Commit Graph

82 Commits

Author SHA1 Message Date
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Laszlo Gaal
89d2b23509 IMPALA-14139: Enable Impala builds on Ubuntu 24.04
Update the following elements of the Impala build environment to enable
builds on Ubuntu 24.04:

- Recognize and handle (where necessary) Ubuntu 24.04 in various
  bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.)
- Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains
  Ubuntu 24.04-specific binary packages
- Bump binutils to 2.42, and
- Bump the GDB version to 12.1-p1, as required by the new toolchain
  version
- Update unique_ptr usage syntax in  be/src/util/webserver-test.cc to
  compensate for new GLIBC funtion prototypes:

System headers in Ubuntu 24.04 adopted attributes on several widely
used function prototypes. Such attributes are not considered to be part
of the function's signature during template evaluation, so GCC throws a
warning when such a function is passed as a template argument, which
breaks the build, as warnings are treated as errors.

webserver-test.cc uses pclose() as the deleter for a unique_ptr in a
utility function. This patch encapsulates pclose() and its attributes in
an explicit specialization for std::default_delete<>, "hiding" the
attributes inside a functor.

The particular solution was inspired by Anton-V-K's proposal in
https://gist.github.com/t-mat/5849549

This commit builds on an earlier patch for the same purpose by Michael
Smith: https://gerrit.cloudera.org/c/23058/

Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2
Reviewed-on: http://gerrit.cloudera.org:8080/23384
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-09-15 16:10:42 +00:00
Csaba Ringhofer
d630d6f8af IMPALA-13802: Ignore error during postgres init
This is a normal error in case postgres init was already called
on the machine.

Change-Id: I67ce40e4c12a7318ad7deb7e796b3e5bd5b4bfd4
Reviewed-on: http://gerrit.cloudera.org:8080/22989
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-06-06 14:11:52 +00:00
Joe McDonnell
b21e0f031b IMPALA-14125: Avoid downloading maven from archive.apache.org
Some builds failed because archive.apache.org was unreachable.
This gets the maven download from the native toolchain s3 bucket
instead.

Change-Id: Ib1eec38f12209abf88c1cf2976db47d0ca04ab5b
Reviewed-on: http://gerrit.cloudera.org:8080/22973
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-05 23:07:30 +00:00
Riza Suminto
bfa4402c13 IMPALA-13956: Move java version check after yum install
IMPALA-13920 adds java and javac version check in bootstrap_system.sh
This patch move that check after 'sudo yum install' command so that
Impala build in redhat machine can work.

Testing:
Build Impala using jenkins and Redhat 8.6 machine.

Change-Id: I25b26c146bf13138741272cd73727e7244462249
Reviewed-on: http://gerrit.cloudera.org:8080/22772
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-14 11:57:00 +00:00
Csaba Ringhofer
6f2d9a24d8 IMPALA-13920: Allow running minicluster with Java 17
IMPALA-11941 allowed building Impala and running tests with Java 17,
but it still uses Java 8 for minicluster components (e.g. Hadoop) and
skips several tests that would restart Hive. It should be possible to
use 17 for everything to be able to deprecate Java 8.

This patch mainly fixes Yarn+Hive+Tez startup issues with java 17 by
setting JAVA_TOOL_OPTIONS.

Another issues fixed is KuduHMSIntegrationTest: this test fails to
restart Kudu due to a bug in OpenJDK (see IMPALA-13856). The current
fix is to remove LD_PRELOAD to avoid loading libjsig (similarly to
the case when MINICLUSTER_JAVA_HOME is set). This works, but it
would be nice to clean up this area in a future patch.

Testing:
- ran exhaustive tests with Java 17
- ran core tests with default Java 8

Change-Id: If58b64a21d14a4a55b12dfe9ea0b9c3d5fe9c9cf
Reviewed-on: http://gerrit.cloudera.org:8080/22705
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2025-04-04 17:50:01 +00:00
Csaba Ringhofer
e49ed3d243 IMPALA-13790: Fix test_wildcard_san_ssl / test_wildcard_ssl
These tests failed in various ways depending on OS/openssl version.
An issue identified is that the certificates contained CN=* while
wildcard subject should be like *.<domain>. Recreated wildcard
certs with *.impala.test common name and added some host names
that match them in bootstrap_system.sh.

Removed the @xfail from the tests as my expectation is that they
should work on all supported OS.

Tested on
- Ubuntu 20.04 / OpenSSL 1.1.1f
- Ubuntu 22.04 / OpenSSL 3.0.2
- RHEL 7.9     / OpenSSL 1.0.2k
- RHEL 8.6     / OpenSSL 1.1.1k
- Rocky 9.2    / OpenSSL 3.2.2

Change-Id: Ieedf682d06bdb6f8f68a5f77e41175e895b77ca9
Reviewed-on: http://gerrit.cloudera.org:8080/22569
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-06 19:42:24 +00:00
Csaba Ringhofer
a4f303ea55 IMPALA-13802: move postgresql init to the end of bin/bootstrap_system.sh
postgresql initialization can fail if run a second time:
sudo service postgresql initdb
3 ERROR: Data directory /var/lib/pgsql/data is not empty!

This can led to skipping the rest of the script - a
quick fix is to deal with postgresql at the end.

Change-Id: I55e862ebe3b823e4aeaaa656d5536b6317b5e19c
Reviewed-on: http://gerrit.cloudera.org:8080/22550
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-27 19:26:05 +00:00
Csaba Ringhofer
988d353e02 IMPALA-13693: Fix load-ext-data-sources.sh on Rocky 9.5
Two permission issues caused this dataload step to fail:
- Lack of X permission on home directory (seems linux specific).
- LOAD statement has no right to use \tmp for some reason - using
  \LOAD instead solves this. I don't know what postgres/configuration
  change caused this.

Testing:
- dataload and ext-data-source related tests passed on Rocky Linux 9.5

Change-Id: I3829116f4c6d6f6cba2da824cd9f31259a15ca1b
Reviewed-on: http://gerrit.cloudera.org:8080/22383
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2025-01-25 11:52:02 +00:00
Joe McDonnell
8dd935a98f IMPALA-13558: Workaround Python 2 tarfile issue by patching tarfile.py
Ubuntu 20.04 introduced a bug in their Python 2.7's tarfile
functionality with 2.7.18-1~20.04.5. See
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/2089071
This breaks the Impala build with a message like
"tarfile.ReadError: invalid header".

The bug has an attached tarfile.py with a workaround for the
issue. This change bootstrap_system.sh and boostrap_build.sh to
detect the bad tarfile.py and replace it with the patched tarfile.py.
Since this is comparing the hash, this will become a no-op once
Ubuntu fixes the issue.

Testing:
 - Ran a build on Ubuntu 20.04

Change-Id: I1d0691611cf53ae6dd1099b97f0aa15b450e0996
Reviewed-on: http://gerrit.cloudera.org:8080/22088
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-21 07:55:48 +00:00
Michael Smith
131f0c74a3 IMPALA-12939: Bound IMPALA_BUILD_THREADS for cgroups and memory
Updates IMPALA_BUILD_THREADS to bound it based on guideline of 2 GB
memory per core during builds. Computes cores and memory from cgroup
limits if applicable; memory is used as a bound on physical memory, as
sometimes cgroups will report a larger limit than available physical
memory.

Uses IMPALA_BUILD_THREADS for load-data.

Adds a default in case USER is unset during bootstrap, which can occur
in devcontainer.

Change-Id: I87994d0464073fe2d91bc2f7c2592c012e42de71
Reviewed-on: http://gerrit.cloudera.org:8080/21200
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2024-09-26 17:00:05 +00:00
jasonmfehr
af5f3acd67 IMPALA-13300: Upgrade Maven to 3.9.8
Maven version 3.9.7 consumed an upgraded version of the resolver
plugin that contains a fix around file locking. Issues with locking
files are seen occasionally on builds.

This patch consumes Maven 3.9.8 since it is the latest version
available at this time.

Testing was performed by running only the download code in a Redhat 8
docker container.

Change-Id: I509dd94799b99bf637a583eadc2905bc32a87c87
Reviewed-on: http://gerrit.cloudera.org:8080/21674
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Andrew Sherman <asherman@cloudera.com>
2024-08-14 17:41:08 +00:00
Laszlo Gaal
d83b48cf72 IMPALA-13014: Upgrade Maven to 3.9.6
IMPALA-12212 upgraded Maven to 3.9.2 to gain access to the parallel
dependency resolver in the 3.9.x line. The Maven project has published
several new releases since 3.9.2, fixing various issues with the new
resolver, and also fixing problems with concurrent access to the
local Maven cache.

Pick up the latest version to gain access to these new fixes.

Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
Reviewed-on: http://gerrit.cloudera.org:8080/21332
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-07-14 03:19:52 +00:00
Michael Smith
a7d7336531 IMPALA-12566: Fix RpcMgrKerberizedTest on RedHat 8
On RedHat 8, RpcMgrKerberizedTest cases fail with

  Jan 09 14:47:03 msmith.vpc.cloudera.com krb5kdc[609624](info): TGS_REQ
  (1 etypes {aes128-cts-hmac-sha1-96(17)}) 127.0.0.1: LOOKING_UP_SERVER:
  authtime 0, etypes {rep=UNSUPPORTED:(0)}
  impala-test/msmith.vpc.cloudera.com@KRBTEST.COM for
  impala-test/msmith@KRBTEST.COM, Server not found in Kerberos database

This happens because bootstrap_system.sh adds an entry to /etc/hosts to
resolve 127.0.0.1 to hostname and puts the short hostname first. During
negotiation, Kudu RPC will call GetFQDN to retrieve the FQDN, which for
our tests running on localhost returns the short hostname.

Fixes RpcMgrKerberizedTest by swapping the order of entries added to
/etc/hosts so the FQDN comes first. This is consistent with the example
provided in https://man7.org/linux/man-pages/man5/hosts.5.html.

Avoids 'hostname -f'; on RedHat it's identical to 'hostname', and on
Ubuntu it causes this test to fail.

Change-Id: I1eb24f9faec766e388d793408aedecdc92107185
Reviewed-on: http://gerrit.cloudera.org:8080/20876
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2024-01-18 00:00:01 +00:00
Michael Smith
2af924d4e5 IMPALA-12516: Set HDFS limit based on memlock
With RHEL 8 on AWS Graviton instances,
dfs.datanode.max.locked.memory=64000 is insufficient to run
query_test/test_hdfs_caching.py::TestHdfsCaching::test_table_is_cached.

Sets dfs.datanode.max.locked.memory based on 'ulimit -l', and sets
memlock to 64MB in bootstrap_system.sh to match modern defaults and
provide space for future HDFS caching tests.

New setting can be seen in admin output like

  node-1 will use ports DATANODE_PORT=31002, DATANODE_HTTP_PORT=31012,
  DATANODE_IPC_PORT=31022, DATANODE_HTTPS_PORT=31032,
  DATANODE_CLIENT_PORT=31042, NODEMANAGER_PORT=31102,
  NODEMANAGER_LOCALIZER_PORT=31122, NODEMANAGER_WEBUI_PORT=31142,
  KUDU_TS_RPC_PORT=31202, and KUDU_TS_WEBUI_PORT=31302;
  DATANODE_LOCKED_MEM=65536000

Change-Id: I7722ddd0c7fbd9bbd1979503952b7522b808194a
Reviewed-on: http://gerrit.cloudera.org:8080/20623
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-11-07 01:11:23 +00:00
wzhou-code
d9f1271c96 IMPALA-12530: Make Postgres server to accept remote connections from same subnet
Currently Postgres server in Impala mini-cluster only accepts local
connections. In dockerised-tests, Impala daemons and Postgres server
are running on different hosts so Postgres server is not accessible
for Impala coordinators.

This patch changes the configurations of Postgres server to make it
accept remote connections from hosts in the same sub network.
Enables query_test/test_ext_data_sources.py for dockerised-tests.

Testing:
 - Passed dockerised-tests.
 - Passed regular core-tests.

Change-Id: I7dfaf38bf9178cb2ec7ef15c79c17a5ab1e1c6dc
Reviewed-on: http://gerrit.cloudera.org:8080/20634
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-10-31 16:40:03 +00:00
Michael Smith
1599360678 IMPALA-12354: Add aarch64 native-toolchain build
Pre-built toolchains are identified by a TOOLCHAIN_BUILD_ID. This commit
adds an aarch64 (64-bit ARM) native-toolchain build, separate from the
x86_64 native-toolchain build, with its own environment variable set in
impala-config.sh. bootstrap_toolchain.py selects which version to use
based on 'uname -m'.

impala-config.sh also verifies that IMPALA_TOOLCHAIN_BUILD_ID_AARCH64
and IMPALA_TOOLCHAIN_BUILD_ID_X86_64 were produced from the same
native-toolchain ref by checking the 2nd token of the build ID.

Updates package version to include the architecture tag to match how
native-toolchain now names them.

Testing:
- successfully built on ARM, and tests passed (exceptions noted in
  IMPALA-12490)

Change-Id: I9bfa7125dbc647b33041c5572d97b7f7ccad6258
Reviewed-on: http://gerrit.cloudera.org:8080/20519
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-10-17 21:12:41 +00:00
Michael Smith
4be517e150 IMPALA-12441: Simplify local toolchain development
If NATIVE_TOOLCHAIN_HOME is set, that will be used to provide the native
toolchain instead of the default in IMPALA_TOOLCHAIN. Overrides
IMPALA_TOOLCHAIN_PACKAGES_HOME and sets SKIP_TOOLCHAIN_BOOTSTRAP=true.

Adds IMPALA_TOOLCHAIN_REPO, IMPALA_TOOLCHAIN_BRANCH, and
IMPALA_TOOLCHAIN_COMMIT_HASH so everything is clear about what toolchain
is used for this Impala commit.

If NATIVE_TOOLCHAIN_HOME does not yet exist, buildall.sh will clone the
repo and checkout the commit hash mentioned above before building.

Also skips downloading Kudu if SKIP_TOOLCHAIN_BOOTSTRAP is true as Kudu
is built from native-toolchain. Normalizes aarch64 logic, which skipped
Kudu because it would always build native-toolchain locally.

Change-Id: I3a9e51b7f54c738d8cc01b32428ac88a344de376
Reviewed-on: http://gerrit.cloudera.org:8080/20267
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-09-14 17:46:37 +00:00
Laszlo Gaal
ed4642180c IMPALA-12331: Overwrite previous Maven installation if exists
The Impala system preparation script bin/bootstrap_system.sh may
run multiple times on a system. Jenkins-based precommit runs may
reuse the worker node, or a developer could just run the script
one more time.

During such a run Maven's current version is downloaded and symlinked
into /usr/local/bin. However, the script was not prepared for an already
existing symlink there, and failed if it found one. This is especially
painful for Jenkins-based runs, where such a failure fails the whole
build.

This patch fixes this annoying failure by adding -f to the `ln` command
to disregard any existing symlink.

Change-Id: Ic057103dd770b22dfe27902d435692f54cbb9d3d
Reviewed-on: http://gerrit.cloudera.org:8080/20305
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-08-03 02:27:33 +00:00
Laszlo Gaal
ee069687fc IMPALA-12212: Bump Maven to 3.9.2, pull dependencies in parallel
Maven 3.9.x offers a new dependency resolver, HttpClient, which allows
downloading project dependencies in parallel.

This patch bumps the Maven version installed by bootstrap_system.sh to
v3.9.2, and adds the flags enabling the new resolver to download
dependencies (including POM files) in parallel. Parallelism is set to
10 threads.

The flags are added to a project-specific Maven setting file in the
newly created java/.mvn directory. The settings file is added to the
RAT exclusion list in bin/rat_exclude_files.txt.

The --show-version flag is added for debugging purposes.

The same flags are added to the JAMM subproject as well.

The new resolver in Maven 3.9 has also changed the warning message
emitted for missing component checksums, so the new warning string
is added to the filter in bin/mvn-quiet.sh
Unfortunately Maven 3.9 has also changed the way it responds to missing
checksum files: the resolver now emits a stack trace when checksums
cannot be determined, and missing checksums are not explicitly ignored.

Detailed documentation for the new Maven resolver in Maven 3.9.0+ is
located at:
https://maven.apache.org/guides/mini/guide-resolver-transport.html
resolver configuration reference:
https://maven.apache.org/resolver/configuration.html

Tests:
- verified in a core-mode test run with Maven 3.9.2 installed
- verified in a local build using an earlier version of Maven
  to verify that the new default setting does not cause regressions
  with the old dependency resolver.

Change-Id: I75d05215effc724f5bd471646fb352f37443e185
Reviewed-on: http://gerrit.cloudera.org:8080/20142
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-07-24 18:50:34 +00:00
stiga-huang
8d0ab2b684 IMPALA-10262: RPM/DEB Packaging Support
This patch bases on a previous patch contributed by Shant Hovsepian:
https://gerrit.cloudera.org/c/16612/

It adds a new option, -package, to buildall.sh for building a package
for the current OS type (e.g. CentOS/Ubuntu). You can also use
"make/ninja package" to build the package. Scripts for launching the
services and the required configuration files are also added.

Tests:
 - Built on Ubuntu 18.04/20.04 and CentOS 7 using
   ./buildall.sh -noclean -skiptests -release -package
 - Deployed the RPM package on a CDP cluster. Verifed the scripts.
 - Deployed the DEB package on a docker container. Verified the scripts.

Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Reviewed-on: http://gerrit.cloudera.org:8080/18939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-16 11:13:23 +00:00
Joe McDonnell
234d641d7b IMPALA-11961/IMPALA-12207: Add Redhat 9 / Ubuntu 22 support
This adds support for Redhat 9 / Ubuntu 22. It updates
to a newer toolchain that has those builds, and it adds
supporting code in bootstrap_system.sh.

Redhat 9 and Ubuntu 22 use python = python3, which requires
various changes to build scripts and tests. Ubuntu 22 uses
Python 3.10, which deprecates certain ssl.PROTOCOL_TLS, so
this adapts test_client_ssl.py to that change until it
can be fully addressed in IMPALA-12219.

Various OpenSSL methods have been deprecated. As a workaround
until these can be addressed properly, this specifies
-Wno-deprecated-declarations. This can be removed once the
code is adapted to the non-deprecated APIs in IMPALA-12226.

Impala crashes with tcmalloc errors unless we update to a newer
gperftools, so this moves to gperftools 2.10. gperftools changed
the default for tcmalloc.aggressive_memory_decommit to off, so
this adapts our code to set it for backend tests. The gperftools
upgrade does not show any performance regression:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.08    | -0.64%     | 2.20       | -0.37%         |
+----------+-----------------------+---------+------------+------------+----------------+

With newer Python versions, the impala-virtualenv command
fails to create a Python 3 virtualenv. This switches to
using Python 3's builtin venv command for Python >=3.6.

Kudu needed a newer version and LLVM required a couple patches.

Testing:
 - Ran a core job on Ubuntu 22 and Redhat 9. The tests run
   to completion without crashing. There are test failures
   that will be addressed in follow-up JIRAs.
 - Ran dockerised tests on Ubuntu 22.
 - Ran dockerised tests on Ubuntu 20 and Rocky 8.5.

Change-Id: If1fcdb2f8c635ecd6dc7a8a1db81f5f389c78b86
Reviewed-on: http://gerrit.cloudera.org:8080/20073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-06-21 05:21:01 +00:00
Joe McDonnell
6222785ef5 IMPALA-12179 (part 3): Remove remaining lsb_release references
This removes a few stray lsb_release references in distcc
scripts and the install_docker.sh script. It then removes
the redhat-lsb package from the list of installed packages.

Testing:
 - Ran a build on Rocky 8.5
 - Ran dockerised tests on Ubuntu 20

Change-Id: I9d84e9ab8076fd8cc4727a5da118d9a747d4a005
Reviewed-on: http://gerrit.cloudera.org:8080/20071
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-06-15 16:22:15 +00:00
Laszlo Gaal
4a153792de IMPALA-12185: Don't install snappy-devel for Red Hat derivatives
Experimental builds on systems based on Red Hat Linux v8.x revealed that
snappy-devel is installed only for RedHat-based systems in
bin/bootstrap_system.sh (the script that preps a workstation for local
Impala development). The same library was not installed for Ubuntu
variants -- probably because Impala has been using Snappy from the
toolchain for a long time now.

Subsequent tests revealed that the build and dataload phases can
complete successfully on Red Hat Linux v8.6 even in the absence of this
package, so this patch removes the installation of snappy-devel during
system preparation.

Tested by running bin/bootstrap_system.sh on a newly minted private VM
instance running RedHat Linux 8.6, then running
buildall.sh -skiptests -format -testdata
successfully.

Change-Id: I6b14e09fa78d51a387a066eb04495f758430fa9d
Reviewed-on: http://gerrit.cloudera.org:8080/20021
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-10 19:46:42 +00:00
Michael Smith
683bef1ca4 IMPALA-11253: Support testing with Java 11 (take 2)
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.

Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 16:04:29 +00:00
Michael Smith
1b6011c6a0 Revert "IMPALA-11253: Support testing with Java 11"
This reverts commit ee6395db76 as it is
not flexible enough at detecting Java automatically in likely build
environments.

Change-Id: I836c9f7fd10740b15f7e40b2e7f889ac7ee61fc3
Reviewed-on: http://gerrit.cloudera.org:8080/19908
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-05-21 14:00:14 +00:00
Michael Smith
ee6395db76 IMPALA-11253: Support testing with Java 11
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-05-19 22:32:00 +00:00
Laszlo Gaal
93d4ce532d Install libreadline-dev for toolchain builds on ARM hosts
This is required because ARM builds of the toolchain binaries are not
(yet) present in s3://native-toolchain, so ARM builds have to build the
toolchain locally, before being able to build Impala.

The toolchain's Python build failed before this patch, because it missed
the libreadline-dev package, which is needed for Python's readline support.
This patch adds it and its libncurses dependency.

Change-Id: I1bf6193027d691d3ded727cb59424c5dc9963ea9
Reviewed-on: http://gerrit.cloudera.org:8080/19835
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-05-04 17:09:18 +00:00
stiga-huang
45ea094fa2 IMPALA-11716: Bump up gcovr version to 4.2
IMPALA-9999 upgrades to GCC version to 10.4 which generates new gcov
format that the current gcovr version (3.4) can't parse. This patch
upgrades gcovr to the latest Python2-compatible version (4.2). Also adds
Jinja2, MarkupSafe and lxml as the required dependent packages. The
development packages of libxml2 and libxslt are also added in
bootstrap_system.sh and bootstrap_build.sh.

This patch also fixes a failure due to the gcov executable not found in
PATH.

Tests:
 - Verified builds on Ubuntu 16.04 and CentOS 7.9
 - Verified coverage_helper.sh work after this patch

Change-Id: I9458fa0dc97d69f88a4e8a3313dc9440215dfd52
Reviewed-on: http://gerrit.cloudera.org:8080/19226
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-11 00:09:05 +00:00
Joe McDonnell
11e66523d6 IMPALA-11526: Install en_US.UTF-8 locale into docker images
In IMPALA-11492, ExprTest.Utf8MaskTest was failing on some
configurations because the en_US.UTF-8 was missing. Since the
Docker images don't contain en_US.UTF-8, they are subject
to the same bug. This was confirmed by adding tests cases
to the test_utf8_strings.py end-to-end test and running it
in the dockerized tests.

This add the appropriate language pack to the list of packages
installed for the Docker build.

Testing:
 - This adds end-to-end tests to test_utf8_strings.py covering the
   same cases that were failing in ExprTest.Utf8MaskTest. They
   failed without the added languages packs, and now succeed.

Change-Id: I353f257b3cb6d45f7d0a28f7d5319fdb457e6e3d
Reviewed-on: http://gerrit.cloudera.org:8080/19080
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
2022-10-11 20:30:50 +00:00
Joe McDonnell
14b9fb97b5 IMPALA-11492: Add langpacks-en (centos) and language-pack-en (Ubuntu)
Machines that don't have en_US.UTF-8 installed see
issues when running ExprTest.Utf8MaskTest.
This currently impacts the Docker-based tests.
This installs the appropriate language packs
to have en_US.UTF-8 installed.

Testing:
 - Ran docker-based tests and verified that
   ExprTest.Utf8MaskTest passes.

Change-Id: I1b8696190e4713bda787e773d48943b5dfc6335e
Reviewed-on: http://gerrit.cloudera.org:8080/18875
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2022-08-22 15:38:57 +00:00
yx91490
7da39c58e7 IMPALA-11440: Remove the unnecessary installation of Apache Ant
Testing:
 - run existing CI jobs.

Change-Id: I4f744c8a1d0ec9a7e7fbe00d42bc50b1cbd68b08
Reviewed-on: http://gerrit.cloudera.org:8080/18741
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-07-28 15:46:26 +00:00
yx91490
bf4320e55d IMPALA-11439: Add option to control whether prepopulate m2 directory or not.
The bootstrap_system.sh will prepopulate the .m2 directory by
downloading a 800M+ m2_archive.tar.gz to speed up the following maven
packaging process, although it's not necessary. Meanwhile, due to
different network environment, the download speed of the archive file
is not necessarily fast.

There add an environment variable 'PREPOPULATE_M2_REPOSITORY' to control
whether to prepopulate the m2 directory or not, which is true by
default.

Testing:
- manually run './bin/bootstrap_system.sh' and expect to see the log
  '>>> Populating m2 directory...' and 'Downloading m2 archive from ...',
  and the terminal returned for a while.
- manually run 'PREPOPULATE_M2_REPOSITORY=true ./bin/bootstrap_system.sh'
  and expect to see the log '>>> Populating m2 directory...' and
  'Downloading m2 archive from ...', and the terminal returned for a while.
- manually run 'PREPOPULATE_M2_REPOSITORY=false ./bin/bootstrap_system.sh'
  and expect to see the log ">>> Skip populating m2 directory", and
  the terminal returned immediately.

Change-Id: Ie3ac55099f326e2abe2f7dc66c08ad7023cb6baf
Reviewed-on: http://gerrit.cloudera.org:8080/18740
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-07-28 15:27:04 +00:00
Joe McDonnell
b157f1a01a IMPALA-11451: Add net-tools dependency to bootstrap_system.sh
TestKrpcSocket uses netstat as part of its test. netstat
is provided by the net-tools package on Ubuntu and Centos.
This adds that as a dependency in bootstrap_system.sh.
Docker-based tests had been hitting this test failure,
because they start from a clean docker image.

Testing:
 - Ran docker-based tests and TestKrpcSocket now passes

Change-Id: I9ad704e408d4ca4741178d4ea7a857bf30d4cfb6
Reviewed-on: http://gerrit.cloudera.org:8080/18774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-25 22:30:04 +00:00
Michael Smith
64b324ac40 IMPALA-11389: Include Python 3 eggs in tarball
Build Python 3 eggs for the shell tarball so it works with both Python 2
and Python 3. The impala-shell script selects eggs based on the
available Python version.

Inlines thrift for impala-shell so we can easily build Python 2 and
Python 3 versions, consistent with other libraries. The impala-shell
version should always be at least as new as IMPALA_THRIFT_PY_VERSION.

Thrift 0.13.0+ wraps all exceptions during TSocket read/write operations
in TTransportException. Specifically socket.error that we got as raw
exceptions are now wrapped. Unwraps them before raising to preserve
prior behavior.

A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE;
otherwise it will use 'python', and if unavailable try 'python3'.

Adds tests for impala-shell tarball with Python 3.

Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
Reviewed-on: http://gerrit.cloudera.org:8080/18653
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-07-14 23:52:04 +00:00
Michael Smith
181fd94068 IMPALA-8373: Test impala-shell with python3
Sets up a python3 virtualenv, installs impala-shell into it, and runs
tests.

Change-Id: I8e123aecd53a7ded44a7da7eb8c8b853cebbfc56
Reviewed-on: http://gerrit.cloudera.org:8080/18588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-06-13 17:13:42 +00:00
Laszlo Gaal
ec391ab25c IMPALA-10618: Update bootstrap_system for Ubuntu 20.04
Impala started adding Ubuntu 20.04 support in various places.
This patch extends bootstrap_config.sh for Ubuntu 20.04 coverage:

1. The runtime version check error message is updated to claim support
   for Ubuntu 20.04.

2. Kudu needs libtinfo.5.so on Ubuntu 20.04 for the minicluster binaries.
   bin/bootstrap_system.sh now installs it when running on Ubuntu 20.04.

3. The OpenJDK default version reset to JDK 8 is extended to Ubuntu 20.04.

Tested by running the code using docker/test-with-docker.py using
--base-image=ubuntu:20.04 and observing that Kudu was able to start in
the minicluster. The test runs completed, but there were test failures,
for which separate tickets will be filed.

Change-Id: I212f6df3657cf9d621a0669573e1e511eae13662
Reviewed-on: http://gerrit.cloudera.org:8080/17240
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-31 14:24:41 +00:00
Laszlo Gaal
eceec36f69 IMPALA-10385: Fix RPM repo ID capitalization for Centos 8.3
Centos 8.3 changed package repo ID capitalization from MixedCase
to all lowercase. On Centos 8 snappy-devel is installed from the
PowerTools repo, which is not enabled by default, so the install command
has to enable is temporarily using the repo ID.
The capitalization change broke bootstrap_system.sh, failing builds
on Centos 8.

The patch changes the `dnf install` call to use a glob pattern
for the PowerTools repo ID to cover the naming conventions in all
Centos 8.x releases.

Change-Id: I224beb1189ce25ae66ecd78d70757537e117805a
Reviewed-on: http://gerrit.cloudera.org:8080/16844
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-11 19:24:52 +00:00
zhaorenhai
48113bcffc IMPALA-10329 Change apt install retry times to 30
Change apt install retry times to 30 in bootstrap_system.sh,
Because this always timeout recently.
And add solution for waiting the apt's lock-frontend

Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Reviewed-on: http://gerrit.cloudera.org:8080/16751
Reviewed-by: Jim Apple <jbapple@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-20 07:44:35 +00:00
Qifan Chen
61a020d0f8 IMPALA-10007: Impala development environment does not support
Ubuntu 20.04

This is a minor amendment to a previously merged change with
ChangeId I4f592f60881fd8f34e2bf393a76f5a921505010a, to address
additional review comments. In particular, the original commit
referred to Ubuntu 20.4 whereas it should have used Ubuntu 20.04.

Change-Id: I7db302b4f1d57ec9aa2100d7589d5e814db75947
Reviewed-on: http://gerrit.cloudera.org:8080/16241
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-22 05:03:32 +00:00
huangtianhua
6ff3707c7f IMPALA-10090 Pull newest code of native-toolchain before build it
If native-toolchain exists we should pull the newest code
before build it.

This change fixes the error of
I2da3ffce7abb88190be0a5ea0e2cf603f98ee15e

Change-Id: I71e23fc8a6d37d8d09565dfa38318c7dbef25c45
Reviewed-on: http://gerrit.cloudera.org:8080/16434
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-11 13:15:47 +00:00
huangtianhua
6aaea3216c IMPALA-10090 Pull newest code of native-toolchain before build it
If native-toolchain exists we should pull the newest code
before build it.

Change-Id: I2da3ffce7abb88190be0a5ea0e2cf603f98ee15e
Reviewed-on: http://gerrit.cloudera.org:8080/16402
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-08 23:35:41 +00:00
zhaorenhai
0098113d95 IMPALA-10090: Create aarch64 development environment on ubuntu 18.04
Including following changes:
1 build native-toolchain local by script on aarch64 platform
2 change some native-toolchain's lib version number
3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things,
  because on aarch64, just need to download cdp components ,
  but not need to download toolchain.
4 download hadoop aarch64 nativelibs , impala building needs these libs.

With this commit,  on ubuntu 18.04 aarch64 version,
just need to run bin/bootstrap_development.sh, just like x86.

Change-Id: I769668c834ab0dd504a822ed9153186778275d59
Reviewed-on: http://gerrit.cloudera.org:8080/16065
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-02 06:47:30 +00:00
Qifan Chen
ebb5599aaa IMPALA-10007: Impala development environment does not support
Ubuntu 20.4

This work addresses the current limitation in Impala development
environment in that Ubuntu 20.4 is not supportd. The fix modifies
bootstrap_system.sh and bootstrap_toolchain.py to specifically
allow the bootstrapping of the development environment on a maching
running Ubuntu 20.4. Limited use shows that the environment is useful
and stable, similar to the one running on Ubuntu 18.4.

Testing on a box running Ubuntu 20.4:
1. Successfully bootstrapped the entire Impala development environment
2. Interacted with the enviroment through the following tools:
    gdb
    jdb
    clang-format
    impalad GUI
    vim
3. Ran all tests

Limitations found with Ubuntu 20.4 environment.
1. gdb in Impala toolchain is not compatible with Impala C++ test
   code ${IMPALA_HOME}/be/build/latest/service\
   /unifiedbetests (invoked by ${IMPALA_HOME}/be/build/latest/\
   scheduling/admission-controller-test) and reports the following
   error, after attaching to the test process.

   BFD (GNU Binutils) 2.25.51 internal error, aborting at elf64-x86-64.c
   ine 5583 in elf_x86_64_get_plt_sym_val

Change-Id: I4f592f60881fd8f34e2bf393a76f5a921505010a
Reviewed-on: http://gerrit.cloudera.org:8080/16238
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-07-25 04:28:20 +00:00
Laszlo Gaal
839cd0ba5a IMPALA-9845: Point Maven and Ant downloads to stable locations
Ant released a new version in May 2020, which made the URL in
bootstrap_system.sh obsolete. At the same time Apache created new rules
for the download locations, moving older releases to archive.apache.org.

This patch changes the download URLs for Maven and Ant to point to the
stable locations at archive.apache.org. These locations don't change
when a new version of a project is released, so downloads pulling a
specific version will not be affected by a new release. At the same time
new releases are stored at the archive site as well, so this location
works for all versions.

Change-Id: I1875f260b931ef096fc91a4723f91310225c55c9
Reviewed-on: http://gerrit.cloudera.org:8080/16062
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-23 18:47:53 +00:00
Joe McDonnell
f15a311065 IMPALA-9709: Remove Impala-lzo from the development environment
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.

This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.

The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.

Testing:
 - Dryrun of GVO
 - Modified TestPartitionMetadataUncompressedTextOnly's
   test_unsupported_text_compression() to add LZO case

Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-06-15 23:42:12 +00:00
Joe McDonnell
fb282852ef IMPALA-9107 (part 2): Add script to use the m2 archive tarball
This adds a script to find an appropriate m2 archive
tarball, download it, and use it to prepopulate the
~/.m2 directory.

The script uses the JSON interface for Jenkins to search through
the all-build-options-ub1604 builds on jenkins.impala.io to
find one that:
1. Is building the "master" branch
2. Has the m2_archive.tar.gz
Then, it downloads the m2 archive and uses it to populate ~/.m2.
It does not overwrite or remove any files already in ~/.m2.

The build scripts that call populate_m2_directory.py do not
rely on the script succeeding. They will continue even if
the script fails.

This also modifies the build-all-flag-combinations.sh script
to only build the m2 archive if the GENERATE_M2_ARCHIVE
environment variable is true. GENERATE_M2_ARCHIVE=true will
clear out the ~/.m2 directory to build an accurate m2 archive.
Precommit jobs will use GENERATE_M2_ARCHIVE=false, which
will allow them to use the m2 archive to speed up the build.

Testing:
 - Ran gerrify-verify-dryrun
 - Tested locally

Change-Id: I5065658d8c0514550927161855b0943fa7b3a402
Reviewed-on: http://gerrit.cloudera.org:8080/15735
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-12 01:54:16 +00:00
Tim Armstrong
c43c03c5ee IMPALA-3926: part 2: avoid setting LD_LIBRARY_PATH
This removes LD_LIBRARY_PATH and LD_PRELOAD from the
developer's shell and cleans it up. With the preceding
change, toolchain utilities like clang can be run without
a special LD_LIBRARY_PATH.

This fixes a bug where libjvm.so was registered as a
static instead of a shared library, which adds it to the
RUNPATH variable in the binary, which provides a default
search location that can be overriden by LD_LIBRARY_PATH.

Impala binaries don't have the rpath baked in for some
libraries, including Impala-lzo, libgcc and libstdc++.
, so we still need to set LD_LIBRARY_PATH when running
those. That is solved with wrapper scripts that sets
the environment variables only when invoking those
binaries, e.g. starting a daemon or running a backend
test. I added three scripts because there were 3 sets
of environment variables. The scripts are:
* run-binary.sh: just sets LD_LIBRARY_PATH
* run-jvm-binary.sh: sets LD_LIBRARY_PATH and CLASSPATH
* start-daemon.sh: sets LD_LIBRARY_PATH and CLASSPATH and
  kerberos-related environment variables.

The binaries, in almost all cases, work fine without
those tweaks, because libstdc++ and libgcc are picked
up along with libkuduclient.so from the toolchain (they
are in the same directory). I decided to leave good enough
alone here. run-binary.sh and friends can be used in
any remaining edge cases to run binaries.

An alternative to the 3 scripts would be to have an
uber-script that set all the variables, but I felt
that it was better to be specific about what
each binary needed. Cleaning the LD_LIBRARY_PATH
mess up has given me a distaste for scattershot
setting of environment variables. I am open to
revisiting this.

Testing:
* Ran tests on centos 7
* Manually tested that my dev env with
 LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu continued
 to work (for now). All ubuntu 16.04 and 18.04 dev
 envs that were set up with bootstrap_development.sh
 will be in this state.

Change-Id: I61c83e6cca6debb87a12135e58ee501244bc9603
Reviewed-on: http://gerrit.cloudera.org:8080/14494
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-07 08:50:44 +00:00
Laszlo Gaal
34018f6275 IMPALA-9629: Add CentOS 8.1 support to bootstrap_system.sh
CentOS 8.1 is a new major version of the CentOS family.
It is now stable and popular enough to start supporting it for Impala
development.

Prepare a raw CentOS 8.1 system to support Impala development and testing.
This should work on a standalone computer, on a virtual machine,
or inside a Docker container.

Details:
- snappy-devel moved to the PowerTools repo, so it needs to be installed
  from there
- CentOS 8 has no default Python version. The bootstrap script installs
  (or configures) Python2 with pip2, then makes them the default via the
  "alternatives" mechanism. The installer is adaptive, it performs only
  the necessary steps, so it works in various environments.
  The installer logic is also shared between bin/bootstrap_system.sh and
  docker/entrypoint.sh
- The toolchain package tag "ec2-centos-8" is added to
  bootstrap_toolchain.py
- For some unknown reason, when the downloaded Maven tarball is extracted
  in a Docker-based test, the "bin" and "boot" directories are created
  with owner-only permissions. The 'impdev' users has no access to the
  maven executable, which then breaks the build.
  This patch forcibly restores the correct permissions on these
  directories; this is a no-op when the extraction happens correctly.
- TOOLCHAIN_ID is bumped to a build that already has CentOS 8 binaries.
- Centos8-specific bootstrap code was added to the Docker-based tests.

Tested:
- ran the Docker-based tests with --base-image=centos:8 to verify the following build
  phases are successful:
  * system prep
  * build
  * dataload
  and that test can start. Passing all tests is was not a requirement for this step,
  although plausible test results (i.e. not all of the tests fail) were.

- ran the Docker-based tests to verify nonregression with --base-image set to the
  following: centos:7, ubuntu:16.04, ubuntu:18.04.
  On centos:7 and ubuntu:16.04 the only failure was IMPALA-9097 (BE tests fail without
  the minicluster running); ubuntu:18.04 showed the same failures as the current upstream
  code.

- passed a core-mode test run on private infrastructure on Centos 7.4

- ran buildall.sh in core mode manually inside a Docker container, simulating a developer
  workflow (prep-build-dataload-test). There were several observed test failures, but
  the workflow itself was run to completion with no problems.

Change-Id: I3df5d48eca7a10219264e3604a4f05f072188e6e
Reviewed-on: http://gerrit.cloudera.org:8080/15623
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-15 17:23:43 +00:00
Tim Armstrong
5989900ae8 IMPALA-9618: fix some usability issues with dev env
Automatically assume IMPALA_HOME is the source directory
in a couple of places.

Delete the cache_tables.py script and MINI_DFS_BASE_DATA_DIR
config var which had both bit-rotted and were unused.

Allow setting IMPALA_CLUSTER_NODES_DIR to put the minicluster
nodes, most important the data, in a different location, e.g.
on a different filesystem.

Testing:
I set up a dev environment using this code and was able to
load data and run some tests.

Change-Id: Ibd8b42a6d045d73e3ea29015aa6ccbbde278eec7
Reviewed-on: http://gerrit.cloudera.org:8080/15687
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-09 08:01:24 +00:00