117 Commits

Author SHA1 Message Date
Joe McDonnell
5eea4f6f79 IMPALA-14559: Ship calcite-planner jar in Impala packages
This adds the java/impala-package Maven project to make it easier
to ship / test the Calcite planner. impala-package has a dependency
on impala-frontend and calcite-planner, so its classpath requires
no extra work when constructing the classpath.

An additional cleanup is that this no longer puts the
impala-frontend-*-tests.jar on the classpath by default. This requires
updating the query event hooks test, as it relies on that jar being
present.

This does not change the default value for the use_calcite_planner
query option, so there is no change in behavior.

Testing:
 - Ran a core job
 - Built docker images and OS packages locally

Change-Id: I81dec2a5b59e279229a735c8bb1a23c77111a793
Reviewed-on: http://gerrit.cloudera.org:8080/23497
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-21 03:36:12 +00:00
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Riza Suminto
02fb5e1ccb IMPALA-13937: (Addendum) Replace diff with manual bash script
diff require BASE_IMAGE to have diffutils preinstalled. However, not all
BASE_IMAGE have it preinstalled. This patch replace the diff invocation
with manual bash script.

Testing:
Completed build with ubi8:latest that does not have diffutils
preinstalled.

Change-Id: I58e9ef7c344caffd198664e3f9683f54ce2c1914
Reviewed-on: http://gerrit.cloudera.org:8080/22898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-14 21:23:36 +00:00
Laszlo Gaal
cc703b3c3a IMPALA-13937: Use simpler chmod syntax to set +t on /var/tmp in Docker build
Some Docker base images contain basic Unix utilities implemented by
Busybox instead of the usual linux-coreutils package. The chmod command
in the Busybox implementation seems to ignore certain syntax variants:
the current invocation for setting the sticky bit (+t) on /var/tmp got
silently ignored, while chmod indicated success, returning 0 to the
calling script.

This patch changes the chmod call to a slightly simple syntax, which was
tested to be understood by Busybox and coreutils both; and adds a simple
inline check to assert that the directories required by Kerberos
- exist
- and have the required ownership and permission structure.

The assertion fails the Docker build if setting up /tmp and /var/tmp in
a Kerberos-compatible way did not succeed.

Change-Id: I20c52dc70fb73337efcd6d12652bf99c3c473ff9
Reviewed-on: http://gerrit.cloudera.org:8080/22811
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-09 15:33:08 +00:00
Laszlo Gaal
b17f22048a IMPALA-14029: Add Kerberos utilities to Docker image build
The Kerberos utility package was missing from the OS package list of
the Docker container build when the base image was detected being
a hardened Wolfi-based image. This prevented Impala coordinators from
renewing their Kerberos tickets in containerized and Kerberized
environments.

This patch adds the Kerberos utility package to the list of installed
packages for such minimal containers.

Change-Id: I84f295ac8ae4c000868abff0342b922beb141b5b
Reviewed-on: http://gerrit.cloudera.org:8080/22854
Reviewed-by: Norbert Luksa <norbert.luksa@cloudera.com>
Tested-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
2025-05-09 15:33:08 +00:00
Laszlo Gaal
e6078b4281 IMPALA-13825: Extend Docker container build to custom base images
Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.

This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.

Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.

The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
  If set to 'true', triggers the use of  the custom image.
  When set to 'false' or left unspecified, the Docker base image is
  selected by the existing logic of matching the build platform's
  operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image

These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.

The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.

A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.

To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:

- synchronizes every launched container's timezone to the host.
  This is needed for Iceberg time-travel test, which create timestamped
  Iceberg metadata items in the impalad context inside a container, but
  check creation/modification times of the same items in the test scripts
  running on the host, outside the containers. The tests scripts have
  the implicit expectation that the same local time is shared across
  all these contexts, but this is not necessarily true if the host,
  where tests are running is set to a timezone other than UTC.

  Time sycnhronization is achieved by injecting the TZ environment
  variable into the container, holding the name of the timezone used
  on the host. The timezone name is taken either from the host's TZ
  variable (if set), or from the host's /etc/localtime symlink,
  checking the name of the timezone file it points to.
  In case /etc/localtime is not a symlink (and TZ is not set on the
  host), the host's /etc/localtime file is mounted into the container.

- sets up a directory for each container to collect the Java VMs error
  files (hs_err_pidNNNN.log) from the containers.

- adds the --mount_sources command line parameter, which mounts the
  complete $IMPALA_HOME subtree into the container at
  /opt/impala/sources to make source code available inside the container
  for easier debugging.

Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
  containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
  containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
  containers) using Wolfi's wolfi-base containers

Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-28 13:40:38 +00:00
Laszlo Gaal
2a03499c61 IMPALA-13458: Fix installing curl on Red Hat variants for dockerised tests
Red Hat 8 and 9 as well as their variants (e.g. Rocky Linux) preinstall
the curl-minimal package as a prerequisite for their package manager.
Unfortunately this conflicts with the installation of the full-blown
curl package when the Impala daemon Docker images are built during a
dockerised test run. The failure is caused by the two packages having
slightly different version numbers.

Fix this the same way as in bootstrap_system.sh: add the --allowerasing
flag to the yum command line to let yum/DNF substitute the full curl
version for the preinstalled curl-minimal package.

Tested by executing dockerised tests on Rocky Linux 9.2

Change-Id: I30fa0f13a77ef2a939a1b754014a78c171443c71
Reviewed-on: http://gerrit.cloudera.org:8080/21944
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-10 06:36:45 +00:00
Riza Suminto
2f5aef64a5 IMPALA-13617: Rename c_last_review_date to c_last_review_date_sk
TPC-DS v2.11.0, section 2.4.7, rename column customer.c_last_review_date
to customer.c_last_review_date_sk to align with other surrogate key
columns. impala-tpcds-kit has been modified to reflect this column name
change in
086d7113c8
However, the tpcds dataset schema in Impala test data remains unchanged.

This patch did such a rename to align closer to TPC-DS v2.11.0. This
patch contains no data type adjustment because such adjustment requires
larger changes.

customer_multiblock_page_index.parquet added by IMPALA-10310 is
regenerated to follow the new schema of table customer. The SQL used to
create the file is ordered more specifically over both
c_current_cdemo_sk and c_customer_sk columns. The associated test
assertion in parquet-page-index.test is also updated.

A workaround in test_file_parser.py added by IMPALA-13543 is now removed
after this change is applied.

Testing:
- Pass core tests.

Change-Id: Ie446b3c534cb8f6f54265cd9b2f705cad91dd4ac
Reviewed-on: http://gerrit.cloudera.org:8080/22223
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-20 06:20:37 +00:00
gaurav1086
61e90e9e90 IMPALA-13182: Support uploading additional jars
This patch enables adding custom jars from the
absolute path: /opt/impala/aux-jars to the CLASSPATH.

Steps:
1. Download the jars into the /opt/impala/aux-jars directory
2. Restart impala cluster.

Testing:
* Tested manually: Added jar files in /opt/impala/aux-jars
  before impala start. After starting impala, asserted that
  the new jars were appended to the value of CLASSPATH as
  printed in the impalad logs.

Change-Id: Ica5fa4c0cd1a5c938f331f3a4bba85d4910db90e
Reviewed-on: http://gerrit.cloudera.org:8080/21556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-09-12 23:57:20 +00:00
stiga-huang
daa7f8ad88 IMPALA-13328: Fix missing krb5-config in building impala_quickstart_client docker image
Building the impala_quickstart_client docker image failed by krb5-config
not found. It's installed by the libkrb5-dev package. This patch adds it
to fix the build failure. Also improves
docker/publish_images_to_apache.sh to skip inexisting images (usually
due to not be built). Updates the quickstart_hms image to base on Ubuntu
18.04.

Also fixes an issue that docker/CMakeLists.txt doesn't dump all the
image names to docker/docker-images.txt

Tests:
 - Verified the quickstart images on MacOS.

Change-Id: Ieaa9878fa9cd9902ac883866c82e224889940615
Reviewed-on: http://gerrit.cloudera.org:8080/21725
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-29 03:09:29 +00:00
Andrew Sherman
78b9b09a16 IMPALA-13076 Add pstack and jstack to Impala Redhat docker images
When the Impala docker images are deployed in production environments,
it can be hard to add debugging tools at runtime. Two of the most
useful diagnostic tools are jstack and pstack, which can be used to
print Java and native stack traces. Install these tools into Redhat
images which are the most commonly used in production.

To install pstack we install gdb
To install jstack we install a development jdk on top of the headless
jdk.

Extend the install_os_packages.sh script to add an argument to
--install-debug-tools to set the level of diagnostic tools to install.
The possible arguments are:
  none - install no extra tools
  basic - install pstack and jstack
  full - install more debugging tools.

In a Centos 8.5 build, the size of a impalad_coord_exec image increased
from 1.74GB to 1.85GB, as reported by ‘docker image list’.

What other tools might be added?
- Installing perf is tricky as in a container perf requires an
  installation specific to the underlying linux kernel image, which is
  hard to predict at build time.
- Installing pprof is hard as installation seems to require compiling
  from sources. Clearly there are many options and we cannot install
  everything.

TESTING

Built release and debug docker images, and used jstack and pstack in a
running container to print Impala's stacks.

Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Reviewed-on: http://gerrit.cloudera.org:8080/21433
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-06-21 22:40:28 +00:00
Michael Smith
a7d7336531 IMPALA-12566: Fix RpcMgrKerberizedTest on RedHat 8
On RedHat 8, RpcMgrKerberizedTest cases fail with

  Jan 09 14:47:03 msmith.vpc.cloudera.com krb5kdc[609624](info): TGS_REQ
  (1 etypes {aes128-cts-hmac-sha1-96(17)}) 127.0.0.1: LOOKING_UP_SERVER:
  authtime 0, etypes {rep=UNSUPPORTED:(0)}
  impala-test/msmith.vpc.cloudera.com@KRBTEST.COM for
  impala-test/msmith@KRBTEST.COM, Server not found in Kerberos database

This happens because bootstrap_system.sh adds an entry to /etc/hosts to
resolve 127.0.0.1 to hostname and puts the short hostname first. During
negotiation, Kudu RPC will call GetFQDN to retrieve the FQDN, which for
our tests running on localhost returns the short hostname.

Fixes RpcMgrKerberizedTest by swapping the order of entries added to
/etc/hosts so the FQDN comes first. This is consistent with the example
provided in https://man7.org/linux/man-pages/man5/hosts.5.html.

Avoids 'hostname -f'; on RedHat it's identical to 'hostname', and on
Ubuntu it causes this test to fail.

Change-Id: I1eb24f9faec766e388d793408aedecdc92107185
Reviewed-on: http://gerrit.cloudera.org:8080/20876
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2024-01-18 00:00:01 +00:00
Michael Smith
472dea5c3c IMPALA-12355: Make utility_entrypoint arch-agnostic
Updates utility_entrypoint.sh for the impala_profile_tool image to
detect the correct JVM native library paths based on a glob, as they're
architecture-specific. Follows the pattern established in
daemon_entrypoint.sh, except impala_profile_tool only uses Java 8 on
Ubuntu.

Excepted output

  $ docker run --entrypoint bash -i impala_profile_tool_debug /opt/impala/bin/utility_entrypoint.sh
  LD_LIBRARY_PATH: /opt/impala/lib:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server

Change-Id: I8e6b781bef52e60072ff02f4098d5ad9405aa2be
Reviewed-on: http://gerrit.cloudera.org:8080/20629
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-11-03 16:51:00 +00:00
stiga-huang
8d0ab2b684 IMPALA-10262: RPM/DEB Packaging Support
This patch bases on a previous patch contributed by Shant Hovsepian:
https://gerrit.cloudera.org/c/16612/

It adds a new option, -package, to buildall.sh for building a package
for the current OS type (e.g. CentOS/Ubuntu). You can also use
"make/ninja package" to build the package. Scripts for launching the
services and the required configuration files are also added.

Tests:
 - Built on Ubuntu 18.04/20.04 and CentOS 7 using
   ./buildall.sh -noclean -skiptests -release -package
 - Deployed the RPM package on a CDP cluster. Verifed the scripts.
 - Deployed the DEB package on a docker container. Verified the scripts.

Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Reviewed-on: http://gerrit.cloudera.org:8080/18939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-16 11:13:23 +00:00
Michael Smith
3b0705ba63 IMPALA-11941: Support Java 17 in Impala
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.

Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.

Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.

Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.

Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).

Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
  test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath

Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-24 10:11:54 +00:00
Joe McDonnell
234d641d7b IMPALA-11961/IMPALA-12207: Add Redhat 9 / Ubuntu 22 support
This adds support for Redhat 9 / Ubuntu 22. It updates
to a newer toolchain that has those builds, and it adds
supporting code in bootstrap_system.sh.

Redhat 9 and Ubuntu 22 use python = python3, which requires
various changes to build scripts and tests. Ubuntu 22 uses
Python 3.10, which deprecates certain ssl.PROTOCOL_TLS, so
this adapts test_client_ssl.py to that change until it
can be fully addressed in IMPALA-12219.

Various OpenSSL methods have been deprecated. As a workaround
until these can be addressed properly, this specifies
-Wno-deprecated-declarations. This can be removed once the
code is adapted to the non-deprecated APIs in IMPALA-12226.

Impala crashes with tcmalloc errors unless we update to a newer
gperftools, so this moves to gperftools 2.10. gperftools changed
the default for tcmalloc.aggressive_memory_decommit to off, so
this adapts our code to set it for backend tests. The gperftools
upgrade does not show any performance regression:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.08    | -0.64%     | 2.20       | -0.37%         |
+----------+-----------------------+---------+------------+------------+----------------+

With newer Python versions, the impala-virtualenv command
fails to create a Python 3 virtualenv. This switches to
using Python 3's builtin venv command for Python >=3.6.

Kudu needed a newer version and LLVM required a couple patches.

Testing:
 - Ran a core job on Ubuntu 22 and Redhat 9. The tests run
   to completion without crashing. There are test failures
   that will be addressed in follow-up JIRAs.
 - Ran dockerised tests on Ubuntu 22.
 - Ran dockerised tests on Ubuntu 20 and Rocky 8.5.

Change-Id: If1fcdb2f8c635ecd6dc7a8a1db81f5f389c78b86
Reviewed-on: http://gerrit.cloudera.org:8080/20073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-06-21 05:21:01 +00:00
Joe McDonnell
9c6df6a691 IMPALA-12179 (part 1): Remove dependency on lsb_release for docker CMake
Newer operating systems like Redhat 9 do not supply
lsb_release as an official package. The /etc/os-release
file provides the same information in a more convenient
form. CMake 3.22 added support for reading those
/etc/os-release values directly via cmake_host_system_information().

This changes docker/CMakeLists.txt to use the new CMake
cmake_host_system_information() APIs to get values from
/etc/os-release. This removes the lsb_release code.

Testing:
 - Ran a docker build locally and verified it detected
   the distribution / version correctly

Change-Id: I04afd2b1c923f1331f7234d53a105a17956e3e18
Reviewed-on: http://gerrit.cloudera.org:8080/20069
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-06-15 16:22:15 +00:00
Michael Smith
3346d070ad IMPALA-11260: (Addendum) Restrict add-opens to Java 9+
Restricts jvm_automatic_add_opens to only apply to Java 9+ where the
option exists. Previously it would also include it in Java 8, which
caused the JVM to ignore all options in JAVA_TOOL_OPTIONS.

Tests for Java version by running $JAVA_HOME/bin/java -version (or
"java" if JAVA_HOME is unset) and parsing version from the first line.
All JVM implementations are expected to include the version in a quoted
string, such as "1.8.0_42" and "11.0.1".

Also added add-opens flags for frontend tests.
test_no_inaccessible_objects detected this in a test run.

Testing:
- manually confirmed -agentlib options are present with both Java
  8 and Java 11.
- promoted test_jvm_mem_tracking to run in all strategies, as it's fast
  and ensures JAVA_TOOL_OPTIONS is honored.

Change-Id: I85953e685f6bbbd213afd93f389066e82f193ddf
Reviewed-on: http://gerrit.cloudera.org:8080/19939
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-04 00:38:22 +00:00
Michael Smith
879afbab1f IMPALA-11260: Add add-opens to JAVA_TOOL_OPTIONS on startup
During Impala startup, Before starting the JVM (by calling libhdfs),
adds add-opens calls to JAVA_TOOL_OPTIONS to ensure Ehcache has access
to non-public members so it can accurately calculate object size.

This effectively circumvents new security precautions in Java 9+.

Use '--jvm_automatic_add_opens=false' to disable it.

Tested with Java 11

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Change-Id: I47a6533b2aa94593d9348e8e3606633f06a111e8
Reviewed-on: http://gerrit.cloudera.org:8080/19845
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 22:32:00 +00:00
Michael Smith
c8a21c51ef IMPALA-12081: Produce multiple Java docker images
This changes the docker image build code so that both Java 8 and Java 11
images can be built in the same build. Specifically, it introduces new
Make targets for Java 11 docker images in addition to the regular Java 8
targets. The "docker_images" and "docker_debug_images" targets continue
to behave the same way and produce Java 8 images of the same name. The
"docker_java11_images" and "docker_debug_java11_images" produce the
daemon docker images for Java 11.

Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when
starting a cluster with container images.

Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50
Reviewed-on: http://gerrit.cloudera.org:8080/19722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 22:19:24 +00:00
Michael Smith
7d07192e89 IMPALA-9627: Use universal_newlines for Python 3
Fixes subprocess.check_output calls for Python 3 using
universal_newlines=True.

Change-Id: I3dae9113635cf23ae02f1f630de311e64119c456
Reviewed-on: http://gerrit.cloudera.org:8080/19812
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-28 23:28:49 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Joe McDonnell
b57f56f3f8 IMPALA-12039 (addendum): Verify presence of pgrep during docker build
As a followup to the fix for IMPALA-12039, this verifies the
presence of pgrep at docker build time as well as at daemon
startup time.

Testing:
 - Build docker images locally
 - Ran Redhat 8 dockerised tests

Change-Id: I67e000b64cf6c1ab2225745f6b95b7a5e7ac3d36
Reviewed-on: http://gerrit.cloudera.org:8080/19713
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-11 00:44:40 +00:00
Abhishek Rawat
3e0a422c2e IMPALA-12039: graceful shutdown doesn't work in redhat docker image
'pgrep' was missing in redhat docker image and as a result graceful
shutdown script (bin/graceful_shutdown_backends.sh) was terminating
the impalad immediately without waiting for the
'shutdown_grace_period_s' grace period. Since, there wasn't enough
time window for cluster membership changes to propagate to
coordinator, it was scheduling query fragments on already deleted
executors and queries were failing.

Built an ubuntu 20 image and it had the 'pgrep' utility already
installed.

Testing:
- Built redhat 8 image and manually tested graceful shutdown in a
docker container.

Change-Id: I91ffc1fe3e022ce7f7507b2bd79a3e2c3851956d
Reviewed-on: http://gerrit.cloudera.org:8080/19711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-09 21:42:50 +00:00
Joe McDonnell
aa4050b4d9 IMPALA-11976: Fix use of deprecated functions/fields removed in Python 3
Python 3 moved several things around or removed deprecated
functions / fields:
 - sys.maxint was removed, but sys.maxsize provides similar functionality
 - long was removed, but int provides the same range
 - file() was removed, but open() already provided the same functionality
 - Exception.message was removed, but str(exception) is equivalent
 - Some encodings (like hex) were moved to codecs.encode()
 - string.letters -> string.ascii_letters
 - string.lowercase -> string.ascii_lowercase
 - string.strip was removed

This fixes all of those locations. Python 3 also has slightly different
rounding behavior from round(), so this changes round() to use future's
builtins.round() to get the Python 3 behavior.

This fixes the following pylint warnings:
 - file-builtin
- long-builtin
- invalid-str-codec
- round-builtin
- deprecated-string-function
- sys-max-int
- exception-message-attribute

Testing:
 - Ran cores tests

Change-Id: I094cd7fd06b0d417fc875add401d18c90d7a792f
Reviewed-on: http://gerrit.cloudera.org:8080/19591
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
ba3518366a IMPALA-11952 (part 4): Fix odds and ends: Octals, long, lambda, etc.
There are a variety of small python 3 syntax differences:
 - Octal constants need to start with 0o rather than just 0
 - Long constants are not supported (i.e. numbers ending with L)
 - Lambda syntax is slightly different
 - The 'ur' string mode is no longer supported

Testing:
 - check-python-syntax.sh now passes

Change-Id: Ie027a50ddf6a2a0db4b34ec9b49484ce86947f20
Reviewed-on: http://gerrit.cloudera.org:8080/19554
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Joe McDonnell
c71de994b0 IMPALA-11952 (part 1): Fix except syntax
Python 3 does not support this old except syntax:

except Exception, e:

Instead, it needs to be:

except Exception as e:

This uses impala-futurize to fix all locations of
the old syntax.

Testing:
 - The check-python-syntax.sh no longer shows errors
   for except syntax.

Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9
Reviewed-on: http://gerrit.cloudera.org:8080/19551
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Joe McDonnell
52956bae14 IMPALA-11741: Verify that 'hostname' is installed in Docker images
Some deployments rely on having the 'hostname' utility
installed in Impala's Docker image (e.g. for constructing
daemon startup arguments). Most distributions include it
by default, but Redhat UBI8 does not.

This adds 'hostname' to the list of installed packages
for both Ubuntu and the Redhat family. This also verifies
that 'hostname' runs properly.

Testing:
 - Verified that this adds hostname for UBI8 images

Change-Id: I5a760680294a3ad7e74e843d3f4c06cd38819e88
Reviewed-on: http://gerrit.cloudera.org:8080/19273
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-23 23:50:51 +00:00
Joe McDonnell
1899b2e34b IMPALA-11703: Set appropriate permissions on /var/tmp in Docker build
Impala will fail to start if the permissions on /var/tmp do
not have the sticky bit set (i.e. +t). Some Redhat UBI images
do not set the sticky bit (+t) on /tmp and /var/tmp. This
sets the sticky bit on those directories during Docker build.

Testing:
 - Verified that the sticky bit is set on one of the affected
   base images and that Impala can start up

Change-Id: I7ff32a035f40cb41d3a8dc80a07fd9924f41b942
Reviewed-on: http://gerrit.cloudera.org:8080/19222
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-11-08 06:21:38 +00:00
Joe McDonnell
11e66523d6 IMPALA-11526: Install en_US.UTF-8 locale into docker images
In IMPALA-11492, ExprTest.Utf8MaskTest was failing on some
configurations because the en_US.UTF-8 was missing. Since the
Docker images don't contain en_US.UTF-8, they are subject
to the same bug. This was confirmed by adding tests cases
to the test_utf8_strings.py end-to-end test and running it
in the dockerized tests.

This add the appropriate language pack to the list of packages
installed for the Docker build.

Testing:
 - This adds end-to-end tests to test_utf8_strings.py covering the
   same cases that were failing in ExprTest.Utf8MaskTest. They
   failed without the added languages packs, and now succeed.

Change-Id: I353f257b3cb6d45f7d0a28f7d5319fdb457e6e3d
Reviewed-on: http://gerrit.cloudera.org:8080/19080
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
2022-10-11 20:30:50 +00:00
Joe McDonnell
3d269e465e IMPALA-11634: Provide an option to use Java 11 for docker images
Currently, Docker images install Java 8 for Impala's use. This
adds the IMPALA_DOCKER_USE_JAVA11 environment variable. When
set to true, this installs Java 11 rather than Java 8. It
defaults to false. The daemon_entrypoint.sh script is modified
to detect Java 11 correctly. As a workaround for IMPALA-11260,
this appends a list of "--add-opens" statements to JAVA_TOOL_OPTIONS
when running with Java 11.

Testing:
 - Ran a set of dockerized tests on Rocky 8.5 with Java 11

Change-Id: Icc1dbd3f6a2279840218dc1da2b60077e211a328
Reviewed-on: http://gerrit.cloudera.org:8080/19031
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-11 20:30:50 +00:00
Joe McDonnell
3962ae1972 IMPALA-8770: Support building Docker images on Redhat-based distributions
Currently, Impala supports building and testing Docker
images on Ubuntu. This extends that same support to
Redhat-based distributions:
1. This splits out the Docker build's OS package
   installation into a separate install_os_packages.sh
   script. This script detects the OS and calls apt
   or yum as appropriate. The script takes the argument
   --install-debug-tools, which installs extra tools
   like iproute2 and ping. This defaults to true for debug
   images and false for release images.
2. This modifies daemon_entrypoint.sh to detect the
   OS and set LD_LIBRARY_PATH appropriate to account
   for different locations of Java.
3. This modifies docker/setup_build_context.py to
   handle different locations of libkudu_client.so
   and add extra sanity checks on various libraries
   found via globs.
4. This modifies bin/jenkins/dockerized-*.sh test
   infrastructure to be able to install docker on
   either Ubuntu or Redhat. It also changes the exit
   logic to collect the container logs.

Developers can override the base image for Redhat 7
and Redhat 8 builds via the IMPALA_REDHAT7_DOCKER_BASE
and IMPALA_REDHAT8_DOCKER_BASE environment variables.
These default to open source Redhat equivalents
(Centos 7.9 and Rocky 8.5 respectively), but they are
also known to work with Redhat UBI images.

Testing:
 - Ran dockerised testing on Rocky 8.5 via the
   rocky-8.5-dockerised-tests job.
 - Ran GVO
 - Ran a Docker build on Centos7 with UBI7 as the base image

Change-Id: Ibaff2560ef971ac2c2231a8e43921164ea1d2f4d
Reviewed-on: http://gerrit.cloudera.org:8080/19006
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-11 20:30:50 +00:00
Laszlo Gaal
68650057a1 Speed up default configuration for Docker-based tests
Docker-based parallelized test runs have proven themselves to be quite a
bit faster than regular core or exhaustive mode builds. While regular
sequential builds have also enjoyed shorter runtimes recently,
Docker-based parallel builds still enjoy a speed advantage.

Scheduling the parallel build segments is currently driven from the
test driver script test-with-docker.py, and the order in which the
segments are considered is currently hard-coded. The ordering was
originally devised experimentally, by timing several test runs, then
ordering the test segments based on expected duration, from longest
to shortest.

The average wall-clock run times for various test segments have changed
since this original ordering was committed: FE tests have gotten
significantly longer, while upgrading the default worker instance
type cut shortened the serial phase(s) of E2E tests.

This patch makes two changes to achieve a shorter overall run time for
the Docker-based tests:
1. Reorders the default scheduling order of the test segments, based
   on currently measured durations
2. Increases the default suite concurrency for execution hosts:
   bumps suite concurrency from 4 to 5 for machines with memory sizes
   between 96 and 140 GBs (the currently used worker size)

The latter change is also based on measurements: memory usage reports for
total peak memory (RSS) and peak memory (RSS) per test segment both
showed significant amounts of unused memory on the current default
worker instance size (having 32 CPUs and 128 GB of RAM).
Experiments showed that this machine size can reliable handle five
concurrent containerized test sessions with some safety margin remaining,
so the patch increases the default concurrency for this machine
category.

with both changes applied the duration of a core-mode test run with
default settings is reduced from 2h45 to 2h25 (on average).

Tested by running the Docker-based default test suite in core mode,
with Ubuntu 16.04 and Rocky Linux 8.5 base images.

Change-Id: Ifb609bcfb10e9f9b281cc6b375c36c9638db168b
Reviewed-on: http://gerrit.cloudera.org:8080/19038
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-09-29 21:53:54 +00:00
Michael Smith
f6151b0aa1 IMPALA-11585: Build quickstart_client with Ubuntu 20
Ubuntu 20.04 only provides the python3-pip package. Update building
quickstart_client to use python3-pip on Ubuntu 20.04.

Change-Id: Ife89b7db88dd58e96ba1b3e3972ca97204332dd4
Reviewed-on: http://gerrit.cloudera.org:8080/18984
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-09-26 23:10:19 +00:00
Joe McDonnell
d0cfdd139f IMPALA-10199: Add Ubuntu 20 toolchain configuration
Ubuntu 20 has been using the toolchain from Ubuntu 18.
Since Ubuntu 20 has been added to the toolchain, this
switches Impala to use a toolchain with Ubuntu 20 support
and uses the Ubuntu 20 bits. This is expected to help
with IMPALA-10962.

Testing:
 - Ran a core build on Ubuntu 20

Change-Id: If2394b668ef3c56b1a4c0773fd5e4ff92be4a846
Reviewed-on: http://gerrit.cloudera.org:8080/18559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-24 20:42:04 +00:00
Joe McDonnell
51b537a3cd IMPALA-11244: Run the minicluster for docker-based BE tests
As an optimization, the docker-based tests didn't run
the minicluster for BE tests. Some BE tests now require
the minicluster (DiskIoMgrTest.WriteToRemote*), so this
cannot work with the optimization.

This changes the docker-based tests to start the minicluster
for the BE tests.

Testing:
 - Ran a docker-based test job

Change-Id: I784a63a02886852e10ccca7c118c22ff7d38b8a3
Reviewed-on: http://gerrit.cloudera.org:8080/18414
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-05-10 00:19:18 +00:00
stiga-huang
35375b3287 IMPALA-2019(part-4): Add UTF-8 support for case conversion functions
There are 3 builtin case conversion string functions: upper(), lower(),
and initcap(). Previously they only convert English alphabetic
characters. This patch adds support to deal with Unicode characters.

There are many corner cases in case conversion depending on the locale
and context. E.g.
1) Case conversion is locale-sensitive.
Turkish has 4 letter "I"s. English has only two, a lowercase dotted i
and an uppercase dotless I. Turkish has lowercase and uppercase forms of
both dotted and dotless I. So simply converting "i" to "I" for upper
case is wrong in Turkish:
    +-------+--------+---------+
    |       | Dotted | Dotless |
    +-------+--------+---------+
    | Upper | İ      | I       |
    +-------+--------+---------+
    | Lower | i      | ı       |
    +-------+--------+---------+

2) Case conversion may change a string's length.
The German word "grüßen" should be converted to "GRÜSSEN" in upper case:
the letter "ß" should be converted to "SS".

3) Case conversion is context-sensitive.
The Greek word "ὈΔΥΣΣΕΎΣ" should be converted to "ὀδυσσεύς", where the
Greek letter "Σ" is converted to "σ" or to "ς", depending on its
position in the word.

The above cases will be focus in follow-up JIRAs. This patch addes the
initial implementation of UTF-8 aware case conversion functions.

--------
Implementation:
In UTF-8 mode (turned on by set UTF8_MODE=true) of these functions, the
bytes in strings are converted to wide characters using std::mbrtowc().
Each wide character (wchar_t) will then be converted using std::towupper
or std::towlower correspondingly. We then convert them back to multi
bytes using std::wcrtomb().

Note that these builtins are locale aware. If impalad is launched
without a UTF-8 aware locale, e.g. LC_ALL="C", these builtins can't
recognize non-ascii characters, which will return unexpected results.
Thus we modify our docker images to set LC_ALL="C.UTF-8" instead of "C".
This patch also logs the current locale when launching impala daemons
for better debugging. We will support customized locale in IMPALA-11080.

Test:
 - Add BE unit tests and e2e tests.

Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Reviewed-on: http://gerrit.cloudera.org:8080/17785
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-02-15 18:40:59 +00:00
Zoltan Garaguly
45d3eddc05 IMPALA-8680: Docker-based tests fail to archive the minicluster component logs
Inside docker container copy logs of cluster components hdfs, yarn, kudu
from folder testdata/cluster/cdh<version-number>/node-<node-id>/var/log/
to folder logs/cluster/

Testing:
 - running docker-based tests and checked that minicluster logs are preserved and archived
 - test if minicluster logs get copied also in case when something gets wrong during build

Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e
Reviewed-on: http://gerrit.cloudera.org:8080/15898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-31 06:58:34 +00:00
Vihang Karajgaonkar
5a9dcd108d IMPALA-8795: Turn on events processing by default
This commit turns on events processing by default. The default
polling interval is set as 1 second which can be overrriden by
setting hms_event_polling_interval_s to non-default value.

When the event polling turned on by default this patch also
moves the test_event_processing.py to tests/metadata instead
of custom cluster test. Some tests within test_event_processing.py
which needed non-default configurations were moved to
tests/custom_cluster/test_events_custom_configs.py.

Additionally, some other tests were modified to take into account
the automatic ability of Impala to detect newly added tables
from hive.

Testing done:
1. Ran exhaustive tests by turning on the events processing multiple
times.
2. Ran exhaustive tests by disabling events processing.
3. Ran dockerized tests.

Change-Id: I9a8b1871a98b913d0ad8bb26a104a296b6a06122
Reviewed-on: http://gerrit.cloudera.org:8080/17612
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2021-08-09 17:22:31 +00:00
John Sherman
ca17e307ab IMPALA-10550: Add External Frontend service port
- If external_fe_port flag is >0, spins up a new HS2 compatible
  service port
- Added enable_external_fe_support option to start-impala-cluster.py
  - which when detected will start impala clusters with
  external_fe_port on 21150-21152
- Modify impalad_coordinator Dockerfile to expose external frontend
  port at 21150
- The intent of this commit is to separate external frontend
  connections from normal hs2 connections
  - This allows different security policy to be applied to
  each type of connection. The external_fe_port should be considered
  a privileged service and should only be exposed to an external
  frontend that does user authentication and does authorization
  checks on generated plans

Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/17125
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-03 22:46:05 +00:00
Tim Armstrong
79bee3befb IMPALA-10469: push quickstart to apache repo
This adds a script, docker/publish_images_to_apache.sh,
that allows uploading images to the apache/impala docker hub
repo, prefixed with a version string. E.g. with the following
commands:

  ninja docker_images quickstart_docker_images
  ./docker/publish_images_to_apache.sh -v 81d5377c2

The uploaded images can then be used for the quickstart cluster,
as documented in docker/README.

Updated docs for quickstart to use a prefix from apache/impala

Remove IMPALA_QUICKSTART_VERSION, which doesn't interact well with
the tagging since the image name and version are now encoded in the
tag.

Fix an incorrect image name added to docker-images.txt:
impala_profile_tool_image.

Testing:
Ran Impala quickstart with data loading using instructions in README.

  export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-"
  docker network create -d bridge quickstart-network
  export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}')
  export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP

  docker-compose -f docker/quickstart.yml \
      -f docker/quickstart-kudu-minimal.yml \
      -f docker/quickstart-load-data.yml up -d

  docker run --network=quickstart-network -it \
       ${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client
       impala-shell

Change-Id: I535d77e565b73d732ae511d7525193467086c76a
Reviewed-on: http://gerrit.cloudera.org:8080/17030
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-10 06:56:45 +00:00
Tim Armstrong
93d4348b54 IMPALA-10389: impala-profile-tool container
Add a build step for an impala-profile-tool docker image
that makes it easy to run the binary on any system.

This container is automatically built as part of the
docker build.

This sets up a new build context that doesn't pull in all of
the same dependencies or depend on the Java build

Testing:

  cat logs/cluster/profiles/* | \
    docker run -i impala_profile_tool

I uploaded a build of the container to dockerhub too:

  timgarmstrong/impala_profile_tool

Change-Id: I36915cd686ab930dcc934bc0c81bff8c16d46714
Reviewed-on: http://gerrit.cloudera.org:8080/17015
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-05 11:22:55 +00:00
Tim Armstrong
eb85c6eeca IMPALA-9793: Impala quickstart cluster with docker-compose
What works:
* A single node cluster can be started up with docker-compose
* HMS data is stored in Derby database in a docker volume
* Filesystem data is stored in a shared docker volume, using the
  localfs support in the Hadoop client.
* A Kudu cluster with a single master can be optionally added on
  to the Impala cluster.
* TPC-DS data can be loaded automatically by a data loading container.

We need to set up a docker network called quickstart-network,
purely because docker-compose insists on generating network names
with underscores, which are part of the FQDN and end up causing
problems with Java's URL parsing, which rejects these technically
invalid domain names.

How to run:

Instructions for running the quickstart cluster are in
docker/README.md.

How to build containers:

  ./buildall.sh -release -noclean -notests -ninja
  ninja quickstart_hms_image quickstart_client_image docker_images

How to upload containers to dockerhub:

  IMPALA_QUICKSTART_IMAGE_PREFIX=timgarmstrong/
  for i in impalad_coord_exec impalad_coordinator statestored \
           impalad_executor catalogd impala_quickstart_client \
           impala_quickstart_hms
  do
    docker tag $i ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i
    docker push ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i
  done

I pushed containers build from commit f260cce22, which
was branched from 6cb7cecacf on master.

Misc other stuff:
* Added more metadata to all images.

TODO:
* Test and instructions to run against Kudu quickstart
* Upload latest version of containers before merging.

Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c
Reviewed-on: http://gerrit.cloudera.org:8080/15966
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-26 11:22:08 +00:00
Laszlo Gaal
6d4756da01 IMPALA-10448: Build impala-profile-tool early for Docker-based tests
impala-profile-tool is a new dependency for end-to-end tests.
The tool is built together with all the other backend tests
(so the buildall.sh flag '-notests' can turn off building it), it is
actually used in the parallel phase of end-to-end tests.

This means a problem for Docker-based builds for the following reasons:
- Docker-based tests run BE, FE and various phases of the EE test in
  separate Docker containers for parallel executions
- Test binaries are only built inside the container running BE tests to
  cut down on the build time and the size of the Docker image that all test
  containers are based on.
- This means that the EE_TEST_PARALLEL container will miss the tool
  required for running test designed to test it.

The solution is to build the tool early, at the end of the build phase
running in the build container. There is already another such tool built
there (parquet-reader) for similar reason, so just add
impala-profile-tool to the same 'make' command there.

Tested by running BE_TEST and EE_TEST_PARALLEL phases in a Docker-based
build.

Change-Id: I60e78ea883f3057c59a345feca38ef08a7f6a0b8
Reviewed-on: http://gerrit.cloudera.org:8080/16965
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-22 20:39:59 +00:00
Thomas Tauber-Marshall
91adb33b22 IMPALA-9975 (part 2): Introduce new admission control daemon
A recent patch (IMPALA-9930) introduces a new admission control rpc
service, which can be configured to perform admission control for
coordinators. In that patch, the admission service runs in an impalad.

This patch separates the service out to run in a new daemon, called
the admissiond. It also integrates this new daemon with the build
infrastructure around Docker.

Some notable changes:
- Adds a new class, AdmissiondEnv, which performs the same function
  for the admissiond as ExecEnv does for impalads.
- The '/admission' http endpoint is exposed on the admissiond's webui
  if the admission control service is in use, otherwise it is exposed
  on coordinator impalad's webuis.
- start-impala-cluster.py takes a new flag --enable_admission_service
  which configures the minicluster to have an admissiond with all
  coordinators using it for admission control.
- Coordinators are now configured to use the admission service by
  specifying the startup flag --admission_service_host. This is
  intended to mirror the configuration of the statestored/catalogd
  location.

Testing:
- Existing tests for the admission control serivce are modified to run
  with an admissiond.
- Manually ran start-impala-cluster.py with --enable_admission_service
  and --docker_network to verify Docker integration.

Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9
Reviewed-on: http://gerrit.cloudera.org:8080/16891
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-13 06:03:37 +00:00
Bikramjeet Vig
8542924fca IMPALA-10373: Run impala docker containers with uid/gid 1000
The convention in in linux is to that anything below 1000 is reserved
for system accounts, services, and other special accounts, and
regular user UIDs and GIDs stay above 1000. This will ensure that the
'impala' user created that runs the impala executable inside the
docker container gets assigned 1000 uid and gid.

Testing:
Manually tested by running the docker container and checking the user.

Change-Id: I51b846ca5fb2c55ac1707b9581cee18447467b41
Reviewed-on: http://gerrit.cloudera.org:8080/16807
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-03 00:59:12 +00:00
Joe McDonnell
cfa8a7a5e5 IMPALA-10278: Use full libraries for impalad_executor Docker container
This backs out the piece of IMPALA-10016 that used a pared-down
set of libraries for the impalad_executor. That pared-down
set was missing org.apache.impala.common.JniUtil, which
prevented the impalad_executor container from starting up.

Testing:
 - Ran a docker core job with one coord_exec and two executors
   and it was able to startup where it wouldn't before

Change-Id: Ieecca61cd3c11f446b922a04fdeb5fd0c90fc971
Reviewed-on: http://gerrit.cloudera.org:8080/16640
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-23 21:20:44 +00:00
Joe McDonnell
97792c4bad IMPALA-10198 (part 2): Add support for mvn versions:set
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.

Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.

Testing:
 - Ran core job
 - Added build-all-flag-combinations.sh case that
   does "mvn versions:set" and runs a build

Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-15 19:30:13 +00:00
Joe McDonnell
97856478ec IMPALA-10198 (part 1): Unify Java in a single java/ directory
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.

This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.

This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.

Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.

This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.

Testing:
 - Ran a core job
 - Verified existing maven commands from fe/ directory still work
 - Compared the *-classpath.txt files from fe and executor-deps
   and verified they are the same except for paths

Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-10-15 19:30:13 +00:00