This patch enables adding custom jars from the
absolute path: /opt/impala/aux-jars to the CLASSPATH.
Steps:
1. Download the jars into the /opt/impala/aux-jars directory
2. Restart impala cluster.
Testing:
* Tested manually: Added jar files in /opt/impala/aux-jars
before impala start. After starting impala, asserted that
the new jars were appended to the value of CLASSPATH as
printed in the impalad logs.
Change-Id: Ica5fa4c0cd1a5c938f331f3a4bba85d4910db90e
Reviewed-on: http://gerrit.cloudera.org:8080/21556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Building the impala_quickstart_client docker image failed by krb5-config
not found. It's installed by the libkrb5-dev package. This patch adds it
to fix the build failure. Also improves
docker/publish_images_to_apache.sh to skip inexisting images (usually
due to not be built). Updates the quickstart_hms image to base on Ubuntu
18.04.
Also fixes an issue that docker/CMakeLists.txt doesn't dump all the
image names to docker/docker-images.txt
Tests:
- Verified the quickstart images on MacOS.
Change-Id: Ieaa9878fa9cd9902ac883866c82e224889940615
Reviewed-on: http://gerrit.cloudera.org:8080/21725
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the Impala docker images are deployed in production environments,
it can be hard to add debugging tools at runtime. Two of the most
useful diagnostic tools are jstack and pstack, which can be used to
print Java and native stack traces. Install these tools into Redhat
images which are the most commonly used in production.
To install pstack we install gdb
To install jstack we install a development jdk on top of the headless
jdk.
Extend the install_os_packages.sh script to add an argument to
--install-debug-tools to set the level of diagnostic tools to install.
The possible arguments are:
none - install no extra tools
basic - install pstack and jstack
full - install more debugging tools.
In a Centos 8.5 build, the size of a impalad_coord_exec image increased
from 1.74GB to 1.85GB, as reported by ‘docker image list’.
What other tools might be added?
- Installing perf is tricky as in a container perf requires an
installation specific to the underlying linux kernel image, which is
hard to predict at build time.
- Installing pprof is hard as installation seems to require compiling
from sources. Clearly there are many options and we cannot install
everything.
TESTING
Built release and debug docker images, and used jstack and pstack in a
running container to print Impala's stacks.
Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Reviewed-on: http://gerrit.cloudera.org:8080/21433
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
On RedHat 8, RpcMgrKerberizedTest cases fail with
Jan 09 14:47:03 msmith.vpc.cloudera.com krb5kdc[609624](info): TGS_REQ
(1 etypes {aes128-cts-hmac-sha1-96(17)}) 127.0.0.1: LOOKING_UP_SERVER:
authtime 0, etypes {rep=UNSUPPORTED:(0)}
impala-test/msmith.vpc.cloudera.com@KRBTEST.COM for
impala-test/msmith@KRBTEST.COM, Server not found in Kerberos database
This happens because bootstrap_system.sh adds an entry to /etc/hosts to
resolve 127.0.0.1 to hostname and puts the short hostname first. During
negotiation, Kudu RPC will call GetFQDN to retrieve the FQDN, which for
our tests running on localhost returns the short hostname.
Fixes RpcMgrKerberizedTest by swapping the order of entries added to
/etc/hosts so the FQDN comes first. This is consistent with the example
provided in https://man7.org/linux/man-pages/man5/hosts.5.html.
Avoids 'hostname -f'; on RedHat it's identical to 'hostname', and on
Ubuntu it causes this test to fail.
Change-Id: I1eb24f9faec766e388d793408aedecdc92107185
Reviewed-on: http://gerrit.cloudera.org:8080/20876
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Updates utility_entrypoint.sh for the impala_profile_tool image to
detect the correct JVM native library paths based on a glob, as they're
architecture-specific. Follows the pattern established in
daemon_entrypoint.sh, except impala_profile_tool only uses Java 8 on
Ubuntu.
Excepted output
$ docker run --entrypoint bash -i impala_profile_tool_debug /opt/impala/bin/utility_entrypoint.sh
LD_LIBRARY_PATH: /opt/impala/lib:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server
Change-Id: I8e6b781bef52e60072ff02f4098d5ad9405aa2be
Reviewed-on: http://gerrit.cloudera.org:8080/20629
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
This patch bases on a previous patch contributed by Shant Hovsepian:
https://gerrit.cloudera.org/c/16612/
It adds a new option, -package, to buildall.sh for building a package
for the current OS type (e.g. CentOS/Ubuntu). You can also use
"make/ninja package" to build the package. Scripts for launching the
services and the required configuration files are also added.
Tests:
- Built on Ubuntu 18.04/20.04 and CentOS 7 using
./buildall.sh -noclean -skiptests -release -package
- Deployed the RPM package on a CDP cluster. Verifed the scripts.
- Deployed the DEB package on a docker container. Verified the scripts.
Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Reviewed-on: http://gerrit.cloudera.org:8080/18939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.
Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.
Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.
Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.
Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).
Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath
Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for Redhat 9 / Ubuntu 22. It updates
to a newer toolchain that has those builds, and it adds
supporting code in bootstrap_system.sh.
Redhat 9 and Ubuntu 22 use python = python3, which requires
various changes to build scripts and tests. Ubuntu 22 uses
Python 3.10, which deprecates certain ssl.PROTOCOL_TLS, so
this adapts test_client_ssl.py to that change until it
can be fully addressed in IMPALA-12219.
Various OpenSSL methods have been deprecated. As a workaround
until these can be addressed properly, this specifies
-Wno-deprecated-declarations. This can be removed once the
code is adapted to the non-deprecated APIs in IMPALA-12226.
Impala crashes with tcmalloc errors unless we update to a newer
gperftools, so this moves to gperftools 2.10. gperftools changed
the default for tcmalloc.aggressive_memory_decommit to off, so
this adapts our code to set it for backend tests. The gperftools
upgrade does not show any performance regression:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.08 | -0.64% | 2.20 | -0.37% |
+----------+-----------------------+---------+------------+------------+----------------+
With newer Python versions, the impala-virtualenv command
fails to create a Python 3 virtualenv. This switches to
using Python 3's builtin venv command for Python >=3.6.
Kudu needed a newer version and LLVM required a couple patches.
Testing:
- Ran a core job on Ubuntu 22 and Redhat 9. The tests run
to completion without crashing. There are test failures
that will be addressed in follow-up JIRAs.
- Ran dockerised tests on Ubuntu 22.
- Ran dockerised tests on Ubuntu 20 and Rocky 8.5.
Change-Id: If1fcdb2f8c635ecd6dc7a8a1db81f5f389c78b86
Reviewed-on: http://gerrit.cloudera.org:8080/20073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Newer operating systems like Redhat 9 do not supply
lsb_release as an official package. The /etc/os-release
file provides the same information in a more convenient
form. CMake 3.22 added support for reading those
/etc/os-release values directly via cmake_host_system_information().
This changes docker/CMakeLists.txt to use the new CMake
cmake_host_system_information() APIs to get values from
/etc/os-release. This removes the lsb_release code.
Testing:
- Ran a docker build locally and verified it detected
the distribution / version correctly
Change-Id: I04afd2b1c923f1331f7234d53a105a17956e3e18
Reviewed-on: http://gerrit.cloudera.org:8080/20069
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Restricts jvm_automatic_add_opens to only apply to Java 9+ where the
option exists. Previously it would also include it in Java 8, which
caused the JVM to ignore all options in JAVA_TOOL_OPTIONS.
Tests for Java version by running $JAVA_HOME/bin/java -version (or
"java" if JAVA_HOME is unset) and parsing version from the first line.
All JVM implementations are expected to include the version in a quoted
string, such as "1.8.0_42" and "11.0.1".
Also added add-opens flags for frontend tests.
test_no_inaccessible_objects detected this in a test run.
Testing:
- manually confirmed -agentlib options are present with both Java
8 and Java 11.
- promoted test_jvm_mem_tracking to run in all strategies, as it's fast
and ensures JAVA_TOOL_OPTIONS is honored.
Change-Id: I85953e685f6bbbd213afd93f389066e82f193ddf
Reviewed-on: http://gerrit.cloudera.org:8080/19939
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
During Impala startup, Before starting the JVM (by calling libhdfs),
adds add-opens calls to JAVA_TOOL_OPTIONS to ensure Ehcache has access
to non-public members so it can accurately calculate object size.
This effectively circumvents new security precautions in Java 9+.
Use '--jvm_automatic_add_opens=false' to disable it.
Tested with Java 11
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Change-Id: I47a6533b2aa94593d9348e8e3606633f06a111e8
Reviewed-on: http://gerrit.cloudera.org:8080/19845
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes the docker image build code so that both Java 8 and Java 11
images can be built in the same build. Specifically, it introduces new
Make targets for Java 11 docker images in addition to the regular Java 8
targets. The "docker_images" and "docker_debug_images" targets continue
to behave the same way and produce Java 8 images of the same name. The
"docker_java11_images" and "docker_debug_java11_images" produce the
daemon docker images for Java 11.
Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when
starting a cluster with container images.
Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50
Reviewed-on: http://gerrit.cloudera.org:8080/19722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
As a followup to the fix for IMPALA-12039, this verifies the
presence of pgrep at docker build time as well as at daemon
startup time.
Testing:
- Build docker images locally
- Ran Redhat 8 dockerised tests
Change-Id: I67e000b64cf6c1ab2225745f6b95b7a5e7ac3d36
Reviewed-on: http://gerrit.cloudera.org:8080/19713
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
'pgrep' was missing in redhat docker image and as a result graceful
shutdown script (bin/graceful_shutdown_backends.sh) was terminating
the impalad immediately without waiting for the
'shutdown_grace_period_s' grace period. Since, there wasn't enough
time window for cluster membership changes to propagate to
coordinator, it was scheduling query fragments on already deleted
executors and queries were failing.
Built an ubuntu 20 image and it had the 'pgrep' utility already
installed.
Testing:
- Built redhat 8 image and manually tested graceful shutdown in a
docker container.
Change-Id: I91ffc1fe3e022ce7f7507b2bd79a3e2c3851956d
Reviewed-on: http://gerrit.cloudera.org:8080/19711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 moved several things around or removed deprecated
functions / fields:
- sys.maxint was removed, but sys.maxsize provides similar functionality
- long was removed, but int provides the same range
- file() was removed, but open() already provided the same functionality
- Exception.message was removed, but str(exception) is equivalent
- Some encodings (like hex) were moved to codecs.encode()
- string.letters -> string.ascii_letters
- string.lowercase -> string.ascii_lowercase
- string.strip was removed
This fixes all of those locations. Python 3 also has slightly different
rounding behavior from round(), so this changes round() to use future's
builtins.round() to get the Python 3 behavior.
This fixes the following pylint warnings:
- file-builtin
- long-builtin
- invalid-str-codec
- round-builtin
- deprecated-string-function
- sys-max-int
- exception-message-attribute
Testing:
- Ran cores tests
Change-Id: I094cd7fd06b0d417fc875add401d18c90d7a792f
Reviewed-on: http://gerrit.cloudera.org:8080/19591
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
There are a variety of small python 3 syntax differences:
- Octal constants need to start with 0o rather than just 0
- Long constants are not supported (i.e. numbers ending with L)
- Lambda syntax is slightly different
- The 'ur' string mode is no longer supported
Testing:
- check-python-syntax.sh now passes
Change-Id: Ie027a50ddf6a2a0db4b34ec9b49484ce86947f20
Reviewed-on: http://gerrit.cloudera.org:8080/19554
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Python 3 does not support this old except syntax:
except Exception, e:
Instead, it needs to be:
except Exception as e:
This uses impala-futurize to fix all locations of
the old syntax.
Testing:
- The check-python-syntax.sh no longer shows errors
for except syntax.
Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9
Reviewed-on: http://gerrit.cloudera.org:8080/19551
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Some deployments rely on having the 'hostname' utility
installed in Impala's Docker image (e.g. for constructing
daemon startup arguments). Most distributions include it
by default, but Redhat UBI8 does not.
This adds 'hostname' to the list of installed packages
for both Ubuntu and the Redhat family. This also verifies
that 'hostname' runs properly.
Testing:
- Verified that this adds hostname for UBI8 images
Change-Id: I5a760680294a3ad7e74e843d3f4c06cd38819e88
Reviewed-on: http://gerrit.cloudera.org:8080/19273
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala will fail to start if the permissions on /var/tmp do
not have the sticky bit set (i.e. +t). Some Redhat UBI images
do not set the sticky bit (+t) on /tmp and /var/tmp. This
sets the sticky bit on those directories during Docker build.
Testing:
- Verified that the sticky bit is set on one of the affected
base images and that Impala can start up
Change-Id: I7ff32a035f40cb41d3a8dc80a07fd9924f41b942
Reviewed-on: http://gerrit.cloudera.org:8080/19222
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In IMPALA-11492, ExprTest.Utf8MaskTest was failing on some
configurations because the en_US.UTF-8 was missing. Since the
Docker images don't contain en_US.UTF-8, they are subject
to the same bug. This was confirmed by adding tests cases
to the test_utf8_strings.py end-to-end test and running it
in the dockerized tests.
This add the appropriate language pack to the list of packages
installed for the Docker build.
Testing:
- This adds end-to-end tests to test_utf8_strings.py covering the
same cases that were failing in ExprTest.Utf8MaskTest. They
failed without the added languages packs, and now succeed.
Change-Id: I353f257b3cb6d45f7d0a28f7d5319fdb457e6e3d
Reviewed-on: http://gerrit.cloudera.org:8080/19080
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Currently, Docker images install Java 8 for Impala's use. This
adds the IMPALA_DOCKER_USE_JAVA11 environment variable. When
set to true, this installs Java 11 rather than Java 8. It
defaults to false. The daemon_entrypoint.sh script is modified
to detect Java 11 correctly. As a workaround for IMPALA-11260,
this appends a list of "--add-opens" statements to JAVA_TOOL_OPTIONS
when running with Java 11.
Testing:
- Ran a set of dockerized tests on Rocky 8.5 with Java 11
Change-Id: Icc1dbd3f6a2279840218dc1da2b60077e211a328
Reviewed-on: http://gerrit.cloudera.org:8080/19031
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Currently, Impala supports building and testing Docker
images on Ubuntu. This extends that same support to
Redhat-based distributions:
1. This splits out the Docker build's OS package
installation into a separate install_os_packages.sh
script. This script detects the OS and calls apt
or yum as appropriate. The script takes the argument
--install-debug-tools, which installs extra tools
like iproute2 and ping. This defaults to true for debug
images and false for release images.
2. This modifies daemon_entrypoint.sh to detect the
OS and set LD_LIBRARY_PATH appropriate to account
for different locations of Java.
3. This modifies docker/setup_build_context.py to
handle different locations of libkudu_client.so
and add extra sanity checks on various libraries
found via globs.
4. This modifies bin/jenkins/dockerized-*.sh test
infrastructure to be able to install docker on
either Ubuntu or Redhat. It also changes the exit
logic to collect the container logs.
Developers can override the base image for Redhat 7
and Redhat 8 builds via the IMPALA_REDHAT7_DOCKER_BASE
and IMPALA_REDHAT8_DOCKER_BASE environment variables.
These default to open source Redhat equivalents
(Centos 7.9 and Rocky 8.5 respectively), but they are
also known to work with Redhat UBI images.
Testing:
- Ran dockerised testing on Rocky 8.5 via the
rocky-8.5-dockerised-tests job.
- Ran GVO
- Ran a Docker build on Centos7 with UBI7 as the base image
Change-Id: Ibaff2560ef971ac2c2231a8e43921164ea1d2f4d
Reviewed-on: http://gerrit.cloudera.org:8080/19006
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Docker-based parallelized test runs have proven themselves to be quite a
bit faster than regular core or exhaustive mode builds. While regular
sequential builds have also enjoyed shorter runtimes recently,
Docker-based parallel builds still enjoy a speed advantage.
Scheduling the parallel build segments is currently driven from the
test driver script test-with-docker.py, and the order in which the
segments are considered is currently hard-coded. The ordering was
originally devised experimentally, by timing several test runs, then
ordering the test segments based on expected duration, from longest
to shortest.
The average wall-clock run times for various test segments have changed
since this original ordering was committed: FE tests have gotten
significantly longer, while upgrading the default worker instance
type cut shortened the serial phase(s) of E2E tests.
This patch makes two changes to achieve a shorter overall run time for
the Docker-based tests:
1. Reorders the default scheduling order of the test segments, based
on currently measured durations
2. Increases the default suite concurrency for execution hosts:
bumps suite concurrency from 4 to 5 for machines with memory sizes
between 96 and 140 GBs (the currently used worker size)
The latter change is also based on measurements: memory usage reports for
total peak memory (RSS) and peak memory (RSS) per test segment both
showed significant amounts of unused memory on the current default
worker instance size (having 32 CPUs and 128 GB of RAM).
Experiments showed that this machine size can reliable handle five
concurrent containerized test sessions with some safety margin remaining,
so the patch increases the default concurrency for this machine
category.
with both changes applied the duration of a core-mode test run with
default settings is reduced from 2h45 to 2h25 (on average).
Tested by running the Docker-based default test suite in core mode,
with Ubuntu 16.04 and Rocky Linux 8.5 base images.
Change-Id: Ifb609bcfb10e9f9b281cc6b375c36c9638db168b
Reviewed-on: http://gerrit.cloudera.org:8080/19038
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Ubuntu 20 has been using the toolchain from Ubuntu 18.
Since Ubuntu 20 has been added to the toolchain, this
switches Impala to use a toolchain with Ubuntu 20 support
and uses the Ubuntu 20 bits. This is expected to help
with IMPALA-10962.
Testing:
- Ran a core build on Ubuntu 20
Change-Id: If2394b668ef3c56b1a4c0773fd5e4ff92be4a846
Reviewed-on: http://gerrit.cloudera.org:8080/18559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As an optimization, the docker-based tests didn't run
the minicluster for BE tests. Some BE tests now require
the minicluster (DiskIoMgrTest.WriteToRemote*), so this
cannot work with the optimization.
This changes the docker-based tests to start the minicluster
for the BE tests.
Testing:
- Ran a docker-based test job
Change-Id: I784a63a02886852e10ccca7c118c22ff7d38b8a3
Reviewed-on: http://gerrit.cloudera.org:8080/18414
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
There are 3 builtin case conversion string functions: upper(), lower(),
and initcap(). Previously they only convert English alphabetic
characters. This patch adds support to deal with Unicode characters.
There are many corner cases in case conversion depending on the locale
and context. E.g.
1) Case conversion is locale-sensitive.
Turkish has 4 letter "I"s. English has only two, a lowercase dotted i
and an uppercase dotless I. Turkish has lowercase and uppercase forms of
both dotted and dotless I. So simply converting "i" to "I" for upper
case is wrong in Turkish:
+-------+--------+---------+
| | Dotted | Dotless |
+-------+--------+---------+
| Upper | İ | I |
+-------+--------+---------+
| Lower | i | ı |
+-------+--------+---------+
2) Case conversion may change a string's length.
The German word "grüßen" should be converted to "GRÜSSEN" in upper case:
the letter "ß" should be converted to "SS".
3) Case conversion is context-sensitive.
The Greek word "ὈΔΥΣΣΕΎΣ" should be converted to "ὀδυσσεύς", where the
Greek letter "Σ" is converted to "σ" or to "ς", depending on its
position in the word.
The above cases will be focus in follow-up JIRAs. This patch addes the
initial implementation of UTF-8 aware case conversion functions.
--------
Implementation:
In UTF-8 mode (turned on by set UTF8_MODE=true) of these functions, the
bytes in strings are converted to wide characters using std::mbrtowc().
Each wide character (wchar_t) will then be converted using std::towupper
or std::towlower correspondingly. We then convert them back to multi
bytes using std::wcrtomb().
Note that these builtins are locale aware. If impalad is launched
without a UTF-8 aware locale, e.g. LC_ALL="C", these builtins can't
recognize non-ascii characters, which will return unexpected results.
Thus we modify our docker images to set LC_ALL="C.UTF-8" instead of "C".
This patch also logs the current locale when launching impala daemons
for better debugging. We will support customized locale in IMPALA-11080.
Test:
- Add BE unit tests and e2e tests.
Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Reviewed-on: http://gerrit.cloudera.org:8080/17785
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Inside docker container copy logs of cluster components hdfs, yarn, kudu
from folder testdata/cluster/cdh<version-number>/node-<node-id>/var/log/
to folder logs/cluster/
Testing:
- running docker-based tests and checked that minicluster logs are preserved and archived
- test if minicluster logs get copied also in case when something gets wrong during build
Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e
Reviewed-on: http://gerrit.cloudera.org:8080/15898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit turns on events processing by default. The default
polling interval is set as 1 second which can be overrriden by
setting hms_event_polling_interval_s to non-default value.
When the event polling turned on by default this patch also
moves the test_event_processing.py to tests/metadata instead
of custom cluster test. Some tests within test_event_processing.py
which needed non-default configurations were moved to
tests/custom_cluster/test_events_custom_configs.py.
Additionally, some other tests were modified to take into account
the automatic ability of Impala to detect newly added tables
from hive.
Testing done:
1. Ran exhaustive tests by turning on the events processing multiple
times.
2. Ran exhaustive tests by disabling events processing.
3. Ran dockerized tests.
Change-Id: I9a8b1871a98b913d0ad8bb26a104a296b6a06122
Reviewed-on: http://gerrit.cloudera.org:8080/17612
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
- If external_fe_port flag is >0, spins up a new HS2 compatible
service port
- Added enable_external_fe_support option to start-impala-cluster.py
- which when detected will start impala clusters with
external_fe_port on 21150-21152
- Modify impalad_coordinator Dockerfile to expose external frontend
port at 21150
- The intent of this commit is to separate external frontend
connections from normal hs2 connections
- This allows different security policy to be applied to
each type of connection. The external_fe_port should be considered
a privileged service and should only be exposed to an external
frontend that does user authentication and does authorization
checks on generated plans
Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/17125
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds a script, docker/publish_images_to_apache.sh,
that allows uploading images to the apache/impala docker hub
repo, prefixed with a version string. E.g. with the following
commands:
ninja docker_images quickstart_docker_images
./docker/publish_images_to_apache.sh -v 81d5377c2
The uploaded images can then be used for the quickstart cluster,
as documented in docker/README.
Updated docs for quickstart to use a prefix from apache/impala
Remove IMPALA_QUICKSTART_VERSION, which doesn't interact well with
the tagging since the image name and version are now encoded in the
tag.
Fix an incorrect image name added to docker-images.txt:
impala_profile_tool_image.
Testing:
Ran Impala quickstart with data loading using instructions in README.
export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-"
docker network create -d bridge quickstart-network
export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}')
export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP
docker-compose -f docker/quickstart.yml \
-f docker/quickstart-kudu-minimal.yml \
-f docker/quickstart-load-data.yml up -d
docker run --network=quickstart-network -it \
${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client
impala-shell
Change-Id: I535d77e565b73d732ae511d7525193467086c76a
Reviewed-on: http://gerrit.cloudera.org:8080/17030
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add a build step for an impala-profile-tool docker image
that makes it easy to run the binary on any system.
This container is automatically built as part of the
docker build.
This sets up a new build context that doesn't pull in all of
the same dependencies or depend on the Java build
Testing:
cat logs/cluster/profiles/* | \
docker run -i impala_profile_tool
I uploaded a build of the container to dockerhub too:
timgarmstrong/impala_profile_tool
Change-Id: I36915cd686ab930dcc934bc0c81bff8c16d46714
Reviewed-on: http://gerrit.cloudera.org:8080/17015
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
What works:
* A single node cluster can be started up with docker-compose
* HMS data is stored in Derby database in a docker volume
* Filesystem data is stored in a shared docker volume, using the
localfs support in the Hadoop client.
* A Kudu cluster with a single master can be optionally added on
to the Impala cluster.
* TPC-DS data can be loaded automatically by a data loading container.
We need to set up a docker network called quickstart-network,
purely because docker-compose insists on generating network names
with underscores, which are part of the FQDN and end up causing
problems with Java's URL parsing, which rejects these technically
invalid domain names.
How to run:
Instructions for running the quickstart cluster are in
docker/README.md.
How to build containers:
./buildall.sh -release -noclean -notests -ninja
ninja quickstart_hms_image quickstart_client_image docker_images
How to upload containers to dockerhub:
IMPALA_QUICKSTART_IMAGE_PREFIX=timgarmstrong/
for i in impalad_coord_exec impalad_coordinator statestored \
impalad_executor catalogd impala_quickstart_client \
impala_quickstart_hms
do
docker tag $i ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i
docker push ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i
done
I pushed containers build from commit f260cce22, which
was branched from 6cb7cecacf on master.
Misc other stuff:
* Added more metadata to all images.
TODO:
* Test and instructions to run against Kudu quickstart
* Upload latest version of containers before merging.
Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c
Reviewed-on: http://gerrit.cloudera.org:8080/15966
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
impala-profile-tool is a new dependency for end-to-end tests.
The tool is built together with all the other backend tests
(so the buildall.sh flag '-notests' can turn off building it), it is
actually used in the parallel phase of end-to-end tests.
This means a problem for Docker-based builds for the following reasons:
- Docker-based tests run BE, FE and various phases of the EE test in
separate Docker containers for parallel executions
- Test binaries are only built inside the container running BE tests to
cut down on the build time and the size of the Docker image that all test
containers are based on.
- This means that the EE_TEST_PARALLEL container will miss the tool
required for running test designed to test it.
The solution is to build the tool early, at the end of the build phase
running in the build container. There is already another such tool built
there (parquet-reader) for similar reason, so just add
impala-profile-tool to the same 'make' command there.
Tested by running BE_TEST and EE_TEST_PARALLEL phases in a Docker-based
build.
Change-Id: I60e78ea883f3057c59a345feca38ef08a7f6a0b8
Reviewed-on: http://gerrit.cloudera.org:8080/16965
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A recent patch (IMPALA-9930) introduces a new admission control rpc
service, which can be configured to perform admission control for
coordinators. In that patch, the admission service runs in an impalad.
This patch separates the service out to run in a new daemon, called
the admissiond. It also integrates this new daemon with the build
infrastructure around Docker.
Some notable changes:
- Adds a new class, AdmissiondEnv, which performs the same function
for the admissiond as ExecEnv does for impalads.
- The '/admission' http endpoint is exposed on the admissiond's webui
if the admission control service is in use, otherwise it is exposed
on coordinator impalad's webuis.
- start-impala-cluster.py takes a new flag --enable_admission_service
which configures the minicluster to have an admissiond with all
coordinators using it for admission control.
- Coordinators are now configured to use the admission service by
specifying the startup flag --admission_service_host. This is
intended to mirror the configuration of the statestored/catalogd
location.
Testing:
- Existing tests for the admission control serivce are modified to run
with an admissiond.
- Manually ran start-impala-cluster.py with --enable_admission_service
and --docker_network to verify Docker integration.
Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9
Reviewed-on: http://gerrit.cloudera.org:8080/16891
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The convention in in linux is to that anything below 1000 is reserved
for system accounts, services, and other special accounts, and
regular user UIDs and GIDs stay above 1000. This will ensure that the
'impala' user created that runs the impala executable inside the
docker container gets assigned 1000 uid and gid.
Testing:
Manually tested by running the docker container and checking the user.
Change-Id: I51b846ca5fb2c55ac1707b9581cee18447467b41
Reviewed-on: http://gerrit.cloudera.org:8080/16807
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This backs out the piece of IMPALA-10016 that used a pared-down
set of libraries for the impalad_executor. That pared-down
set was missing org.apache.impala.common.JniUtil, which
prevented the impalad_executor container from starting up.
Testing:
- Ran a docker core job with one coord_exec and two executors
and it was able to startup where it wouldn't before
Change-Id: Ieecca61cd3c11f446b922a04fdeb5fd0c90fc971
Reviewed-on: http://gerrit.cloudera.org:8080/16640
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.
Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.
Testing:
- Ran core job
- Added build-all-flag-combinations.sh case that
does "mvn versions:set" and runs a build
Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This wraps each command executed by CMake with a wrapper that
generates a JUnitXML file if the command fails. If the command
succeeds, the wrapper does nothing. The wrapper applies to C++
compilation, linking, and custom shell commands (such as
building the frontend via maven). It does not apply to failures
coming from CMake itself. It can be disabled by setting
DISABLE_CMAKE_JUNITXML.
The command output can include Unicode (e.g. smart quotes for
g++), so this also updates generate_junitxml.py to handle
Unicode.
The wrapper interacts poorly with add_custom_command/add_custom_target
CMake commands that use 'cd directory && do_something', so this
switches those locations (in /docker) to use CMake's WORKING_DIRECTORY.
Testing:
- Verified it does not impact a successful build (including with
ccache and/or distcc).
- Verified it generates JUnitXML for C++ and Java compilation
failures.
- Verified it doesn't use the wrapper when DISABLE_CMAKE_JUNITXML
is set.
Change-Id: If71f2faf3ab5052b56b38f1b291fee53c390ce23
Reviewed-on: http://gerrit.cloudera.org:8080/12668
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Maven Changes:
Splits out all executor specific jar files into a separate pom file
under mvn-deps/executor-deps. The new pom file lists out all executor
specific jar files. fe/pom.xml has a dependency on
mvn-deps/executor-deps/pom.xml so that all executor specific jars are
still built as part of the fe/ build. mvn-deps/executor-deps/pom.xml
writes out a build-classpath.txt file that contains all dependencies in
the pom.xml file (similar to what is already done in fe/pom.xml).
Docker Build Changes:
setup_build_context.py was changed to leverage the aformentioned Maven
changes. The script still symlinks all dependencies into the lib/ folder,
but also creates an exec-lib/ and statestore-lib/ folder. The exec-lib/
folder contains all dependencies necessary to run Impala Executors, but
excludes any dependencies that are Coordinator specific. The
statestore-lib/ folder excludes all jar files entirely since it does not
run an embedded JVM.
The docker/CMakeLists.txt was modified to support the new library layout
created by setup_build_context.py. Prior to this patch only the build
for the Impala base image has access to the dependencies created by
setup_build_context.py. This patch changes the build logic so all images
have access to the dependencies. This does increase build time because
the built context has to be copied and sent to the Docker daemon for
each image build.
Docker Image Changes:
The copy command for the lib/ folder was removed from the impala_base
Dockerfile and a corresponding copy command was added to each daemon
Docker image. This allows each daemon image to only copy in the
dependencies it actually requires to run.
Other:
* Deleted the hive-3 profile since Impala 4.0 only supports hive-3 builds
* Moved shaded-deps into the mvn-deps folder
Overall, this decreases the size of the impalad_executor image by 120 MB,
and the statestored image by 700 MB.
impalad_coordinator and impalad_coordinator images are now 771 MB, and
impalad_executor images are 651MB.
Further improvements might be possible by decreasing the number of
transitive dependencies in mvn-deps/executor-deps/pom.xml. Moreover,
any new Coordinator specific jar files will not be included in the
Executor image.
Testing:
* Ran core tests
Change-Id: I899859a38d8ccab890de889a49ef132a89289dfd
Reviewed-on: http://gerrit.cloudera.org:8080/16320
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Strip debug symbols from libkudu_client.so and libstdc++.so. The same
technique used to strip debug symbols from impalad binaries is used.
This decreases the Docker image sizes by about 100 MB.
Test:
* Ran Dockerized tests
Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294
Reviewed-on: http://gerrit.cloudera.org:8080/16263
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A prior change increased the suite concurrency for the
docker-based tests on machines with 140+GB of memory.
This new rung should also bump the parallel test
concurrency (i.e. for parallel EE tests). This sets
the parallel test concurrency to 12 for this rung
(which is what we use for the 95GB-140GB rung).
Testing:
- Ran test-with-docker.py on a m5.12xlarge
Change-Id: Ib7299abd585da9ba1a838640dadc0bef9c72a39b
Reviewed-on: http://gerrit.cloudera.org:8080/16326
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
The shutdown script should not abort if it can't write
a log - it should continue to try and shut down impala.
The entrypoint script should abort with an explicit
error if the log directory isn't writable by the
current user.
Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756
Reviewed-on: http://gerrit.cloudera.org:8080/16237
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
This adds a flag --use_resolved_hostname, which replaces
--hostname with a resolved IP on startup. This is useful
for containerized environments where the hostname -> IP
mapping can be very dynamic.
This flag is used by default in the dockerized minicluster.
This also fixes a bug in the test code that incorrectly
identified command line flags. Specifically it only checked
the suffix, so it confused use_resolved_hostname and hostname.
Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
Reviewed-on: http://gerrit.cloudera.org:8080/16108
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.
This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.
The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.
Testing:
- Dryrun of GVO
- Modified TestPartitionMetadataUncompressedTextOnly's
test_unsupported_text_compression() to add LZO case
Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>