This changes the docker image build code so that both Java 8 and Java 11
images can be built in the same build. Specifically, it introduces new
Make targets for Java 11 docker images in addition to the regular Java 8
targets. The "docker_images" and "docker_debug_images" targets continue
to behave the same way and produce Java 8 images of the same name. The
"docker_java11_images" and "docker_debug_java11_images" produce the
daemon docker images for Java 11.
Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when
starting a cluster with container images.
Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50
Reviewed-on: http://gerrit.cloudera.org:8080/19722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 moved several things around or removed deprecated
functions / fields:
- sys.maxint was removed, but sys.maxsize provides similar functionality
- long was removed, but int provides the same range
- file() was removed, but open() already provided the same functionality
- Exception.message was removed, but str(exception) is equivalent
- Some encodings (like hex) were moved to codecs.encode()
- string.letters -> string.ascii_letters
- string.lowercase -> string.ascii_lowercase
- string.strip was removed
This fixes all of those locations. Python 3 also has slightly different
rounding behavior from round(), so this changes round() to use future's
builtins.round() to get the Python 3 behavior.
This fixes the following pylint warnings:
- file-builtin
- long-builtin
- invalid-str-codec
- round-builtin
- deprecated-string-function
- sys-max-int
- exception-message-attribute
Testing:
- Ran cores tests
Change-Id: I094cd7fd06b0d417fc875add401d18c90d7a792f
Reviewed-on: http://gerrit.cloudera.org:8080/19591
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Currently, Impala supports building and testing Docker
images on Ubuntu. This extends that same support to
Redhat-based distributions:
1. This splits out the Docker build's OS package
installation into a separate install_os_packages.sh
script. This script detects the OS and calls apt
or yum as appropriate. The script takes the argument
--install-debug-tools, which installs extra tools
like iproute2 and ping. This defaults to true for debug
images and false for release images.
2. This modifies daemon_entrypoint.sh to detect the
OS and set LD_LIBRARY_PATH appropriate to account
for different locations of Java.
3. This modifies docker/setup_build_context.py to
handle different locations of libkudu_client.so
and add extra sanity checks on various libraries
found via globs.
4. This modifies bin/jenkins/dockerized-*.sh test
infrastructure to be able to install docker on
either Ubuntu or Redhat. It also changes the exit
logic to collect the container logs.
Developers can override the base image for Redhat 7
and Redhat 8 builds via the IMPALA_REDHAT7_DOCKER_BASE
and IMPALA_REDHAT8_DOCKER_BASE environment variables.
These default to open source Redhat equivalents
(Centos 7.9 and Rocky 8.5 respectively), but they are
also known to work with Redhat UBI images.
Testing:
- Ran dockerised testing on Rocky 8.5 via the
rocky-8.5-dockerised-tests job.
- Ran GVO
- Ran a Docker build on Centos7 with UBI7 as the base image
Change-Id: Ibaff2560ef971ac2c2231a8e43921164ea1d2f4d
Reviewed-on: http://gerrit.cloudera.org:8080/19006
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Add a build step for an impala-profile-tool docker image
that makes it easy to run the binary on any system.
This container is automatically built as part of the
docker build.
This sets up a new build context that doesn't pull in all of
the same dependencies or depend on the Java build
Testing:
cat logs/cluster/profiles/* | \
docker run -i impala_profile_tool
I uploaded a build of the container to dockerhub too:
timgarmstrong/impala_profile_tool
Change-Id: I36915cd686ab930dcc934bc0c81bff8c16d46714
Reviewed-on: http://gerrit.cloudera.org:8080/17015
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.
Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.
Testing:
- Ran core job
- Added build-all-flag-combinations.sh case that
does "mvn versions:set" and runs a build
Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Maven Changes:
Splits out all executor specific jar files into a separate pom file
under mvn-deps/executor-deps. The new pom file lists out all executor
specific jar files. fe/pom.xml has a dependency on
mvn-deps/executor-deps/pom.xml so that all executor specific jars are
still built as part of the fe/ build. mvn-deps/executor-deps/pom.xml
writes out a build-classpath.txt file that contains all dependencies in
the pom.xml file (similar to what is already done in fe/pom.xml).
Docker Build Changes:
setup_build_context.py was changed to leverage the aformentioned Maven
changes. The script still symlinks all dependencies into the lib/ folder,
but also creates an exec-lib/ and statestore-lib/ folder. The exec-lib/
folder contains all dependencies necessary to run Impala Executors, but
excludes any dependencies that are Coordinator specific. The
statestore-lib/ folder excludes all jar files entirely since it does not
run an embedded JVM.
The docker/CMakeLists.txt was modified to support the new library layout
created by setup_build_context.py. Prior to this patch only the build
for the Impala base image has access to the dependencies created by
setup_build_context.py. This patch changes the build logic so all images
have access to the dependencies. This does increase build time because
the built context has to be copied and sent to the Docker daemon for
each image build.
Docker Image Changes:
The copy command for the lib/ folder was removed from the impala_base
Dockerfile and a corresponding copy command was added to each daemon
Docker image. This allows each daemon image to only copy in the
dependencies it actually requires to run.
Other:
* Deleted the hive-3 profile since Impala 4.0 only supports hive-3 builds
* Moved shaded-deps into the mvn-deps folder
Overall, this decreases the size of the impalad_executor image by 120 MB,
and the statestored image by 700 MB.
impalad_coordinator and impalad_coordinator images are now 771 MB, and
impalad_executor images are 651MB.
Further improvements might be possible by decreasing the number of
transitive dependencies in mvn-deps/executor-deps/pom.xml. Moreover,
any new Coordinator specific jar files will not be included in the
Executor image.
Testing:
* Ran core tests
Change-Id: I899859a38d8ccab890de889a49ef132a89289dfd
Reviewed-on: http://gerrit.cloudera.org:8080/16320
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Strip debug symbols from libkudu_client.so and libstdc++.so. The same
technique used to strip debug symbols from impalad binaries is used.
This decreases the Docker image sizes by about 100 MB.
Test:
* Ran Dockerized tests
Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294
Reviewed-on: http://gerrit.cloudera.org:8080/16263
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.
This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.
Testing:
- The only impediment to building with
IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
Impala-lzo. With a custom Impala-lzo, compilation succeeds.
Either Impala-lzo will be fixed or it will be removed.
- Core tests
Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes a few transitive dependencies that
don't appear to be needed at runtime.
This also removes the frontend test jar. The inclusion
of that jar was masking an issue where some configs
were not accessible from within the container, because
they were symlinks to paths on the host.
Testing:
Ran dockerized tests in precommit.
Ran regular tests with CDP hive.
Change-Id: I030e7cd28e29cd4e077c0b4addd4d14a8599eed6
Reviewed-on: http://gerrit.cloudera.org:8080/15753
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
* Build scripts are generalised to have different targets for release
and debug images.
* Added new targets for the debug images: docker_debug_images,
statestored_debug images. The release images still have the
same names.
* Separate build contexts are set up for the different base
images.
* The debug or release base image can be specified as the FROM
for the daemon images.
* start-impala-cluster.py picks the correct images for the build type
Future work:
We would like to generalise this to allow building from
non-ubuntu-16.04 base images. This probably requires another
layer of dockerfiles to specify a base image for impala_base
with the required packages installed.
Change-Id: I32d2e19cb671beacceebb2642aba01191bd7a244
Reviewed-on: http://gerrit.cloudera.org:8080/13905
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a helper script to initiate graceful daemon shutdown
via the signaling mechanism. It also includes that helper script in the
docker containers.
Testing: This change adds a test to verify that the script works as
expected. In addition, I manually verified that the script gets added to
the containers and that calling it inside the container will cause a
shutdown as expected.
Change-Id: I877483a385cd0747f69b82a6488de203a4029599
Reviewed-on: http://gerrit.cloudera.org:8080/13912
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
* Symlink impalad/catalog/statestored inside container.
This doesn't seem to really save any space - there's
some kind of deduplication going on.
* Don't include libfesupport.so, which shouldn't be needed.
* strip debug symbols from the binary.
* Only include the libkuduclient.so libraries for Kudu
This shaves ~1.1GB from the image size- 250MB as a result
of the impalad binary changes and the remainder from the
Kudu changs.
Change-Id: I95ff479bedd3b93e6569e72f03f42acd9dba8b14
Reviewed-on: http://gerrit.cloudera.org:8080/13487
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes the FE pom to generate a build classpath file in the
target/ directory. Then, bin/set-classpath.sh uses this file to generate
the classpath to start the cluster. This replaces the former approach of
including all of the jars found in target/dependency/
The advantage of this is that a clean build is no longer required when
switching artifact versions. Prior to this patch, if you changed an
artifact version and rebuilt, both the old and new artifact would be
left in the target/dependency/ directory and pollute the classpath.
This doesn't fully remove the target/dependency/ directory, because its
existence is likely important for downstream packaging of Impala. We can
likely assume that such packaging always does a clean build.
This also changes the set-classpath script to no longer load jars from
testdata/target/dependency/ since it appears that directory doesn't
actually get created during the build.
Change-Id: I103a1da10a54c7525ba7fb584d942ba1cb9fcb94
Reviewed-on: http://gerrit.cloudera.org:8080/13185
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>
The docker containers currently have minicluster configs baked into
them. This is not necessary any more since the /opt/impala/conf
directory is mounted to point at the up-to-date configs, so there's
no reason to include configs in the container.
Testing:
Confirmed that I could build containers, start up a minicluster and run
queries.
Change-Id: I6d77f79620514187a5c45483e9051bd8c40dfc9e
Reviewed-on: http://gerrit.cloudera.org:8080/13104
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This builds an impala_base container that has all of the build artifacts
required to run the impala processes, then builds impalad, catalogd and
statestore containers based on that with the right ports exposed.
The images are based on the Ubuntu 16.04 image to align with the
most common development environment.
The container build process is integrated with CMake and is designed
to integrate with the rest of the build so that the container build
depends on the artifacts that will go into the container. You can
build the images with the following command, which will create
images called "impala_base", "impalad", "catalogd" and
"statestored":
ninja -j $IMPALA_BUILD_THREADS docker_images
The images need some refinement to be truly useful. The following
will be done in future patches:
* IMPALA-7947 - integrate with start-impala-cluster.py to
automatically create docker network with containers running on it
* Mechanism to pass in command-line flags
* Mechanisms to update the various config files to point to the
docker host rather than "localhost", which doesn't point to
the right thing inside the container.
* Mechanisms to set mem_limit, JVM heap sizes, etc, automatically.
Testing:
Manually started up the containers connected to a user-defined bridge
network, tweaked the configurations to point to the HMS/HDFS/etc
running on my host. I then used "docker ps" to figure out the
port mappings for beeswax and debug webserver.
Confirmed that I could run a query and access debug pages:
$ impala-shell.sh -i localhost:32860 -q "select coordinator()"
Starting Impala Shell without Kerberos authentication
Opened TCP connection to localhost:32860
Connected to localhost:32860
Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build
d7870fe03645490f95bd5ffd4a2177f90eb2f3c0)
Query: select coordinator()
Query submitted at: 2018-12-11 15:51:04 (Coordinator:
http://8063e77ce999:25000)
Query progress can be monitored at:
http://8063e77ce999:25000/query_plan?query_id=1b4d03f0f0f1fcfb:b0b37e5000000000
+---------------+
| coordinator() |
+---------------+
| 8063e77ce999 |
+---------------+
Fetched 1 row(s) in 0.11s
Change-Id: Ifea707aa3cc23e4facda8ac374160c6de23ffc4e
Reviewed-on: http://gerrit.cloudera.org:8080/12074
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>