mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Files

Andrew Sherman c3fff37236 IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

When the Impala docker images are deployed in production environments,
it can be hard to add debugging tools at runtime. Two of the most
useful diagnostic tools are jstack and pstack, which can be used to
print Java and native stack traces. Install these tools into Redhat
images which are the most commonly used in production.

To install pstack we install gdb
To install jstack we install a development jdk on top of the headless
jdk.

Extend the install_os_packages.sh script to add an argument to
--install-debug-tools to set the level of diagnostic tools to install.
The possible arguments are:
  none - install no extra tools
  basic - install pstack and jstack
  full - install more debugging tools.

In a Centos 8.5 build, the size of a impalad_coord_exec image increased
from 1.74GB to 1.85GB, as reported by ‘docker image list’.

What other tools might be added?
- Installing perf is tricky as in a container perf requires an
  installation specific to the underlying linux kernel image, which is
  hard to predict at build time.
- Installing pprof is hard as installation seems to require compiling
  from sources. Clearly there are many options and we cannot install
  everything.

TESTING

Built release and debug docker images, and used jstack and pstack in a
running container to print Impala's stacks.

Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Reviewed-on: http://gerrit.cloudera.org:8080/21433
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>

2024-08-05 10:04:10 +08:00

admissiond

IMPALA-8770: Support building Docker images on Redhat-based distributions

2022-10-11 20:30:50 +00:00

catalogd

IMPALA-8770: Support building Docker images on Redhat-based distributions

2022-10-11 20:30:50 +00:00

impala_base

IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-08-05 10:04:10 +08:00

impala_profile_tool

IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-08-05 10:04:10 +08:00

impalad_coord_exec

IMPALA-8770: Support building Docker images on Redhat-based distributions

2022-10-11 20:30:50 +00:00

impalad_coordinator

IMPALA-8770: Support building Docker images on Redhat-based distributions

2022-10-11 20:30:50 +00:00

impalad_executor

IMPALA-8770: Support building Docker images on Redhat-based distributions

2022-10-11 20:30:50 +00:00

quickstart_client

IMPALA-11585: Build quickstart_client with Ubuntu 20

2022-09-26 23:10:19 +00:00

quickstart_conf

IMPALA-9793: Impala quickstart cluster with docker-compose

2021-01-26 11:22:08 +00:00

quickstart_hms

IMPALA-9793: Impala quickstart cluster with docker-compose

2021-01-26 11:22:08 +00:00

statestored

IMPALA-8770: Support building Docker images on Redhat-based distributions

2022-10-11 20:30:50 +00:00

annotate.py

IMPALA-9627: Update utility scripts for Python 3 (part 2)

2023-04-26 18:52:23 +00:00

CMakeLists.txt

IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-08-05 10:04:10 +08:00

configure_test_network.sh

IMPALA-7995: part 2: Jenkins script to automate e2e tests

2019-04-13 02:54:19 +00:00

daemon_entrypoint.sh

IMPALA-11941: Support Java 17 in Impala

2023-06-24 10:11:54 +00:00

docker-build.sh

IMPALA-9793: Impala quickstart cluster with docker-compose

2021-01-26 11:22:08 +00:00

entrypoint.sh

IMPALA-12566: Fix RpcMgrKerberizedTest on RedHat 8

2024-01-18 00:00:01 +00:00

install_os_packages.sh

IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-08-05 10:04:10 +08:00

monitor.py

IMPALA-9627: Use universal_newlines for Python 3

2023-04-28 23:28:49 +00:00

publish_images_to_apache.sh

IMPALA-11585: Build quickstart_client with Ubuntu 20

2022-09-26 23:10:19 +00:00

push-images.sh

Add support to tag docker images when pushing them

2019-07-26 18:22:05 +00:00

quickstart-kudu-minimal.yml

IMPALA-9793: Impala quickstart cluster with docker-compose

2021-01-26 11:22:08 +00:00

quickstart-load-data.yml

IMPALA-10469: push quickstart to apache repo

2021-02-10 06:56:45 +00:00

quickstart.yml

IMPALA-10469: push quickstart to apache repo

2021-02-10 06:56:45 +00:00

README.md

IMPALA-10469: push quickstart to apache repo

2021-02-10 06:56:45 +00:00

setup_build_context.py

IMPALA-12081: Produce multiple Java docker images

2023-05-19 22:19:24 +00:00

test-with-docker.py

IMPALA-9627: Use universal_newlines for Python 3

2023-04-28 23:28:49 +00:00

timeline.html.template

Prettify the timeline produced by test-with-docker.py

2018-10-09 19:12:50 +00:00

utility_entrypoint.sh

IMPALA-12355: Make utility_entrypoint arch-agnostic

2023-11-03 16:51:00 +00:00

README.md

test-with-docker.py runs the Impala build and tests inside of Docker containers, parallelizing the test execution across test suites. See that file for more details.

This also contains infrastructure to build impala_base, catalogd, statestored, impalad_coordinator, impalad_executor and impalad_coord_exec container images from the output of an Impala build. The containers can be built via the CMake target docker_images. See CMakeLists.txt for the build targets.

Docker Quickstart with docker-compose

Various docker-compose files in this directory provide a convenient way to run a basic Impala service with a single Impala Daemon and minimal set of supporting services. A Hive MetaStore service is used to manage metadata. All filesystem data is stored in Docker volumes. The default storage location for tables is in the impala-quickstart-warehouse volume, i.e. if you create a table in Impala, it will be stored in that volume by default.

Prerequisites:

A docker network called quickstart-network must be created, and the QUICKSTART_IP and QUICKSTART_LISTEN_ADDR environment variables must be set.

docker network create -d bridge quickstart-network
export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}')
export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP

If you want the cluster to be open to connections from other hosts, you can set QUICKSTART_LISTEN_ADDR:

export QUICKSTART_LISTEN_ADDR=0.0.0.0

You can optionally set IMPALA_QUICKSTART_IMAGE_PREFIX to pull prebuilt images from a DockerHub repo. For example, the following will use images like apache/impala:81d5377c2-impalad_coordinator:

  export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-"

Leave IMPALA_QUICKSTART_IMAGE_PREFIX unset to use images built from a local Impala dev environment.

Starting the cluster:

To start the base quickstart cluster without Kudu:

  docker-compose -f docker/quickstart.yml up -d

To load data in background into Parquet and Kudu formats:

  docker-compose -f docker/quickstart.yml -f docker/quickstart-kudu-minimal.yml \
                 -f docker/quickstart-load-data.yml up -d

To follow the data loading process, you can use the docker logs command, e.g.:

  docker logs -f docker_data-loader_1

Connecting to the cluster:

The impala service can be connected to $QUICKSTART_IP, or if you set QUICKSTART_LISTEN_ADDR=0.0.0.0, you can connect to it on localhost or your machine's host name.

Connecting with containerized impala-shell:

  docker run --network=quickstart-network -it \
     ${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client impala-shell

Or with a pre-installed impala-shell:

  impala-shell -i ${QUICKSTART_IP}

Accessing the Warehouse volume

If you want to directly interact with the contents of the warehouse in the impala-quickstart-warehouse Docker volume or copy data from the host into the quickstart warehouse, you can mount the volume in another container. E.g. to run an Ubuntu 18.04 container with the warehouse directory mounted at /user/hive/warehouse and your home directory mounted at /host_dir, you can run the following command:

docker run -v ~:/host_dir -v docker_impala-quickstart-warehouse:/user/hive/warehouse \
    -it ubuntu:18.04 /bin/bash

In the container, you can find the external and managed tablespaces stored in the impala-quickstart-warehouse volume, for example:

root@377747c68bfa:/# ls /user/hive/warehouse/external/tpcds_raw/
call_center       customer_demographics   inventory  store_returns  web_sales
catalog_page      date_dim                item       store_sales    web_site
catalog_returns   dbgen_version           promotion  time_dim
catalog_sales     generated               reason     warehouse
customer          household_demographics  ship_mode  web_page
customer_address  income_band             store      web_returns
t@377747c68bfa:/# head -n2 /user/hive/warehouse/external/tpcds_raw/time_dim/time_dim.dat
0|AAAAAAAABAAAAAAA|0|0|0|0|AM|third|night||
1|AAAAAAAACAAAAAAA|1|0|0|1|AM|third|night||

It is then possible to copy data files from the host into an external table. In impala-shell, create an external table:

create external table quickstart_example(s string)
stored as textfile
location '/user/hive/warehouse/external/quickstart_example';

Then in the host and container shells, create a text file and copy it into the external table directory.

# On host:
echo 'hello world' > ~/hw.txt

# In container:
cp /host_dir/hw.txt /user/hive/warehouse/external/quickstart_example

You can then refresh the table to pick up the data file and query the table:

refresh quickstart_example;
select * from quickstart_example;

Environment Variable Overrides:

The following environment variables influence the behaviour of the various quickstart docker compose files.

KUDU_QUICKSTART_VERSION - defaults to latest, can be overridden to a different tag to use different Kudu images.
IMPALA_QUICKSTART_IMAGE_PREFIX - defaults to using local images, change to to a different prefix to pick up prebuilt images.
QUICKSTART_LISTEN_ADDR - can be set to either $QUICKSTART_IP to listen on only the docker network interface, or 0.0.0.0 to listen on all interfaces.

Publishing Quickstart Docker Images (for developers)

To publish the images you need to build locally then run publish_images_to_apache.sh to tag and push them to a docker repository. For example, to tag the images with the current commit hash and upload them to the default apache/impala Docker repository, you can run the following commands:

cd $IMPALA_HOME
IMAGE_VERSION=$(git rev-parse --short HEAD)
./buildall.sh -release -noclean -ninja -skiptests -notests
ninja docker_images quickstart_docker_images
./docker/publish_images_to_apache.sh -v ${IMAGE_VERSION} -

For official Impala releases you will want to use the release version instead.

README.md

Docker-related scripts for Impala

Docker Quickstart with docker-compose

Prerequisites:

Starting the cluster:

Connecting to the cluster:

Connecting with containerized impala-shell:

Accessing the Warehouse volume

Environment Variable Overrides:

Publishing Quickstart Docker Images (for developers)