This adds a script, docker/publish_images_to_apache.sh,
that allows uploading images to the apache/impala docker hub
repo, prefixed with a version string. E.g. with the following
commands:
ninja docker_images quickstart_docker_images
./docker/publish_images_to_apache.sh -v 81d5377c2
The uploaded images can then be used for the quickstart cluster,
as documented in docker/README.
Updated docs for quickstart to use a prefix from apache/impala
Remove IMPALA_QUICKSTART_VERSION, which doesn't interact well with
the tagging since the image name and version are now encoded in the
tag.
Fix an incorrect image name added to docker-images.txt:
impala_profile_tool_image.
Testing:
Ran Impala quickstart with data loading using instructions in README.
export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-"
docker network create -d bridge quickstart-network
export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}')
export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP
docker-compose -f docker/quickstart.yml \
-f docker/quickstart-kudu-minimal.yml \
-f docker/quickstart-load-data.yml up -d
docker run --network=quickstart-network -it \
${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client
impala-shell
Change-Id: I535d77e565b73d732ae511d7525193467086c76a
Reviewed-on: http://gerrit.cloudera.org:8080/17030
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
6.0 KiB
Docker-related scripts for Impala
test-with-docker.py runs the Impala build and tests inside of Docker
containers, parallelizing the test execution across test suites. See that file
for more details.
This also contains infrastructure to build impala_base, catalogd,
statestored, impalad_coordinator, impalad_executor and
impalad_coord_exec container images from the output of an Impala build.
The containers can be built via the CMake target docker_images. See
CMakeLists.txt for the build targets.
Docker Quickstart with docker-compose
Various docker-compose files in this directory provide a convenient way to run
a basic Impala service with a single Impala Daemon and minimal set of supporting
services. A Hive MetaStore service is used to manage metadata. All filesystem data
is stored in Docker volumes. The default storage location for tables is in the
impala-quickstart-warehouse volume, i.e. if you create a table in Impala, it will
be stored in that volume by default.
Prerequisites:
A docker network called quickstart-network must be created, and the QUICKSTART_IP and
QUICKSTART_LISTEN_ADDR environment variables must be set.
docker network create -d bridge quickstart-network
export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}')
export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP
If you want the cluster to be open to connections from other hosts, you can set
QUICKSTART_LISTEN_ADDR:
export QUICKSTART_LISTEN_ADDR=0.0.0.0
You can optionally set IMPALA_QUICKSTART_IMAGE_PREFIX to pull prebuilt images from a DockerHub repo.
For example, the following will use images like apache/impala:81d5377c2-impalad_coordinator:
export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-"
Leave IMPALA_QUICKSTART_IMAGE_PREFIX unset to use images built from a local Impala dev environment.
Starting the cluster:
To start the base quickstart cluster without Kudu:
docker-compose -f docker/quickstart.yml up -d
To load data in background into Parquet and Kudu formats:
docker-compose -f docker/quickstart.yml -f docker/quickstart-kudu-minimal.yml \
-f docker/quickstart-load-data.yml up -d
To follow the data loading process, you can use the docker logs command, e.g.:
docker logs -f docker_data-loader_1
Connecting to the cluster:
The impala service can be connected to $QUICKSTART_IP, or if you set
QUICKSTART_LISTEN_ADDR=0.0.0.0, you can connect to it on localhost or your
machine's host name.
Connecting with containerized impala-shell:
docker run --network=quickstart-network -it \
${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client impala-shell
Or with a pre-installed impala-shell:
impala-shell -i ${QUICKSTART_IP}
Accessing the Warehouse volume
If you want to directly interact with the contents of the warehouse in the
impala-quickstart-warehouse Docker volume or copy data from the host into the
quickstart warehouse, you can mount the volume in another container. E.g. to run
an Ubuntu 18.04 container with the warehouse directory mounted at
/user/hive/warehouse and your home directory mounted at /host_dir, you
can run the following command:
docker run -v ~:/host_dir -v docker_impala-quickstart-warehouse:/user/hive/warehouse \
-it ubuntu:18.04 /bin/bash
In the container, you can find the external and managed tablespaces stored in
the impala-quickstart-warehouse volume, for example:
root@377747c68bfa:/# ls /user/hive/warehouse/external/tpcds_raw/
call_center customer_demographics inventory store_returns web_sales
catalog_page date_dim item store_sales web_site
catalog_returns dbgen_version promotion time_dim
catalog_sales generated reason warehouse
customer household_demographics ship_mode web_page
customer_address income_band store web_returns
t@377747c68bfa:/# head -n2 /user/hive/warehouse/external/tpcds_raw/time_dim/time_dim.dat
0|AAAAAAAABAAAAAAA|0|0|0|0|AM|third|night||
1|AAAAAAAACAAAAAAA|1|0|0|1|AM|third|night||
It is then possible to copy data files from the host into an external table. In impala-shell, create an external table:
create external table quickstart_example(s string)
stored as textfile
location '/user/hive/warehouse/external/quickstart_example';
Then in the host and container shells, create a text file and copy it into the external table directory.
# On host:
echo 'hello world' > ~/hw.txt
# In container:
cp /host_dir/hw.txt /user/hive/warehouse/external/quickstart_example
You can then refresh the table to pick up the data file and query the table:
refresh quickstart_example;
select * from quickstart_example;
Environment Variable Overrides:
The following environment variables influence the behaviour of the various quickstart docker compose files.
KUDU_QUICKSTART_VERSION- defaults to latest, can be overridden to a different tag to use different Kudu images.IMPALA_QUICKSTART_IMAGE_PREFIX- defaults to using local images, change to to a different prefix to pick up prebuilt images.QUICKSTART_LISTEN_ADDR- can be set to either$QUICKSTART_IPto listen on only the docker network interface, or0.0.0.0to listen on all interfaces.
Publishing Quickstart Docker Images (for developers)
To publish the images you need to build locally then run publish_images_to_apache.sh
to tag and push them to a docker repository. For example, to tag the images with the
current commit hash and upload them to the default apache/impala Docker repository,
you can run the following commands:
cd $IMPALA_HOME
IMAGE_VERSION=$(git rev-parse --short HEAD)
./buildall.sh -release -noclean -ninja -skiptests -notests
ninja docker_images quickstart_docker_images
./docker/publish_images_to_apache.sh -v ${IMAGE_VERSION} -
For official Impala releases you will want to use the release version instead.