mirror of
https://github.com/apache/impala.git
synced 2026-02-03 00:00:40 -05:00
This refactors start-impala-cluster.py to allow multiple implementations
of the minicluster operations like start and stop. There are now
two classes implementing the same set of operations -
MiniClusterOperations and DockerMiniClusterOperations. The docker
versions start and stop the containers added in IMPALA-7948.
With some configuration (see instructions below), the containers can
connect back to services (HDFS, HMS, Kudu, Sentry, etc) running on the
host. Config generation was modified so that services optionally
communicate via the docker bridge network rather than loopback
(the host's loopback interface is not accessible to the containers).
Notes:
* I improved the container build to regenerate containers when cluster
configs are regenerated (previously the containers could have stale
configs).
* Switch from CMD to ENTRYPOINT to allow passing in arguments to "docker
run" without clobbering default args.
* Python 2.6 is not supported for this code path. This only affects
CentOS 6, which has limited support for docker anyway.
* I deferred implementing wait_for_cluster(), since the existing
code requires surgery to abstract out assumptions about locating
processes and web UI ports - see IMPALA-7988.
How to use:
==========
Create a docker network to use for internal cluster communication,
e.g.:
docker network create -d bridge --gateway=172.17.0.1 \
--subnet=172.17.0.1/16 impala-cluster
Add the gateway address of the docker network you created to
impala-config-local.sh, e.g.:
export INTERNAL_LISTEN_HOST=172.17.0.1
export DEFAULT_FS=hdfs://${INTERNAL_LISTEN_HOST}:20500
Regenerate configs and docker images:
. bin/impala-config.sh
./bin/create-test-configuration.sh
ninja -j $IMPALA_BUILD_THREADS docker_images
Restart the minicluster and Impala services to pick up the config:
./testdata/bin/run-all.sh
start-impala-cluster.py --docker_network impala-cluster
You can connect with impala-shell and run some queries. You will
likely run into issues, particularly if running against an existing
data load, since "localhost" or "127.0.0.1" get baked into HMS
table definitions.
Testing:
Ran exhaustive tests (not using Docker) to make sure I didn't break
anything.
Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11
Reviewed-on: http://gerrit.cloudera.org:8080/12095
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>