impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2025-12-21 19:08:12 -05:00

Commit Graph

Author	SHA1	Message	Date
Philip Zeyliger	2e6a63e31e	IMPALA-6070: Further improvements to test-with-docker. This commit tackles a few additions and improvements to test-with-docker. In general, I'm adding workloads (e.g., exhaustive, rat-check), tuning memory setting and parallelism, and trying to speed things up. Bug fixes: * Embarassingly, I was still skipping thrift-server-test in the backend tests. This was a mistake in handling feedback from my last review. * I made the timeline a little bit taller to clip less. Adding workloads: * I added the RAT licensing check. * I added exhaustive runs. This led me to model the suites a little bit more in Python, with a class representing a suite with a bunch of data about the suite. It's not perfect and still coupled with the entrypoint.sh shell script, but it feels workable. As part of adding exhaustive tests, I had to re-work the timeout handling, since now different suites meaningfully have different timeouts. Speed ups: * To speed up test runs, I added a mechanism to split py.test suites into multiple shards with a py.test argument. This involved a little bit of work in conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh. Furthermore, I moved a bit more logic about managing the list of suites into Python. * Doing the full build with "-notests" and only building the backend tests in the relevant target that needs them. This speeds up "docker commit" significantly by removing about 20GB from the container. I had to indicates that expr-codegen-test depends on expr-codegen-test-ir, which was missing. * I sped up copying the Kudu data: previously I did both a move and a copy; now I'm doing a move followed by a move. One of the moves is cross-filesystem so is slow, but this does half the amount of copying. Memory usage: * I tweaked the memlimit_gb settings to have a higher default. I've been fighting empirically to have the tests run well on c4.8xlarge and m4.10xlarge. The more memory a minicluster and test suite run uses, the fewer parallel suites we can run. By observing the peak processes at the tail of a run (with a new "memory_usage" function that uses a ps/sort/awk trick) and by observing peak container total_rss, I found that we had several JVMs that didn't have Xmx settings set. I added Xms/Xmx settings in a few places: * The non-first Impalad does very little JVM work, so having an Xmx keeps it small, even in the parallel tests. * Datanodes do work, but they essentially were never garbage collecting, because JVM defaults let them use up to 1/4th the machine memory. (I observed this based on RSS at the end of the run; nothing fancier.) Adding Xms/Xmx settings helped. * Similarly, I piped the settings through to HBase. A few daemons still run without resource limitations, but they don't seem to be a problem. Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c Reviewed-on: http://gerrit.cloudera.org:8080/10123 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-26 20:47:29 +00:00
Philip Zeyliger	2896b8d127	IMPALA-6070: Expose using Docker to run tests faster. Allows running the tests that make up the "core" suite in about 2 hours. By comparison, https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend tends to run in about 3.5 hours. This commit: * Adds "echo" statements in a few places, to facilitate timing. * Adds --skip-parallel/--skip-serial flags to run-tests.py, and exposes them in run-all-tests.sh. * Marks TestRuntimeFilters as a serial test. This test runs queries that need > 1GB of memory, and, combined with other tests running in parallel, can kill the parallel test suite. * Adds "test-with-docker.py", which runs a full build, data load, and executes tests inside of Docker containers, generating a timeline at the end. In short, one container is used to do the build and data load, and then this container is re-used to run various tests in parallel. All logs are left on the host system. Besides the obvious win of getting test results more quickly, this commit serves as an example of how to get various bits of Impala development working inside of Docker containers. For example, Kudu relies on atomic rename of directories, which isn't available in most Docker filesystems, and entrypoint.sh works around it. In addition, the timeline generated by the build suggests where further optimizations can be made. Most obviously, dataload eats up a precious ~30-50 minutes, on a largely idle machine. This work is significantly CPU and memory hungry. It was developed on a 32-core, 120GB RAM Google Compute Engine machine. I've worked out parallelism configurations such that it runs nicely on 60GB of RAM (c4.8xlarge) and over 100GB (eg., m4.10xlarge, which has 160GB). There is some simple logic to guess at some knobs, and there are knobs. By and large, EC2 and GCE price machines linearly, so, if CPU usage can be kept up, it's not wasteful to run on bigger machines. Change-Id: I82052ef31979564968effef13a3c6af0d5c62767 Reviewed-on: http://gerrit.cloudera.org:8080/9085 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-06 06:40:07 +00:00

Author

SHA1

Message

Date

Philip Zeyliger

2e6a63e31e

IMPALA-6070: Further improvements to test-with-docker.

This commit tackles a few additions and improvements to
test-with-docker. In general, I'm adding workloads (e.g., exhaustive,
rat-check), tuning memory setting and parallelism, and trying to speed
things up.

Bug fixes:

* Embarassingly, I was still skipping thrift-server-test in the backend
  tests. This was a mistake in handling feedback from my last review.

* I made the timeline a little bit taller to clip less.

Adding workloads:

* I added the RAT licensing check.

* I added exhaustive runs. This led me to model the suites a little
  bit more in Python, with a class representing a suite with a
  bunch of data about the suite. It's not perfect and still
  coupled with the entrypoint.sh shell script, but it feels
  workable. As part of adding exhaustive tests, I had
  to re-work the timeout handling, since now different
  suites meaningfully have different timeouts.

Speed ups:

* To speed up test runs, I added a mechanism to split py.test suites into
  multiple shards with a py.test argument. This involved a little bit of work in
  conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh.

  Furthermore, I moved a bit more logic about managing the
  list of suites into Python.

* Doing the full build with "-notests" and only building
  the backend tests in the relevant target that needs them. This speeds
  up "docker commit" significantly by removing about 20GB from the
  container.  I had to indicates that expr-codegen-test depends on
  expr-codegen-test-ir, which was missing.

* I sped up copying the Kudu data: previously I did
  both a move and a copy; now I'm doing a move followed by a move. One
  of the moves is cross-filesystem so is slow, but this does half the
  amount of copying.

Memory usage:

* I tweaked the memlimit_gb settings to have a higher default. I've been
  fighting empirically to have the tests run well on c4.8xlarge and
  m4.10xlarge.

The more memory a minicluster and test suite run uses, the fewer parallel
suites we can run. By observing the peak processes at the tail of a run (with a
new "memory_usage" function that uses a ps/sort/awk trick) and by observing
peak container total_rss, I found that we had several JVMs that
didn't have Xmx settings set. I added Xms/Xmx settings in a few
places:

 * The non-first Impalad does very little JVM work, so having
   an Xmx keeps it small, even in the parallel tests.
 * Datanodes do work, but they essentially were never garbage
   collecting, because JVM defaults let them use up to 1/4th
   the machine memory. (I observed this based on RSS at the
   end of the run; nothing fancier.) Adding Xms/Xmx settings
   helped.
 * Similarly, I piped the settings through to HBase.

A few daemons still run without resource limitations, but they don't
seem to be a problem.

Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c
Reviewed-on: http://gerrit.cloudera.org:8080/10123
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2018-04-26 20:47:29 +00:00

Philip Zeyliger

2896b8d127

IMPALA-6070: Expose using Docker to run tests faster.

Allows running the tests that make up the "core" suite in about 2 hours.
By comparison, https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend
tends to run in about 3.5 hours.

This commit:
* Adds "echo" statements in a few places, to facilitate timing.
* Adds --skip-parallel/--skip-serial flags to run-tests.py,
  and exposes them in run-all-tests.sh.
* Marks TestRuntimeFilters as a serial test. This test runs
  queries that need > 1GB of memory, and, combined with
  other tests running in parallel, can kill the parallel test
  suite.
* Adds "test-with-docker.py", which runs a full build, data load,
  and executes tests inside of Docker containers, generating
  a timeline at the end. In short, one container is used
  to do the build and data load, and then this container is
  re-used to run various tests in parallel. All logs are
  left on the host system.

Besides the obvious win of getting test results more quickly, this
commit serves as an example of how to get various bits of Impala
development working inside of Docker containers. For example, Kudu
relies on atomic rename of directories, which isn't available in most
Docker filesystems, and entrypoint.sh works around it.

In addition, the timeline generated by the build suggests where further
optimizations can be made. Most obviously, dataload eats up a precious
~30-50 minutes, on a largely idle machine.

This work is significantly CPU and memory hungry. It was developed on a
32-core, 120GB RAM Google Compute Engine machine. I've worked out
parallelism configurations such that it runs nicely on 60GB of RAM
(c4.8xlarge) and over 100GB (eg., m4.10xlarge, which has 160GB). There is
some simple logic to guess at some knobs, and there are knobs.  By and
large, EC2 and GCE price machines linearly, so, if CPU usage can be kept
up, it's not wasteful to run on bigger machines.

Change-Id: I82052ef31979564968effef13a3c6af0d5c62767
Reviewed-on: http://gerrit.cloudera.org:8080/9085
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2018-04-06 06:40:07 +00:00

2 Commits