impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2025-12-20 18:37:21 -05:00

Commit Graph

Author	SHA1	Message	Date
Andrew Sherman	7cee01d1ba	IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor. The test 'test_jvm_pause_monitor_logs_entries' stops and starts an impalad, and confirms that that the JVM pause monitor detects the pause by looking for a specific message in the log. In a test run the test failed to find the correct message after sleeping for 1.2 seconds. Because the test notes the last message that it sees in the log, we can observe that the test would have found the correct message if it had waited for just a few more milliseconds. This change increases the time that the test waits to 2 seconds. TESTING: Ran end-to-end tests cleanly and checked that test_jvm_pause_monitor_logs_entries ran OK. Change-Id: I735c0c0ecfd3a9099c9cef332c5e79854bec7b8d Reviewed-on: http://gerrit.cloudera.org:8080/12475 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-14 02:57:52 +00:00
Philip Zeyliger	abd230647f	IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null \| tr -d ' ')/stat \| awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null \| tr -d ' ')/stat \| awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Reviewed-on: http://gerrit.cloudera.org:8080/11468 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-27 01:55:09 +00:00
Bharath Vissapragada	4976ff3c07	IMPALA-6857: Add Jvm pause/GC Monitor utility and expose JMX metrics Pause monitor: ============= This commit adds a stripped down version of Hadoop's JvmPauseMonitor class (https://bit.ly/2O6qSwm) . The core implementaion is borrowed from hadoop-common project and the hadoop dependencies are removed. - Removed dependency on AbstractService. - Not relying on Hadoop's Configuration object for reading confs. - Switched to Guava's implementation of Stopwatch. This utility class can detect both GC/non-GC pauses. In case of GC pauses, the GC metrics during the pause period are logged. Sample Output: ============= Detected pause in JVM or host machine (eg GC): pause of approximately 2356ms GC pool 'PS MarkSweep' had collection(s): count=1 time=2241ms GC pool 'PS Scavenge' had collection(s): count=3 time=352ms Detected pause in JVM or host machine (eg GC): pause of approximately 1964ms GC pool 'PS MarkSweep' had collection(s): count=1 time=2082ms GC pool 'PS Scavenge' had collection(s): count=1 time=251ms Detected pause in JVM or host machine (eg GC): pause of approximately 2120ms GC pool 'PS MarkSweep' had collection(s): count=1 time=2454ms Detected pause in JVM or host machine (eg GC): pause of approximately 2238ms GC pool 'PS MarkSweep' had collection(s): count=5 time=13464ms Detected pause in JVM or host machine (eg GC): pause of approximately 2233ms GC pool 'PS MarkSweep' had collection(s): count=1 time=2733ms JMX Metrics: ============ JMX metrics are now emmitted for Impala and Catalog JVMs at the web end point /jmx. - Impalad: http(s)://<impalad-host>:25000/jmx - Catalogd: http(s)://<catalogd-host>:25020/jmx Misc: ==== Renamed JvmMetric -> JvmMemoryMetric to make the intent more clear. It doesn't relate to the functionality of the patch in anyway. Testing: ======= - Tested it manually with kill -SIGSTOP/-SIGCONT <pid>. Made sure that the non-GC JVM pauses are logged. - This class' functionality is tested manually by invoking it's main() - Injected a memory leak into the Catalog server code and made sure the GC is detected. Change-Id: I30d897b7e063846ad6d8f88243e2c04264da0341 Reviewed-on: http://gerrit.cloudera.org:8080/10998 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-09 04:33:59 +00:00

Author

SHA1

Message

Date

Andrew Sherman

7cee01d1ba

IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.

The test 'test_jvm_pause_monitor_logs_entries' stops and starts an
impalad, and confirms that that the JVM pause monitor detects the pause
by looking for a specific message in the log. In a test run the test
failed to find the correct message after sleeping for 1.2 seconds.
Because the test notes the last message that it sees in the log, we can
observe that the test would have found the correct message if it had
waited for just a few more milliseconds.

This change increases the time that the test waits to 2 seconds.

TESTING:
 Ran end-to-end tests cleanly and checked that
 test_jvm_pause_monitor_logs_entries ran OK.

Change-Id: I735c0c0ecfd3a9099c9cef332c5e79854bec7b8d
Reviewed-on: http://gerrit.cloudera.org:8080/12475
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2019-02-14 02:57:52 +00:00

Philip Zeyliger

abd230647f

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }')
  for i in $(seq 100); do
    curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Reviewed-on: http://gerrit.cloudera.org:8080/11468
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2018-09-27 01:55:09 +00:00

Bharath Vissapragada

4976ff3c07

IMPALA-6857: Add Jvm pause/GC Monitor utility and expose JMX metrics

Pause monitor:
=============

This commit adds a stripped down version of Hadoop's JvmPauseMonitor
class (https://bit.ly/2O6qSwm) . The core implementaion is borrowed
from hadoop-common project and the hadoop dependencies are removed.

- Removed dependency on AbstractService.
- Not relying on Hadoop's Configuration object for reading confs.
- Switched to Guava's implementation of Stopwatch.

This utility class can detect both GC/non-GC pauses. In case of GC
pauses, the GC metrics during the pause period are logged.

Sample Output:
=============
Detected pause in JVM or host machine (eg GC): pause of approximately
2356ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2241ms
GC pool 'PS Scavenge' had collection(s): count=3 time=352ms
Detected pause in JVM or host machine (eg GC): pause of approximately
1964ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2082ms
GC pool 'PS Scavenge' had collection(s): count=1 time=251ms
Detected pause in JVM or host machine (eg GC): pause of approximately
2120ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2454ms
Detected pause in JVM or host machine (eg GC): pause of approximately
2238ms
GC pool 'PS MarkSweep' had collection(s): count=5 time=13464ms
Detected pause in JVM or host machine (eg GC): pause of approximately
2233ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2733ms

JMX Metrics:
============

JMX metrics are now emmitted for Impala and Catalog JVMs at the web end
point /jmx.

- Impalad: http(s)://<impalad-host>:25000/jmx
- Catalogd: http(s)://<catalogd-host>:25020/jmx

Misc:
====

Renamed JvmMetric -> JvmMemoryMetric to make the intent more clear. It
doesn't relate to the functionality of the patch in anyway.

Testing:
=======
- Tested it manually with kill -SIGSTOP/-SIGCONT <pid>. Made sure that
  the non-GC JVM pauses are logged.
- This class' functionality is tested manually by invoking it's main()
- Injected a memory leak into the Catalog server code and made sure the
  GC is detected.

Change-Id: I30d897b7e063846ad6d8f88243e2c04264da0341
Reviewed-on: http://gerrit.cloudera.org:8080/10998
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2018-08-09 04:33:59 +00:00

3 Commits