IMPALA-12053: Expose event-processor error message in WebUI

When the event-processor goes into the ERROR/NEEDS_INVALIDATE state, we
can only check logs to get the detailed information. This is
inconvenient in triaging failures. This patch exposes the error message
in the /events WebUI. It includes the timestamp string and the
stacktrace of the exception.

This patch makes the /events page visable. Also modifies the test code
of EventProcessorUtils.wait_for_synced_event_id() to print the error
message if the event processor is down.

A trivial bug of lastProcessedEvent is not updated (IMPALA-11588) is
also fixed in this patch. Refactored the variable to be a member of the
class so internal methods can update it before processing each event.

Some new metrics are not added in the /events page, e.g.
latest-event-id, latest-event-time-ms, last-synced-event-time-ms. This
patch addes them and also add a metric of event-processing-delay-ms
which is latest-event-time-ms minors last-synced-event-time-ms.

Tests:
 - Manually inject codes to fail the event processor and verified the
   WebUI.
 - Ran metadata/test_event_processing.py when the event processor is in
   ERROR state. Verified the error message is shown up in test output.

Change-Id: I077375422bc3d24eed57c95c6b05ac408228f083
Reviewed-on: http://gerrit.cloudera.org:8080/19916
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
stiga-huang
2023-05-23 16:50:58 +08:00
committed by Impala Public Jenkins
parent 683bef1ca4
commit 1cf8f5065a
6 changed files with 75 additions and 17 deletions

View File

@@ -60,7 +60,9 @@ class EventProcessorUtils(object):
break
status = EventProcessorUtils.get_event_processor_status()
if status not in ["ACTIVE", "PAUSED"]:
raise Exception("Event processor is not working. Status: {0}".format(status))
error_msg = EventProcessorUtils.get_event_processor_error_msg()
raise Exception("Event processor is not working. Status: {0}. Error msg: {1}"
.format(status, error_msg))
made_progress = current_synced_id > last_synced_id
if t >= end_time:
raise Exception(
@@ -115,6 +117,17 @@ class EventProcessorUtils(object):
pairs = [strip_pair(kv.split(':')) for kv in metrics if kv]
return dict(pairs)
@staticmethod
def get_event_processor_error_msg():
"""Scrapes the catalog's /events webpage and return the error message (if exists) of
the event processor"""
response = requests.get("%s/events?json" % EventProcessorUtils.DEFAULT_CATALOG_URL)
assert response.status_code == requests.codes.ok
res_json = json.loads(response.text)
if "event_processor_error_msg" in res_json:
return res_json["event_processor_error_msg"].strip()
return None
@staticmethod
def get_int_metric(metric_key, default_val=None):
"""Returns the int value of event processor metric from the /events catalogd debug