mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
At present, metastore event processor is single threaded. Notification
events are processed sequentially with a maximum limit of 1000 events
fetched and processed in a single batch. Multiple locks are used to
address the concurrency issues that may arise when catalog DDL
operation processing and metastore event processing tries to
access/update the catalog objects concurrently. Waiting for a lock or
file metadata loading of a table can slow the event processing and can
affect the processing of other events following it. Those events may
not be dependent on the previous event. Altogether it takes a very
long time to synchronize all the HMS events.
Existing metastore event processing is turned into multi-level
event processing with enable_hierarchical_event_processing flag. It
is not enabled by default. Idea is to segregate the events based on
their dependency, maintain the order of events as they occur within
the dependency and process them independently as much as possible.
Following 3 main classes represents the three level threaded event
processing.
1. EventExecutorService
It provides the necessary methods to initialize, start, clear,
stop and process the metastore events processing in hierarchical
mode. It is instantiated from MetastoreEventsProcessor and its
methods are invoked from MetastoreEventsProcessor. Upon receiving
the event to process, EventExecutorService queues the event to
appropriate DbEventExecutor for processing.
2. DbEventExecutor
An instance of this class has an execution thread, manage events
of multiple databases with DbProcessors. An instance of DbProcessor
is maintained to store the context of each database within the
DbEventExecutor. On each scheduled execution, input events on
DbProcessor are segregated to appropriate TableProcessors for the
event processing and also process the database events that are
eligible for processing.
Once a DbEventExecutor is assigned to a database, a DbProcessor
is created. And the subsequent events belonging to the database
are queued to same DbEventExecutor thread for further processing.
Hence, linearizability is ensured in dealing with events within
the database. Each instance of DbEventExecutor has a fixed list
of TableEventExecutors.
3. TableEventExecutor
An instance of this class has an execution thread, processes
events of multiple tables with TableProcessors. An instance of
TableProcessor is maintained to store context of each table within
a TableEventExecutor. On each scheduled execution, events from
TableProcessors are processed.
Once a TableEventExecutor is assigned to table, a TableProcessor
is created. And the subsequent table events are processed by same
TableEventExecutor thread. Hence, linearizability is guaranteed
in processing events of a particular table.
- All the events of a table are processed in the same order they
have occurred.
- Events of different tables are processed in parallel when those
tables are assigned to different TableEventExecutors.
Following new events are added:
1. DbBarrierEvent
This event wraps a database event. It is used to synchronize all
the TableProcessors belonging to database before processing the
database event. It acts as a barrier to restrict the processing
of table events that occurred after the database event until the
database event is processed on DbProcessor.
2. RenameTableBarrierEvent
This event wraps an alter table event for rename. It is used to
synchronize the source and target TableProcessors to
process the rename table event. It ensures the source
TableProcessor removes the table first and then allows the target
TableProcessor to create the renamed table.
3. PseudoCommitTxnEvent and PseudoAbortTxnEvent
CommitTxnEvent and AbortTxnEvent can involve multiple tables in
a transaction and processing these events modifies multiple table
objects. Pseudo events are introduced such that a pseudo event is
created for each table involved in the transaction and these
pseudo events are processed independently at respective
TableProcessors.
Following new flags are introduced:
1. enable_hierarchical_event_processing
To enable the hierarchical event processing on catalogd.
2. num_db_event_executors
To set the number of database level event executors.
3. num_table_event_executors_per_db_event_executor
To set the number of table level event executors within a
database event executor.
4. min_event_processor_idle_ms
To set the minimum time to retain idle db processors and table
processors on the database event executors and table event
executors respectively, when they do not have events to process.
5. max_outstanding_events_on_executors
To set the limit of maximum outstanding events to process on
event executors.
Changed hms_event_polling_interval_s type from int to double to support
millisecond precision interval
TODOs:
1. We need to redefine the lag in the hierarchical processing mode.
2. Need to have a mechanism to capture the actual event processing time
in hierarchical processing mode. Currently, with
enable_hierarchical_event_processing as true, lastSyncedEventId_ and
lastSyncedEventTimeSecs_ are updated upon event dispatch to
EventExecutorService for processing on respective DbEventExecutor
and/or TableEventExecutor. So lastSyncedEventId_ and
lastSyncedEventTimeSecs_ doesn't actually mean events are processed.
3. Hierarchical processing mode currently have a mechanism to show the
total number of outstanding events on all the db and table executors
at the moment. Need to enhance observability further with this mode.
Filed a jira[IMPALA-13801] to fix them.
Testing:
- Executed existing end to end tests.
- Added fe and end-to-end tests with enable_hierarchical_event_processing.
- Added event processing performance tests.
- Have executed the existing tests with hierarchical processing
mode enabled. lastSyncedEventId_ is now used in the new feature of
sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when
hierarchical processing mode is enabled because lastSyncedEventId_ do
not actually mean event is processed in this mode. This need to be
fixed/verified with above jira[IMPALA-13801].
Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Reviewed-on: http://gerrit.cloudera.org:8080/21031
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
697 lines
24 KiB
XML
697 lines
24 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept id="impala_metadata">
|
||
|
||
<title>Metadata Management</title>
|
||
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Configuring"/>
|
||
<data name="Category" value="Administrators"/>
|
||
<data name="Category" value="Developers"/>
|
||
</metadata>
|
||
</prolog>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
This topic describes various knobs you can use to control how Impala manages its metadata
|
||
in order to improve performance and scalability.
|
||
</p>
|
||
|
||
<p outputclass="toc inpage"/>
|
||
|
||
</conbody>
|
||
|
||
<concept id="on_demand_metadata">
|
||
|
||
<title>On-demand Metadata</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
In previous versions of Impala, every coordinator kept a replica of all the cache in
|
||
<codeph>catalogd</codeph>, consuming large memory on each coordinator with no option to
|
||
evict. Metadata always propagated through the <codeph>statestored</codeph> and suffers
|
||
from head-of-line blocking, for example, one user loading a big table blocking another
|
||
user loading a small table.
|
||
</p>
|
||
|
||
<p>
|
||
With this new feature, the coordinators pull metadata as needed from
|
||
<codeph>catalogd</codeph> and cache it locally. The cached metadata gets evicted
|
||
automatically under memory pressure.
|
||
</p>
|
||
|
||
<p>
|
||
The granularity of on-demand metadata fetches is now at the partition level between the
|
||
coordinator and <codeph>catalogd</codeph>. Common use cases like add/drop partitions do
|
||
not trigger unnecessary serialization/deserialization of large metadata.
|
||
</p>
|
||
|
||
<p>
|
||
This feature is disabled by default.
|
||
</p>
|
||
|
||
<p>
|
||
The feature can be used in either of the following modes.
|
||
<dl>
|
||
<dlentry>
|
||
|
||
<dt>
|
||
Metadata on-demand mode
|
||
</dt>
|
||
|
||
<dd>
|
||
In this mode, all coordinators use the metadata on-demand.
|
||
</dd>
|
||
|
||
<dd>
|
||
Set the following on <codeph>catalogd</codeph>:
|
||
<codeblock>--catalog_topic_mode=minimal</codeblock>
|
||
</dd>
|
||
|
||
<dd>
|
||
Set the following on all <codeph>impalad</codeph> coordinators:
|
||
<codeblock>--use_local_catalog=true</codeblock>
|
||
</dd>
|
||
|
||
</dlentry>
|
||
|
||
<dlentry>
|
||
|
||
<dt>
|
||
Mixed mode
|
||
</dt>
|
||
|
||
<dd>
|
||
In this mode, only some coordinators are enabled to use the metadata on-demand.
|
||
</dd>
|
||
|
||
<dd>
|
||
We recommend that you use the mixed mode only for testing local catalog’s impact
|
||
on heap usage.
|
||
</dd>
|
||
|
||
<dd>
|
||
Set the following on <codeph>catalogd</codeph>:
|
||
<codeblock>--catalog_topic_mode=mixed</codeblock>
|
||
</dd>
|
||
|
||
<dd>
|
||
Set the following on <codeph>impalad</codeph> coordinators with metdadata
|
||
on-demand:
|
||
<codeblock>--use_local_catalog=true </codeblock>
|
||
</dd>
|
||
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>Flags related to <codeph>use_local_catalog</codeph></dt>
|
||
<dd>When <codeph>use_local_catalog</codeph> is enabled or set to <codeph>True</codeph> on the impalad
|
||
coordinators the following list of flags configure various parameters as described below. It is not
|
||
recommended to change the default values on these flags.
|
||
</dd>
|
||
<dd>
|
||
<ul>
|
||
<li>The flag <codeph>local_catalog_cache_mb</codeph> (defaults to -1) configures the
|
||
size of the catalog cache within each coordinator. With the default set to -1, the
|
||
cache is auto-configured to 60% of the configured Java heap size. Note that the
|
||
Java heap size is distinct from and typically smaller than the overall Impala
|
||
memory limit.</li>
|
||
<li>The flag <codeph>local_catalog_cache_expiration_s</codeph> (defaults to 3600) configures the
|
||
expiration time of the catalog cache within each impalad. Even if the configured
|
||
cache capacity has not been reached, items are removed from the cache if they have not
|
||
been accessed in the defined amount of time.</li>
|
||
<li>The flag <codeph>local_catalog_max_fetch_retries</codeph> (defaults to 40) configures
|
||
the maximum number of retries needed for queries to fetch a metadata object from the impalad
|
||
coordinator's local catalog cache.</li>
|
||
</ul>
|
||
</dd>
|
||
</dlentry>
|
||
</dl>
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="auto_invalidate_metadata">
|
||
|
||
<title>Automatic Invalidation of Metadata Cache</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
To keep the size of metadata bounded, <codeph>catalogd</codeph> periodically scans all
|
||
the tables and invalidates those not recently used. There are two types of
|
||
configurations for <codeph>catalogd</codeph> and <codeph>impalad</codeph>.
|
||
</p>
|
||
|
||
<dl>
|
||
<dlentry>
|
||
|
||
<dt>
|
||
Time-based cache invalidation
|
||
</dt>
|
||
|
||
<dd>
|
||
<codeph>Catalogd</codeph> invalidates tables that are not recently used in the
|
||
specified time period (in seconds).
|
||
</dd>
|
||
|
||
<dd>
|
||
The <codeph>‑‑invalidate_tables_timeout_s</codeph> flag needs to be
|
||
applied to both <codeph>impalad</codeph> and <codeph>catalogd</codeph>.
|
||
</dd>
|
||
|
||
</dlentry>
|
||
|
||
<dlentry>
|
||
|
||
<dt>
|
||
Memory-based cache invalidation
|
||
</dt>
|
||
|
||
<dd>
|
||
When the memory pressure reaches 60% of JVM heap size after a Java garbage
|
||
collection in <codeph>catalogd</codeph>, Impala invalidates 10% of the least
|
||
recently used tables.
|
||
</dd>
|
||
|
||
<dd>
|
||
The <codeph>‑‑invalidate_tables_on_memory_pressure</codeph> flag needs
|
||
to be applied to both <codeph>impalad</codeph> and <codeph>catalogd</codeph>.
|
||
</dd>
|
||
|
||
</dlentry>
|
||
</dl>
|
||
|
||
<p>
|
||
Automatic invalidation of metadata provides more stability with lower chances of running
|
||
out of memory, but the feature could potentially cause performance issues and may
|
||
require tuning.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="auto_poll_hms_notification">
|
||
|
||
<title>Automatic Invalidation/Refresh of Metadata</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
When tools such as Hive and Spark are used to process the raw data ingested into Hive
|
||
tables, new HMS metadata (database, tables, partitions) and filesystem metadata (new
|
||
files in existing partitions/tables) is generated. In previous versions of Impala, in
|
||
order to pick up this new information, Impala users needed to manually issue an
|
||
<codeph>INVALIDATE</codeph> or <codeph>REFRESH</codeph> commands.
|
||
</p>
|
||
|
||
<p>
|
||
When automatic invalidate/refresh of metadata is enabled, <codeph>catalogd</codeph>
|
||
polls Hive Metastore (HMS) notification events at a configurable interval and processes
|
||
the following changes:
|
||
</p>
|
||
|
||
<note>
|
||
This is a preview feature in <keyword keyref="impala33_full"/> and <keyword keyref="impala40"/>
|
||
It is generally available and enabled by default from <keyword keyref="impala41"/> onwards.
|
||
</note>
|
||
|
||
<ul>
|
||
<li>
|
||
Invalidates the tables when it receives the <codeph>ALTER TABLE</codeph> event.
|
||
</li>
|
||
|
||
<li>
|
||
Refreshes the partition when it receives the <codeph>ALTER</codeph>,
|
||
<codeph>ADD</codeph>, or <codeph>DROP</codeph> partitions.
|
||
</li>
|
||
|
||
<li>
|
||
Adds the tables or databases when it receives the <codeph>CREATE TABLE</codeph> or
|
||
<codeph>CREATE DATABASE</codeph> events.
|
||
</li>
|
||
|
||
<li>
|
||
Removes the tables from <codeph>catalogd</codeph> when it receives the <codeph>DROP
|
||
TABLE</codeph> or <codeph>DROP DATABASE</codeph> events.
|
||
</li>
|
||
|
||
<li>
|
||
Refreshes the table and partitions when it receives the <codeph>INSERT</codeph>
|
||
events.
|
||
<p>
|
||
If the table is not loaded at the time of processing the <codeph>INSERT</codeph>
|
||
event, the event processor does not need to refresh the table and skips it.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
Changes the database and updates <codeph>catalogd</codeph> when it receives the
|
||
<codeph>ALTER DATABASE</codeph> events. The following changes are supported. This
|
||
event does not invalidate the tables in the database.
|
||
<ul>
|
||
<li>
|
||
Change the database properties
|
||
</li>
|
||
|
||
<li>
|
||
Change the comment on the database
|
||
</li>
|
||
|
||
<li>
|
||
Change the owner of the database
|
||
</li>
|
||
|
||
<li>
|
||
Change the default location of the database
|
||
<p>
|
||
Changing the default location of the database does not move the tables of that
|
||
database to the new location. Only the new tables which are created subsequently
|
||
use the default location of the database in case it is not provided in the
|
||
create table statement.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
This feature is controlled by the
|
||
<codeph>‑‑hms_event_polling_interval_s</codeph> flag. Start the
|
||
<codeph>catalogd</codeph> with the <codeph>‑‑hms_event_polling_interval_s</codeph>
|
||
flag set to a positive double value to enable the feature and set the polling frequency in
|
||
seconds. We recommend the value between 1.0 to 5.0 seconds.
|
||
</p>
|
||
|
||
<p>
|
||
The following use cases are not supported:
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
When you bypass HMS and add or remove data into table by adding files directly on the
|
||
filesystem, HMS does not generate the <codeph>INSERT</codeph> event, and the event
|
||
processor will not invalidate the corresponding table or refresh the corresponding
|
||
partition.
|
||
<p>
|
||
It is recommended that you use the <codeph>LOAD DATA</codeph> command to do the data
|
||
load in such cases, so that event processor can act on the events generated by the
|
||
<codeph>LOAD</codeph> command.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
The Spark API that saves data to a specified location does not generate events in HMS,
|
||
thus is not supported. For example:
|
||
<codeblock>Seq((1, 2)).toDF("i", "j").write.save("/user/hive/warehouse/spark_etl.db/customers/date=01012019")</codeblock>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
This feature is turned off by default with the
|
||
<codeph>‑‑hms_event_polling_interval_s</codeph> flag set to
|
||
<codeph>0</codeph>.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
<concept id="configure_event_based_metadata_sync">
|
||
|
||
<title>Configure HMS for Event Based Automatic Metadata Sync</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
To use the HMS event based metadata sync:
|
||
</p>
|
||
|
||
<ol>
|
||
<li>
|
||
Add the following entries to the <codeph>hive-site.xml</codeph> of the Hive
|
||
Metastore service.
|
||
<codeblock> <property>
|
||
<name>hive.metastore.transactional.event.listeners</name>
|
||
<value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
|
||
</property>
|
||
<property>
|
||
<name>hive.metastore.dml.events</name>
|
||
<value>true</true>
|
||
</property></codeblock>
|
||
</li>
|
||
|
||
<li>
|
||
Save <codeph>hive-site.xml</codeph>.
|
||
</li>
|
||
|
||
<li>
|
||
Set the <codeph>hive.metastore.dml.events</codeph> configuration key to
|
||
<codeph>true</codeph> in HiveServer2 service's <codeph>hive-site.xml</codeph>. This
|
||
configuration key needs to be set to <codeph>true</codeph> in both Hive services,
|
||
HiveServer2 and Hive Metastore.
|
||
</li>
|
||
|
||
<li>
|
||
If applicable, set the <codeph>hive.metastore.dml.events</codeph> configuration key
|
||
to <codeph>true</codeph> in <codeph>hive-site.xml</codeph> used by the Spark
|
||
applications (typically, <codeph>/etc/hive/conf/hive-site.xml</codeph>) so that the
|
||
<codeph>INSERT</codeph> events are generated when the Spark application inserts data
|
||
into existing tables and partitions.
|
||
</li>
|
||
|
||
<li>
|
||
Restart the HiveServer2, Hive Metastore, and Spark (if applicable) services.
|
||
</li>
|
||
</ol>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="disable_event_based_metadata_sync">
|
||
|
||
<title>Disable Event Based Automatic Metadata Sync</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
When the <codeph>‑‑hms_event_polling_interval_s</codeph> flag is set to a non-zero
|
||
value for your <codeph>catalogd</codeph>, the event-based automatic invalidation is
|
||
enabled for all databases and tables. If you wish to have the fine-grained control on
|
||
which tables or databases need to be synced using events, you can use the
|
||
<codeph>impala.disableHmsSync</codeph> property to disable the event processing at the
|
||
table or database level.
|
||
</p>
|
||
|
||
<p>
|
||
When you add the <codeph>DBPROPERTIES</codeph> or <codeph>TBLPROPERTIES</codeph> with
|
||
the <codeph>impala.disableHmsSync</codeph> key, the HMS event based sync is turned on
|
||
or off. The value of the <codeph>impala.disableHmsSync</codeph> property determines if
|
||
the event processing needs to be disabled for a particular table or database.
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
If <codeph>'impala.disableHmsSync'='true'</codeph>, the events for that table or
|
||
database are ignored and not synced with HMS.
|
||
</li>
|
||
|
||
<li>
|
||
If <codeph>'impala.disableHmsSync'='false'</codeph> or if
|
||
<codeph>impala.disableHmsSync</codeph> is not set, the automatic sync with HMS is
|
||
enabled if the <codeph>‑‑hms_event_polling_interval_s</codeph> global flag is
|
||
set to non-zero.
|
||
</li>
|
||
</ul>
|
||
|
||
<ul>
|
||
<li>
|
||
To disable the event based HMS sync for a new database, set the
|
||
<codeph>impala.disableHmsSync</codeph> database properties in Hive as currently,
|
||
Impala does not support setting database properties:
|
||
<codeblock>CREATE DATABASE <name> WITH DBPROPERTIES ('impala.disableHmsSync'='true');</codeblock>
|
||
</li>
|
||
|
||
<li>
|
||
To enable or disable the event based HMS sync for a table:
|
||
<codeblock>CREATE TABLE <name> WITH TBLPROPERTIES ('impala.disableHmsSync'='true' | 'false');</codeblock>
|
||
</li>
|
||
|
||
<li>
|
||
To change the event based HMS sync at the table level:
|
||
<codeblock>ALTER TABLE <name> WITH TBLPROPERTIES ('impala.disableHmsSync'='true' | 'false');</codeblock>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
When both table and database level properties are set, the table level property takes
|
||
precedence. If the table level property is not set, then the database level property
|
||
is used to evaluate if the event needs to be processed or not.
|
||
</p>
|
||
|
||
<p>
|
||
If the property is changed from <codeph>true</codeph> (meaning events are skipped) to
|
||
<codeph>false</codeph> (meaning events are not skipped), you need to issue a manual
|
||
<codeph>INVALIDATE METADATA</codeph> command to reset event processor because it
|
||
doesn't know how many events have been skipped in the past and cannot know if the
|
||
object in the event is the latest. In such a case, the status of the event processor
|
||
changes to <codeph>NEEDS_INVALIDATE</codeph>.
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
<concept id="event_processor_metrics">
|
||
|
||
<title>Metrics for Event Based Automatic Metadata Sync</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
You can use the web UI of the <codeph>catalogd</codeph> to check the state of the
|
||
automatic invalidate event processor.
|
||
</p>
|
||
|
||
<p>
|
||
Under the web UI, there are two pages that presents the metrics for HMS event
|
||
processor that is responsible for the event based automatic metadata sync.
|
||
<ul>
|
||
<li>
|
||
<b>/metrics#events</b>
|
||
</li>
|
||
|
||
<li>
|
||
<b>/events</b>
|
||
<p>
|
||
This provides a detailed view of the metrics of the event processor, including
|
||
min, max, mean, median, of the durations and rate metrics for all the counters
|
||
listed on the <b>/metrics#events</b> page.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
|
||
</conbody>
|
||
|
||
<concept id="concept_gch_xzm_1hb">
|
||
|
||
<title>/metrics#events Page</title>
|
||
|
||
<conbody>
|
||
|
||
<p>
|
||
The <b>/metrics#events</b> page provides the following metrics about the HMS event
|
||
processor.
|
||
</p>
|
||
|
||
<table id="events-tbl">
|
||
<tgroup cols="2">
|
||
<colspec colnum="1" colname="col1" colwidth="1*"/>
|
||
<colspec colnum="2" colname="col3" colwidth="2.58*"/>
|
||
<thead>
|
||
<row>
|
||
<entry>
|
||
Name
|
||
</entry>
|
||
<entry>
|
||
Description
|
||
</entry>
|
||
</row>
|
||
</thead>
|
||
<tbody>
|
||
<row>
|
||
<entry>
|
||
events-processor.avg-events-fetch-duration
|
||
</entry>
|
||
<entry>
|
||
Average duration to fetch a batch of events and process it.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.avg-events-process-duration
|
||
</entry>
|
||
<entry>
|
||
Average time taken to process a batch of events received from the Metastore.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.events-received
|
||
</entry>
|
||
<entry>
|
||
Total number of the Metastore events received.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.events-received-15min-rate
|
||
</entry>
|
||
<entry>
|
||
Exponentially weighted moving average (EWMA) of number of events received in
|
||
last 15 min.
|
||
|
||
<p>
|
||
This rate of events can be used to determine if there are spikes in event
|
||
processor activity during certain hours of the day.
|
||
</p>
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.events-received-1min-rate
|
||
</entry>
|
||
<entry>
|
||
Exponentially weighted moving average (EWMA) of number of events received in
|
||
last 1 min.
|
||
|
||
<p>
|
||
This rate of events can be used to determine if there are spikes in event
|
||
processor activity during certain hours of the day.
|
||
</p>
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.events-received-5min-rate
|
||
</entry>
|
||
<entry>
|
||
Exponentially weighted moving average (EWMA) of number of events received in
|
||
last 5 min.
|
||
|
||
<p>
|
||
This rate of events can be used to determine if there are spikes in event
|
||
processor activity during certain hours of the day.
|
||
</p>
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.events-skipped
|
||
</entry>
|
||
<entry>
|
||
Total number of the Metastore events skipped.
|
||
|
||
<p>
|
||
Events can be skipped based on certain flags are table and database level.
|
||
You can use this metric to make decisions, such as:
|
||
<ul>
|
||
<li>
|
||
If most of the events are being skipped, see if you might just turn
|
||
off the event processing.
|
||
</li>
|
||
|
||
<li>
|
||
If most of the events are not skipped, see if you need to add flags on
|
||
certain databases.
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.outstanding-event-count
|
||
</entry>
|
||
<entry>
|
||
Total number of outstanding events to be processed on db event executors
|
||
and table event executors when
|
||
<codeph>--enable_hierarchical_event_processing</codeph> flag is
|
||
<codeph>true</codeph>.
|
||
</entry>
|
||
</row>
|
||
<row>
|
||
<entry>
|
||
events-processor.status
|
||
</entry>
|
||
<entry>
|
||
Metastore event processor status to see if there are events being received
|
||
or not. Possible states are:
|
||
|
||
<ul>
|
||
<li>
|
||
<codeph>PAUSED</codeph>
|
||
<p>
|
||
The event processor is paused because catalog is being reset
|
||
concurrently.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>ACTIVE</codeph>
|
||
<p>
|
||
The event processor is scheduled at a given frequency.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>ERROR</codeph>
|
||
</li>
|
||
|
||
<li>
|
||
The event processor is in error state and event processing has stopped.
|
||
Needs a manual <codeph>INVALIDATE</codeph> command to reset the state.
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>NEEDS_INVALIDATE</codeph>
|
||
<p>
|
||
The event processor could not resolve certain events and needs a
|
||
manual <codeph>INVALIDATE</codeph> command to reset the state.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>STOPPED</codeph>
|
||
<p>
|
||
The event processing has been shutdown. No events will be processed.
|
||
</p>
|
||
</li>
|
||
|
||
<li>
|
||
<codeph>DISABLED</codeph>
|
||
<p>
|
||
The event processor is not configured to run.
|
||
</p>
|
||
</li>
|
||
</ul>
|
||
</entry>
|
||
</row>
|
||
</tbody>
|
||
</tgroup>
|
||
</table>
|
||
|
||
</conbody>
|
||
|
||
</concept>
|
||
|
||
</concept>
|
||
|
||
</concept>
|
||
|
||
</concept>
|