At present, metastore event processor is single threaded. Notification
events are processed sequentially with a maximum limit of 1000 events
fetched and processed in a single batch. Multiple locks are used to
address the concurrency issues that may arise when catalog DDL
operation processing and metastore event processing tries to
access/update the catalog objects concurrently. Waiting for a lock or
file metadata loading of a table can slow the event processing and can
affect the processing of other events following it. Those events may
not be dependent on the previous event. Altogether it takes a very
long time to synchronize all the HMS events.
Existing metastore event processing is turned into multi-level
event processing with enable_hierarchical_event_processing flag. It
is not enabled by default. Idea is to segregate the events based on
their dependency, maintain the order of events as they occur within
the dependency and process them independently as much as possible.
Following 3 main classes represents the three level threaded event
processing.
1. EventExecutorService
It provides the necessary methods to initialize, start, clear,
stop and process the metastore events processing in hierarchical
mode. It is instantiated from MetastoreEventsProcessor and its
methods are invoked from MetastoreEventsProcessor. Upon receiving
the event to process, EventExecutorService queues the event to
appropriate DbEventExecutor for processing.
2. DbEventExecutor
An instance of this class has an execution thread, manage events
of multiple databases with DbProcessors. An instance of DbProcessor
is maintained to store the context of each database within the
DbEventExecutor. On each scheduled execution, input events on
DbProcessor are segregated to appropriate TableProcessors for the
event processing and also process the database events that are
eligible for processing.
Once a DbEventExecutor is assigned to a database, a DbProcessor
is created. And the subsequent events belonging to the database
are queued to same DbEventExecutor thread for further processing.
Hence, linearizability is ensured in dealing with events within
the database. Each instance of DbEventExecutor has a fixed list
of TableEventExecutors.
3. TableEventExecutor
An instance of this class has an execution thread, processes
events of multiple tables with TableProcessors. An instance of
TableProcessor is maintained to store context of each table within
a TableEventExecutor. On each scheduled execution, events from
TableProcessors are processed.
Once a TableEventExecutor is assigned to table, a TableProcessor
is created. And the subsequent table events are processed by same
TableEventExecutor thread. Hence, linearizability is guaranteed
in processing events of a particular table.
- All the events of a table are processed in the same order they
have occurred.
- Events of different tables are processed in parallel when those
tables are assigned to different TableEventExecutors.
Following new events are added:
1. DbBarrierEvent
This event wraps a database event. It is used to synchronize all
the TableProcessors belonging to database before processing the
database event. It acts as a barrier to restrict the processing
of table events that occurred after the database event until the
database event is processed on DbProcessor.
2. RenameTableBarrierEvent
This event wraps an alter table event for rename. It is used to
synchronize the source and target TableProcessors to
process the rename table event. It ensures the source
TableProcessor removes the table first and then allows the target
TableProcessor to create the renamed table.
3. PseudoCommitTxnEvent and PseudoAbortTxnEvent
CommitTxnEvent and AbortTxnEvent can involve multiple tables in
a transaction and processing these events modifies multiple table
objects. Pseudo events are introduced such that a pseudo event is
created for each table involved in the transaction and these
pseudo events are processed independently at respective
TableProcessors.
Following new flags are introduced:
1. enable_hierarchical_event_processing
To enable the hierarchical event processing on catalogd.
2. num_db_event_executors
To set the number of database level event executors.
3. num_table_event_executors_per_db_event_executor
To set the number of table level event executors within a
database event executor.
4. min_event_processor_idle_ms
To set the minimum time to retain idle db processors and table
processors on the database event executors and table event
executors respectively, when they do not have events to process.
5. max_outstanding_events_on_executors
To set the limit of maximum outstanding events to process on
event executors.
Changed hms_event_polling_interval_s type from int to double to support
millisecond precision interval
TODOs:
1. We need to redefine the lag in the hierarchical processing mode.
2. Need to have a mechanism to capture the actual event processing time
in hierarchical processing mode. Currently, with
enable_hierarchical_event_processing as true, lastSyncedEventId_ and
lastSyncedEventTimeSecs_ are updated upon event dispatch to
EventExecutorService for processing on respective DbEventExecutor
and/or TableEventExecutor. So lastSyncedEventId_ and
lastSyncedEventTimeSecs_ doesn't actually mean events are processed.
3. Hierarchical processing mode currently have a mechanism to show the
total number of outstanding events on all the db and table executors
at the moment. Need to enhance observability further with this mode.
Filed a jira[IMPALA-13801] to fix them.
Testing:
- Executed existing end to end tests.
- Added fe and end-to-end tests with enable_hierarchical_event_processing.
- Added event processing performance tests.
- Have executed the existing tests with hierarchical processing
mode enabled. lastSyncedEventId_ is now used in the new feature of
sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when
hierarchical processing mode is enabled because lastSyncedEventId_ do
not actually mean event is processed in this mode. This need to be
fixed/verified with above jira[IMPALA-13801].
Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Reviewed-on: http://gerrit.cloudera.org:8080/21031
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Generating HTML or a PDF of Apache Impala Documentation
Prerequisites
Make sure that you have a recent version of a Java JDK installed and that your JAVA_HOME environment variable is set. This procedure has been tested with JDK 1.8.0. See Setting JAVA_HOME at the end of these instructions.
Download Docs Source
- There are two ways to obtain docs sources.
-
Clone the whole repository. Open a terminal window and run the following commands to get the whole Impala repository from Git and go to the docs folder:
git clone https://gitbox.apache.org/repos/asf/impala.git cd <local_directory> git checkout master cd docs/Where
masteris the branch where Impala documentation source files are uploaded. -
Clone only the docs directory. Open a terminal window and run the following commands to get only the Impala documentation source files from Git:
git init impala_docs cd impala_docs git remote add origin https://gitbox.apache.org/repos/asf/impala.git git sparse-checkout set docs/ git pull origin master cd docs/You'll see only the 'docs/' sub-directory is downloaded.
-
Download DITA Open Toolkit
-
Download the DITA Open Toolkit version 2.3.3 from the DITA Open Toolkit web site:
https://github.com/dita-ot/dita-ot/releases/download/2.3.3/dita-ot-2.3.3.zip
Note: A DITA-OT 2.3.3 User Guide is included in the toolkit. Look for
userguide.pdfin thedocdirectory of the toolkit after you extract it. For example, if you extract the toolkit package to the/Users/<username>/DITA-OTdirectory on Mac OS, you will find theuserguide.pdfat the following location:/Users/<username>/DITA-OT/doc/userguide.pdf
Add dita Executable to Your PATH
- Identify the directory into which you extracted DITA-OT. For this
exercise, we'll assume it's
/Users/<username>/DITA-OT - Find your
.bash_profile. On Mac OS X, it is probably/Users/<username>/.bash_profile. - Edit your
<path_to_bash_profile>/.bash_profilefile and add the following lines to the end of the file.Save the file.# Add dita to path export PATH="/Users/<username>/DITA-OT/bin:$PATH" - Open a new terminal, or run
source <path_to_bash_profile>/.bash_profile. - Verify
ditais in yourPATH. A command likewhich ditashould print the location of theditaexecutable, like:$ which dita /Users/<username>/DITA-OT/bin/dita
Verify dita Executable Can Run
In a terminal, try dita --help. You should get brief usage, like:
Usage: dita -i <file> -f <name> [options]
or: dita -install [<file>]
or: dita -uninstall <id>
or: dita -help
or: dita -version
Arguments:
-i, -input <file> input file
-f, -format <name> output format (transformation type)
-install [<file>] install plug-in from a ZIP file or reload plugins
-uninstall <id> uninstall plug-in with the ID
-h, -help print this message
-version print version information and exit
Options:
-o, -output <dir> output directory
-filter <file> filter and flagging file
-t, -temp <dir> temporary directory
-v, -verbose verbose logging
-d, -debug print debugging information
-l, logfile <file> use given file for log
-D<property>=<value> use value for given property
-propertyfile <name> load all properties from file with -D
properties taking precedence
If you don't get this, or you get an error, see Setting JAVA_HOME and Troubleshooting at the end of these instructions.
Oneshot Docs Build
The easiest way to build the docs is to run make from the docs/
directory corresponding to your git clone. It takes about 1 minute.
This works because the make uses the provided Makefile to call
dita properly.
Docs will end up in docs/build (both HTML and PDF).
Details, Advanced Usage
-
In the directory where you cloned the Impala documentation files, you will find the following important configuration files in the
docssubdirectory. These files are used to convert the XML source you downloaded from the Apache site to PDF and HTML:impala.ditamap: Tells the DITA Open Toolkit what topics to include in the Impala User/Administration Guide. This guide also includes the Impala SQL Reference.impala_html.ditaval: Further defines what topics to include in the Impala HTML output.impala_pdf.ditaval: Further defines what topics to include in the Impala PDF output.
-
Run one of the following commands, depending on what you want to generate:
-
To generate HTML output of the Impala User and Administration Guide, which includes the Impala SQL Reference, run the following command:
dita -input <path_to_impala.ditamap> -format html5 \ -output <path_to_build_output_directory> \ -filter <path_to_impala_html.ditaval> -
To generate PDF output of the Impala User and Administration Guide, which includes the Impala SQL Reference, run the following command:
dita -input <path_to_impala.ditamap> -format pdf \ -output <path_to_build_output_directory> \ -filter <path_to_impala_pdf.ditaval>
Note: For a description of all command-line options, see the DITA Open Toolkit User Guide in the
docdirectory of your downloaded DITA Open Toolkit. -
Setting JAVA_HOME
Set your JAVA_HOME environment variable to tell your computer where to find the Java executable file. For example, to set your JAVA_HOME environment on Mac OS X when you have the 1.8.0_101 version of the Java Development Kit (JDK) installed and you are using the Bash version 3.2 shell, perform the following steps:
-
Find your
.bash_profile. On Mac OS X, it is probably/Users/<username>/.bash_profile. Edit your<path_to_bash_profile>/.bash_profilefile and add the following lines to the end of the file.# Set JAVA_HOME JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home export JAVA_HOMEWhere
jdk1.8.0_101.jdkis the version of JDK that you have installed. For example, if you have installedjdk1.8.0_102.jdk, you would use that value instead. -
Open a new terminal, or run
source <path_to_bash_profile>/.bash_profile. -
Test to make sure you have set your JAVA_HOME correctly:
-
Open a terminal window and type:
$JAVA_HOME/bin/java -version -
Press return. If you see something like the following:
java version "1.8.0_101" Java(TM) 2 Runtime Environment, Standard Edition (build 1.8.0_101-b06-284) Java HotSpot (TM) Client VM (build 1.8.0_101-133, mixed mode, sharing)Then you've successfully set your JAVA_HOME environment variable to the binary stored in
/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home.Note: The exact version and build number on your system may differ. The point is you want a message like the above.
-
Troubleshooting
Ant
If you're trying to use DITA-OT to build docs and you get an exception like this
java.lang.NoSuchMethodError: org.apache.tools.ant.Main: method <init>()V not found
at org.dita.dost.invoker.Main.<init>(Main.java:418)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:279)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
... your CLASSPATH may be interfering with DITA-OT's ability to find
the proper Ant. While you're free to fix the CLASSPATH yourself, it
may be easier just to run
unset CLASSPATH
and try again. This will use the libraries and Ant provided by the DITA-OT package.