PartitionDeltaUpdater has two sub-classes, PartNameBasedDeltaUpdater and
PartBasedDeltaUpdater. They are used in reloading metadata of a table.
Their constructors invoke HMS RPCs which could be slow and should be
tracked in the catalog timeline.
This patch adds missing timeline items for those HMS RPCs.
Tests:
- Added e2e tests
Change-Id: Id231c2b15869aac2dae3258817954abf119da802
Reviewed-on: http://gerrit.cloudera.org:8080/22917
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_ranger.py is a custom cluster test consisting of 41 test methods.
Each test method require minicluster restart. With IMPALA-13503, we can
reorganize TestRanger class into 3 separate test class:
TestRangerIndependent, TestRangerLegacyCatalog, and
TestRangerLocalCatalog. Both TestRangerLegacyCatalog and
TestRangerLocalCatalog can maintain the same minicluster without
restarting it in between.
Testing:
- Run and pass test_ranger.py in exhaustive mode.
- Confirmed that no test is missing after reorganization.
Change-Id: I01ff2b3e98fccfffa8bcdfe1177be98634363b56
Reviewed-on: http://gerrit.cloudera.org:8080/22905
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_rename_drop fail with NPE after IMPALA-14042. This is because
CatalogServiceCatalog.renameTable() return null for not finding the
database of oldTableName. This patch change renameTable() to return
Pair.create(null, null) for that scenario.
Refactor test_rename_drop slightly to ensure that invalidating the
renamed table and dropping it are successful.
Testing:
- Add checkNotNull precondition in
CatalogOpExecutor.alterTableOrViewRename()
- Increase catalogd_table_rename_delay delay to 6s to ensure that DROP
query happen in Catalogd before renameTable() called. Manually
observed that No NPE is shown anymore.
Change-Id: I7a421a71cf3703290645e85180de8e9d5e86368a
Reviewed-on: http://gerrit.cloudera.org:8080/22899
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
cancel_query_and_validate_state is a helper method used to test query
cancellation with concurrent fetch. It is still use beeswax client by
default.
This patch change the test method to use HS2 protocol by default. The
changes include following:
1. Set TGetOperationStatusResp.operationState to
TOperationState::ERROR_STATE if returning abnormally.
2. Use separate MinimalHS2Client for
(execute_async, fetch, get_runtime_profile) vs cancel vs close.
Cancellation through KILL QUERY still instantiate new
ImpylaHS2Connection client.
3. Implement required missing methods in MinimalHS2Client.
4. Change MinimalHS2Client logging pattern to match with other clients.
Testing:
Pass test_cancellation.py and TestResultSpoolingCancellation in core
exploration mode. Also fix default_test_protocol to HS2 for these tests.
Change-Id: I626a1a06eb3d5dc9737c7d4289720e1f52d2a984
Reviewed-on: http://gerrit.cloudera.org:8080/22853
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
maxRowsInHeaps calculation may overflow because it use simple
multiplication. This patch fix the bug by calculating it using
checkedMultiply(). A broader refactoring will be done by IMPALA-14071.
Testing:
Add ee tests TestTopNHighNdv that exercise the issue.
Change-Id: Ic6712b94f4704fd8016829b2538b1be22baaf2f7
Reviewed-on: http://gerrit.cloudera.org:8080/22896
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestConcurrentRename.test_rename_drop has been flaky because the
INVALIDATE query may arrive ahead of the ALTER TABLE RENAME query. This
patch deflake it by changing the sleep with admission control wait and
catalog version check. The first INVALIDATE query will only start after
catalog version increase since CREATE TABLE query.
Testing:
Loop the test 50x and pass them all.
Change-Id: I2539d5755aae6d375400b9a1289a658d0e7ba888
Reviewed-on: http://gerrit.cloudera.org:8080/22876
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
'transient_lastDdlTime' table property was not updated for Iceberg
tables before this change. Now it is updated after DDL operations
including DROP PARTITION as well.
Renaming an Iceberg table is an exception:
Iceberg does not keep track of the table name in the metadata files,
so there is no Iceberg transaction to change it.
The table name is a concept that exists only in the catalog.
If we rename the table, we only edit our catalog entry, but the metadata
stored on the file system - the table's state - does not change.
Therefore renaming an Iceberg table does not change the
'transient_lastDdlTime' table property because rename is a
catalog-level operation for Iceberg tables, and not table-level.
Testing:
- added managed and non-managed Iceberg table DDL tests to
test_last_ddl_update.py
Change-Id: I7e5f63b50bd37c80faf482c4baf4221be857c54b
Reviewed-on: http://gerrit.cloudera.org:8080/22831
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for JS code analysis and linting to webUI
scripts using ESLint.
Support to enforce code style and quality is partcularly beneficial,
as the codebase for client-side scripts is consistently growing.
This has been implemented to work alongside other code style enforcement
rules present within 'critique-gerrit-review.py', which runs on the
existing jenkins job 'gerrit-auto-critic', to produce gerrit comments.
In the case of webUI scripts, ESLint's code analysis and linting checks
are performed to produce these comments.
As a shared NodeJS installation can be used for JS tests as well as
linting, a seperate common script "bin/nodejs/setup_nodejs.sh"
has been added for assiting with the NodeJS installation.
To ensure quicker run times for the jenkins job, NodeJS tarball is
cached within "${HOME}/.cache" directory, after the initial installation.
ESLint's packages and dependencies have been made to be cached
using NPM's own package management and are also cached locally.
NodeJS and ESLint dependencies are retrieved and executed, only if
there are any changes within ".js" files within the patchset,
and run with minimal overhead.
After analysis, comments are generated for all the violations according
to the specified rules.
A custom formatter has been added to extract, format and filter the
violations in JSON form.
These generated code style violations are formatted into the required
JSON form according to gerrit's REST API, similar to comments generated
by flake8. These are then posted to gerrit as comments
on the respective patchset from jenkins over SSH.
The following code style and quality rules have been added using ESLint.
- Disallow unused variables
- Enforce strict equality (=== and !==)
- Require curly braces for all control statements (if, while, etc.)
- Enforce semicolons at the end of statements
- Enforce double quotes for strings
- Set maximum line length to 90
- Disallow `var`, use `let` or `const`
- Prefer `const` where possible
- Disallow multiple empty lines
- Enforce spacing around infix operators (eg. +, =)
- Disallow the use of undeclared variables
- Require parentheses around arrow function arguments
- Require a space before blocks
- Enforce consistent spacing inside braces
- Disallow shadowing variables declared in the outer scope
- Disallow constant conditions in if statements, loops, etc
- Disallow unnecessary parentheses in expressions
- Disallow duplicate arguments in function definitions
- Disallow duplicate keys in object literals
- Disallow unreachable code after return, throw, continue, etc
- Disallow reassigning function parameters
- Require functions to always consistently return or not return at all
- Enforce consistent use of dot notation wherever possible
- Disallow multiple empty lines
- Enforce spacing around the colon in object literal properties
- Disallow optional chaining, where undefined values are not allowed
The required linting packages have been added as dependencies in the
"www/scripts" directory.
All the test scripts and related dependencies have been moved to -
$IMPALA_HOME/tests/webui/js_tests.
All the custom ESLint formatter scripts and related dependencies
have been moved to -
$IMPALA_HOME/tests/webui/linting.
A combination of NodeJS's 'prefix' argument and NODE_PATH environmental
variable is being used to seperate the dependencies and webUI scripts.
To support running the tests from a remote directory(i.e. tests/webui),
by modifying the required base paths.
The JS scripts need to be updated according to these linting rules,
as per IMPALA-13986.
Change-Id: Ieb3d0a9221738e2ac6fefd60087eaeee4366e33f
Reviewed-on: http://gerrit.cloudera.org:8080/21970
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch modify test_change_parquet_column_type to let it pass
regardless of the test JDK version. The assertion is changed from using
string match to regex.
Testing:
Run and pass the test with both JDK8 and JDK17.
Change-Id: I5bd3eebe7b1e52712033dda488f0c19882207f9d
Reviewed-on: http://gerrit.cloudera.org:8080/22874
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With this patch Impala skips reloading Iceberg tables when metadata
JSON file is the same, as this means that the table is essentially
unchanged.
This can help in situations when the event processor is lagging behind
and we have an Iceberg table that is updated frequently. Imagine the
case when Impala gets 100 events for an Iceberg table. In this case
after processing the first event, our internal representation of
the Iceberg table is already up-to-date, there is no need to do the
reload 100 times.
We cannot use the internal icebergApiTable_'s metadata location,
as the following statement might silently refresh the metadata
in 'current()':
icebergApiTable_.operations().current().metadataFileLocation()
To guarantee that we check against the actual loaded metadata
this patch introduces a new member to store the metadata location.
Testing
* added e2e tests for REFRESH, also for event processing
Change-Id: I16727000cb11d1c0591875a6542d428564dce664
Reviewed-on: http://gerrit.cloudera.org:8080/22432
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
ImpalaTestSuite.__restore_query_options() attempt to restore client's
configuration with what it understand as the "default" query option.
Since IMPALA-13930, ImpalaConnection.get_default_configuration() parse
the default query option from TQueryOption fields. Therefore, it might
not respect server's default that comes from --default_query_options
flag.
ImpalaTestSuite.__restore_query_options() should simply unset any
configuration that previously set by running SET query like this:
SET query_option="";
This patch also change execute_query_using_vector() to simply unset
client's configuration.
Follow up cleanup will be tracked through IMPALA-14060.
Testing:
Run and pass test_queries.py::TestQueries.
Change-Id: I884986b9ecbcabf0b34a7346220e6ea4142ca923
Reviewed-on: http://gerrit.cloudera.org:8080/22862
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11604 (part 2) changes how many instances to create in
Scheduler::CreateInputCollocatedInstances. This works when the left
child fragment of a parent fragment is distributed across nodes.
However, if the left child fragment instance is limited to only 1
node (the case of UNPARTITIONED fragment), the scheduler might
over-parallelize the parent fragment by scheduling too many instances in
a single node.
This patch attempts to mitigate the issue in two ways. First, it adds
bounding logic in PlanFragment.traverseEffectiveParallelism() to lower
parallelism further if the left (probe) side of the child fragment is
not well distributed across nodes.
Second, it adds TQueryExecRequest.max_parallelism_per_node to relay
information from Analyzer.getMaxParallelismPerNode() to the scheduler.
With this information, the scheduler can do additional sanity checks to
prevent Scheduler::CreateInputCollocatedInstances from
over-parallelizing a fragment. Note that this sanity check can also cap
MAX_FS_WRITERS option under a similar scenario.
Added ScalingVerdict enum and TRACE log it to show the scaling decision
steps.
Testing:
- Add planner test and e2e test that exercise the corner case under
COMPUTE_PROCESSING_COST=1 option.
- Manually comment the bounding logic in traverseEffectiveParallelism()
and confirm that the scheduler's sanity check still enforces the
bounding.
Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
Reviewed-on: http://gerrit.cloudera.org:8080/22840
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch attempt to stabilize TestFetch by using HS2 as test protocol.
test_rows_sent_counters is modified to use the default hs2_client.
test_client_fetch_time_stats and test_client_fetch_time_stats_incomplete
is modified to use MinimalHS2Connection that has more simpler mechanism
in terms of fetching (ImpylaHS2Connection always fetch 10240 rows at a
time).
Implemented minimal functions needed to wait for finished state and pull
runtime profile at MinimalHS2Connection.
Testing:
Loop the test 50 times and pass them all.
Change-Id: I52651df37a318357711d26d2414e025cce4185c3
Reviewed-on: http://gerrit.cloudera.org:8080/22847
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HS2 NULL_TYPE should be implemented using TStringValue.
However, due to incompatibility with Hive JDBC driver implementation
then, Impala choose to implement NULL type using TBoolValue (see
IMPALA-914, IMPALA-1370).
HIVE-4172 might be the root cause for such decision. Today, the Hive
JDBC (org.apache.hive.jdbc.HiveDriver) does not have that issue anymore,
as shown in this reproduction after applying this patch:
./bin/run-jdbc-client.sh -q "select null" -t NOSASL
Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
Executing: select null
----[START]----
NULL
----[END]----
Returned 1 row(s) in 0.343s
Thus, we can reimplement NULL_TYPE using TStringValue to match
HiveServer2 behavior.
Testing:
- Pass core tests.
Change-Id: I354110164b360013d9893f1eb4398c3418f80472
Reviewed-on: http://gerrit.cloudera.org:8080/22852
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds a startup flag so that by default the catalog server
will not consolidate the grant/revoke requests sent to the Ranger server
when there are multiple columns involved in the GRANT/REVOKE statement.
Testing:
- Added 2 end-to-end tests to make sure the grant/revoke requests
sent to the Ranger server would be consolidated only when the flag
is explicitly added when we start the catalog server.
Change-Id: I4defc59c048be1112380c3a7254ffa8655eee0af
Reviewed-on: http://gerrit.cloudera.org:8080/22833
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes an issue where EXEC_TIME_LIMIT_S was inaccurately
enforced by including the planning time in its countdown. The timer
for EXEC_TIME_LIMIT_S is now started only after the coordinator
reaches the "Ready to start on the backends" state, ensuring that
this time limit applies strictly to the execution phase.
This patch also adds a DebugAction PLAN_CREATE in the planning phase
for the testing purpose.
Tests:
Passed core tests.
Adds an ee testcase query_test/test_exec_time_limit.py.
Change-Id: I825e867f1c9a39a9097d1c97ee8215281a009d7d
Reviewed-on: http://gerrit.cloudera.org:8080/22837
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
At present, metastore event processor is single threaded. Notification
events are processed sequentially with a maximum limit of 1000 events
fetched and processed in a single batch. Multiple locks are used to
address the concurrency issues that may arise when catalog DDL
operation processing and metastore event processing tries to
access/update the catalog objects concurrently. Waiting for a lock or
file metadata loading of a table can slow the event processing and can
affect the processing of other events following it. Those events may
not be dependent on the previous event. Altogether it takes a very
long time to synchronize all the HMS events.
Existing metastore event processing is turned into multi-level
event processing with enable_hierarchical_event_processing flag. It
is not enabled by default. Idea is to segregate the events based on
their dependency, maintain the order of events as they occur within
the dependency and process them independently as much as possible.
Following 3 main classes represents the three level threaded event
processing.
1. EventExecutorService
It provides the necessary methods to initialize, start, clear,
stop and process the metastore events processing in hierarchical
mode. It is instantiated from MetastoreEventsProcessor and its
methods are invoked from MetastoreEventsProcessor. Upon receiving
the event to process, EventExecutorService queues the event to
appropriate DbEventExecutor for processing.
2. DbEventExecutor
An instance of this class has an execution thread, manage events
of multiple databases with DbProcessors. An instance of DbProcessor
is maintained to store the context of each database within the
DbEventExecutor. On each scheduled execution, input events on
DbProcessor are segregated to appropriate TableProcessors for the
event processing and also process the database events that are
eligible for processing.
Once a DbEventExecutor is assigned to a database, a DbProcessor
is created. And the subsequent events belonging to the database
are queued to same DbEventExecutor thread for further processing.
Hence, linearizability is ensured in dealing with events within
the database. Each instance of DbEventExecutor has a fixed list
of TableEventExecutors.
3. TableEventExecutor
An instance of this class has an execution thread, processes
events of multiple tables with TableProcessors. An instance of
TableProcessor is maintained to store context of each table within
a TableEventExecutor. On each scheduled execution, events from
TableProcessors are processed.
Once a TableEventExecutor is assigned to table, a TableProcessor
is created. And the subsequent table events are processed by same
TableEventExecutor thread. Hence, linearizability is guaranteed
in processing events of a particular table.
- All the events of a table are processed in the same order they
have occurred.
- Events of different tables are processed in parallel when those
tables are assigned to different TableEventExecutors.
Following new events are added:
1. DbBarrierEvent
This event wraps a database event. It is used to synchronize all
the TableProcessors belonging to database before processing the
database event. It acts as a barrier to restrict the processing
of table events that occurred after the database event until the
database event is processed on DbProcessor.
2. RenameTableBarrierEvent
This event wraps an alter table event for rename. It is used to
synchronize the source and target TableProcessors to
process the rename table event. It ensures the source
TableProcessor removes the table first and then allows the target
TableProcessor to create the renamed table.
3. PseudoCommitTxnEvent and PseudoAbortTxnEvent
CommitTxnEvent and AbortTxnEvent can involve multiple tables in
a transaction and processing these events modifies multiple table
objects. Pseudo events are introduced such that a pseudo event is
created for each table involved in the transaction and these
pseudo events are processed independently at respective
TableProcessors.
Following new flags are introduced:
1. enable_hierarchical_event_processing
To enable the hierarchical event processing on catalogd.
2. num_db_event_executors
To set the number of database level event executors.
3. num_table_event_executors_per_db_event_executor
To set the number of table level event executors within a
database event executor.
4. min_event_processor_idle_ms
To set the minimum time to retain idle db processors and table
processors on the database event executors and table event
executors respectively, when they do not have events to process.
5. max_outstanding_events_on_executors
To set the limit of maximum outstanding events to process on
event executors.
Changed hms_event_polling_interval_s type from int to double to support
millisecond precision interval
TODOs:
1. We need to redefine the lag in the hierarchical processing mode.
2. Need to have a mechanism to capture the actual event processing time
in hierarchical processing mode. Currently, with
enable_hierarchical_event_processing as true, lastSyncedEventId_ and
lastSyncedEventTimeSecs_ are updated upon event dispatch to
EventExecutorService for processing on respective DbEventExecutor
and/or TableEventExecutor. So lastSyncedEventId_ and
lastSyncedEventTimeSecs_ doesn't actually mean events are processed.
3. Hierarchical processing mode currently have a mechanism to show the
total number of outstanding events on all the db and table executors
at the moment. Need to enhance observability further with this mode.
Filed a jira[IMPALA-13801] to fix them.
Testing:
- Executed existing end to end tests.
- Added fe and end-to-end tests with enable_hierarchical_event_processing.
- Added event processing performance tests.
- Have executed the existing tests with hierarchical processing
mode enabled. lastSyncedEventId_ is now used in the new feature of
sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when
hierarchical processing mode is enabled because lastSyncedEventId_ do
not actually mean event is processed in this mode. This need to be
fixed/verified with above jira[IMPALA-13801].
Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Reviewed-on: http://gerrit.cloudera.org:8080/21031
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_hms_event_sync_basic is not a simple test that it actually tests
several kinds of statements in sequence.
This refactors it into smaller parallel tests so there are more
concurrent HMS events to be processed and easier to reveal bugs.
Renamed some tests to use shorter names.
Tests:
- Ran all parallel tests of TestEventSyncWaiting 32 times.
Change-Id: I8a2be548697f6259961b83dc91230306f38e03ad
Reviewed-on: http://gerrit.cloudera.org:8080/22829
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, Impala always assumes that the data in the binary columns of
JSON tables is base64 encoded. However, before HIVE-21240, Hive wrote
binary data to JSON tables without base64 encoding it, instead writing
it as escaped strings. After HIVE-21240, Hive defaults to base64
encoding binary data when writing to JSON tables and introduces the
serde property 'json.binary.format' to indicate the encoding method of
binary data in JSON tables.
To maintain consistency with Hive and avoid correctness issues caused by
reading data in an incorrect manner, this patch also introduces the
serde property 'json.binary.format' to specify the reading method for
binary data in JSON tables. Currently, this property supports reading in
either base64 or rawstring formats, same as Hive.
Additionally, this patch introduces a query option 'json_binary_format'
to achieve the same effect. This query option will only take effect for
JSON tables where the serde property 'json.binary.format' is not set.
The reading format of binary columns in JSON tables can be configured
globally by setting the 'default_query_options'. It should be noted that
the default value of 'json_binary_format' is 'NONE', and impala will
prohibit reading binary columns of JSON tables that either have
"no 'json.binary.format' set and 'json_binary_format' is 'NONE'" or
"an invalid 'json.binary.format' value set", and will provide an error
message to avoid using an incorrect format without the user noticing.
Testing:
- Enabled existing binary type E2E tests for JSON tables
- Added new E2E test for 'json.binary.format'
Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82
Reviewed-on: http://gerrit.cloudera.org:8080/22289
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When a db/table is removed in the catalog cache, catalogd assigns it a
new catalog version and put it into the deleteLog. This is used for the
catalog update thread to collect deletion updates. Once the catalog
update thread collects a range of updates, it triggers GC in the
deleteLog to clear items older than the last sent catalog version. The
deletions will be broadcasted by statestore to all the coordinators
eventually.
However, waitForHmsEvent requests is also a consumer of the deleteLog
and could be impacted by these GCs. waitForHmsEvent is a catalogd RPC
used by coordinators when a query wants to wait until the related
metadata is in synced with HMS. The response of waitForHmsEvent returns
the latest metadata including the deletions on related dbs/tables.
If the related deletions in deleteLog is GCed just before the
waitForHmsEvent request collects the results, they will be missing in
the response. Coordinator might keep using stale metadata of
non-existing dbs/tables.
This is a quick fix for the issue by postponing deleteLog GC in a
configurable number of topic updates, similar to what we have done on
the TopicUpdateLog. A thorough fix might need to carefully choose the
version to GC or let impalad waits for the deletions from statestore to
arrive.
A new flag, catalog_delete_log_ttl, is added for this. The deleteLog
items can survive for catalog_delete_log_ttl catalog updates. The
default is 60 so a deletion can survive for at least 120s. It should be
safe enough, i.e. the GCed deletions must have arrived in the impalad
side after 60 rounds of catalog updates, otherwise that's an abnormal
impalad and already has other more severe issues, e.g. lots of stale
tables due to metadata out of sync with catalogd.
Note that postponing deleteLog GCs might increase the memory
consumption. But since most of its memory is used by db/table/partition
names, the memory usage might still be trivial comparing to other
metadata like file descriptors and incremental stats in lived catalog
objects.
This patch also removed some unused imports.
Tests:
- Added e2e test with a debug action to reproduce the issue. Ran the
test 100 times. Without the fix, it consistently fails when runs for
2-3 times.
Change-Id: I2441440bca2b928205dd514047ba742a5e8bf05e
Reviewed-on: http://gerrit.cloudera.org:8080/22816
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestAllowIncompleteData.test_too_many_files depends on
tpch_parquet.lineitem to have exactly 3 data files. This is false in
erasure coding builds in which tpch_parquet.lineitem has only 2 data
files.
This fixes the test to use dedicate tables created in the test.
Change-Id: I28cec8ec4bc59f066aa15a7243b7163639706cc7
Reviewed-on: http://gerrit.cloudera.org:8080/22824
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Handles the error "Table/view rename succeeded in the Hive Metastore,
but failed in Impala's Catalog Server" rather than failing the table
rename. This error happens when catalog state catches up to the alter
event from our alter_table RPC to HMS before we renameTable explicitly
in the catalog. The catalog can update independently due to a concurrent
'invalidate metadata' call.
In that case we use the oldTbl definition we already have - updated from
the delete log if possible - and fetch the new table definition with
invalidateTable to continue, automating most of the work that the error
message suggested users do via 'invalidate metadata <tbl>'.
Updated the test_concurrent_ddls test to remove handle_rename_failure
and ran the tests a dozen times. Adds concurrency tests with
simultaneous rename and invalidate metadata that previously would fail.
Change-Id: Ic2a276b6e5ceb35b7f3ce788cc47052387ae8980
Reviewed-on: http://gerrit.cloudera.org:8080/22807
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In beeswax all statements with the exception of USE print
'Fetched X row(s) in Ys', while in HS2 some statements (REFRESH,
INVALIDATE) metadata does not print it. While these statements always
return 0 rows, the amount of time spent with the statement can be
useful.
This patch modifies add impala-shell to let it print elapsed time for
that query, even if query is not expected to return result metadata.
Added --beeswax_compat_num_rows option in impala-shell. It default to
False. If this option is set (True), 'Fetched 0 row(s) in' will be
printed for all Impala protocol, just like beeswax. One exception for
this is USE query, which will remain silent.
Testing:
- Added test_beeswax_compat_num_rows in test_shell_interactive.py.
- Pass test_shell_interactive.py.
Change-Id: Id76ede98c514f73ff1dfa123a0d951e80e7508b4
Reviewed-on: http://gerrit.cloudera.org:8080/22813
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When calling planFiles() on an Iceberg table, it can give us some
metrics like total planning time, number of data/delete files and
manifests, how many of these could be skipped etc.
This change integrates these metrics into the query profile, under the
"Frontend" section. These metrics are per-table, so if multiple tables
are scanned for the query there will be multiple sections in the
profile.
Note that we only have these metrics for a table if Iceberg needs to be
used for planning for that table, e.g. if a predicate is pushed down to
Iceberg or if there is time travel. For tables where Iceberg was not
used in planning, the profile will contain a short note describing this.
To facilitate pairing the metrics with scans, the metrics header
references the plan node responsible for the scan. This will always be
the top level node for the scan, so it can be a SCAN node, a JOIN node
or a UNION node depending on whether the table has delete files.
Testing:
- added EE tests in iceberg-scan-metrics.tests
- added a test in PlannerTest.java that asserts on the number of
metrics; if it changes in a new Iceberg release, the test will fail
and we can update our reporting
Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219
Reviewed-on: http://gerrit.cloudera.org:8080/22501
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
When a db is missing in the catalog cache, waitForHmsEvent request
currently just check if there are pending database events on it,
assuming processing the CREATE_DATABASE event will add the db with
the table list. However, that's wrong since processing CREATE_DATABASE
event just adds the db with an empty table list. We should still wait
for pending events on underlying tables.
Tests:
- Added e2e test which consistenly fails when running concurrently with
other tests in TestEventSyncWaiting without the fix.
Change-Id: I3fe74fcf0bf4dbac4a3584f6603279c0a2730b0c
Reviewed-on: http://gerrit.cloudera.org:8080/22817
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
Adds a test that multiple slow concurrent alters complete in parallel
rather than being executed sequentially.
The test creates tables, and ensures Impala's catalog is up-to-date for
their creation so that starting ALTER TABLE will be fast. Then starts
two ALTER TABLE RENAMES asynchronously - with debug_action ensuring each
takes at least 5 seconds - and waits for them to finish.
Verifies that concurrent alters are no longer blocked on "Got catalog
version read lock" and complete in less than 10 seconds.
Change-Id: I87d077aaa295943a16e6da60a2755dd289f3a132
Reviewed-on: http://gerrit.cloudera.org:8080/22804
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch aims to extract existing verifications on DEBUG_ACTION query
option format onto pre-planner stage SetQueryOption(), in order to
prevent failures on execution stage. Also, it localizes verification
code for two existing types of debug actions.
There are two types of debug actions, global e.g. 'RECVR_ADD_BATCH:FAIL'
and ExecNode debug actions, e.g. '0:GETNEXT:FAIL'. Two types are
implemented independently in source code, both having verification code
intertwined with execution. In addition, global debug actions subdivide
into C++ and Java, the two being more or less synchronized though.
In case of global debug actions, most of the code inside existing
DebugActionImpl() consists of verification, therefore it makes sense to
make a wrapper around it for separating out the execution code.
Things are worse for ExecNode debug actions, where verification code
consists of two parts, one in DebugOptions() constructor and another one
in ExecNode::ExecDebugActionImpl(). Additionally, some verification in
constructor produces warnings, while ExecDebugActionImpl() verification
either fails on DCHECK() or (in a single case) returns an error. For
this case, a reasonable solution seems to be simply calling the
constructor for a temporary object and extracting verification code from
ExecNode::ExecDebugActionImpl(). This has the drawback of having the
same warning being produced two times.
Finally, having extracted verification code for both types, logic in
impala::SetQueryOption() combines the two verification mechanisms.
Note: In the long run, it is better to write a single verification
routine for both Global and ExecNode debug actions, ideally as part of a
general unification of the two existing debug_action mechanisms. With
this in mind, the current patch intends to preserve current behavior,
while avoiding complex refactoring.
Change-Id: I53816aba2c79b556688d3b916883fee7476fdbb5
Reviewed-on: http://gerrit.cloudera.org:8080/22734
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds some profile counters for memory allocation and free in
MemPools, which are useful to detect tcmalloc contention.
The following counters are added:
- Thread level page faults: TotalThreadsMinorPageFaults,
TotalThreadsMajorPageFaults.
- MemPool counters for tuple_mem_pool and aux_mem_pool of the scratch
batch in columnar scanners:
- ScratchBatchMemAllocDuration
- ScratchBatchMemFreeDuration
- ScratchBatchMemAllocBytes
- MemPool counters for data_page_pool of ParquetColumnChunkReader
- ParquetDataPagePoolAllocBytes
- ParquetDataPagePoolAllocDuration
- ParquetDataPagePoolFreeBytes
- ParquetDataPagePoolFreeDuration
- MemPool counters for the fragment level RowBatch
- RowBatchMemPoolAllocDuration
- RowBatchMemPoolAllocBytes
- RowBatchMemPoolFreeDuration
- RowBatchMemPoolFreeBytes
- Duration in HdfsColumnarScanner::GetCollectionMemory() which includes
memory allocation for collection values and memcpy when doubling the
tuple buffer:
- MaterializeCollectionGetMemTime
Here is an example of a memory-bound query:
Fragment Instance
- RowBatchMemPoolAllocBytes: 0 (Number of samples: 0)
- RowBatchMemPoolAllocDuration: 0.000ns (Number of samples: 0)
- RowBatchMemPoolFreeBytes: (Avg: 719.25 KB (736517) ; Min: 4.00 KB (4096) ; Max: 4.12 MB (4321922) ; Sum: 1.93 GB (2069615013) ; Number of samples: 2810)
- RowBatchMemPoolFreeDuration: (Avg: 132.027us ; Min: 0.000ns ; Max: 21.999ms ; Sum: 370.997ms ; Number of samples: 2810)
- TotalStorageWaitTime: 47.999ms
- TotalThreadsInvoluntaryContextSwitches: 2 (2)
- TotalThreadsMajorPageFaults: 0 (0)
- TotalThreadsMinorPageFaults: 549.63K (549626)
- TotalThreadsTotalWallClockTime: 9s646ms
- TotalThreadsSysTime: 1s508ms
- TotalThreadsUserTime: 1s791ms
- TotalThreadsVoluntaryContextSwitches: 8.85K (8852)
- TotalTime: 9s648ms
...
HDFS_SCAN_NODE (id=0):
- ParquetDataPagePoolAllocBytes: (Avg: 2.36 MB (2477480) ; Min: 4.00 KB (4096) ; Max: 4.12 MB (4321922) ; Sum: 1.02 GB (1090091508) ; Number of samples: 440)
- ParquetDataPagePoolAllocDuration: (Avg: 1.263ms ; Min: 0.000ns ; Max: 39.999ms ; Sum: 555.995ms ; Number of samples: 440)
- ParquetDataPagePoolFreeBytes: (Avg: 1.28 MB (1344350) ; Min: 4.00 KB (4096) ; Max: 1.53 MB (1601012) ; Sum: 282.06 MB (295757000) ; Number of samples: 220)
- ParquetDataPagePoolFreeDuration: (Avg: 1.927ms ; Min: 0.000ns ; Max: 19.999ms ; Sum: 423.996ms ; Number of samples: 220)
- ScratchBatchMemAllocBytes: (Avg: 486.33 KB (498004) ; Min: 4.00 KB (4096) ; Max: 512.00 KB (524288) ; Sum: 1.19 GB (1274890240) ; Number of samples: 2560)
- ScratchBatchMemAllocDuration: (Avg: 1.936ms ; Min: 0.000ns ; Max: 35.999ms ; Sum: 4s956ms ; Number of samples: 2560)
- ScratchBatchMemFreeDuration: 0.000ns (Number of samples: 0)
- DecompressionTime: 1s396ms
- MaterializeCollectionGetMemTime: 4s899ms
- MaterializeTupleTime: 6s656ms
- ScannerIoWaitTime: 47.999ms
- TotalRawHdfsOpenFileTime: 0.000ns
- TotalRawHdfsReadTime: 360.997ms
- TotalTime: 9s254ms
The fragment instance took 9s648ms to finish. 370.997ms spent in
releasing memory of the final RowBatch. The majority of the time is
spent in the scan node (9s254ms). Mostly it's DecompressionTime +
MaterializeTupleTime + ScannerIoWaitTime + TotalRawHdfsReadTime. The
majority is MaterializeTupleTime (6s656ms).
ScratchBatchMemAllocDuration shows that invoking std::malloc() in
materializing the scratch batches took 4s956ms overall.
MaterializeCollectionGetMemTime shows that allocating memory for
collections and copying memory in doubling the tuple buffer took
4s899ms. So materializing the collections took most of the time.
Note that DecompressionTime (1s396ms) also includes memory allocation
duration tracked by the sum of ParquetDataPagePoolAllocDuration
(555.995ms). So memory allocation also takes a significant portion of
time here.
The other observation is TotalThreadsTotalWallClockTime is much higher
than TotalThreadsSysTime + TotalThreadsUserTime and there is a large
number of TotalThreadsVoluntaryContextSwitches. So the thread is waiting
for resources (e.g. lock) for a long duration. In the above case, it's
waiting for locks in tcmalloc memory allocation (need off-cpu flame
graph to reveal this).
Implementation of MemPool counters
Add MemPoolCounters in MemPool to track malloc/free duration and bytes.
Note that counters are not updated in the destructor since it's expected
that all chunks are freed or transfered before calling the destructor.
MemPool is widely used in the code base. This patch only exposes MemPool
counters in three places:
- the scratch batch in columnar scanners
- the ParquetColumnChunkReader of parquet scanners
- the final RowBatch reset by FragmentInstanceState
This patch also moves GetCollectionMemory() from HdfsScanner to
HdfsColumnarScanner since it's only used by parquet and orc scanners.
PrettyPrint of SummaryStatsCounter is updated to also show the sum of
the values if it's not speeds or percentages.
Tests
- tested in manually reproducing the memory-bound queries
- ran perf-AB-test on tpch (sf=42) and didn't see significant
performance change
- added e2e tests
- updated expected files of observability/test_profile_tool.py due to
SummaryStatsCounter now prints sum in most of the cases. Also updated
get_bytes_summary_stats_counter and
test_get_bytes_summary_stats_counter accordingly.
Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710
Reviewed-on: http://gerrit.cloudera.org:8080/22062
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestConcurrentDdls has several exceptions it considers acceptable for
testing; it would accept the query failure and continue with other
cases. That was fine for existing queries, but if an ALTER RENAME fails
subsequent queries will also fail because the table does not have the
expected name.
With IMPALA-13631, there are three exception cases we need to handle:
1. "Table/view rename succeeded in the Hive Metastore, but failed in
Impala's Catalog Server" happens when the HMS alter_table RPC
succeeds but local catalog has changed. INVALIDATE METADATA on the
target table is sufficient to bring things in sync.
2. "CatalogException: Table ... was modified while operation was in
progress, aborting execution" can safely be retried.
3. "Couldn't retrieve the catalog topic update for the SYNC_DDL
operation" happens when SYNC_DDL=1 and the DDL runs on a stale table
object that's removed from the cache by a global INVALIDATE.
Adds --max_wait_time_for_sync_ddl_s=10 in catalogd_args for the last
exception to occur. Otherwise the query will just timeout.
Tested by running test_concurrent_ddls.py 15 times. The 1st exception
previously would show up within 3-4 runs, while the 2nd exception
happens pretty much every run.
Change-Id: I04d071b62e4f306466a69ebd9e134a37d4327b77
Reviewed-on: http://gerrit.cloudera.org:8080/22802
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
stress_catalog_init_delay_ms does not exist in RELEASE build and causing
KeyError in impala_cluster.py. This patch fix it by specifying default
value when inspecting ImpaladService.get_flag_current_values() return
value.
Testing:
Run start-impala-cluster.py in RELEASE build and it works.
Change-Id: Ia4400a7e711d21d23cc37878f18f2e0389b741b0
Reviewed-on: http://gerrit.cloudera.org:8080/22803
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Catalog versionLock is a lock used to synchronize reads/writes of
catalogVersion. It can be used to perform atomic bulk catalog operations
since catalogVersion cannot change externally while the lock is being
held. All other catalog operations will be blocked if the current thread
holds the lock. So it shouldn't be held for a long time, especially when
the current thread is invoking external RPCs for a table.
CatalogOpExecutor.alterTable() is one place that could hold the lock for
a long time. If the ALTER operation is a RENAME, it holds the lock until
alterTableOrViewRename() finishes. HMS RPCs are invoked in this method
to perform the operation, which might take an unpredictive time. The
motivation of holding this lock is that RENAME is implemented as an DROP
+ ADD in the catalog cache. So this operation can be atomic. However,
that doesn't mean we need the lock before operating the cache in
CatalogServiceCatalog.renameTable(). We actually acquires the lock again
in this method. So no need to keep holding the lock when invoking HMS
RPCs.
This patch removes holding the lock in alterTableOrViewRename().
Tests
- Added e2e test for concurrent rename operations.
- Also added some rename operations in test_concurrent_ddls.py
Change-Id: Ie5f443b1e167d96024b717ce70ca542d7930cb4b
Reviewed-on: http://gerrit.cloudera.org:8080/22789
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
getPartialCatalogObject is a catalogd RPC used by local catalog mode
coordinators to fetch metadata on-demand from catalogd.
For a table with a huge number (e.g. 6M) of files, catalogd might hit
OOM of exceeding the JVM array limit when serializing the response of
a getPartialCatalogObject request for all partitions (thus all files).
This patch adds a new flag, catalog_partial_fetch_max_files, to define
the max number of file descriptors allowed in a response of
getPartialCatalogObject. Catalogd will truncate the response in
partition level when it's too big, and only return a subset of the
requested partitions. Coordinator should send new requests to fetch the
remaining partitions. Note that it's possible that table metadata
changes between the requests. Coordinator will detect the catalog
version changes and throws an InconsistentMetadataFetchException for the
planner to replan the query. This is an existing mechanism for other
kinds of table metadata.
Here are some metrics of the number of files in a single response and
the corresponding byte array size and duration of a single response:
* 1000000: 371.71MB, 1s487ms
* 2000000: 744.51MB, 4s035ms
* 3000000: 1.09GB, 6s643ms
* 4000000: 1.46GB, duration not measured due to GC pauses
* 5000000: 1.82GB, duration not measured due to GC pauses
* 6000000: >2GB (hit OOM)
Choose 1000000 as the default value for now. We can tune it in the
future.
Tests:
- Added custom-cluster test
- Ran e2e tests in local-catalog mode with
catalog_partial_fetch_max_files=1000 so the new codes are used.
Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Reviewed-on: http://gerrit.cloudera.org:8080/22559
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds NaN, Infinity, and boolean parsing in ImpylaHS2ResultSet
to match with beeswax result. TestQueriesJsonTables is changed to test
all client protocol.
Testing:
Run and pass TestQueriesJsonTables.
Change-Id: I739a88e9dfa418d3a3c2d9d4181b4add34bc6b93
Reviewed-on: http://gerrit.cloudera.org:8080/22785
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
TestAdmissionController.test_user_loads_rules is flaky for not failing
the last query that should exceed the user quota. The test executes
queries in a round-robin fashion across all impalad. These impalads are
expected to synchronize user quota count through statestore updates.
This patch attempts to deflake the test by raising the heartbeat wait
time from 1 heartbeat period to 3 hearbeat periods. It also changes the
reject query to a fast version of SLOW_QUERY (without sleep) so the test
can fail fast if it is not rejected.
Testing:
Loop the test 50 times and pass them all.
Change-Id: Ib2ae8e1c2edf174edbf0e351d3c2ed06a0539f08
Reviewed-on: http://gerrit.cloudera.org:8080/22787
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
ImpalaConnection.execute and ImpalaConnection.execute_async have 'user'
parameter to set specific user to run the query. This is mainly legacy
of BeeswaxConnection, which allows using 1 client to run queries under
different usernames.
BeeswaxConnection and ImpylaHS2Connection actually allow specifying one
user per client. Doing so will simplify user-specific tests such as
test_ranger.py that often instantiates separate clients for admin user
and regular user. There is no need to specify 'user' parameter anymore
when calling execute() or execute_async(). Thus, reducing potential bugs
from forgetting to set one or setting it with incorrect value.
This patch applies one-user-per-client practice as much as possible for
test_ranger.py, test_authorization.py, and test_admission_controller.py.
Unused code and pytest fixtures are removed. Few flake8 issues are
addressed too. Their default_test_protocol() is overridden to return
'hs2'.
ImpylaHS2Connection.execute() and ImpylaHS2Connection.execute_async()
are slightly modified to assume ImpylaHS2Connection.__user if 'user'
parameter in None. BeeswaxConnection remains unchanged.
Extend ImpylaHS2ResultSet.__convert_result_value() to lower case boolean
return value to match beeswax result.
Testing:
Run and pass all modified tests in exhaustive exploration.
Change-Id: I20990d773f3471c129040cefcdff1c6d89ce87eb
Reviewed-on: http://gerrit.cloudera.org:8080/22782
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
hs2_parquet_constraint and hs2_text_constraint is meant to extend test
vector dimension to also test non-default test protocol (other than
beeswax), but limit it to only run against 'parquet/none' or 'text/none'
format accordingly.
This patch modifies these constraints to
default_protocol_or_parquet_constraint and
default_protocol_or_text_constraint respectively such that the full file
format coverage happen for default_test_protocol configuration and
limited for the other protocols. Drop hs2_parquet_constraint entirely
from test_utf8_strings.py because that test is already constrained to
single 'parquet/none' file format.
Num modified rows validation in date-fileformat-support.test and
date-partitioning.test are changed to check the NumModifiedRows counter
from profile.
Fix TestQueriesJsonTables to always run with beeswax protocol because
its assertions relies on beeswax-specific return values.
Run impala-isort and fix few flake8 issues and in modified test files.
Testing:
Run and pass the affected test files using exhaustive exploration and
env var DEFAULT_TEST_PROTOCOL=hs2. Confirmed that full file format
coverage happen for hs2 protocol. Note that
DEFAULT_TEST_PROTOCOL=beeswax is still the default.
Change-Id: I8be0a628842e29a8fcc036180654cd159f6a23c8
Reviewed-on: http://gerrit.cloudera.org:8080/22775
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
waitForHmsEvent is a catalogd RPC for coordinators to send a requested
db/table names to catalogd and wait until it's safe (i.e. no stale
metadata) to start analyzing the statement. The wait time is configured
by query option sync_hms_events_wait_time_s. Currently, when this option
is enabled, catalogd waits until it syncs to the latest HMS event
regardless what the query is.
This patch reduces waiting by only checking related events and wait
until the last related event has been processed. In the ideal case, if
there are no pending events that are related, the query doesn't need to
wait.
Related pending events are determined as follows:
- For queries that need the db list, i.e. SHOW DATABASES, check pending
CREATE/ALTER/DROP_DATABASE events on all dbs. ALTER_DATABASE events
are checked in case the ownership changes and impacts visibility.
- For db statements like SHOW FUNCTIONS, CREATE/ALTER/DROP DATABASE,
check pending CREATE/ALTER/DROP events on that db.
- For db statements that require the table list, i.e. SHOW TABLES,
also check CREATE_TABLE, DROP_TABLE events under that db.
- For table statements,
- check all database events on related db names.
- If there are loaded transactional tables, check all the pending
COMMIT_TXN, ABORT_TXN events. Note that these events might modify
multiple transactional tables and we don't know their table names
until they are processed. To be safe, wait for all transactional
events.
- For all the other table names,
- if they are all missing/unloaded in the catalog, check all the
pending CREATE_TABLE, DROP_TABLE events on them for their
existence.
- Otherwise, some of them are loaded, check all the table events on
them. Note that we can fetch events on multiple tables under the
same db in a single fetch.
If the statement has a SELECT part, views will be expanded so underlying
tables will be checked as well. For performance, this feature assumes
that views won't be changed to tables, and vice versa. This is a rare
use case in regular jobs. Users should use INVALIDATE for such case.
This patch leverages the HMS API to fetch events of several tables under
the same db in batch. MetastoreEventsProcessor.MetaDataFilter is
improved for this.
Tests:
- Added test for multiple tables in a single query.
- Added test with views.
- Added test for transactional tables.
- Ran CORE tests.
Change-Id: Ic033b7e197cd19505653c3ff80c4857cc474bcfc
Reviewed-on: http://gerrit.cloudera.org:8080/22571
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().
This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if:
1. It is the active CatalogD, and
2. Gathering thread has collect the first topic update or CatalogD is
set with catalog_topic_mode other than "minimal".
The later prerequisite is to ensure that all coordinators are not
blocked waiting for full topic update in on-demand metadata mode. This
is all managed by a new thread method TriggerResetMetadata that monitor
and trigger the initial reset metadata.
Note that this is a behavior change in on-demand catalog
mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode
will send full database list in its first catalog topic update. This
behavior change is OK since coordinator can request metadata on-demand.
After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.
Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution (there is no table yet) and when use_local_catalog=true (local
catalog cache does not start with any table). Modified startup flag
checking from reading the actual command line args to reading the
'/varz?json' page of the daemon. Cleanup impala_service.py to fix some
flake8 issues.
Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so
that unique_database cleanup is successful.
Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
unique_database fixture, and additionally validate /healthz page of
both active and standby catalogd. Changed it to test using hs2
protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.
- Pass core tests.
Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Reviewed-on: http://gerrit.cloudera.org:8080/22634
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
Coordinator uses collectTableRefs() to collect table names used by a
statement. For ResetMetadataStmt used by REFRESH and INVALIDATE METADATA
commands, it's intended to not return the table name in
collectTableRefs() to avoid triggering unneccessary table metadata
loading. However, when this method is used for the HMS event sync
feature, we do want to know what the table is. Thus, catalogd can return
the latest metadata of it after waiting for HMS events are synced. This
bug leads to REFRESH/INVALIDATE not waiting for HMS ALTER ownership
events to be synced. REFRESH/INVALIDATE statements might unexpectedly
fail or succeed due to stale ownership info in coordinators.
To avoid changing the existing logic of collectTableRefs(), this patch
uses getTableName() directly for REFRESH statements since we know it's a
single-table statement. There are other kinds of such single-table
statements like DROP TABLE. To be generic, introduces a new interface,
SingleTableStmt, for all such statements that have a single table name.
If a statement is a SingleTableStmt, we use getTableName() directly
instead of collectTableRefs() in collectRequiredObjects().
This improves coordinator in collecting table names for single-table
statements. E.g. "DROP TABLE mydb.foo" previously has two candidate
table names - "mydb.foo" and "default.mydb" (assuming the session db is
"default"). Now it just collects "mydb.foo". Catalogd can return less
metadata in the response.
Tests:
- Added FE tests for collectRequiredObjects() where coordinators
collect db/table names.
- Added authorization tests on altering the ownership in Hive and
running queries in Impala.
Change-Id: I813007e9ec42392d0f6d3996331987c138cc4fb8
Reviewed-on: http://gerrit.cloudera.org:8080/22743
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
An equivalent of ImpalaBeeswaxResult.schema is not implemented at
ImpylaHS2ResultSet. However, column_labels and column_types fields are
implemented for both.
This patch removes usage of ImpalaBeeswaxResult.schema and replaces it
with either column_labels or column_types field. Tests that used to
access ImpalaBeeswaxResult.schema are migrated to test using hs2
protocol by default. Also fix flake8 issues in modified test files.
Testing:
Run and pass modified test files in exhaustive exploration.
Change-Id: I060fe2d3cded1470fd09b86675cb22442c19fbee
Reviewed-on: http://gerrit.cloudera.org:8080/22776
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.
This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.
Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.
This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.
Testing:
- Ran a core job
Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
When enable_insert_events is set to true (default), Impala will fire HMS
INSERT events for each INSERT statement. Preparing data of the
InsertEvents actually takes time since it fetches checksums of all the
new files. This patch adds a catalog timeline item to reveal this step.
Before this patch, the duration of "Got Metastore client" before "Fired
Metastore events" could be long:
Catalog Server Operation: 65.762ms
- Got catalog version read lock: 12.724us (12.724us)
- Got catalog version write lock and table write lock: 224.572us (211.848us)
- Got Metastore client: 418.346us (193.774us)
- Got Metastore client: 29.001ms (28.583ms) <---- Unexpected long
- Fired Metastore events: 52.665ms (23.663ms)
After this patch, we shows what actually takes the time is "Prepared
InsertEvent data":
Catalog Server Operation: 61.597ms
- Got catalog version read lock: 7.129us (7.129us)
- Got catalog version write lock and table write lock: 114.476us (107.347us)
- Got Metastore client: 200.040us (85.564us)
- Prepared InsertEvent data: 25.335ms (25.135ms)
- Got Metastore client: 25.342ms (7.009us)
- Fired Metastore events: 46.625ms (21.283ms)
Tests:
- Added e2e test
Change-Id: Iaef1cae7e8ca1c350faae8666ab1369717736978
Reviewed-on: http://gerrit.cloudera.org:8080/22778
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestHmsIntegration.test_change_parquet_column_type fail in exhaustive
mode due to a missing int parsing introduced by IMPALA-13920.
This patch add the missing int parsing. It also fix flake8 issues
in test_hms_integration.py, including unused vector fixture.
Testing:
Run and pass test_hms_integration.py in exhaustive mode.
Change-Id: If5fb9f96b4087e86b0ebaac7135e14b7a14936ea
Reviewed-on: http://gerrit.cloudera.org:8080/22774
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch changes the way test validate num inserted rows from checking
the beeswax-specific result to checking NumModifiedRows counter from
query profile.
Remove skiping over hs2 protocol in test_chars.py and refactor
test_date_queries.py a bit to reduce test skiping. Added HS2_TYPES in
tests that requires it and fix some flake8 issues.
Testing:
Run and pass all affected tests.
Change-Id: I96eae9967298f75b2c9e4d0662fcd4a62bf5fffc
Reviewed-on: http://gerrit.cloudera.org:8080/22770
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
Before this patch, ImpylaHS2Connection unconditionally opened a
cursor (and HS2 session) as it connected, followed by running a "SET
ALL" query to populate the default query options.
This patch changes the behavior of ImpylaHS2Connection to open the
default cursor only when querying is needed for the first time. This
helps preserve assertions for a test that is sensitive about client
connection, like IMPALA-13925. Default query options are now parsed from
newly instantiated TQueryOptions object rather than issuing a "SET ALL"
query or making BeeswaxService.get_default_configuration() RPC.
Fix test_query_profile_contains_query_compilation_metadata_cached_event
slightly by setting the 'sync_ddl' option because the test is flaky
without it.
Tweak test_max_hs2_sessions_per_user to run queries so that sessions
will open.
Deduplicate test cases between utc-timestamp-functions.test and
local-timestamp-functions.test. Rename TestUtcTimestampFunctions to
TestTimestampFunctions, and expand it to also tests
local-timestamp-functions.test and
file-formats-with-local-tz-conversion.test. The table_format is now
contrained to 'test/none' because it is unnecessary to permute other
table_format.
Deprecate 'use_local_tz_for_unix_timestamp_conversions' in favor of
query option with the same name. Filed IMPALA-13953 to update the
documentation of 'use_local_tz_for_unix_timestamp_conversions'
flag/option.
Testing:
Run and pass a few pytests such as:
test_admission_controller.py
test_observability.py
test_runtime_filters.py
test_session_expiration.py.
test_set.py
Change-Id: I9d5e3e5c11ad386b7202431201d1a4cff46cbff5
Reviewed-on: http://gerrit.cloudera.org:8080/22731
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>