This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.
This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.
Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.
This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.
Testing:
- Ran a core job
Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
It's a common scenario to run Impala queries after the dependent
external changes are done. E.g. running COMPUTE STATS on a table after
Hive/Spark jobs ingest some new data to it. Currently, it's unsafe to
run the Impala queries immediately after the Hive/Spark jobs finish
since EventProcessor might have a long lag in applying the HMS events.
Note that running REFRESH/INVALIDATE on the table can also solve the
problem. But one of the motivation of EventProcessor is to get rid of
such Impala specific commands.
This patch adds a mechanism to let query planning wait until the
metadata is synced up. Two new query options are added:
- SYNC_HMS_EVENTS_WAIT_TIME_S configures the timeout in seconds for
waiting. It's 0 by default, which disables the waiting mechanism.
- SYNC_HMS_EVENTS_STRICT_MODE controls the behavior if we can't wait
for metadata to be synced up, e.g. when the waiting times out or
EventProcessor is in ERROR state. It defaults to false (non-strict
mode). In the strict mode, coordinator will fail the query. In the
non-strict mode, coordinator will start planning with a warning
message in profile (and in client outputs if the client consumes the
get_log results, e.g. in impala-shell).
Example usage - query the table after inserting into dynamic partitions
in Hive. We don't know what partitions are modified so running REFRESH
in Impala is inefficient since it reloads all partitions.
hive> insert into tbl partition(p) select * from tbl2;
impala> set sync_hms_events_wait_time_s=300;
impala> select * from tbl;
With this new feature, let catalogd reload the updated partitions based
on HMS events, which is more efficient than REFRESH. The wait time can
be set to the largest lag of event processing that has been observed in
the cluster. Note the lag of event processing is shown as the "Lag time"
in the /events page of catalogd WebUI and "events-processor.lag-time" in
the /metrics page. Users can monitor it to get a sense of the lag.
Some timeline items are added in query profile for this waiting
mechanism, e.g.
A succeeded wait:
Query Compilation: 937.279ms
- Synced events from Metastore: 909.162ms (909.162ms)
- Metadata of all 1 tables cached: 911.005ms (1.843ms)
- Analysis finished: 919.600ms (8.595ms)
A failed wait:
Query Compilation: 1s321ms
- Continuing without syncing Metastore events: 40.883ms (40.883ms)
- Metadata load started: 41.618ms (735.633us)
Added a histogram metric, impala-server.wait-for-hms-event-durations-ms,
to track the duration of this waiting.
--------
Implementation
A new catalogd RPC, WaitForHmsEvent, is added to CatalogService API so
that coordinator can wait until catalogd processes the latest event when
this RPC is triggered. Query planning starts or fails after this RPC
returns. The RPC request contains the potential dbs/tables that are
required by the query. Catalogd records the latest event id when it
receives this RPC. When the last synced event id reaches this, catalogd
returns the catalog updates to the coordinator in the RPC response.
Before that, the RPC thread is in a waiting loop that sleeps in a
configurable interval. It's configured by a hidden flag,
hms_event_sync_sleep_interval_ms (defaults to 100).
Entry-point functions
- Frontend#waitForHmsEvents()
- CatalogServiceCatalog#waitForHmsEvent()
Some statements don't need to wait for HMS events, e.g. CREATE/DROP ROLE
statements. This patch adds an overrided method, requiresHmsMetadata(),
in each Statement to mark whether they can skip HMS event sync.
Test side changes:
- Some test codes use EventProcessorUtils.wait_for_event_processing()
to wait for HMS events being synced up before running a query. Now
they are updated to just use these new query options in the query.
- Note that we still need wait_for_event_processing() in test codes
that verify metrics after HMS events are synced up.
--------
Limitation
Currently, UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT,
OPEN_TXN events are ignored by the event processor. If the latest event
happens to be in these types and there are no more other events, the
last synced event id can never reach the latest event id. We need to fix
last synced event id to also consider ignored events (IMPALA-13623).
The current implementation waits for the event id when the
WaitForHmsEvent RPC is received at catalogd side. We can improve it
by leveraging HIVE-27499 to efficiently detect whether the given
dbs/tables have unsynced events and just wait for the *largest* id
of them. Dbs/tables without unsynced events don't need to block query
planning. However, this only works for non-transactional tables.
Transactional tables might be modified by COMMIT_TXN or ABORT_TXN events
which don't have the table names. So even with HIVE-27499, we can't
determine whether a transactional table has pending events. IMPALA-13684
will target on improving this on non-transactional tables.
Tests
- Add test to verify planning waits until catalogd is synced with HMS
changes.
- Add test on the error handling when HMS event processing is disabled
Change-Id: I36ac941bb2c2217b09fcfa2eb567b011b38efa2a
Reviewed-on: http://gerrit.cloudera.org:8080/20131
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestEventProcessingError.test_event_processor_error_global_invalidate
creates a partitioned table with partition year=2024. The test expects
the output of DESCRIBE FORMATTED contains string "2024". It happens to
work in year 2024 since there is a field of "CreateTime" in the output
that has "2024" in the timestamp. Now as we are in year 2025, the test
fails forever.
This fixes the test to check the output of SHOW PARTITIONS.
Tests
- Verified the test locally.
Change-Id: I0b17fd1f90a9bc00d027527661ff675e61ba0b1a
Reviewed-on: http://gerrit.cloudera.org:8080/22287
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
CI failed for Ozone and S3 builds due to 5 tests failure in
TestEventProcessingError. Those tests run insert into table and
analyze table compute statistics queries on hive and failed to start
the Tez client. Those tests were introduced with IMPALA-12832.
Testing:
- Executed end to end tests
Change-Id: Idb8422fbb494cd74def5ff6926aea82d0981cc82
Reviewed-on: http://gerrit.cloudera.org:8080/21172
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Event processor goes to error state before it tries to global invalidate
It remains in error state for a very short period of time. If
wait_for_synced_event_id() obtains event processor status during
this period, it can get status as error. This test was introduced
with IMPALA-12832.
Testing:
- Tested manually. Added sleep in code for testing so that event
processor remains in error state for little longer time.
Change-Id: I787cff4cc9f9df345cd715c02b51b8d93a150edf
Reviewed-on: http://gerrit.cloudera.org:8080/21169
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
At present, failure in event processing needs manual invalidate
metadata. This patch implicitly invalidates the table upon failures
in processing of table events with new
'invalidate_metadata_on_event_processing_failure' flag. And a new
'invalidate_global_metadata_on_event_processing_failure' flag is
added to global invalidate metadata automatically when event
processor goes to non-active state.
Note: Also introduced a config
'inject_process_event_failure_event_types' for automated tests to
simulate event processor failures. This config is used to specify what
event types can be intentionally failed. This config should only be
used for testing purpose. Need IMPALA-12851 as a prerequisite
Testing:
- Added end-to-end tests to mimic failures in event processor and verified
that event processor is active
- Added unit test to verify the 'auto_global_invalidate_metadata' config
- Passed FE tests
Co-Authored-by: Sai Hemanth Gantasala <saihemanth@cloudera.com>
Change-Id: Ia67fc04c995802d3b6b56f79564bf0954b012c6c
Reviewed-on: http://gerrit.cloudera.org:8080/21065
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>