Commit Graph

11 Commits

Author SHA1 Message Date
Riza Suminto
55feffb41b IMPALA-13850 (part 1): Wait until CatalogD active before resetting
In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().

This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if:

1. It is the active CatalogD, and
2. Gathering thread has collect the first topic update or CatalogD is
   set with catalog_topic_mode other than "minimal".

The later prerequisite is to ensure that all coordinators are not
blocked waiting for full topic update in on-demand metadata mode. This
is all managed by a new thread method TriggerResetMetadata that monitor
and trigger the initial reset metadata.

Note that this is a behavior change in on-demand catalog
mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode
will send full database list in its first catalog topic update. This
behavior change is OK since coordinator can request metadata on-demand.

After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.

Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution (there is no table yet) and when use_local_catalog=true (local
catalog cache does not start with any table). Modified startup flag
checking from reading the actual command line args to reading the
'/varz?json' page of the daemon. Cleanup impala_service.py to fix some
flake8 issues.

Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so
that unique_database cleanup is successful.

Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
  unique_database fixture, and additionally validate /healthz page of
  both active and standby catalogd. Changed it to test using hs2
  protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.
- Pass core tests.

Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Reviewed-on: http://gerrit.cloudera.org:8080/22634
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-17 01:59:54 +00:00
Riza Suminto
00dc79adf6 IMPALA-13907: Remove reference to create_beeswax_client
This patch replace create_beeswax_client() reference to
create_hs2_client() or vector-based client creation to prepare towards
hs2 test migration.

test_session_expiration_with_queued_query is changed to use impala.dbapi
directly from Impyla due to limitation in ImpylaHS2Connection.

TestAdmissionControllerRawHS2 is migrated to use hs2 as default test
protocol.

Modify test_query_expiration.py to set query option through client
instead of SET query. test_query_expiration is slightly modified due to
behavior difference in hs2 ImpylaHS2Connection.

Remove remaining reference to BeeswaxConnection.QueryState.

Fixed a bug in ImpylaHS2Connection.wait_for_finished_timeout().

Fix some easy flake8 issues caught thorugh this command:
git show HEAD --name-only | grep '^tests.*py' \
  | xargs -I {} impala-flake8 {} \
  | grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F...

Testing:
- Pass exhaustive tests.

Change-Id: I1d84251835d458cc87fb8fedfc20ee15aae18d51
Reviewed-on: http://gerrit.cloudera.org:8080/22700
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-29 18:37:45 +00:00
Riza Suminto
8324201acd IMPALA-13847: Remove beeswax-specific way to obtain query id
With IMPALA-13682 merged, checking for query state can be done via
ImpalaConnection.handle_id() that works for beeswax, hs2, and hs2-http
protocol. This patch apply such change.
ImpalaTestSuite.wait_for_progress() is refactored a bit to make client
parameter required.

Testing:
- Run and pass the affected tests.

Change-Id: I0a2bac1011f5a0e058f88f973ac403cce12d2b86
Reviewed-on: http://gerrit.cloudera.org:8080/22606
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-12 07:14:19 +00:00
Riza Suminto
71feb617e4 IMPALA-13835: Remove reference to protocol-specific states
With IMPALA-13682 merged, checking for query state can be done via
wait_for_impala_state(), wait_for_any_impala_state() and other helper
methods of ImpalaConnection. This patch remove all reference to
protocol-specific states such as BeeswaxService.QueryState.

Also fix flake8 errors and unused variable in modified test files.

Testing:
- Run and pass all affected tests.

Change-Id: Id6b56024fbfcea1ff005c34cd146d16e67cb6fa1
Reviewed-on: http://gerrit.cloudera.org:8080/22586
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-09 00:04:05 +00:00
Xuebin Su
242095ac8a IMPALA-13729: Accept error messages not starting with prompt
Previously, error_msg_expected() only accepted error messages starting
with the following error prompt:
```
Query <query_id> failed:\n
```
However, for some tests using the Beeswax protocol, the error prompt may
appear in the middle of the error message instead of at its beginning.

Therefore, this patch adapts error_msg_expected() to accept error
messages not starting with the error prompt.

The error_msg_expected() function is renamed to error_msg_startswith()
to better describe its behavior.

Change-Id: Iac3e68bcc36776f7fd6cc9c838dd8da9c3ecf58b
Reviewed-on: http://gerrit.cloudera.org:8080/22468
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-02-26 15:29:36 +00:00
Xuebin Su
ad868b9947 IMPALA-13115: Add query id to error messages
This patch adds the query id to the error messages in both

- the result of the `get_log()` RPC, and
- the error message in an RPC response

before they are returned to the client, so that the users can easily
figure out the errored queries on the client side.

To achieve this, the query id of the thread debug info is set in the
RPC handler method, and is retrieved from the thread debug info each
time the error reporting function or `get_log()` gets called.

Due to the change of the error message format, some checks in the
impala-shell.py are adapted to keep them valid.

Testing:
- Added helper function `error_msg_expected()` to check whether an
  error message is expected. It is stricter than only using the `in`
  operator.
- Added helper function `error_msg_equal()` to check if two error
  messages are equal regardless of the query ids.
- Various test cases are adapted to match the new error message format.
- `ImpalaBeeswaxException`, which is used in tests only, is simplified
  so that it has the same error message format as the exceptions for
  HS2.
- Added an assertion to the case of killing and restarting a worker
  in the custom cluster test to ensure that the query id is in
  the error message in the client log retrieved with `get_log()`.

Change-Id: I67e659681e36162cad1d9684189106f8eedbf092
Reviewed-on: http://gerrit.cloudera.org:8080/21587
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-08 14:11:04 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
wzhou-code
6bb3b88d05 IMPALA-9180 (part 1): Remove legacy ImpalaInternalService
The legacy Thrift based Impala internal service has been deprecated
and can be removed now.

This patch removes ImpalaInternalService. All infrastructures around it
are cleaned up, except one place for flag be_port.
StatestoreSubscriber::subscriber_id consists be_port, but we cannot
change format of subscriber_id now. This remaining be_port issue will be
fixed in a succeeding patch (part 4).
TQueryCtx.coord_address is changed to TQueryCtx.coord_hostname since the
port in TQueryCtx.coord_address is set as be_port and is unused now.
Also Rename TQueryCtx.coord_krpc_address as TQueryCtx.coord_ip_address.

Testing:
 - Passed the exhaustive test.
 - Passed Quasar-L0 test.

Change-Id: I5fa83c8009590124dded4783f77ef70fa30119e6
Reviewed-on: http://gerrit.cloudera.org:8080/16291
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-30 22:41:00 +00:00
wzhou-code
9d43cfdaee IMPALA-5746: Cancel all queries scheduled by failed coordinators
Executor registers the updating of cluster membership. When coordinators
are absence from the active cluster membership list, executer cancels
all the running fragments of the queries which are scheduled by the
inactive coordinators since the executer cannot send results back to
the inactive/failed coordinators. This makes executers quickly release
the resources allocated for those running fragments to be cancelled.

Testing:
- Added new test case TestProcessFailures::test_kill_coordinator
  and ran the test case as following command:
    ./bin/impala-py.test tests/custom_cluster/test_process_failures.py\
      ::TestProcessFailures::test_kill_coordinator \
      --exploration_strategy=exhaustive.
- Passed the core test.

Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Reviewed-on: http://gerrit.cloudera.org:8080/16215
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-07-31 21:39:08 +00:00
Sahil Takiar
fc19e70cbc IMPALA-5534: Fix and enable experimental failure tests
Moves the test_catalog_hms_failures.py and test_process_failures.py from
the experimental tests to custom cluster tests.
catalog_service/test_hms_failure.py is combined with
custom_cluster/test_catalog_hms_failure.py as well in order to unify all
tests for HMS failures. Several modifications to the tests were
necessary to get them working again, but for the most part, the logic of
the tests remained the same. A few additional fault tolerance tests
(e.g. TestHiveMetaStoreFailure::test_hms_client_retries) were added as
well. The overall goal is to increase the process failure test coverage
for all components: impalads, statestore, catalogd, HMS, etc.

test_restart_catalogd in test_process_failures.py fails due to
IMPALA-9848, so it is skipped for now.

Testing:
* Ran new tests locally

Change-Id: I9dbb98017fb6c40cea349e7c63a35c325cbbc288
Reviewed-on: http://gerrit.cloudera.org:8080/16157
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-07-14 00:03:14 +00:00