Commit Graph

10 Commits

Author SHA1 Message Date
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Csaba Ringhofer
f98b697c7b IMPALA-13929: Make 'functional-query' the default workload in tests
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.

All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.

The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.

get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.

Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-08 07:12:55 +00:00
Riza Suminto
95f353ac4a IMPALA-13507: Allow disabling glog buffering via with_args fixture
We have plenty of custom_cluster tests that assert against content of
Impala daemon log files while the process is still running using
assert_log_contains() and it's wrappers. The method specifically mention
about disabling glog buffering ('-logbuflevel=-1'), but not all
custom_cluster tests do that. This often result in flaky test that hard
to triage and often neglected if it does not frequently run in core
exploration.

This patch adds boolean param 'disable_log_buffering' into
CustomClusterTestSuite.with_args for test to declare intention to
inspect log files in live minicluster. If it is True, start minicluster
with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on
any calls to assert_log_contains().

There are several complex custom_cluster tests that left unchanged and
print out such WARNING logs, such as:
- TestQueryLive
- TestQueryLogTableBeeswax
- TestQueryLogOtherTable
- TestQueryLogTableHS2
- TestQueryLogTableAll
- TestQueryLogTableBufferPool
- TestStatestoreRpcErrors
- TestWorkloadManagementInitWait
- TestWorkloadManagementSQLDetails

This patch also fixed some small flake8 issues on modified tests.

There is a flakiness sign at test_query_live.py where test query is
submitted to coordinator and fail because sys.impala_query_live table
has not exist yet from coordinator's perspective. This patch modify
test_query_live.py to wait for few seconds until sys.impala_query_live
is queryable.

Testing:
- Pass custom_cluster tests in exhaustive exploration.

Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61
Reviewed-on: http://gerrit.cloudera.org:8080/22015
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-05 04:49:05 +00:00
Michael Smith
8d4497be09 IMPALA-13252: Consistently use PrintId to print TUniqueId
Some logging formats TUniqueId inconsistently by relying on the Thrift
to_string/toString generated printers. This makes it difficult to track
a specific query through logs.

Adds operator<<(ostream, TUniqueId) to simplify logging TUniqueId
correctly, uses PrintId instead of toString in Java, and adds a verifier
to test_banned_log_messages to ensure TUniqueId is not printed in logs.

Change-Id: If01bf20a240debbbd4c0a22798045ea03f17b28e
Reviewed-on: http://gerrit.cloudera.org:8080/21606
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-07-27 04:26:44 +00:00
wzhou-code
ae95c43eda IMPALA-12286: Make CatalogD HA robust
IMPALA-12155 added support for CatalogD HA. The statestore assigns the
roles for the catalogd in the HA pair. When an active catalogd is
elected, statestore sends RPCs to coordinators and catalogds to notify
the new active catalogd. But the RPCs could fail due to network
conditions. To avoid missing the notification of new active catalogd,
this patch makes statestore to resend RPCs to subscribers if there are
RPC failures.

This patch also rename the metric "catalog-server.ha-active-status" to
"catalog-server.active-status" since this metric is also set for
catalogd when CatalogD HA is not enabled.

As discussed in IMPALA-12267, catalogd need to re-generate its Catalog
Service ID when it becomes active.

Testing:
 - Added unit-test cases for CatalogD HA with simulated RPC failures.
 - Passed core tests.

Change-Id: Ibdfea022031c3cc1cbaf4ad52e947720a5d5630f
Reviewed-on: http://gerrit.cloudera.org:8080/20220
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2023-07-21 07:34:32 +00:00
wzhou-code
a82830896b IMPALA-12150: Use protocol version to isolate cluster components
Some Thrift request/response structs in CatalogService were changed to
add new variables in the middle, which caused cross version
incompatibility issue for CatalogService.

Impala cluster membership is managed by the statestore. During upgrade
scenarios where different versions of Impala daemons are upgraded one
at a time, the upgraded daemons have incompatible message formats.
Even through protocol versions numbers were already defined for
Statestore and Catalog Services, they were not used. The Statestore and
Catalog server did not check the protocol version in the requests, which
allowed incompatible Impala daemons to join one cluster. This causes
unexpected query failures during rolling upgrade.

We need a way to detect this and enforce that some rules are followed:
 - Statestore refuses the registration requests from incompatible
   subscribers.
 - Catalog server refuses the requests from incompatible clients.
 - Scheduler assigns tasks to a group of compatible executors.

This patch isolate Impala daemons into separate clusters based on
protocol versions of Statestore service to prevent incompatible Impala
daemons from communicating with each other. It covers the Thrift RPC
communications between catalogd and coordinators, and communication
between statestore and its subscribers (executor, coordinators,
catalogd and admissiond). This change should work for future upgrade.

Following changes were made:
 - Bump StatestoreServiceVersion and CatalogServiceVersion to V2 for
   all requests of Statestore and Catalog services.
 - Update the request and response structs in CatalogService to ensure
   each Thrift request struct has protocol version and each Thrift
   response struct has returned status.
 - Update the request and response struct in StatestoreService to
   ensure each Thrift request struct has protocol version and each
   Thrift response struct has returned status.
 - Add subscriber type so that statestore could distinguish different
   types of subscribers.
 - Statestore checks protocol version for registration requests from
   subscribers. It refuses the requests with incompatible version.
 - Catalog server checks protocol version for Catalog service APIs, and
   returns error for requests with incompatible version.
 - Catalog daemon sends its address and the protocol version of Catalog
   service when it registers to statestore, statestore forwards the
   address and the protocol version of Catalog service to all
   subscribers during registration.
 - Add UpdateCatalogd API for StatestoreSubscriber service so that the
   coordinators could receive the address and the protocol version of
   Catalog service from statestore if the coordinators register to
   statestore before catalog daemon.
 - Add GetProtocolVersion API for Statestore service so that the
   subscribers can check the protocol version of statestore before
   calling RegisterSubscriber API.
 - Add starting flag tolerate_statestore_startup_delay. It is off by
   default. When it's enabled, the subscriber is able to tolerate
   the delay of the statestore's availability. The subscriber's
   process will not exit if it cannot register with the specified
   statestore on startup. But instead it enter into Recovery mode,
   it will loop, sleep and retry till it successfully register with
   the statestore. This flag should be enabled during rolling upgrade.

CatalogServiceVersion is defined in CatalogService.thrift. In future,
if we make non backward version compatible changes in the request or
response structures for CatalogService APIs, we need to bump the
protocol version of Catalog service.
StatestoreServiceVersion is defined in StatestoreService.thrift.
Similarly if we make non backward version compatible changes in the
request or response structures for StatestoreService APIs, we need
to bump the protocol version of Statestore service.

Message formats for KRPC communications between coordinators and
executors, and between admissiond and coordinators are defined
in proto files under common/protobuf. If we make non backward version
compatible changes in these structures, we need to bump the
protocol version of Statestore service.

Testing:
 - Added end-to-end unit tests.
 - Passed the core tests.
 - Ran manual test to verify old version of executors cannot register
   with new version of statestore, and new version of executors cannot
   register with old version of statestore.

Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc
Reviewed-on: http://gerrit.cloudera.org:8080/19959
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-21 04:24:16 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Michael Ho
41426077fb IMPALA-9026: Use resolved IP address for statestore subscriber
This change adds a flag (--statestore_subscriber_use_resolved_address)
which, if set to true, allows statestore subscribers to use its
resolved IP address instead of its hostname as the heartbeat
address which statestore sends heartbeats / updates to.

This flag is useful in certain situation in which the subscriber's
DNS entry may not be present for a valid reason (e.g. a Kubernetes
pod whose readiness probe returns false). An example is that there
are multiple Impala coordinators but only one of them will be active
at a time (for admission control reason) and the rest will serve
as backup. In which case, we still want the backup coordinators to
receive updates from statestore but not serve any queries.

Change-Id: Ieb8302dec0e52beb9f0b88306a51c38ff42a63a2
Reviewed-on: http://gerrit.cloudera.org:8080/14388
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-10-29 23:11:47 +00:00
Zoram Thanga
f97634fcf9 IMPALA-2642: Fix a potential deadlock in statestore
The statestored can deadlock if the number of subscribers has
reached STATESTORE_MAX_SUBSCRIBERS, because the DoSubscriberUpdate()
method calls OfferUpdate(), while holding subscribers_lock_, which
also tries to take the same lock in this situation.

Fix the issue by moving out the call to acquire subscribers_lock_ from
OfferUpdate(), and depend on the callers to take it. We also make
the maximum number of statestore subscribers a start-up time tuneable,
to allow us to test the limit more easily.

Testing: The problem is easily reproduced by lowering the value of
STATESTORE_MAX_SUBSCRIBERS to 3, and then launching a mini cluster
with 3 impalads. Without the fix, the statestored becomes completely
deadlocked.

A new EE test has been added to exercise this scenario. The test
verifies that statestored correctly rejects new subscription
requests when the limit it reached.

Change-Id: I5d49dede221ce1f50ec299643b5532c61f93f0c6
Reviewed-on: http://gerrit.cloudera.org:8080/9038
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
2018-01-31 21:27:00 +00:00