Commit Graph

13 Commits

Author SHA1 Message Date
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Csaba Ringhofer
f98b697c7b IMPALA-13929: Make 'functional-query' the default workload in tests
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.

All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.

The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.

get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.

Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-08 07:12:55 +00:00
Riza Suminto
b9c074d283 IMPALA-13926: Remove teardown in TestWorkloadManagementInitNoWait
test_start_invalid_version and test_start_unknown_version of
TestWorkloadManagementInitNoWait hit assertion error during teardown.
These tests are set up with expect_startup_fail=True.

This patch attempts to deflake the issue by removing the
teardown_method() and moves wait_for_wm_idle() at the end of tests that
are not set up with expect_startup_fail=True.

Testing:
Loop TestWorkloadManagementInitNoWait 10 times and pass them all.

Change-Id: I03c748dc100f5447820a9a77c527facb832521d6
Reviewed-on: http://gerrit.cloudera.org:8080/22725
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-03 04:50:53 +00:00
Riza Suminto
ea8f74a6ac IMPALA-13861: Standardize workload management tests
This patch standardizes tests against workload management tables
(sys.impala_query_log and sys.impala_query_live) to use a common
superclass named WorkloadManagementTestSuite. The setup_method of this
superclass waits for workload management init completion
(wait_for_wm_init_complete()), while the teardown_method waits until
impala-server.completed-queries.queued metric reaches
0 (wait_for_wm_idle()).

test_query_log.py and test_workload_mgmt_sql_details.py are refactored
to extend from WorkloadManagementTestSuite. Tests to assert the query
log table flush behavior are grouped together in TestQueryLogTableFlush.
test_workload_mgmt_sql_details.py::TestWorkloadManagementSQLDetails now
uses 1 minicluster instance for all tests.

test_workload_mgmt_init.py does not extend from
WorkloadManagementTestSuite because it is testing cluster start and
restart scenario. This patch only adds wait_for_wm_idle() at
teardown_method where it make sense to do so.

test_query_live.py does not extend from WorkloadManagementTestSuite
because most of its test method require long
--query_log_write_interval_s so that DML queries from workload
management worker does not disturb sys.impala_query_live.

workload_mgmt parameter in CustomClusterTestSuite.with_args() is
standardized to setup appropriate default flags in cluster_setup()
rather than passing it down to _start_impala_cluster():
IMPALAD_ARGS
  --enable_workload_mgmt=true --query_log_write_interval_s=1 \
  --shutdown_grace_period_s=0 --shutdown_deadline_s=60
and CATALOGD_ARGS
  --enable_workload_mgmt=true

Note that IMPALAD_ARGS and CATALOGD_ARGS flags added by workload_mgmt
and impalad_graceful_shutdown parameter are still overridable to
different value by explicitly adding it in the impalad_args and
catalogd_args parameters. Setting workload_mgmt=True now automatically
enables graceful shutdown for the test. Thus,
impalad_graceful_shutdown=True is now removed.

With beeswax protocol deprecated, this patch also changes the protocol
under test from beeswax to hs2. TestQueryLogTableBeeswax is now renamed
to TestQueryLogTableBasic.

Additionally, print total wait time in wait_for_metric_value().

Testing:
- Run modified tests and pass.

Change-Id: Iecf6452fa963304e263805ebeb017c843d17dd16
Reviewed-on: http://gerrit.cloudera.org:8080/22617
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-21 22:31:11 +00:00
jasonmfehr
b3b2dbaca3 IMPALA-13772: Fix Workload Management DMLs Timeouts
The insert DMLs executed by workload management to add rows to the
completed queries Iceberg table time out after 10 seconds because
that is the default FETCH_ROWS_TIMEOUT_MS value. If the DML queues up
in admission control, this timeout will quickly cause the DML to be
cancelled. The fix is to set the FETCH_ROWS_TIMEOUT_MS query option
to 0 for the workload management insert DMLs.

Even though the workload management DMLs do not retrieve any rows,
the FETCH_ROWS_TIMEOUT_MS value still applies because the internal
server functions call into the client request state's
ExecQueryOrDmlRequest() function which starts query execution and
immediately returns. Then, the BlockOnWait function in
impala-server.cc is called. This function times out based on the
FETCH_ROWS_TIMEOUT_MS value.

A new coordinator startup flag 'query_log_dml_exec_timeout_s' is
added to specify the EXEC_TIME_LIMIT_S query option on the workload
management insert DML statements. This flag ensures the DMLs will
time out if they do not complete in a reasonable timeframe.

While adding the new coordinator startup flag, a bug in the
internal-server code was discovered. This bug caused a return status
of 'ok' even when the query exec time limit was reached and the query
cancelled. This bug has also been fixed.

Testing:
  1. Added new custom cluster test that simulates a busy cluster where
       the workload management DML queues for longer than 10 seconds.
  2. Existing tests in test_query_log and test_admission_controller
       passed.
  3. One internal-server-test ctest was modified to assert for a
       returned status of error when a query is cancelled.
  4. Added a new cusom cluster test that asserts the workload
       management DML is cancelled based on the value of the new
       coordinator startup flag.

Change-Id: I0cc7fbce40eadfb253d8cff5cbb83e2ad63a979f
Reviewed-on: http://gerrit.cloudera.org:8080/22511
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-26 03:12:31 +00:00
Andrew Sherman
35c6a0b76d IMPALA-13726 Add admission control slots to /queries page in webui
When tracking resource utilization it is useful to see how many
admission control slots are being used by queries. Add the slots used
by coordinator and executors to the webui queries tables. For
implementation reasons this entails also adding these fields to the
query history and live query tables.

The executor admission control slots are calculated by looking at a
single executor backend. In theory this single number could be
misleading but in practice queries are expected to have symmetrical
slots across executors.

Bump the schema number for the query history schema, and add some new
tests.

Change-Id: I057493b7767902a417dfeb75cdaeffd452d66789
Reviewed-on: http://gerrit.cloudera.org:8080/22443
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-05 03:53:18 +00:00
Riza Suminto
3005092332 IMPALA-13668: Add default_test_protocol parameter to py.test
ImpalaTestSuite.client is always initialized as beeswax client. And many
tests use it directly rather than going through helper method such as
execute_query().

This patch add add default_test_protocol parameter to conftest.py. It
control whether to initialize ImpalaTestSuite.client equals to
'beeswax_client', 'hs2_client', or 'hs2_http_client'. This parameter is
still default to 'beeswax'.

This patch also adds helper method 'default_client_protocol_dimension',
'beeswax_client_protocol_dimension' and 'hs2_client_protocol_dimension'
for convenience and traceability.

Reduced occurrence where test method manually override
ImpalaTestSuite.client. They are replaced by combination of
ImpalaTestSuite.create_impala_clients and
ImpalaTestSuite.close_impala_clients.

Testing:
- Pass core tests.

Change-Id: I9165ea220b2c83ca36d6e68ef3b88b128310af23
Reviewed-on: http://gerrit.cloudera.org:8080/22336
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-24 12:19:02 +00:00
jasonmfehr
490f90c65e IMPALA-13536: Fix Workload Management Init Tests Issues
Several problems with the workload management code and
test_workload_mgmt_init.py tests have been uncovered by the Ozone
tests.

* test_create_on_version_1_0_0 - Test comment said it ran on 10
      nodes, test configuration specified 1 node. Fix was to modify
      the test configuration.
* test_create_on_version_1_1_0 - Test comment said it ran on 10
      nodes, test configuration specified 1 node. Fix was to modify
      the test configuration.
* test_invalid_* - All four of these tests run the same internal
      function to execute the test. This internal function was not
      waiting long enough for the expected failure to appear. The
      fixed internal function waits longer for the expected failure.

Additionally, the @CustomClusterTestSuite annotation has a new option
named 'log_symlinks', which, if set to True will resolve all daemon
log symlinks and output their actual paths to the log. Failed tests
can then be easily traced to the exact log files for that test.

The existing workload management tests in testdata have been expanded
to also assert the expected table properties are present.

Modified tests passed on Ozone builds both with and without erasure
coding enabled.

Change-Id: Ie3f34088d1d925f30abb63471387e6fdb62b95a7
Reviewed-on: http://gerrit.cloudera.org:8080/22119
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-11 01:01:46 +00:00
jasonmfehr
41c145f5ad IMPALA-13536: Fix Workload Management Init with Catalog HA
When running an Impala cluster with catalogd HA enabled, the standby
catalogd would go into a loop waiting for the first catalog update to
arrive repeatedly logging the same error and never joining the server
thread defined in catalogd-main.cc.

Before this patch, when the standby daemon became active, the first
catalogd update was finally received, and the workload management
initialization process ran a second time in the newly active daemon
because this daemon saw that it was active.

This patch modifies the catalogd workload management initialization
code so it waits until the active catalogd has been determined. At
that point, the standby daemon skips workload management
initialization while the active daemon runs it after it receives the
first catalog update.

Testing was accomplished by modifying the workload management
initialization custom cluster tests to assert that the init process
is not re-run when a catalogd switches from standby to active and
also to remove the assumption that the first catalogd was active. The
test_catalog_ha test was deleted since all its assertions are handled
by the setup_method of the new TestWorkloadManagementCatalogHA class.

Ozone tests with and without erasure coding were also ran and passed.

Change-Id: Id3797a0a9cf0b8ae844d9b7d46b607d93824f69a
Reviewed-on: http://gerrit.cloudera.org:8080/22118
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-10 19:23:18 +00:00
Jason Fehr
9e05ffcaaf IMPALA-13505: Fix NPE in Calcite Planner
Fixes the NullPointerException occurring when using the Calcite
planner with
test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8.
The NPE was thrown from the Planner where it generates the list of
columns in the query for use in the profile and workload management.

Testing was accomplished by manually running the impacted the test
and with a new custom cluster test that replicates the failing test.

Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0
Reviewed-on: http://gerrit.cloudera.org:8080/22033
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-07 03:11:47 +00:00
Jason Fehr
a46625a3c0 IMPALA-12737: (addendum) Turn Off Log Buffering in Workload Management Init Tests
Fixes the issue where custom cluster workload management tests do not
disable glog log buffering in tests that wait for specific messages
to be logged from the coordinators and catalogs.

By default, logs are buffered up to 30 seconds. This buffering can
cause unnecessary test slowness while the tests wait longer than
needed for the expected log message to be flushed and can also cause
flakiness where the tests do not find the expected log message before
the timeout expires.

Change-Id: I03ac0f0f00c93fe785db131278a706e3f5e975c2
Reviewed-on: http://gerrit.cloudera.org:8080/22021
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-05 23:16:20 +00:00
Riza Suminto
95f353ac4a IMPALA-13507: Allow disabling glog buffering via with_args fixture
We have plenty of custom_cluster tests that assert against content of
Impala daemon log files while the process is still running using
assert_log_contains() and it's wrappers. The method specifically mention
about disabling glog buffering ('-logbuflevel=-1'), but not all
custom_cluster tests do that. This often result in flaky test that hard
to triage and often neglected if it does not frequently run in core
exploration.

This patch adds boolean param 'disable_log_buffering' into
CustomClusterTestSuite.with_args for test to declare intention to
inspect log files in live minicluster. If it is True, start minicluster
with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on
any calls to assert_log_contains().

There are several complex custom_cluster tests that left unchanged and
print out such WARNING logs, such as:
- TestQueryLive
- TestQueryLogTableBeeswax
- TestQueryLogOtherTable
- TestQueryLogTableHS2
- TestQueryLogTableAll
- TestQueryLogTableBufferPool
- TestStatestoreRpcErrors
- TestWorkloadManagementInitWait
- TestWorkloadManagementSQLDetails

This patch also fixed some small flake8 issues on modified tests.

There is a flakiness sign at test_query_live.py where test query is
submitted to coordinator and fail because sys.impala_query_live table
has not exist yet from coordinator's perspective. This patch modify
test_query_live.py to wait for few seconds until sys.impala_query_live
is queryable.

Testing:
- Pass custom_cluster tests in exhaustive exploration.

Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61
Reviewed-on: http://gerrit.cloudera.org:8080/22015
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-05 04:49:05 +00:00
jasonmfehr
7b6ccc644b IMPALA-12737: Query columns in workload management tables.
Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate
Columns", and "OrderBy Columns" to the query profile and the workload
management active/completed queries tables. These fields are
presented as comma separate strings containing the fully qualified
column name in the format database.table_name.column_name. Aggregate
columns include all columns in the order by and having clauses.

Since new columns are being added, the workload management init
process is also being modified to allow for one-way upgrades of the
table schemas if necessary.  Additionally, workload management can be
set up to run under a schema version that is not the latest. This
ability will be useful during troubleshooting. To enable these
upgrades, the workload management initialization that manages the
structure of the tables has been moved to the catalogd.

The changes in this patch must be backwards compatible so that Impala
clusters running previous workload management code can co-exist with
Impala clusters running this workload management code. To enable that
backwards compatibility, a new table property named
'wm_schema_version' is now used to track the schema version of the
workload management tables. Thus, the old property 'schema_version'
will always be set to '1.0.0' since modifying that property value
causes Impala running previous workload management code to error at
startup.

Testing accomplished by
* Adding/updating workload and custom cluster tests to assert the new
  columns and the workload management upgrade process.
* JUnit tests added to verify the new workload management columns are
  being correctly parsed.
* GTests added to ensure the workload management columns are
  correctly defined and in the correct order.

Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7
Reviewed-on: http://gerrit.cloudera.org:8080/21142
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2024-10-31 17:06:43 +00:00