Commit Graph

6 Commits

Author SHA1 Message Date
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Riza Suminto
ea8f74a6ac IMPALA-13861: Standardize workload management tests
This patch standardizes tests against workload management tables
(sys.impala_query_log and sys.impala_query_live) to use a common
superclass named WorkloadManagementTestSuite. The setup_method of this
superclass waits for workload management init completion
(wait_for_wm_init_complete()), while the teardown_method waits until
impala-server.completed-queries.queued metric reaches
0 (wait_for_wm_idle()).

test_query_log.py and test_workload_mgmt_sql_details.py are refactored
to extend from WorkloadManagementTestSuite. Tests to assert the query
log table flush behavior are grouped together in TestQueryLogTableFlush.
test_workload_mgmt_sql_details.py::TestWorkloadManagementSQLDetails now
uses 1 minicluster instance for all tests.

test_workload_mgmt_init.py does not extend from
WorkloadManagementTestSuite because it is testing cluster start and
restart scenario. This patch only adds wait_for_wm_idle() at
teardown_method where it make sense to do so.

test_query_live.py does not extend from WorkloadManagementTestSuite
because most of its test method require long
--query_log_write_interval_s so that DML queries from workload
management worker does not disturb sys.impala_query_live.

workload_mgmt parameter in CustomClusterTestSuite.with_args() is
standardized to setup appropriate default flags in cluster_setup()
rather than passing it down to _start_impala_cluster():
IMPALAD_ARGS
  --enable_workload_mgmt=true --query_log_write_interval_s=1 \
  --shutdown_grace_period_s=0 --shutdown_deadline_s=60
and CATALOGD_ARGS
  --enable_workload_mgmt=true

Note that IMPALAD_ARGS and CATALOGD_ARGS flags added by workload_mgmt
and impalad_graceful_shutdown parameter are still overridable to
different value by explicitly adding it in the impalad_args and
catalogd_args parameters. Setting workload_mgmt=True now automatically
enables graceful shutdown for the test. Thus,
impalad_graceful_shutdown=True is now removed.

With beeswax protocol deprecated, this patch also changes the protocol
under test from beeswax to hs2. TestQueryLogTableBeeswax is now renamed
to TestQueryLogTableBasic.

Additionally, print total wait time in wait_for_metric_value().

Testing:
- Run modified tests and pass.

Change-Id: Iecf6452fa963304e263805ebeb017c843d17dd16
Reviewed-on: http://gerrit.cloudera.org:8080/22617
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-21 22:31:11 +00:00
jasonmfehr
b3b2dbaca3 IMPALA-13772: Fix Workload Management DMLs Timeouts
The insert DMLs executed by workload management to add rows to the
completed queries Iceberg table time out after 10 seconds because
that is the default FETCH_ROWS_TIMEOUT_MS value. If the DML queues up
in admission control, this timeout will quickly cause the DML to be
cancelled. The fix is to set the FETCH_ROWS_TIMEOUT_MS query option
to 0 for the workload management insert DMLs.

Even though the workload management DMLs do not retrieve any rows,
the FETCH_ROWS_TIMEOUT_MS value still applies because the internal
server functions call into the client request state's
ExecQueryOrDmlRequest() function which starts query execution and
immediately returns. Then, the BlockOnWait function in
impala-server.cc is called. This function times out based on the
FETCH_ROWS_TIMEOUT_MS value.

A new coordinator startup flag 'query_log_dml_exec_timeout_s' is
added to specify the EXEC_TIME_LIMIT_S query option on the workload
management insert DML statements. This flag ensures the DMLs will
time out if they do not complete in a reasonable timeframe.

While adding the new coordinator startup flag, a bug in the
internal-server code was discovered. This bug caused a return status
of 'ok' even when the query exec time limit was reached and the query
cancelled. This bug has also been fixed.

Testing:
  1. Added new custom cluster test that simulates a busy cluster where
       the workload management DML queues for longer than 10 seconds.
  2. Existing tests in test_query_log and test_admission_controller
       passed.
  3. One internal-server-test ctest was modified to assert for a
       returned status of error when a query is cancelled.
  4. Added a new cusom cluster test that asserts the workload
       management DML is cancelled based on the value of the new
       coordinator startup flag.

Change-Id: I0cc7fbce40eadfb253d8cff5cbb83e2ad63a979f
Reviewed-on: http://gerrit.cloudera.org:8080/22511
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-26 03:12:31 +00:00
Jason Fehr
9e05ffcaaf IMPALA-13505: Fix NPE in Calcite Planner
Fixes the NullPointerException occurring when using the Calcite
planner with
test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8.
The NPE was thrown from the Planner where it generates the list of
columns in the query for use in the profile and workload management.

Testing was accomplished by manually running the impacted the test
and with a new custom cluster test that replicates the failing test.

Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0
Reviewed-on: http://gerrit.cloudera.org:8080/22033
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-07 03:11:47 +00:00
Jason Fehr
a46625a3c0 IMPALA-12737: (addendum) Turn Off Log Buffering in Workload Management Init Tests
Fixes the issue where custom cluster workload management tests do not
disable glog log buffering in tests that wait for specific messages
to be logged from the coordinators and catalogs.

By default, logs are buffered up to 30 seconds. This buffering can
cause unnecessary test slowness while the tests wait longer than
needed for the expected log message to be flushed and can also cause
flakiness where the tests do not find the expected log message before
the timeout expires.

Change-Id: I03ac0f0f00c93fe785db131278a706e3f5e975c2
Reviewed-on: http://gerrit.cloudera.org:8080/22021
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-11-05 23:16:20 +00:00
jasonmfehr
7b6ccc644b IMPALA-12737: Query columns in workload management tables.
Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate
Columns", and "OrderBy Columns" to the query profile and the workload
management active/completed queries tables. These fields are
presented as comma separate strings containing the fully qualified
column name in the format database.table_name.column_name. Aggregate
columns include all columns in the order by and having clauses.

Since new columns are being added, the workload management init
process is also being modified to allow for one-way upgrades of the
table schemas if necessary.  Additionally, workload management can be
set up to run under a schema version that is not the latest. This
ability will be useful during troubleshooting. To enable these
upgrades, the workload management initialization that manages the
structure of the tables has been moved to the catalogd.

The changes in this patch must be backwards compatible so that Impala
clusters running previous workload management code can co-exist with
Impala clusters running this workload management code. To enable that
backwards compatibility, a new table property named
'wm_schema_version' is now used to track the schema version of the
workload management tables. Thus, the old property 'schema_version'
will always be set to '1.0.0' since modifying that property value
causes Impala running previous workload management code to error at
startup.

Testing accomplished by
* Adding/updating workload and custom cluster tests to assert the new
  columns and the workload management upgrade process.
* JUnit tests added to verify the new workload management columns are
  being correctly parsed.
* GTests added to ensure the workload management columns are
  correctly defined and in the correct order.

Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7
Reviewed-on: http://gerrit.cloudera.org:8080/21142
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2024-10-31 17:06:43 +00:00