impala

mirror of https://github.com/apache/impala.git synced 2025-12-21 10:58:31 -05:00

Author	SHA1	Message	Date
Steve Carlin	048b5689fd	IMPALA-14080: Support LocalFsTable table types in Calcite planner. IMPALA-13947 changes the use_local_catalog default to true. This causes failure for when the use_calcite_planner query option is set to true. The Calcite planner was only handling HdfsTable table types. It will now handle LocalFsTable table types as well. Currently, if table num rows is missing from table, Calcite planner will load all partitions to estimate by iterating all partitions. This is inefficent in local catalog mode and ideally should happen later after partition prunning. Follow up work is needed to improve this. Testing: Reenable local catalog mode in TestCalcitePlanner.test_calcite_frontend TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal Co-authored-by: Riza Suminto Change-Id: Ic855779aa64d11b7a8b19dd261c0164e65604e44 Reviewed-on: http://gerrit.cloudera.org:8080/23341 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-27 03:17:13 +00:00
Riza Suminto	1cead45114	IMPALA-13947: Test local catalog mode by default Local catalog mode has been the default and works well in downstream Impala for over 5 years. This patch turn on local catalog mode by default (--catalog_topic_mode=minimal and --use_local_catalog=true) as preferred mode going forward. Implemented LocalCatalog.setIsReady() to facilitate using local catalog mode for FE tests. Some FE tests fail due to behavior differences in local catalog mode like IMPALA-7539. This is probably OK since Impala now largely hand over FileSystem permission check to Apache Ranger. The following custom cluster tests are pinned to evaluate under legacy catalog mode because their behavior changed in local catalog mode: TestCalcitePlanner.test_calcite_frontend TestCoordinators.test_executor_only_lib_cache TestMetadataReplicas TestTupleCacheCluster TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set --use_hms_column_order_for_hbase_tables=true flag for both impalad and catalogd to get consistent column order in either local or legacy catalog mode. Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error assertions to be more fine grained by matching individual query id. Move most of test methods from TestRangerLegacyCatalog to TestRangerLocalCatalog, except for some that do need to run in legacy catalog mode. Also renamed TestRangerLocalCatalog to TestRangerDefaultCatalog. Table ownership issue in local catalog mode remains unresolved (see IMPALA-8937). Testing: Pass exhaustive tests. Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f Reviewed-on: http://gerrit.cloudera.org:8080/23080 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-06 21:42:24 +00:00
jfehr	742d8d05f5	IMPALA-14090: Move Some Stable Custom Cluster Tests to Exhaustive Moves several custom cluster tests out of core and into exhaustive only. The tests were chosen based on their stability, lack of recent modifications, and coverage of rare/corner cases. Testing was accomplished by running both core and exhaustive tests and manually verifying the tests were or were not skipped as expected. Change-Id: If99c015a0cb5d95b1607ca2be48d2dea04194f81 Reviewed-on: http://gerrit.cloudera.org:8080/22963 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-02 07:53:37 +00:00
Joe McDonnell	c5a0ec8bdf	IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package This puts all of the thrift-generated python code into the impala_thrift_gen package. This is similar to what Impyla does for its thrift-generated python code, except that it uses the impala_thrift_gen package rather than impala._thrift_gen. This is a preparatory patch for fixing the absolute import issues. This patches all of the thrift files to add the python namespace. This has code to apply the patching to the thirdparty thrift files (hive_metastore.thrift, fb303.thrift) to do the same. Putting all the generated python into a package makes it easier to understand where the imports are getting code. When the subsequent change rearranges the shell code, the thrift generated code can stay in a separate directory. This uses isort to sort the imports for the affected Python files with the provided .isort.cfg file. This also adds an impala-isort shell script to make it easy to run. Testing: - Ran a core job Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0 Reviewed-on: http://gerrit.cloudera.org:8080/20169 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-15 17:03:02 +00:00
Riza Suminto	ea8f74a6ac	IMPALA-13861: Standardize workload management tests This patch standardizes tests against workload management tables (sys.impala_query_log and sys.impala_query_live) to use a common superclass named WorkloadManagementTestSuite. The setup_method of this superclass waits for workload management init completion (wait_for_wm_init_complete()), while the teardown_method waits until impala-server.completed-queries.queued metric reaches 0 (wait_for_wm_idle()). test_query_log.py and test_workload_mgmt_sql_details.py are refactored to extend from WorkloadManagementTestSuite. Tests to assert the query log table flush behavior are grouped together in TestQueryLogTableFlush. test_workload_mgmt_sql_details.py::TestWorkloadManagementSQLDetails now uses 1 minicluster instance for all tests. test_workload_mgmt_init.py does not extend from WorkloadManagementTestSuite because it is testing cluster start and restart scenario. This patch only adds wait_for_wm_idle() at teardown_method where it make sense to do so. test_query_live.py does not extend from WorkloadManagementTestSuite because most of its test method require long --query_log_write_interval_s so that DML queries from workload management worker does not disturb sys.impala_query_live. workload_mgmt parameter in CustomClusterTestSuite.with_args() is standardized to setup appropriate default flags in cluster_setup() rather than passing it down to _start_impala_cluster(): IMPALAD_ARGS --enable_workload_mgmt=true --query_log_write_interval_s=1 \ --shutdown_grace_period_s=0 --shutdown_deadline_s=60 and CATALOGD_ARGS --enable_workload_mgmt=true Note that IMPALAD_ARGS and CATALOGD_ARGS flags added by workload_mgmt and impalad_graceful_shutdown parameter are still overridable to different value by explicitly adding it in the impalad_args and catalogd_args parameters. Setting workload_mgmt=True now automatically enables graceful shutdown for the test. Thus, impalad_graceful_shutdown=True is now removed. With beeswax protocol deprecated, this patch also changes the protocol under test from beeswax to hs2. TestQueryLogTableBeeswax is now renamed to TestQueryLogTableBasic. Additionally, print total wait time in wait_for_metric_value(). Testing: - Run modified tests and pass. Change-Id: Iecf6452fa963304e263805ebeb017c843d17dd16 Reviewed-on: http://gerrit.cloudera.org:8080/22617 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-21 22:31:11 +00:00
jasonmfehr	b3b2dbaca3	IMPALA-13772: Fix Workload Management DMLs Timeouts The insert DMLs executed by workload management to add rows to the completed queries Iceberg table time out after 10 seconds because that is the default FETCH_ROWS_TIMEOUT_MS value. If the DML queues up in admission control, this timeout will quickly cause the DML to be cancelled. The fix is to set the FETCH_ROWS_TIMEOUT_MS query option to 0 for the workload management insert DMLs. Even though the workload management DMLs do not retrieve any rows, the FETCH_ROWS_TIMEOUT_MS value still applies because the internal server functions call into the client request state's ExecQueryOrDmlRequest() function which starts query execution and immediately returns. Then, the BlockOnWait function in impala-server.cc is called. This function times out based on the FETCH_ROWS_TIMEOUT_MS value. A new coordinator startup flag 'query_log_dml_exec_timeout_s' is added to specify the EXEC_TIME_LIMIT_S query option on the workload management insert DML statements. This flag ensures the DMLs will time out if they do not complete in a reasonable timeframe. While adding the new coordinator startup flag, a bug in the internal-server code was discovered. This bug caused a return status of 'ok' even when the query exec time limit was reached and the query cancelled. This bug has also been fixed. Testing: 1. Added new custom cluster test that simulates a busy cluster where the workload management DML queues for longer than 10 seconds. 2. Existing tests in test_query_log and test_admission_controller passed. 3. One internal-server-test ctest was modified to assert for a returned status of error when a query is cancelled. 4. Added a new cusom cluster test that asserts the workload management DML is cancelled based on the value of the new coordinator startup flag. Change-Id: I0cc7fbce40eadfb253d8cff5cbb83e2ad63a979f Reviewed-on: http://gerrit.cloudera.org:8080/22511 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-26 03:12:31 +00:00
Jason Fehr	9e05ffcaaf	IMPALA-13505: Fix NPE in Calcite Planner Fixes the NullPointerException occurring when using the Calcite planner with test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8. The NPE was thrown from the Planner where it generates the list of columns in the query for use in the profile and workload management. Testing was accomplished by manually running the impacted the test and with a new custom cluster test that replicates the failing test. Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0 Reviewed-on: http://gerrit.cloudera.org:8080/22033 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-07 03:11:47 +00:00
Jason Fehr	a46625a3c0	IMPALA-12737: (addendum) Turn Off Log Buffering in Workload Management Init Tests Fixes the issue where custom cluster workload management tests do not disable glog log buffering in tests that wait for specific messages to be logged from the coordinators and catalogs. By default, logs are buffered up to 30 seconds. This buffering can cause unnecessary test slowness while the tests wait longer than needed for the expected log message to be flushed and can also cause flakiness where the tests do not find the expected log message before the timeout expires. Change-Id: I03ac0f0f00c93fe785db131278a706e3f5e975c2 Reviewed-on: http://gerrit.cloudera.org:8080/22021 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-05 23:16:20 +00:00
jasonmfehr	7b6ccc644b	IMPALA-12737: Query columns in workload management tables. Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate Columns", and "OrderBy Columns" to the query profile and the workload management active/completed queries tables. These fields are presented as comma separate strings containing the fully qualified column name in the format database.table_name.column_name. Aggregate columns include all columns in the order by and having clauses. Since new columns are being added, the workload management init process is also being modified to allow for one-way upgrades of the table schemas if necessary. Additionally, workload management can be set up to run under a schema version that is not the latest. This ability will be useful during troubleshooting. To enable these upgrades, the workload management initialization that manages the structure of the tables has been moved to the catalogd. The changes in this patch must be backwards compatible so that Impala clusters running previous workload management code can co-exist with Impala clusters running this workload management code. To enable that backwards compatibility, a new table property named 'wm_schema_version' is now used to track the schema version of the workload management tables. Thus, the old property 'schema_version' will always be set to '1.0.0' since modifying that property value causes Impala running previous workload management code to error at startup. Testing accomplished by * Adding/updating workload and custom cluster tests to assert the new columns and the workload management upgrade process. * JUnit tests added to verify the new workload management columns are being correctly parsed. * GTests added to ensure the workload management columns are correctly defined and in the correct order. Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7 Reviewed-on: http://gerrit.cloudera.org:8080/21142 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-10-31 17:06:43 +00:00

9 Commits