impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 09:58:28 -05:00

Author	SHA1	Message	Date
stiga-huang	d4644b0381	IMPALA-13607, IMPALA-14490: Deflake test_cache_valid_on_nontransactional_table_ddls When catalogd runs with --start_hms_server=true, it services all the HMS endpoints so that any HMS compatible client would be able to use catalogd as a metadata cache. For all the DDL/DML requests, catalogd just delegates them to HMS APIs without reloading related metadata in the cache. For read requests like get_table_req, catalogd serves them from its cache which could be stale. There is a flag, invalidate_hms_cache_on_ddls, to decide whether to explicitly invalidate the table when catalogd delegates a DDL/DML on the table to HMS. test_cache_valid_on_nontransactional_table_ddls is a test verifying that when invalidate_hms_cache_on_ddls=false, the cache is not updated so should have stale metadata. However, there are HMS events generated from invoking the HMS APIs. Even when invalidate_hms_cache_on_ddls=false, catalogd can still update its cache when processing the corresponding HMS events. The test fails when its check is done after catalogd applies the event (so the cache is up-to-date). If the check is done before that, the test passes. This patch deflakes the test by explicitly disabling event processing. Also updates the description of invalidate_hms_cache_on_ddls to mention the impact of event processing. Tests: - Ran the test locally 100 times. Change-Id: Ib1ffc11a793899a0dbdb009bf2ac311117f2318e Reviewed-on: http://gerrit.cloudera.org:8080/23792 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-17 22:22:53 +00:00
Daniel Vanko	9d112dae23	IMPALA-14536: Fix CONVERT TO ICEBERG to not throw exception on Iceberg tables Previously, running ALTER TABLE <table> CONVERT TO ICEBERG on an Iceberg table produced an error. This patch fixes that, so the statement will do nothing when called on an Iceberg table and return with 'Table has already been migrated.' message. This is achieved by adding a new flag to StatementBase to signal when a statement ends up NO_OP, if that's true, the new TStmtType::NO_OP will be set as TExecRequest's type and noop_result can be used to set result from Frontend-side. Tests: * extended fe and e2e tests Change-Id: I41ecbfd350d38e4e3fd7b813a4fc27211d828f73 Reviewed-on: http://gerrit.cloudera.org:8080/23699 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2025-12-12 15:35:28 +00:00
Xuebin Su	d54b75ccf1	IMPALA-14619: Reset levels_readahead_ for late materialization Previously, `BaseScalarColumnReader::levels_readahead_` was not reset when the reader did not do page filtering. If a query selected the last row containing a collection value in a row group, `levels_readahead_` would be set and would not be reset when advancing to the next row group without page filtering. As a result, trying to skip collection values at the start of the next row group would cause a check failure. This patch fixes the failure by resetting `levels_readahead_` in `BaseScalarColumnReader::Reset()`, which is always called when advancing to the next row group. `levels_readahead_` is also moved out of the "Members used for page filtering" section as the variable is also used in late materialization. Testing: - Added an E2E test for the fix. Change-Id: Idac138ffe4e1a9260f9080a97a1090b467781d00 Reviewed-on: http://gerrit.cloudera.org:8080/23779 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-12 15:12:50 +00:00
Sai Hemanth Gantasala	1684c2d9da	IMPALA-14131: Add flag to configure the default value of 'impala.disableHmsSync' FEATURE: Implement global 'disable_hms_sync_by_default' flag for event processing. This change introduces a new catalogd startup flag, `disable_hms_sync_by_default`, to simplify skipping/processing events. Problem: Disabling event processing globally requires tedious process of setting 'impala.disableHmsSync' property on every database and table, especially if few specific tables requires sync up of events. Solution: The new flag provides a global default for the 'impala.disableHmsSync' property. Behavior: - If `disable_hms_sync_by_default` is true (the intended default-off state), event processing is skipped for all tables/databases unless the property "impala.disableHmsSync"="false" is explicitly set. - This allows users to easily keep event processing off by default and opt-in specific databases or tables to start syncing. - The check order is: table-property > db-property > global default. - HMS polling remains independent and unaffected by this flag. Change-Id: I4ee617aed48575502d9cf5cf2cbea6ec897d6839 Reviewed-on: http://gerrit.cloudera.org:8080/23487 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-11 05:10:32 +00:00
Csaba Ringhofer	780e6683a2	IMPALA-14573: port critical geospatial functions to c++ (part 1) This commit contains the simpler parts from https://gerrit.cloudera.org/#/c/20602 This mainly means accessors for the header of the binary format and bounding box check (st_envIntersects). New tests for not yet covered functions / overloads are also added. For details of the binary format see be/src/exprs/geo/shape-format.h Differences from the PR above: Only a subset of functions are added. The criteria was: 1. the native function must be fully compatible with the Java version* 2. must not rely on (de)serializing the full geometry 3. the function must be tested 1 implies 2 because (de)serialization is not implemented yet in the original patch for >2d geometries, which would break compatibility for the Java version for ZYZ/XYM/XYZM geometries. *: there are 2 known differences: 1. NULL handling: the Java functions return error instead of NULL when getting a NULL parameter 2. st_envIntersects() doesn't check if the SRID matches - the Java library looks inconsistant about this Because the native functions are fairly safe replacements for the Java ones, they are always used when geospatial_library=HIVE_ESRI. Change-Id: I0ff950a25320549290a83a3b1c31ce828dd68e3c Reviewed-on: http://gerrit.cloudera.org:8080/23700 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-06 07:50:23 +00:00
jichen0919	7e29ac23da	IMPALA-14092 Part2: Support querying of paimon data table via JNI This patch mainly implement the querying of paimon data table through JNI based scanner. Features implemented: - support column pruning. The partition pruning and predicate push down will be submitted as the third part of the patch. We implemented this by treating the paimon table as normal unpartitioned table. When querying paimon table: - PaimonScanNode will decide paimon splits need to be scanned, and then transfer splits to BE do the jni-based scan operation. - We also collect the required columns that need to be scanned, and pass the columns to Scanner for column pruning. This is implemented by passing the field ids of the columns to BE, instead of column position to support schema evolution. - In the original implementation, PaimonJniScanner will directly pass paimon row object to BE, and call corresponding paimon row field accessor, which is a java method to convert row fields to impala row batch tuples. We find it is slow due to overhead of JVM method calling. To minimize the overhead, we refashioned the implementation, the PaimonJniScanner will convert the paimon row batches to arrow recordbatch, which stores data in offheap region of impala JVM. And PaimonJniScanner will pass the arrow offheap record batch memory pointer to the BE backend. BE PaimonJniScanNode will directly read data from JVM offheap region, and convert the arrow record batch to impala row batch. The benchmark shows the later implementation is 2.x better than the original implementation. The lifecycle of arrow row batch is mainly like this: the arrow row batch is generated in FE,and passed to BE. After the record batch is imported to BE successfully, BE will be in charge of freeing the row batch. There are two free paths: the normal path, and the exception path. For the normal path, when the arrow batch is totally consumed by BE, BE will call jni to fetch the next arrow batch. For this case, the arrow batch is freed automatically. For the exceptional path, it happends when query is cancelled, or memory failed to allocate. For these corner cases, arrow batch is freed in the method close if it is not totally consumed by BE. Current supported impala data types for query includes: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE TODO: - Patches pending submission: - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Snapshot incremental read. - Complex type query support. - Native paimon table scanner, instead of jni based. Testing: - Create tests table in functional_schema_template.sql - Add TestPaimonScannerWithLimit in test_scanners.py - Add test_paimon_query in test_paimon.py. - Already passed the tpcds/tpch test for paimon table, due to the testing table data is currently generated by spark, and it is not supported by impala now, we have to do this since hive doesn't support generating paimon table for dynamic-partitioned tables. we plan to submit a separate patch for tpcds/tpch data loading and associated tpcds/tpch query tests. - JVM Offheap memory leak tests, have run looped tpch tests for 1 day, no obvious offheap memory increase is observed, offheap memory usage is within 10M. Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Reviewed-on: http://gerrit.cloudera.org:8080/23613 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-05 18:19:57 +00:00
Sai Hemanth Gantasala	bff814f079	IMPALA-14562: Enable Hierarchical event processing by default IMPALA-12709 Added support for hierarchical metastore event processing. This commit enables hierarchical event processing by default. hms_event_polling_interval_s can now be set to decimal value (eg: 0.5) to support millisecond precision interval. Along with that others configs can be fine tuned, such as: num_db_event_executors: To set the number of database level event executors. num_table_event_executors_per_db_event_executor: To set the number of table level event executors within a database event executor. min_event_processor_idle_ms: To set the minimum time to retain idle db processors and table processors. max_outstanding_events_on_executors: To set the limit of maximum outstanding events to process on event executors. Testing: - All the testing required to enable this flag is done in IMPALA-12709 and IMPALA-13801. Change-Id: Ie9a28f863ef17456817e0a335215450e514b1f5b Reviewed-on: http://gerrit.cloudera.org:8080/23687 Reviewed-by: <k.venureddy2103@gmail.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-04 07:02:18 +00:00
Riza Suminto	b77cd73c19	IMPALA-14604: Fix ASAN issue in hdfs-fs-cache.cc This patch fix heap-use-after-free issue around HdfsFsCache::GetConnection. The issue is resolved by changing copy access to read-only access of HdfsConnOptions parameter entries. Testing: - Pass tmp-file-mgr-test in ASAN build. Change-Id: I23ae03bf82191cd3cd99f8d4c7cbd99daaa0cfe8 Reviewed-on: http://gerrit.cloudera.org:8080/23742 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-04 04:10:08 +00:00
Arnab Karmakar	d4707ff197	IMPALA-13941: Add helper to format file permissions as UNIX-style string This change introduces a utility method FormatPermissions() that converts mode_t permission bits into a human-readable string (e.g., "drwxrwxrwt"). It correctly handles file type indicators, owner/group/other read-write-execute bits, and special bits such as setuid, setgid, and sticky. This improves log readability and debugging for file metadata-related operations by providing consistent, ls-style permission formatting. Testing: - Added unit tests validating permission string output for: - Regular files, directories, symlinks, sockets - All rwx combinations for user/group/other - setuid, setgid, and sticky bit behavior Change-Id: Ib53dbecd5c202e33b6e3b5cd3a372a77d8b1703a Reviewed-on: http://gerrit.cloudera.org:8080/23714 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-12-01 19:07:00 +00:00
jasonmfehr	336034debd	IMPALA-14480: Optional OpenTelemetry DCHECKs The code in span-manager.cc contains aggressive DCHECKS that rely on the query lifecycle to be deterministic. In reality, the query lifecycle is not completely deterministic due to multiple threads being involved in execution, result retrieval, query shutdown, etc. On debug builds only, a new flag named, otel_trace_exhaustive_dchecks will be available with a default of 'false'. If set to 'true', then optional DCHECKs will be enabled in the SpanManager class to enable identification of edge cases where the query lifecycle proceeds in an unexpected way. The DCHECKs that are controlled by the new flag are those that rely on a specific ordering of start/end child span and add child span event calls. Change-Id: Id6507f3f0e23ecf7c2bece9a6b6c2d86bfac1e57 Reviewed-on: http://gerrit.cloudera.org:8080/23518 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-26 04:48:46 +00:00
jasonmfehr	2ac5a24dc0	IMPALA-14455: Cleanup OpenTelemetry Tracing Startup Flags Fixes several issues with the OpenTelemetry tracing startup flags: 1. otel_trace_beeswax -- Removes this hidden flag which enabled tracing of queries submitted over Beeswax. Since this protocol is deprecated and no tests assert the traces generated by Beeswax queries, this flag was removed to eliminate an extra check when determining if OpenTelemetry tracing should be enabled. 2. otel_trace_tls_minimum_version -- Fixes parsing of this flag's value. This flag is in the format "tlsv1.2" or "tlsv1.3", but the OpenTelemetry C++ SDK expects the minimum TLS version to be in the format "1.2" or "1.3". The code now removes the "tlsv" prefix before passing the value to the OpenTelemetry C++ SDK. 3. otel_trace_tls_insecure_skip_verify -- Fixes the guidance to only set this flag to true in dev/testing. Adds ctest tests for the functions that configure the TraceProvider singleton to ensure startup flags are correctly parsed and applied. Modifies the http_exporter_config and init_otel_tracer function signatures in otel.cc to return the actual object they create instead of a Status since these functions only ever returned OK. Updates the OpenTelemetry collector docker-compose file to support the collector receiving traces over both HTTP and HTTPS. This setup is used to manually smoke test the integration from Impala to an OpenTelemetry collector. Change-Id: Ie321fa37c0fd260f783dc6cf47924d53a06d82ea Reviewed-on: http://gerrit.cloudera.org:8080/23440 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-11-24 23:46:57 +00:00
Steve Carlin	a6bb0c7c45	IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend When the --use_calcite_planner=true option is set at the server level, the queries will no longer go through CalciteJniFrontend. Instead, they will go through the regular JniFrontend, which is the path that is used when the query option for "use_calcite_planner" is set. The CalciteJniFrontend will be removed in a later commit. This commit also enables fallback to the original planner when an unsupported feature exception is thrown. This needed to be added to allow the tests to run properly. During initial database load, there are queries that access complex columns which throws the unsupported exception. Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f Reviewed-on: http://gerrit.cloudera.org:8080/23523 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-11-20 21:08:48 +00:00
Gabriella Gyorgyevics	c4c9adf592	IMPALA-14386: Add benchmarks for Byte Stream Split encoding This patch adds benchmarks to the Byte Stream Split encoding. It compares different ways to use the decoder. I added benchmarks for the following comparisons: * Compile VS Runtime initialized decoder * Float VS Int VS Double VS Long VS 6 and 11 byte size types * Repeating VS Sequential VS Random ordered data * Decoding one by one VS in batch VS with stride (!= byte_size) * Small VS Medium (10x small) VS Large (100x small) stride Conclusions: * Passing the byte size as a template parameter is almost 5 times as fast as passing it in the constructor. * The size of the type heavily influences the speed * The data variation doesn't influence the speed at all * Reading values in batch is much faster than one-by-one * The stride sizes have a small influence on the speed For more details and graphs, go to https://docs.google.com/spreadsheets/d/129LwvR6gpZInlRhlVWktn6Haugwo_fnloAAYfI0Qp2s Change-Id: I708af625348b0643aa3f37525b8a6e74f0c47057 Reviewed-on: http://gerrit.cloudera.org:8080/23401 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-19 17:49:42 +00:00
Arnab Karmakar	068158e495	IMPALA-12401: Support more info types for HS2 GetInfo() API This patch adds support for 40+ additional TGetInfoType values in the HiveServer2 GetInfo() API, improving ODBC/JDBC driver compatibility. Previously, only 3 info types were supported (CLI_SERVER_NAME, CLI_DBMS_NAME, CLI_DBMS_VER). The implementation follows the ODBC CLI specification and matches the behavior of Hive's GetInfo implementation where applicable. Testing: - Added unit tests in test_hs2.py for new info types - Tests verify correct return values and data types for each info type Change-Id: I1ce5f2b9dcc2e4633b4679b002f57b5b4ea3e8bf Reviewed-on: http://gerrit.cloudera.org:8080/23528 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-17 19:32:50 +00:00
Xuebin Su	e4a508529c	IMPALA-14544: Fix use-after-poison for Kudu arrays This patch fixes the use-after-poison error caused by using the memory in the MemPool after calling `MemPool::Clear()` when reading Kudu arrays. Testing: - The ASAN build passed the core tests. Change-Id: I9b729fc6003e64856ea0e197b1e3c74dad7247a1 Reviewed-on: http://gerrit.cloudera.org:8080/23668 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 22:38:32 +00:00
Arnab Karmakar	760eb4f2fa	IMPALA-13066: Extend SHOW CREATE TABLE to include stats and partitions Adds a new WITH STATS option to the SHOW CREATE TABLE statement to emit additional SQL statements for recreating table statistics and partitions. When specified, Impala outputs: - Base CREATE TABLE statement. - ALTER TABLE ... SET TBLPROPERTIES for table-level stats. - ALTER TABLE ... SET COLUMN STATS for all non-partition columns, restoring column stats. - For partitioned tables: - ALTER TABLE ... ADD PARTITION statements to recreate partitions. - Per-partition ALTER TABLE ... PARTITION (...) SET TBLPROPERTIES to restore partition-level stats. Partition output is limited by the PARTITION_LIMIT query option (default 1000). Setting PARTITION_LIMIT=0 includes all partitions and emits a warning if the limit is exceeded. Tests added to verify correctness of emitted statements. Default behavior of SHOW CREATE TABLE remains unchanged for compatibility. Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2 Reviewed-on: http://gerrit.cloudera.org:8080/23536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 06:11:37 +00:00
Xuebin Su	6b6f7e614d	IMPALA-14472: Add create/read support for ARRAY column of Kudu Initial implementation of KUDU-1261 (array column type) recently merged in upstream Apache Kudu repository. This patch add initial Impala support for working with Kudu tables having array type columns. Unlike rows, the elements of a Kudu array are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays. The following types of queries are not supported in this patch: - (IMPALA-14538) Queries that reference an array column as a table, e.g. ```sql SELECT item FROM kudu_array.array_int; ``` - (IMPALA-14539) Queries that create duplicate collection slots, e.g. ```sql SELECT array_int FROM kudu_array AS t, t.array_int AS unnested; ``` Testing: - Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest. - Add EE test test_kudu.py::TestKuduArray. Since Impala does not support inserting complex types, including array, the data insertion part of the test is achieved through custom C++ code kudu-array-inserter.cc that insert into Kudu via Kudu C++ client. It would be great if we could migrate it to Python so that it can be moved to the same file as the test (IMPALA-14537). - Pass core tests. Co-authored-by: Riza Suminto Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Reviewed-on: http://gerrit.cloudera.org:8080/23493 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-08 06:41:07 +00:00
Yida Wu	f2f297a00f	IMPALA-14533: Fix crash in ASAN/TSAN builds due to nullptr TcmallocMetric::BYTES_IN_USE Impala uses SanitizerMallocMetric::BYTES_ALLOCATED instead of TcmallocMetric::BYTES_IN_USE in ASAN or TSAN builds. However, the admissiond logic in IMPALA-14493 still uses uninitialized TcmallocMetric::BYTES_IN_USE under these builds, leading to a nullptr crash. To fix this issue, we will use SanitizerMallocMetric::BYTES_ALLOCATED instead for ASAN and TSAN builds in admission controller, which is the same logic in memory-metrics.cc to use a different metric for those builds. Tests: Passed ASAN and TSAN builds testing. Passed core tests. Change-Id: Ic4fbdc134ea302f7302d177d073eb49136ba775c Reviewed-on: http://gerrit.cloudera.org:8080/23646 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-06 21:56:53 +00:00
Abhishek Rawat	a8618c6a65	IMPALA-10204: Make AdmitQuery params more efficient The admission request may contain the lineage graphs and other stuff that the admission control service doesn't need. For example, currently the admission controller service would hold onto the full TQueryExecRequest object for the entire lifetime of a query, even after the admission decision was complete. This led to unnecessary memory consumption. This commit introduces two optimizations for reducing the memory footprint: 1. A lightweight copy of TQueryExecRequest is now created on the client side before sending to the admission control service. Fields that are not required for admission decisions (e.g., query_plan, lineage_graph) are cleared from this copy. 2. The AdmissionState now uses a unique_ptr to manage the TQueryExecRequest. This allows the object's memory to be explicitly released as soon as the query schedule is generated and the request object is no longer needed. During a customized high concurrent TPCDS run, without the change, the peak memory usage in admissiond was around 2GB. With this change, it required less than half that memory. Tests: Passed exhaustive tests. Change-Id: I1ba5e8818336bd1fc3ad604a0acee5eb7a1116c4 Reviewed-on: http://gerrit.cloudera.org:8080/23546 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com>	2025-10-23 14:33:57 +00:00
Yida Wu	1bc7cdbff6	IMPALA-14493: Cap memory usage of global admission service The global admission service can experience OOM errors under high concurrency because its process memory tracker is inaccurate and doesn't account for all memory allocations. Ensuring memory tracker accurately accounts for every allocation could be difficult, this patch uses a simpler solution to introduce a hard memory cap using tcmalloc statistics, which accurately reflect the true process memory usage. If a new query is submitted while tcmalloc memory usage is over the process limit, the query will be rejected immediately to protect from OOM. Adds a new flag enable_admission_service_mem_safeguard allowing this feature to be enabled or disabled. By default, this feature is turned on Tests: Added test test_admission_service_low_mem_limit. Passed exhaustive tests. Change-Id: I2ee2c942a73fcd69358851fc2fdc0fc4fe531c73 Reviewed-on: http://gerrit.cloudera.org:8080/23542 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 12:13:11 +00:00
stiga-huang	ff8bb33b91	IMPALA-12870: Tag query id for Java pool threads Logs from Java threads running in ExecutorService are missing the query id which is stored in the C++ thread-local ThreadDebugInfo variable. This patch adds JNI calls for Java threads to manage the ThreadDebugInfo variable. Currently two thread pools are changed: - MissingTable loading pool in StmtMetadataLoader.parallelTableLoad(). - Table loading pool in TableLoadingMgr. MissingTable loading pool only lives within the parallelTableLoad() method. So we initialize ThreadDebugInfo with the queryId at the beginning of the thread and delete it at the end of the thread. Note that a thread might be reused to load different tables, but they all belong to the same query. Table loading pool is a long running pool in catalogd that never shut down. Threads in it is used to load tables triggered by different queries. We initialize ThreadDebugInfo as the above but update it when the thread starts loading table for a different query id, and reset it when the loading is done. The query id is passed down from the catalogd RPC request headers. Tests: - Added e2e test to verify the logs. - Ran existing CORE tests. Change-Id: I83cca55edc72de35f5e8c5422efc104e6aa894c1 Reviewed-on: http://gerrit.cloudera.org:8080/23558 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 03:35:29 +00:00
Joe McDonnell	1913ab46ed	IMPALA-14501: Migrate most scripts from impala-python to impala-python3 To remove the dependency on Python 2, existing scripts need to use python3 rather than python. These commands find those locations (for impala-python and regular python): git grep impala-python \| grep -v impala-python3 \| grep -v impala-python-common \| grep -v init-impala-python git grep bin/python \| grep -v python3 This removes or switches most of these locations by various means: 1. If a python file has a #!/bin/env impala-python (or python) but doesn't have a main function, it removes the hash-bang and makes sure that the file is not executable. 2. Most scripts can simply switch from impala-python to impala-python3 (or python to python3) with minimal changes. 3. The cm-api pypi package (which doesn't support Python 3) has been replaced by the cm-client pypi package and interfaces have changed. Rather than migrating the code (which hasn't been used in years), this deletes the old code and stops installing cm-api into the virtualenv. The code can be restored and revamped if there is any interest in interacting with CM clusters. 4. This switches tests/comparison over to impala-python3, but this code has bit-rotted. Some pieces can be run manually, but it can't be fully verified with Python 3. It shouldn't hold back the migration on its own. 5. This also replaces locations of impala-python in comments / documentation / READMEs. 6. kazoo (used for interacting with HBase) needed to be upgraded to a version that supports Python 3. The newest version of kazoo requires upgrades of other component versions, so this uses kazoo 2.8.0 to avoid needing other upgrades. The two remaining uses of impala-python are: - bin/cmake_aux/create_virtualenv.sh - bin/impala-env-versioned-python These will be removed separately when we drop Python 2 support completely. In particular, these are useful for testing impala-shell with Python 2 until we stop supporting Python 2 for impala-shell. The docker-based tests still use /usr/bin/python, but this can be switched over independently (and doesn't impact impala-python) Testing: - Ran core job - Ran build + dataload on Centos 7, Redhat 8 - Manual testing of individual scripts (except some bitrotted areas like the random query generator) Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc Reviewed-on: http://gerrit.cloudera.org:8080/23468 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-22 16:30:17 +00:00
stiga-huang	f0a781806f	IMPALA-14494: Tag catalogd logs of GetPartialCatalogObject requests with correct query ids Catalogd logs of GetPartialCatalogObject requests are not tagged with correct query ids. Instead, the query id that is previously using that thread is printed in the logs. This is fixed by using ScopedThreadContext which resets the query id at the end of the RPC code. Add DCHECKs to make sure ThreadDebugInfo is initialized before being used in Catalog methods. An instance is added in CatalogdMain() for this. This patch also adds the query id in GetPartialCatalogObject requests so catalogd can tag the responding thread with it. Some codes are copied from Michael Smith's patch: https://gerrit.cloudera.org/c/22738/ Tested by enabling TRACE logging in org.apache.impala.common.JniUtil to verify logs of GetPartialCatalogObject requests. I20251014 09:39:39.685225 342587 JniUtil.java:165] 964e37e9303d6f8a:eab7096000000000] getPartialCatalogObject request: Getting partial catalog object of CATALOG_SERVICE_ID I20251014 09:39:39.690346 342587 JniUtil.java:176] 964e37e9303d6f8a:eab7096000000000] Finished getPartialCatalogObject request: Getting partial catalog object of CATALOG_SERVICE_ID. Time spent: 5ms I20251014 09:39:39.699471 342587 JniUtil.java:165] 964e37e9303d6f8a:eab7096000000000] getPartialCatalogObject request: Getting partial catalog object of DATABASE:functional I20251014 09:39:39.701821 342587 JniUtil.java:176] 964e37e9303d6f8a:eab7096000000000] Finished getPartialCatalogObject request: Getting partial catalog object of DATABASE:functional. Time spent: 2ms I20251014 09:39:39.711462 341074 TAcceptQueueServer.cpp:368] New connection to server CatalogService from client <Host: 127.0.0.1 Port: 42084> I20251014 09:39:39.719146 342588 JniUtil.java:165] 964e37e9303d6f8a:eab7096000000000] getPartialCatalogObject request: Getting partial catalog object of TABLE:functional.alltypestiny Change-Id: Ie63363ac60e153e3a69f2a4cf6a0f4ce10701674 Reviewed-on: http://gerrit.cloudera.org:8080/23535 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-16 07:06:29 +00:00
Zoltan Borok-Nagy	bfae4d0b32	IMPALA-14496: Impala crashes when it writes multiple delete files per partition in a single DELETE operation Impala crashes when it needs to write multiple delete files per partition in a single DELETE operation. It is because IcebergBufferedDeleteSink has its own DmlExecState object, but sometimes the methods in TableSinkBase use the RuntimeState's DmlExecState object. I.e. it can happen that we add a partition to the IcebergBufferedDeleteSink's DmlExecState, but later we expect to find it in the RuntimeState's DmlExecState. This patch adds new methods to TableSinkBase that are specific for writing delete files, and they always take a DmlExecState object as a parameter. They are now used by IcebergBufferedDeleteSink. Testing * added e2e tests Change-Id: I46266007a6356e9ff3b63369dd855aff1396bb72 Reviewed-on: http://gerrit.cloudera.org:8080/23537 Reviewed-by: Mihaly Szjatinya <mszjat@pm.me> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-15 19:58:37 +00:00
Riza Suminto	1008decc07	IMPALA-14447: Parallelize table loading in getMissingTables() StmtMetadataLoader.getMissingTables() load missing tables in serial manner. In local catalog mode, large number of serial table loading can incur significant round trip latency to CatalogD. This patch parallelize the table loading by using executor service to lookup and gather all non-null FeTables from given TableName set. Modify LocalCatalog.loadDbs() and LocalDb.loadTableNames() slightly to make it thread-safe. Change FrontendProfile.Scope to support nested scope referencing the same FrontendProfile instance. Added new flag max_stmt_metadata_loader_threads to control the maximum number of threads to use for loading table metadata during query compilation. It is deafult to 8 threads per query compilation. If there is only one table to load, max_stmt_metadata_loader_threads set to 1, or RejectedExecutionException raised, fallback to load table serially. Testing: Run and pass few tests such as test_catalogd_ha.py, test_concurrent_ddls.py, and test_observability.py. Add FE tests CatalogdMetaProviderTest.testProfileParallelLoad. Manually run following query and observe parallel loading by setting TRACE level log in CatalogdMetaProvider.java. use functional; select count() from alltypesnopart union select count() from alltypessmall union select count() from alltypestiny union select count() from alltypesagg; Change-Id: I97a5165844ae846b28338d62e93a20121488d79f Reviewed-on: http://gerrit.cloudera.org:8080/23436 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-13 12:53:47 +00:00
jasonmfehr	c0b3580754	IMPALA-14372: Output OpenTelemetry SDK Logs to Impala Logs Emits log messages from the OpenTelemetry SDK to the Impalad DEBUG, INFO, WARNING, and ERROR logs. Previously, these SDK log messages were dropped. Modifies the function of the 'otel_debug' startup flag. This flag defaults to 'false' which causes log messages from the SDK to be dropped. When set to 'true', log messages from the OpenTelemetry SDK will be sent to the Impala logging system. The overall glog level is applied to all messages sent from the OpenTelemetry SDK, thus DEBUG SDK logs will not appear in the Impalad logs unless the glog level is greater than or equal to 2. When a trace is successfully sent to the OpenTelemetry collector, zero log lines are generated. When a trace cannot be sent, local testing showed 12 lines with a total size around 3k were written between the impalad.ERROR and impalad.WARNING log files. The request body is not included in these log messages unless the glog level is greater than or equal to 2 thus log message size will not grow or shrink based on the size of the trace(s). This patch also removes the completely useless 'LoggingInstrumentation' class. Previously, the 'otel_debug' flag caused this class to log messages, but those messages provided no insightful information. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: I41aba21f46233e6430eede9606be1e791071717a Reviewed-on: http://gerrit.cloudera.org:8080/23418 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-08 00:00:01 +00:00
Joe McDonnell	762fe0a4f5	IMPALA-14473: Fix absolute path logic for sorting scan ranges oldest to newest When IMPALA-14462 added tie-breaking logic to ScanRangeOldestToNewestComparator, it relied on absolute path being unset if the relative path is set. However, the code always sets absolute path and uses an empty string to indicate whether it is set. This caused the tie-breaking logic to see two unrelated scan ranges as equal, triggering a DCHECK when running query_test/test_tuple_cache_tpc_queries.py. The fix is to rearrange the logic to check whether the relative path is not empty rather than checking whether the absolute path is set. Testing: - Ran query_test/test_tuple_cache_tpc_queries.py - Ran custom_cluster/test_tuple_cache.py Change-Id: I449308f4a0efdca7fc238e3dda24985a2931dd37 Reviewed-on: http://gerrit.cloudera.org:8080/23495 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-07 17:02:43 +00:00
pranav.lodha	a77fec6391	IMPALA-13661: Support parallelism above JDBC tables for joins/aggregates Impala's planner generates a single-fragment, single- threaded scan node for queries on JDBC tables because table statistics are not properly available from the external JDBC source. As a result, even large JDBC tables are executed serially, causing suboptimal performance for joins, aggregations, and scans over millions of rows. This patch enables Impala to estimate the number of rows in a JDBC table by issuing a COUNT() query at query preparation time. The estimation is returned via TPrepareResult.setNum_rows_estimate() and propagated into DataSourceScanNode. The scan node then uses this cardinality to drive planner heuristics such as join order, fragment parallelization, and scanner thread selection. The design leverages the existing JDBC accessor layer: - JdbcDataSource.prepare() constructs the configuration and invokes GenericJdbcDatabaseAccessor.getTotalNumberOfRecords(). - The accessor wraps the underlying query in: SELECT COUNT() FROM (<query>) tmptable ensuring correctness for both direct table scans and parameterized query strings. - The result is captured as num_rows_estimate, which is then applied during computeStats() in DataSourceScanNode. With accurate (or approximate) row counts, the planner can now: - Assign multiple scanner threads to JDBC scan nodes instead of falling back to a single-thread plan. - Introduce exchange nodes where beneficial, parallelizing data fetches across multiple JDBC connections. - Produce better join orders by comparing JDBC row cardinalities against native Impala tables. - Avoid severe underestimation that previously defaulted to wrong table statistics, leading to degenerate plans. For a sample join query mentioned in the test file, these are the improvements: Before Optimization: - Cardinality fixed at 1 for all JDBC scans - Single fragment, single thread per query - Max per-host resource reservation: ~9.7 MB, 1 thread - No EXCHANGE or MERGING EXCHANGE operators - No broadcast distribution; joins executed serially - Example query runtime: ~77s SCAN JDBC A \ HASH JOIN \ SCAN JDBC B \ HASH JOIN \ SCAN JDBC C \ TOP-N -> ROOT After Optimization: - Cardinality derived from COUNT(*) (e.g. 150K, 1.5M rows) - Multiple fragments per scan, 7 threads per query - Max per-host resource reservation: ~123 MB, 7 threads - Plans include EXCHANGE and MERGING EXCHANGE operators - Broadcast joins on small sides, improving parallelism - Example query runtime: ~38s (~2x faster) SCAN JDBC A --> EXCHANGE(SND) --+ \ EXCHANGE(RCV) -> HASH JOIN(BCAST) --+ SCAN JDBC B --> EXCHANGE(SND) ----/ \ HASH JOIN(BCAST) --+ SCAN JDBC C --> EXCHANGE(SND) ------------------------------------------/ \ TOP-N \ MERGING EXCHANGE -> ROOT Also added a new backend configuration flag --min_jdbc_scan_cardinality (default: 10) to provide a lower bound for scan node cardinality estimates during planning. This flag is propagated from BE to FE via TBackendGflags and surfaced through BackendConfig, ensuring the planner never produces unrealistically low cardinality values. TODO: Add a query option for this optimization to avoid extra JDBC round trip for smaller queries (IMPALA-14417). Testing: All cases of Planner tests are written in jdbc-parallel.test. Some basic metrics are also mentioned in the commit message. Change-Id: If47d29bdda5b17a1b369440f04d4e209d12133d9 Reviewed-on: http://gerrit.cloudera.org:8080/23112 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2025-10-04 15:42:38 +00:00
Yida Wu	63dee74712	IMPALA-14466: Remote client should not cache admissiond's IP when retrying AdmitQuery RPC The remote admission client's retry logic for AdmitQuery RPC did not handle cases where the admissiond restarts with a new IP address. The client would use the old proxy and retry against the old, stale ip, causing queries to time out. This change fixes the issue by adding the GetProxy() call inside the retry loop. This forces the client to re-resolve the admissiond's network address on each retry attempt, allowing it to discover the new endpoint and successfully reconnect. Tests: Passed admissiond related exhaustive ee tests. Since automatically change hosts might be difficult, manually test to change the /etc/hosts with following steps: 1. Start with --admission_service_host=localhost. 2. Change the 'localhost' in /etc/hosts to an inaccessible IP, like 127.0.0.2. 3. Submit a query, it will block in the retry logic. 4. While the query is blocked, change 'localhost' in /etc/hosts back to 127.0.0.1. 5. The query succeeded. Change-Id: I5857de84ce69902b902099f668e87d747f944aff Reviewed-on: http://gerrit.cloudera.org:8080/23472 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-27 03:59:47 +00:00
Zoltan Borok-Nagy	4e2a81161a	IMPALA-14468: Don't generate errors during InitWorkloadManagement() when everything goes fine When Workload management is used first, CatalogD reports error "Table not found: sys.impala_query_log". (also for sys.impala_query_live) It is because during InitWorkloadManagement() we issue a ResetMetadata() request against sys.impala_query_log to retrieve its schema version. If the request fails with TableNotFound, we create the table. In other words, the current initialization of workload management generates error messages even when everything is going fine, and this can confuse users. Instead of calling ResetMetadata() we can test the existence of the workload management tables (sys.impala_query_log and sys.impala_query_live) first. Testing * tested manually that the error logs disappear Change-Id: Ic7f7c92bda57d9fdc2185bf4ef8fd4f09aea0879 Reviewed-on: http://gerrit.cloudera.org:8080/23470 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-27 00:25:07 +00:00
Venu Reddy	ebbc67cf40	IMPALA-13801: Support greatest synced event with hierarchical metastore event processing It is a follow-up jira/commit to IMPALA-12709. IMPALA-12152 and IMPALA-12785 are affected when hierarchical metastore event processing feature is enabled. Following changes are incorporated with this patch: 1. Added creationTime_ and dispatchTime_ fields in MetastoreEvent class to store the current time in millisec. They are used to calculate: a) Event dispatch time(time between a MetastoreEvent object creation and when event is moved to inProgressLog_ of EventExecutorService after dispatching it to a DbEventExecutor). b) Event schedule delays incurred at DbEventExecutors and TableEventExecutors(time between an event moved to EventExecutorService's inProgressLog_ and before start of processing event at appropriate DbEventExecutor and TableEventExecutor). c) Event process time from EventExecutorService point of view(time spent in inProgressLog_ before it is moved to processedLog_). Logs are added to show the event dispatch time, schedule delays, process time from EventExecutorService point of view for each event. Also a log is added to show the time taken for event's processIfEnabled(). 2. Added isDelimiter_ field in MetastoreEvent class to indicate whether it is a delimiter event. It is set only when hierarchical event processing is enabled. Delimiter is a kind of metastore event that do not require event processing. Delimeter event can be: a) A CommitTxnEvent that do not have any write event info for a given transaction. b) An AbortTxnEvent that do not have write ids for a given transaction. c) An IgnoredEvent. An event is determined and marked as delimiter in EventExecutorService#dispatch(). They are not queued to a DbEventExecutor for processing. They are just maintained in the inProgressLog_ to preserve continuity and correctness in synchronization tracking. The delimiter events are removed from inProgressLog_ when their preceding non-delimiter metastore event is removed from inProgressLog_. 3. Greatest synced event id is computed based on the dispatched events(inProgressLog_) and processed events(processedLog_) tree maps. Greatest synced event is the latest event such that all events with id less than or equal to the latest event are definitely synced. 4. Lag is calculated as difference between latest event time on HMS and the greatest synced event time. It is shown in the log. 5. Greatest synced event id is used in IMPALA-12152 changes. When greatest synced event id becomes greater than or equal to waitForEventId, all the required events are definitely synced. 6. Event processor is paused gracefully when paused with command in IMPALA-12785. This ensures that all the fetched events from HMS in current batch are processed before the event processor is fully paused. It is necessary to process the current batch of events because, certain events like AllocWriteIdEvent, AbortTxnEvent and CommitTxnEvent update table write ids in catalog upon metastore event object creation. And the table write ids are later updated to appropriate table object during their event process. Can lead to inconsistent state of write ids on table objects when paused abruptly in the middle of current batch of event processing. 7. Added greatest synced event id and event time in events processor metrics. And updated description of lag, pending events, last synced event id and event time metrics. 8. Atomically update the event queue and increment outstanding event count in enqueue methods of both DbProcessor and TableProcessor so that respective process methods do not process the event until event is added to queue and outstanding event count is incremented. Otherwise, event can get processed, outstanding event count gets decremented before it is incremented in enqueue method. 9. Refactored DbEventExecutor, DbProcessor, TableEventExecutor and TableProcessor classes to propapage the exception occurred along with event during event processing. EventProcessException is a wrapper added to hold reference to event being processed and exception occurred. 10.Added AcidTableWriteInfo helper class to store table, writeids and partitions for the transaction id received in CommitTxnEvent. Testing: - Added new tests and executed existing end to end tests. - Have executed the existing tests with hierarchical event processing enabled. Change-Id: I26240f36aaf85125428dc39a66a2a1e4d3197e85 Reviewed-on: http://gerrit.cloudera.org:8080/22997 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-09-26 10:53:46 +00:00
Joe McDonnell	775f73f03e	IMPALA-14462: Fix tie-breaking for sorting scan ranges oldest to newest TestTupleCacheFullCluster.test_scan_range_distributed is flaky on s3 builds. The addition of a single file is changing scheduling significantly even with scan ranges sorted oldest to newest. This is because modification times on S3 have a granularity of one second. Multiple files have the same modification time, and the fix for IMPALA-13548 did not properly break ties for sorting. This adds logic to break ties for files with the same modification time. It compares the path (absolute path or relative path + partition) as well as the offset within the file. These should be enough to break all conceivable ties, as it is not possible to have two scan ranges with the same file at the same offset. In debug builds, this does additional validation to make sure that when a != b, comp(a, b) != comp(b, a). The test requires that adding a single file to the table changes exactly one cache key. If that final file has the same modification time as an existing file, scheduling may still mix up the files and change more than one cache key, even with tie-breaking. This adds a sleep just before generating the final file to guarantee that it gets a newer modification time. Testing: - Ran TestTupleCacheFullCluster.test_scan_range_distributed for 15 iterations on S3 Change-Id: I3f2e40d3f975ee370c659939da0374675a28cd38 Reviewed-on: http://gerrit.cloudera.org:8080/23458 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-25 23:30:28 +00:00
Joe McDonnell	16c350ce5a	IMPALA-14271: Reapply the core piece of IMPALA-6984 IMPALA-6984 changed the behavior to cancel backends when the query reaches the RETURNED_RESULTS state. This ran into a regression on large clusters where a query would end up waiting 10 seconds. IMPALA-10047 reverted the core piece of the change. For tuple caching, we found that a scan node can get stuck waiting for a global runtime filter. It turns out that the coordinator will not send out global runtime filters if the query is in a terminal state. Tuple caching was causing queries to reach the RETURNED_RESULTS phase before the runtime filter could be sent out. Reenabling the core part of IMPALA-6984 sends out a cancel as soon as the query transitions to RETURNED_RESULTS and wakes up any fragment instances waiting on runtime filters. The underlying cause of IMPALA-10047 is a tangle of locks that causes us to exhaust the RPC threads. The coordinator is holding a lock on the backend state while it sends the cancel synchronously. Other backends that complete during that time run Coordinator::BackendState::LogFirstInProgress(), which iterates through backend states to find the first that is not done. The check to see if a backend state is done takes a lock on the backend state. The problem case is that the coordinator may be sending a cancel to a backend on itself. In that case, it needs an RPC thread on the coordinator to be available to process the cancel. If all of the RPC threads are processing updates, they can all call LogFirstInProgress() and get stuck on the backend state lock for the coordinator's fragment. In that case, it becomes a temporary deadlock as the cancel can't be processed and the coordinator won't release the lock. It only gets resolved by the RPC timing out. To resolve this, this changes the Cancel() method to drop the lock while doing the CancelQueryFInstances RPC. It reacquires the lock when it finishes the RPC. Testing: - Hand tested with 10 impalads and control_service_num_svc_threads=1 Without the fix, it reproduces easily after reverting IMPALA-10047. With the fix, it doesn't reproduce. Change-Id: Ia058b03c72cc4bb83b0bd0a19ff6c8c43a647974 Reviewed-on: http://gerrit.cloudera.org:8080/23264 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 03:58:35 +00:00
Joe McDonnell	e05d92cb3d	IMPALA-13548: Schedule scan ranges oldest to newest for tuple caching Scheduling does not sort scan ranges by modification time. When a new file is added to a table, its order in the list of scan ranges is not based on modification time. Instead, it is based on which partition it belongs to and what its filename is. A new file that is added early in the list of scan ranges can cause cascading differences in scheduling. For tuple caching, this means that multiple runtime cache keys could change due to adding a single file. To minimize that disruption, this adds the ability to sort the scan ranges by modification time and schedule scan ranges oldest to newest. This enables it for scan nodes that feed into tuple cache nodes (similar to deterministic scan range assignment). Testing: - Modified TestTupleCacheFullCluster::test_scan_range_distributed to have stricter checks about how many cache keys change after an insert (only one should change) - Modified TupleCacheTest#testDeterministicScheduling to verify that oldest to newest scheduling is also enabled. Change-Id: Ia4108c7a00c6acf8bbfc036b2b76e7c02ae44d47 Reviewed-on: http://gerrit.cloudera.org:8080/23228 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-22 20:51:46 +00:00
Zoltan Borok-Nagy	9abbcf82ca	IMPALA-14451: Log if memory-based admission is skipped When admission control is enabled, but max memory for pool is not configured, Impala skips memory-based admission completely, i.e. it doesn't even take available host memory into account. This behavior can lead to admitting many queries with large memory consumption, potentially causing query failures due to memory exhaustion. Fixing the above behavior might cause regressions in some workloads, so this patch just adds a new log message which makes it clear why a query gets admitted, and also mentions possible failures. Change-Id: Ib98482abc0fbcb53552adfd89cf6d157b17527fd Reviewed-on: http://gerrit.cloudera.org:8080/23438 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-22 16:55:17 +00:00
Joe McDonnell	ca356a8df5	IMPALA-13437 (part 2): Implement cost-based tuple cache placement This changes the default behavior of the tuple cache to consider cost when placing the TupleCacheNodes. It tries to pick the best locations within a budget. First, it eliminates unprofitable locations via a threshold. Next, it ranks the remaining locations by their profitability. Finally, it picks the best locations in rank order until it reaches the budget. The threshold is based on the ratio of processing cost for regular execution versus the processing cost for reading from the cache. If the ratio is below the threshold, the location is eliminated. The threshold is specified by the tuple_cache_required_cost_reduction_factor query option. This defaults to 3.0, which means that the cost of reading from the cache must be less than 1/3 the cost of computing the value normally. A higher value makes this more restrictive about caching locations, which pushes in the direction of lower overhead. The ranking is based on the cost reduction per byte. This is given by the formula: (regular processing cost - cost to read from cache) / estimated serialized size This prefers locations with small results or high reduction in cost. The budget is based on the estimated serialized size per node. This limits the total caching that a query will do. A higher value allows more caching, which can increase the overhead on the first run of a query. A lower value is less aggressive and can limit the overhead at the expense of less caching. This uses a per-node limit as the limit should scale based on the size of the executor group as each executor brings extra capacity. The budget is specified by the tuple_cache_budget_bytes_per_executor. The old behavior to place the tuple cache at all eligible locations is still available via the tuple_cache_placement_policy query option. The default is the cost_based policy described above, but the old behavior is available via the all_eligible policy. This is useful for correctness testing (and the existing tuple cache test cases). This changes the explain plan output: - The hash trace is only enabled at VERBOSE level. This means that the regular profile will not contain the hash trace, as the regular profile uses EXTENDED. - This adds additional information at VERBOSE to display the cost information for each plan node. This can help trace why a particular location was not picked. Testing: - This adds a TPC-DS planner test with tuple caching enabled (based on the existing TpcdsCpuCostPlannerTest) - This modifies existing tests to adapt to changes in the explain plan output Change-Id: Ifc6e7b95621a7937d892511dc879bf7c8da07cdc Reviewed-on: http://gerrit.cloudera.org:8080/23219 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-18 21:02:51 +00:00
Zoltan Borok-Nagy	850f2cf361	IMPALA-14443: Fix potential memory leak in TmpFileMgr Clang static analyzer found a potential memory leak in TmpFileMgr. In some cases we forget the deletion of a newly created TmpFileRemote object. This patch replaces the raw pointer with a unique_ptr. Change-Id: I5a516eab1a946e7368c6059f8d1cc430d2ee19e9 Reviewed-on: http://gerrit.cloudera.org:8080/23431 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-16 19:16:46 +00:00
jasonmfehr	2efd22bd73	IMPALA-14403: Fix OpenTelemetry TLS Detection Fixes the code that detects if the OpenTelemetry collector is using TLS. Previously, the code only worked properly when the collector URL was all lowercase. Also removes unnecessary checks that could cause TLS to be enabled even when the collector URL scheme was not https. Testing accomplished by adding new ctest tests. Change-Id: I3bf74f1353545d280575cdb94cf135e55c580ec7 Reviewed-on: http://gerrit.cloudera.org:8080/23397 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-16 05:26:42 +00:00
jasonmfehr	ec809fc16c	IMPALA-14433: Fix OpenTelemetry Tracing Deadlock All functions in the SpanManager class operate under the assumption that child_span_mu_ in the SpanManager class will be locked before the ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function takes the ClientRequestState lock before calling SpanManager::EndChildSpanPlanning. If another function in the SpanManager class has already taken the child_span_mu_ lock and is waiting for the ClientRequestState lock, a deadlock occurs. This issue was found by running end-to-end tests with OpenTelemetry tracing enabled and a release buildof Impala. Testing accomplished by re-running the end-to-end tests with OpenTelemetry tracing enabled and verifying that the deadlock no longer occurs. Change-Id: I7b43dba794cfe61d283bdd476e4056b9304d8947 Reviewed-on: http://gerrit.cloudera.org:8080/23422 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-16 02:43:29 +00:00
Laszlo Gaal	89d2b23509	IMPALA-14139: Enable Impala builds on Ubuntu 24.04 Update the following elements of the Impala build environment to enable builds on Ubuntu 24.04: - Recognize and handle (where necessary) Ubuntu 24.04 in various bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.) - Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains Ubuntu 24.04-specific binary packages - Bump binutils to 2.42, and - Bump the GDB version to 12.1-p1, as required by the new toolchain version - Update unique_ptr usage syntax in be/src/util/webserver-test.cc to compensate for new GLIBC funtion prototypes: System headers in Ubuntu 24.04 adopted attributes on several widely used function prototypes. Such attributes are not considered to be part of the function's signature during template evaluation, so GCC throws a warning when such a function is passed as a template argument, which breaks the build, as warnings are treated as errors. webserver-test.cc uses pclose() as the deleter for a unique_ptr in a utility function. This patch encapsulates pclose() and its attributes in an explicit specialization for std::default_delete<>, "hiding" the attributes inside a functor. The particular solution was inspired by Anton-V-K's proposal in https://gist.github.com/t-mat/5849549 This commit builds on an earlier patch for the same purpose by Michael Smith: https://gerrit.cloudera.org/c/23058/ Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2 Reviewed-on: http://gerrit.cloudera.org:8080/23384 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-15 16:10:42 +00:00
Riza Suminto	7fabd27096	IMPALA-14411: enable_workload_mgmt should work with V2 profile Impalad crash (hitting DCHECK) when both enable_workload_mgmt and gen_experimental_profile enabled. This is because lambda function process_exec_profile expect "Averaged Fragment" node exist in query profile. But it is actually not exist in V2 query profile. This patch fix the issue by gathering ScratchBytesWritten, ScannerIoWaitTime, and DataCacheHitBytes counters differently in V2 profile. Testing: - Add TestQueryLogTableHS2::test_with_experimental_profile. - Manually start minicluster with both enable_workload_mgmt and gen_experimental_profile flag enabled. Run few queries and confirm no crash happen. Also verify that the columns of sys.impala_query_log that summarize the scan node counters are correct. Change-Id: Iccb4ad9279b0d66479b1e7816ffc732028e71734 Reviewed-on: http://gerrit.cloudera.org:8080/23396 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-11 09:45:38 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Zoltan Borok-Nagy	711797e7fb	IMPALA-14349: Encode FileDescriptors in time in loading Iceberg Tables With this patch we create Iceberg file descriptors from LocatedFileStatus objects during IcebergFileMetadataLoader's parallelListing(). This has the following benefits: * We parallelize the creation of Iceberg file descriptor objects * We don't need to maintain a large hash map with all the LocatedFileStatus objects at once. Now we only need to keep a few LocatedFileStatus objects per partition in memory while we are converting them to Iceberg file descriptors. I.e., the GC is free to destroy the LocatedFileStatus objects we don't use anymore. This patch retires startup flag 'iceberg_reload_new_files_threshold'. Since IMPALA-13254 we only list partitions that have new data files, and we load them in parallel, i.e. efficient incremental table loading is already covered. From that point the startup flag only added unnecessary code complexity. Measurements I created two tables (from tpcds.store_sales) to measure table loading times for large tables: Table #1: PARTITIONED BY SPEC(ss_item_sk, BUCKET(5, ss_sold_time_sk)) partitions: 107818 files: 754726 Table #2: PARTITIONED BY SPEC(ss_item_sk) partitions: 18000 files: 504224 Time taken in IcebergFileMetadataLoader.load() during full table reload: +----------+-------+------+---------+ \| \| Base \| New \| Speedup \| +----------+-------+------+---------+ \| Table #1 \| 17.3s \| 8.1s \| 2.14 \| \| Table #2 \| 7.8s \| 4.3s \| 1.8 \| +----------+-------+------+---------+ I measured incremental table loading only for Table #2 (since there are more files per partition this is the worse scenario for the new code, as it only uses file listings, and each new file were created in a separate partition) Time taken in IcebergFileMetadataLoader.load() during incremental table reload: +------------+------+------+---------+ \| #new files \| Base \| New \| Speedup \| +------------+------+------+---------+ \| 1 \| 1.4s \| 1.6s \| 0.9 \| \| 100 \| 1.5s \| 1.9s \| 0.8 \| \| 200 \| 1.5s \| 1.5s \| 1 \| +------------+------+------+---------+ We lose a few tenths of a second, but I think the simplified code justifies it. Testing: * some tests were updated because we we don't have startup flag 'iceberg_reload_new_files_threshold' anymore Change-Id: Ia1c2a7119d76db7ce7c43caec2ccb122a014851b Reviewed-on: http://gerrit.cloudera.org:8080/23363 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-09 20:34:01 +00:00
Mihaly Szjatinya	4577cab3e8	IMPALA-13806: Avoid per-function std::locale creation A new std::locale is constructed at each mask*() function call while in UTF8_MODE. Instead use a static local object. Change-Id: I9a611ba1b175b0ab1c8f0d1de3b2439be70a68f7 Reviewed-on: http://gerrit.cloudera.org:8080/23380 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-09-09 07:14:54 +00:00
Daniel Vanko	321429eac6	IMPALA-14237: Fix Iceberg partition values encoding This patch modifies the string overload of IcebergFunctions::TruncatePartitionTransform so that it always handles strings as UTF-8-encoded ones, because the Iceberg specification states that that strings are UTF-8 encoded. Also, for an Iceberg table UrlEncode is called in not the Hive-compatible way, rather than the standard way, similar to Java's URLEncoder.encode() (which the Iceberg API also uses) to conform with existing practices by Hive, Spark and Trino. This included a change in the set of characters which are not escaped to follow the URL Standard's application/x-www-form-urlencoded format. [1] Also renamed it from ShouldNotEscape to IsUrlSafe for better readability. Testing: * add and extend e2e tests to check partitions with Unicode characters * add be tests to coding-util-test.cc [1]: https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755 Reviewed-on: http://gerrit.cloudera.org:8080/23190 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-08 18:54:07 +00:00
Yida Wu	e486f3e3c3	IMPALA-14385: Fix crashes using sha2() in FIPS CentOS 7 This commit fixes a crash in the sha2() function that occurs when Impala is run on a FIPS enabled OS, particularly CentOS 7. Running sha2() with 384 or 512-bit lengths would cause the impalad to crash with an OpenSSL assertion failure: "Low level API call to digest SHA384 forbidden in FIPS mode!" The root cause was the direct use of low-level OpenSSL API calls like SHA384(), SHA512(). OpenSSL 1.0 (used in RHEL/CentOS 7) is particularly strict and forbids these calls in FIPS mode, causing the module to terminate the process. This patch changes to use the high-level, FIPS compliant EVP_Digest API to perform the hash in sha2() function implementation. Tests: Ran sha2() in FIPS enabled CentOs 7 after the change and succeeded. Passed exhaustive tests. Change-Id: I694532350285534fd935c92b7a78bed91ded3cb5 Reviewed-on: http://gerrit.cloudera.org:8080/23373 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-04 21:00:50 +00:00
Yida Wu	c78f7a7290	IMPALA-14392: Fix a crash in AdmissionD in GetQueryStatus PrintId() The GetQueryStatus RPC could crash due to a use-after-free error when accessing the reqest. When a query was rejected, the function would call RespondAndReleaseRpc(), which can free the "req" object, and then attempt to access req->query_id() for logging and to delete the entry from the admission_state_map_. This commit fixes the crash by moving the call to RespondAndReleaseRpc() to the end of the function. This change aligns this function with others like AdmissionControlService::ReleaseQuery(), which also deletes from admission_state_map_ before RespondAndReleaseRpc(), it ensures that all logic is completed before the RPC resources are released. Tests: Reproduced the issue by running 100 times TestAdmissionControllerStressWithACService::test_mem_limit, and after the change, it can successfully run 100 times. Passed exhaustive tests. Change-Id: I688954c5c671671cc2dc669ecfdf2405476302d7 Reviewed-on: http://gerrit.cloudera.org:8080/23379 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-09-04 15:54:03 +00:00
stiga-huang	0dfed88861	IMPALA-14383: Fix crash in casting timestamp string with timezone offsets to DATE Timestamp string can have a timezone offset at its end, e.g. "2025-08-31 06:23:24.9392129 +08:00" has "+08:00" as the timezone offset. When casting strings to DATE type, we try to find the default format by matching the separators, i.e. '-', ':', ' ', etc in SimpleDateFormatTokenizer::GetDefaultFormatContext(). The one that matches this example is DEFAULT_DATE_TIME_CTX[] which represents the default date/time context for "yyyy-MM-dd HH:mm:ss.SSSSSSSSS". The fractional part at the end can have length from 0 to 9, matching DEFAULT_DATE_TIME_CTX[0] to DEFAULT_DATE_TIME_CTX[9] respectively. When calculating which item in DEFAULT_DATE_TIME_CTX is the matched format, we use the index as str_len - 20 where 20 is the length of "yyyy-MM-dd HH:mm:ss.". This causes the index overflow if the string length is larger than 29. A wild pointer is returned from GetDefaultFormatContext(), leading crash in following codes. This patch fixes the issue by adding a check to make sure the string length is smaller than the max length of the default date time format, i.e. DEFAULT_DATE_TIME_FMT_LEN (29). Longer strings will use DateTimeFormatContext created lazily. Note that this just fixes the crash. Converting timestamp strings with timezone offset at the end to DATE type is not supported yet and will be followed up in IMPALA-14391. Tests - Added e2e tests on constant expressions. Also added a test table with such timestamp strings and added test on it. Change-Id: I36d73f4a71432588732b2284ac66552f75628a62 Reviewed-on: http://gerrit.cloudera.org:8080/23371 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 20:54:59 +00:00
jasonmfehr	789991c6cc	IMPALA-13237: [Patch 8] - OpenTelemetry Traces for DML/DDL Queries and Handle Leading Comments Trace DML/DDL Queries * Adds tracing for alter, compute, create, delete, drop, insert, invalidate metadata, and with queries. * Stops tracing beeswax queries since that protocol is deprecated. * Adds Coordinator attribute to Init and Root spans for identifying where the query is running. Comment Handling * Corrects handling of leading comments, both inline and full line. Previously, queries with comments before the first keyword were always ignored. * Adds be ctest tests for determining whether or not a query should be traced. General Improvements * Handles the case where the first query keyword is followed by a newline character or an inline comment (without or with spaces between). * Corrects traces for errored/cancelled queries. These cases short-circuit the normal query processing code path and have to be handled accordingly. * Ends the root span when the query ends instead of waiting for the ClientRequestState to go out of scope. This change removes use-after-free issues caused by reading from ClientRequestState when the SpanManager went out of scope during that object's dtor. * Simplified minimum tls version handling because the validators on the ssl_minimum_version eliminate invalid values that previously had to be accounted for. * Removes the unnecessary otel_trace_enabled() function. * Fixes IMPALA-14314 by waiting for the full trace to be written to the output file before asserting that trace. Testing * Full test suite passed. * ASAN/TSAN builds passed. * Adds new ctest test. * Adds custom cluster tests to assert traces for the new supported query types. * Adds custom cluster tests to assert traces for errored and cancelled queries. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: Ie9e83d7f761f3d629f067e0a0602224e42cd7184 Reviewed-on: http://gerrit.cloudera.org:8080/23279 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-03 04:38:36 +00:00
jasonmfehr	8d663765d7	IMPALA-14382: Fix Null Pointer Dereference Fixes a potential null pointer dereference when log level >= 2. Adds 'build' as a valid EE test helper directory as VSCode creates this directory. Tested locally by running test_scanners from the query_test EE test suite using a release build of Impala and log level 2. Minidumps were not generated during this test run but were generated during the same test run without this fix applied. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: I91660aa84407c17ffb7cd3c721d4f3f0a844d61d Reviewed-on: http://gerrit.cloudera.org:8080/23365 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 01:14:53 +00:00

1 2 3 4 5 ...

5599 Commits