impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	b581e45286	IMPALA-14606: (addendum) Install Python 3 for RHEL8 The first IMPALA-14606 commit miss to setup Python 3 in fresh RHEL8 machine. This was not caught before because I test using downstream jenkins and it reuse RHEL8 machine that previously setup with Python 2. This patch fix the issue by skipping pip install argparse that broke the script and run setup_python3 instead for RHEL8 machine. Testing: - Run full bootstrap_system.sh and buildall.sh in fresh RHEL8 machine. Change-Id: I6df0a534175404fe96d32eeb1e7bf0aa9ca204cd Reviewed-on: http://gerrit.cloudera.org:8080/23772 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-10 17:22:32 +00:00
Riza Suminto	3ed2a82a95	IMPALA-14606: Stop building impala-shell for Python 2 This patch stop setting up and building impala-shell for Python 2. A more thorough clean up will be done in the future. Testing: Pass build and test/shell/ in RHEL8. Change-Id: Ic7d59b283f4e2f011880ff6221d550b52714a538 Reviewed-on: http://gerrit.cloudera.org:8080/23750 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-10 04:40:46 +00:00
Noemi Pap-Takacs	1bddbefb2d	IMPALA-14580: Document Iceberg table repair functionality Testing: built docs locally Change-Id: I67a861a56269648c5f8c2e9697861bf95587f731 Reviewed-on: http://gerrit.cloudera.org:8080/23738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Vanko <dvanko@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-12-08 13:17:21 +00:00
Csaba Ringhofer	780e6683a2	IMPALA-14573: port critical geospatial functions to c++ (part 1) This commit contains the simpler parts from https://gerrit.cloudera.org/#/c/20602 This mainly means accessors for the header of the binary format and bounding box check (st_envIntersects). New tests for not yet covered functions / overloads are also added. For details of the binary format see be/src/exprs/geo/shape-format.h Differences from the PR above: Only a subset of functions are added. The criteria was: 1. the native function must be fully compatible with the Java version* 2. must not rely on (de)serializing the full geometry 3. the function must be tested 1 implies 2 because (de)serialization is not implemented yet in the original patch for >2d geometries, which would break compatibility for the Java version for ZYZ/XYM/XYZM geometries. *: there are 2 known differences: 1. NULL handling: the Java functions return error instead of NULL when getting a NULL parameter 2. st_envIntersects() doesn't check if the SRID matches - the Java library looks inconsistant about this Because the native functions are fairly safe replacements for the Java ones, they are always used when geospatial_library=HIVE_ESRI. Change-Id: I0ff950a25320549290a83a3b1c31ce828dd68e3c Reviewed-on: http://gerrit.cloudera.org:8080/23700 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-06 07:50:23 +00:00
Laszlo Gaal	fe41448780	IMPALA-14603: Force Java alternative after setup on Rocky and Red Hat Linux Impala allows various Java versions to be selected for its build and runtime environment when bin/bootstrap_system.sh is used to set up the environment. Unfortunately this setup failed to affect the current Java JRE and compiler tools on Red Hat Linux and compatibles (e.g. Rocky Linux), because bootstrap_system.sh failed to set up the requested version in the "alternatives" subsystem. The same failure was not observed on Ubuntu versions, on that platform `update_java_alternatives` was correctly run for the same purpose. This patch adds calls to `alternatives` to set the JRE and JDK environments to the requested version. This benefits automated test runs in Impala's pre- and post-commit environments as well as individual workstation setups. Change-Id: I8972fb35b232830c6d8cf1125a7a8223547bd206 Reviewed-on: http://gerrit.cloudera.org:8080/23741 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-05 21:42:55 +00:00
jichen0919	7e29ac23da	IMPALA-14092 Part2: Support querying of paimon data table via JNI This patch mainly implement the querying of paimon data table through JNI based scanner. Features implemented: - support column pruning. The partition pruning and predicate push down will be submitted as the third part of the patch. We implemented this by treating the paimon table as normal unpartitioned table. When querying paimon table: - PaimonScanNode will decide paimon splits need to be scanned, and then transfer splits to BE do the jni-based scan operation. - We also collect the required columns that need to be scanned, and pass the columns to Scanner for column pruning. This is implemented by passing the field ids of the columns to BE, instead of column position to support schema evolution. - In the original implementation, PaimonJniScanner will directly pass paimon row object to BE, and call corresponding paimon row field accessor, which is a java method to convert row fields to impala row batch tuples. We find it is slow due to overhead of JVM method calling. To minimize the overhead, we refashioned the implementation, the PaimonJniScanner will convert the paimon row batches to arrow recordbatch, which stores data in offheap region of impala JVM. And PaimonJniScanner will pass the arrow offheap record batch memory pointer to the BE backend. BE PaimonJniScanNode will directly read data from JVM offheap region, and convert the arrow record batch to impala row batch. The benchmark shows the later implementation is 2.x better than the original implementation. The lifecycle of arrow row batch is mainly like this: the arrow row batch is generated in FE,and passed to BE. After the record batch is imported to BE successfully, BE will be in charge of freeing the row batch. There are two free paths: the normal path, and the exception path. For the normal path, when the arrow batch is totally consumed by BE, BE will call jni to fetch the next arrow batch. For this case, the arrow batch is freed automatically. For the exceptional path, it happends when query is cancelled, or memory failed to allocate. For these corner cases, arrow batch is freed in the method close if it is not totally consumed by BE. Current supported impala data types for query includes: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE TODO: - Patches pending submission: - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Snapshot incremental read. - Complex type query support. - Native paimon table scanner, instead of jni based. Testing: - Create tests table in functional_schema_template.sql - Add TestPaimonScannerWithLimit in test_scanners.py - Add test_paimon_query in test_paimon.py. - Already passed the tpcds/tpch test for paimon table, due to the testing table data is currently generated by spark, and it is not supported by impala now, we have to do this since hive doesn't support generating paimon table for dynamic-partitioned tables. we plan to submit a separate patch for tpcds/tpch data loading and associated tpcds/tpch query tests. - JVM Offheap memory leak tests, have run looped tpch tests for 1 day, no obvious offheap memory increase is observed, offheap memory usage is within 10M. Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Reviewed-on: http://gerrit.cloudera.org:8080/23613 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-05 18:19:57 +00:00
jasonmfehr	f8d21f34bc	IMPALA-14602: Fix Flaky OTel Test Tests in test_otel_trace.py that rely on queries being queued assume a first long-running query will be started before a second query. These tests are flaky most likely because the first long running query is executed asynchronously immediately followed by executing a second query. During slower builds (such as ASAN), the first query may not be in running state before the second query is started. This patch adds a check on the first query to ensure it is running before starting the second query. Change-Id: I9e77ec70b4668f0daed2ab9411f8f6c52ddccb2a Reviewed-on: http://gerrit.cloudera.org:8080/23743 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-04 08:05:24 +00:00
Sai Hemanth Gantasala	bff814f079	IMPALA-14562: Enable Hierarchical event processing by default IMPALA-12709 Added support for hierarchical metastore event processing. This commit enables hierarchical event processing by default. hms_event_polling_interval_s can now be set to decimal value (eg: 0.5) to support millisecond precision interval. Along with that others configs can be fine tuned, such as: num_db_event_executors: To set the number of database level event executors. num_table_event_executors_per_db_event_executor: To set the number of table level event executors within a database event executor. min_event_processor_idle_ms: To set the minimum time to retain idle db processors and table processors. max_outstanding_events_on_executors: To set the limit of maximum outstanding events to process on event executors. Testing: - All the testing required to enable this flag is done in IMPALA-12709 and IMPALA-13801. Change-Id: Ie9a28f863ef17456817e0a335215450e514b1f5b Reviewed-on: http://gerrit.cloudera.org:8080/23687 Reviewed-by: <k.venureddy2103@gmail.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-04 07:02:18 +00:00
Riza Suminto	b77cd73c19	IMPALA-14604: Fix ASAN issue in hdfs-fs-cache.cc This patch fix heap-use-after-free issue around HdfsFsCache::GetConnection. The issue is resolved by changing copy access to read-only access of HdfsConnOptions parameter entries. Testing: - Pass tmp-file-mgr-test in ASAN build. Change-Id: I23ae03bf82191cd3cd99f8d4c7cbd99daaa0cfe8 Reviewed-on: http://gerrit.cloudera.org:8080/23742 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-04 04:10:08 +00:00
ttttttz	5d1f1e0180	IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3 When the environment variable USE_APACHE_HIVE is set to true, build Impala for adapting to Apache Hive 3.x. In order to better distinguish it from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3. Additionally, to facilitate referencing different versions of the Hive MetastoreShim, the major version of Hive has been added to the environment variable IMPALA_HIVE_DIST_TYPE. Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d Reviewed-on: http://gerrit.cloudera.org:8080/21724 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 13:38:45 +00:00
Riza Suminto	0f8f54de20	IMPALA-14595: Fix Ozone trash path after IMPALA-12893 IMPALA-12893 upgrade CDP_BUILD_NUMBER=71942734 upgrade Ozone version to 1.4.0.7.3.1.500-182. This newer Ozone version does not include WAREHOUSE_PREFIX anymore in its trash path. This patch fix the broken tests in test_ddl.py by updating the expected trash path. Testing: Run and pass metadata/test_ddl.py in Ozone environment. Change-Id: If1271a399d4eb82fed9b073b99d9a7b2c18a03b1 Reviewed-on: http://gerrit.cloudera.org:8080/23734 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 01:48:21 +00:00
Riza Suminto	4f3251e19a	IMPALA-14582: Deflake test_show_create_table_with_stats test_show_create_table_with_stats is flaky due to inconsistent metadata is not handled/retried correctly in coordinator side. This patch deflake it by retrying if InconsistentMetadataFetchException is caught. This patch also fix some flake8 warnings in test_show_create_table.py, including unused 'vector' parameter in several tests. Testing: Loop and pass test_show_create_table_with_stats 10 times. Change-Id: I397b9502d92bfd756929be8e851661fd9246dd5e Reviewed-on: http://gerrit.cloudera.org:8080/23728 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-02 09:23:40 +00:00
Peter Rozsa	d67ab6f11f	IMPALA-14569: (addendum) Fix 'partitions' row matching IMPALA-14569 introduced a test that asserts for a profile row like 'HDFS partitions' and it's possible for test environments to run on a different storage system. This change omits the storage type from the row_regex. Change-Id: If9b223f2be2dfe7be8724423fefdfb56ffeeba6e Reviewed-on: http://gerrit.cloudera.org:8080/23727 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-01 23:06:47 +00:00
Arnab Karmakar	d4707ff197	IMPALA-13941: Add helper to format file permissions as UNIX-style string This change introduces a utility method FormatPermissions() that converts mode_t permission bits into a human-readable string (e.g., "drwxrwxrwt"). It correctly handles file type indicators, owner/group/other read-write-execute bits, and special bits such as setuid, setgid, and sticky. This improves log readability and debugging for file metadata-related operations by providing consistent, ls-style permission formatting. Testing: - Added unit tests validating permission string output for: - Regular files, directories, symlinks, sockets - All rwx combinations for user/group/other - setuid, setgid, and sticky bit behavior Change-Id: Ib53dbecd5c202e33b6e3b5cd3a372a77d8b1703a Reviewed-on: http://gerrit.cloudera.org:8080/23714 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-12-01 19:07:00 +00:00
Peter Rozsa	6cf21464b4	IMPALA-14569: Fix IllegalStateException in partition pruning on type mismatch This fixes an IllegalStateException in HdfsPartitionPruner when evaluating 'IN' predicates whose consist of two compatible types, for example DATE and STRING: date_col in (<date as string>). Previously, 'canEvalUsingPartitionMd' did not check if the slot type matched the literal type. This caused the frontend to attempt invalid comparisons via 'LiteralExpr.compareTo', leading to IllegalStateException or incorrect pruning. The fix ensures 'canEvalUsingPartitionMd' returns false on type mismatches, deferring evaluation to the backend where proper casting occurs. Testing: - Added regression test in hdfs-partition-pruning.test. Change-Id: Idc226a628c8df559329a060cb963b81e27e21eda Reviewed-on: http://gerrit.cloudera.org:8080/23706 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-27 02:48:28 +00:00
jichen0919	685745f785	IMPALA-14579: Bump up paimon version to 1.3.1 for CVE-2025-46762 This patch mainly fix the CVE-2025-46762 by bumping up paimon version to 1.3.1. Background: Following PR: https://github.com/apache/incubator-paimon/pull/6363 has been merged by paimon community since paimon-1.3.0. So in impala, need to upgrade paimon version to 1.3.0 or later to fix the CVE as well. Testing: - All paimon related tests are passed. Change-Id: Ie8052f71a5e2a4e39b0ac39b6d349e55f10092bc Reviewed-on: http://gerrit.cloudera.org:8080/23717 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-26 16:55:30 +00:00
jasonmfehr	336034debd	IMPALA-14480: Optional OpenTelemetry DCHECKs The code in span-manager.cc contains aggressive DCHECKS that rely on the query lifecycle to be deterministic. In reality, the query lifecycle is not completely deterministic due to multiple threads being involved in execution, result retrieval, query shutdown, etc. On debug builds only, a new flag named, otel_trace_exhaustive_dchecks will be available with a default of 'false'. If set to 'true', then optional DCHECKs will be enabled in the SpanManager class to enable identification of edge cases where the query lifecycle proceeds in an unexpected way. The DCHECKs that are controlled by the new flag are those that rely on a specific ordering of start/end child span and add child span event calls. Change-Id: Id6507f3f0e23ecf7c2bece9a6b6c2d86bfac1e57 Reviewed-on: http://gerrit.cloudera.org:8080/23518 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-26 04:48:46 +00:00
Noemi Pap-Takacs	fdad9d3204	IMPALA-13725: Add Iceberg table repair functionalities In some cases users delete files directly from storage without going through the Iceberg API, e.g. they remove old partitions. This corrupts the table, and makes queries that try to read the missing files fail. This change introduces a repair statement that deletes the dangling references of missing files from the metadata. Note that the table cannot be repaired if there are missing delete files because Iceberg's DeleteFiles API which is used to execute the operation allows removing only data files. Testing: - E2E - HDFS - S3, Ozone - analysis Change-Id: I514403acaa3b8c0a7b2581d676b82474d846d38e Reviewed-on: http://gerrit.cloudera.org:8080/23512 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-25 13:03:52 +00:00
jasonmfehr	2ac5a24dc0	IMPALA-14455: Cleanup OpenTelemetry Tracing Startup Flags Fixes several issues with the OpenTelemetry tracing startup flags: 1. otel_trace_beeswax -- Removes this hidden flag which enabled tracing of queries submitted over Beeswax. Since this protocol is deprecated and no tests assert the traces generated by Beeswax queries, this flag was removed to eliminate an extra check when determining if OpenTelemetry tracing should be enabled. 2. otel_trace_tls_minimum_version -- Fixes parsing of this flag's value. This flag is in the format "tlsv1.2" or "tlsv1.3", but the OpenTelemetry C++ SDK expects the minimum TLS version to be in the format "1.2" or "1.3". The code now removes the "tlsv" prefix before passing the value to the OpenTelemetry C++ SDK. 3. otel_trace_tls_insecure_skip_verify -- Fixes the guidance to only set this flag to true in dev/testing. Adds ctest tests for the functions that configure the TraceProvider singleton to ensure startup flags are correctly parsed and applied. Modifies the http_exporter_config and init_otel_tracer function signatures in otel.cc to return the actual object they create instead of a Status since these functions only ever returned OK. Updates the OpenTelemetry collector docker-compose file to support the collector receiving traces over both HTTP and HTTPS. This setup is used to manually smoke test the integration from Impala to an OpenTelemetry collector. Change-Id: Ie321fa37c0fd260f783dc6cf47924d53a06d82ea Reviewed-on: http://gerrit.cloudera.org:8080/23440 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-11-24 23:46:57 +00:00
Daniel Vanko	3d22c7fe05	IMPALA-12209: Always include format-version in DESCRIBE FORMATTED and SHOW CREATE TABLE for Iceberg tables HiveCatalog does not include format-version for Iceberg tables in the table's parameters, therefore the output of SHOW CREATE TABLE may not replicate the original table. This patch makes sure to add it to both the SHOW CREATE TABLE and DESCRIBE FORMATTED/EXTENDED output. Additionally, adds ICEBERG_DEFAULT_FORMAT_VERSION variable to E2E tests, deducting from IMPALA_ICEBERG_VERSION environment variable. If Iceberg version is at least 1.4, default format-version is 2, before 1.4 it's 1. This way tests can work with multiple Iceberg versions. Testing: * updated show-create-table.test and show-create-table-with-stats.test for Iceberg tables * added format-version checks to multiple DESCRIBE FORMATTED tests Change-Id: I991edf408b24fa73e8a8abe64ac24929aeb8e2f8 Reviewed-on: http://gerrit.cloudera.org:8080/23514 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-24 21:48:17 +00:00
Csaba Ringhofer	f6ceca2b4d	IMPALA-14571: increase planner cost of java functions The main motivation is to evaluate expensive geospatial functions (which are Java functions) last in predicates. Java functions have a major overhead anyway from the JNI call, so bumping all Java function costs seems beneficial. Note that currently geospatial functions are the only built-in Java functions. Change-Id: I11d1652d76092ec60af18a33502dacc25b284fcc Reviewed-on: http://gerrit.cloudera.org:8080/22733 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-24 16:52:59 +00:00
Csaba Ringhofer	f12bb87d42	IMPALA-14081: (addendum) add ';' to CREATE part in dataload The missing ';' can cause problems for the next created table. Change-Id: I719872de23941bf81289340ce246d25ee113223a Reviewed-on: http://gerrit.cloudera.org:8080/23704 Reviewed-by: Daniel Vanko <dvanko@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-21 12:29:48 +00:00
Joe McDonnell	5eea4f6f79	IMPALA-14559: Ship calcite-planner jar in Impala packages This adds the java/impala-package Maven project to make it easier to ship / test the Calcite planner. impala-package has a dependency on impala-frontend and calcite-planner, so its classpath requires no extra work when constructing the classpath. An additional cleanup is that this no longer puts the impala-frontend-*-tests.jar on the classpath by default. This requires updating the query event hooks test, as it relies on that jar being present. This does not change the default value for the use_calcite_planner query option, so there is no change in behavior. Testing: - Ran a core job - Built docker images and OS packages locally Change-Id: I81dec2a5b59e279229a735c8bb1a23c77111a793 Reviewed-on: http://gerrit.cloudera.org:8080/23497 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 03:36:12 +00:00
Zoltan Borok-Nagy	5ea4dc342e	IMPALA-14565: Update Apache component versions after CDP_BUILD_NUMBER bump to 71942734 CDP_BUILD_NUMBER was bumped to 71942734 which upgraded Iceberg to version 1.5.2. We should update our Apache component dependencies (not just Iceberg) accordingly. Change-Id: Ic353bbef64a59365b708a20bd0d5ed502cb6d44e Reviewed-on: http://gerrit.cloudera.org:8080/23678 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 01:40:05 +00:00
Steve Carlin	e67b627858	IMPALA-14408: (addendum) Log Calcite exception in profile This addendum logs the exception thrown in the runtime profile under the CalciteFailureReason key. Testing: test_ranger.py uses this. Change-Id: Ia18a52c488f9c73d51690997b277fd8e918c645f Reviewed-on: http://gerrit.cloudera.org:8080/23686 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 21:08:48 +00:00
Steve Carlin	a6bb0c7c45	IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend When the --use_calcite_planner=true option is set at the server level, the queries will no longer go through CalciteJniFrontend. Instead, they will go through the regular JniFrontend, which is the path that is used when the query option for "use_calcite_planner" is set. The CalciteJniFrontend will be removed in a later commit. This commit also enables fallback to the original planner when an unsupported feature exception is thrown. This needed to be added to allow the tests to run properly. During initial database load, there are queries that access complex columns which throws the unsupported exception. Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f Reviewed-on: http://gerrit.cloudera.org:8080/23523 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-11-20 21:08:48 +00:00
Riza Suminto	64c4abe6ed	IMPALA-14547: Bumping Kudu version to pickup KUDU-3716 Redhat 9 environments recently switched to OpenSSL 3.5.1. On those machines, the Kudu minicluster fails to start up with CSR signature verification error. KUDU-3716 fixed this issue. This patch update Toolchain and Kudu version to pick up KUDU-3716. Testing: Pass data loading with in Redhat 9. Change-Id: I7262267939a9f08650af85443240950afbb3323f Reviewed-on: http://gerrit.cloudera.org:8080/23697 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 15:16:57 +00:00
Joe McDonnell	3ce0004c12	IMPALA-14512: Remove dependency on sh python package This modifies bin/single_node_perf_run.py to stop using the sh python package. It replaces sh with calls to subprocess. It stops installing sh for both the Python 2 and 3 virtualenvs. Testing: - Ran perf-AB-test job with it and examined the logs Change-Id: Ic5f9316a5d83c5c0dc37d4a94c55b6a655765fe3 Reviewed-on: http://gerrit.cloudera.org:8080/23600 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 03:29:48 +00:00
Joe McDonnell	001263f58a	IMPALA-14514: Handle serializing bytes in bin/run-workload.py On python 3, when Impyla receives a result with a string that is not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that contains invalid UTF-8, so bin/run-workload.py can fail while trying to dump this to JSON. This modifies CustomJSONEncoder to handle serializing bytes by converting it to a string with invalid unicode handled with backslashes. Testing: - Ran bin/run-workload.py against TPC-DS scale 20 Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea Reviewed-on: http://gerrit.cloudera.org:8080/23602 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-11-20 03:29:48 +00:00
Gabriella Gyorgyevics	c4c9adf592	IMPALA-14386: Add benchmarks for Byte Stream Split encoding This patch adds benchmarks to the Byte Stream Split encoding. It compares different ways to use the decoder. I added benchmarks for the following comparisons: * Compile VS Runtime initialized decoder * Float VS Int VS Double VS Long VS 6 and 11 byte size types * Repeating VS Sequential VS Random ordered data * Decoding one by one VS in batch VS with stride (!= byte_size) * Small VS Medium (10x small) VS Large (100x small) stride Conclusions: * Passing the byte size as a template parameter is almost 5 times as fast as passing it in the constructor. * The size of the type heavily influences the speed * The data variation doesn't influence the speed at all * Reading values in batch is much faster than one-by-one * The stride sizes have a small influence on the speed For more details and graphs, go to https://docs.google.com/spreadsheets/d/129LwvR6gpZInlRhlVWktn6Haugwo_fnloAAYfI0Qp2s Change-Id: I708af625348b0643aa3f37525b8a6e74f0c47057 Reviewed-on: http://gerrit.cloudera.org:8080/23401 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-19 17:49:42 +00:00
m-sanjana19	134c28d445	IMPALA-13788: [DOCS] Docs for query options SYNC_HMS_EVENTS_WAIT_TIME_S and SYNC_HMS_EVENTS_STRICT_MODE The commit documents query options SYNC_HMS_EVENTS_WAIT_TIME_S and SYNC_HMS_EVENTS_STRICT_MODE Url: https://impala.apache.org/docs/build/html/topics/impala_set.html Change-Id: Ia11663c5e84794d4bca658124cde59bf97aa7158 Reviewed-on: http://gerrit.cloudera.org:8080/23592 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com>	2025-11-19 07:42:54 +00:00
Steve Carlin	54c0074b33	IMPALA-14405 ADDENDUM: Catch exception for bad column names This commit is a fix on top of IMPALA-14405 for the Calcite planner. The original commit matches column names from the expression in the select clause. For instance, if the query is "select 1 + 1", the label in impala-shell will be "1 + 1". It accomplished this by retrieving the string from the SqlNode object through the MySql dialect. However, when the expression doesn't succeed in the MySql dialect, an AssertionError gets thrown, causing the query to fail. We don't want the query to fail, we just want to go back to using the Calcite expression, e.g. EXPR$0. This occurred with this specific query: "select timestamp_col + interval 3 nanoseconds" So now the exception is caught and the default label name is used. Eventually we should try to match what Impala has, but this is a harder problem to fix. Change-Id: I6c4d76a25fb2486eb1ef19485bce7888d45d282f Reviewed-on: http://gerrit.cloudera.org:8080/23665 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-11-18 21:34:29 +00:00
Zoltan Borok-Nagy	454cb07e7c	IMPALA-14556: Move Hive ACID stress tests to exhaustive tests Currently Hive ACID stress tests run with "core" exploration strategy. It was important to get instant feedback about this feature when this was actively developed. Since then development activity around Hive ACID decreased significantly, as focus shifted towards Iceberg. This patch moves Hive ACID tests to exhaustive tests where they will be still executed regularly, but won't slow down pre-commit tests. Change-Id: Id7181fea62e2e3f8fcf7897a70e54a1708ef3f3e Reviewed-on: http://gerrit.cloudera.org:8080/23677 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-18 16:20:16 +00:00
Arnab Karmakar	a2a11dec62	IMPALA-13263: Add single-argument overload for ST_ConvexHull() Implemented a single-argument version of ST_ConvexHull() to align with PostGIS behavior and simplify usage across geometry types. Testing: Added new tests in test_geospatial_functions.py for ST_ConvexHull(), which previously had no test coverage, to verify correctness across supported geometry types. Change-Id: Idb17d98f5e75929ec0143aa16195a84dd6e50796 Reviewed-on: http://gerrit.cloudera.org:8080/23604 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-18 10:26:04 +00:00
Steve Carlin	52334ba426	IMPALA-14421: Calcite planner: case statement returning wrong types for char, varchar The 'case' function resolver in the original Impala planner has a quirk in it which caused issues in the Calcite planner. The function resolver for the original planner resolves all case statements with the "boolean" version. Later on, in the analysis of the CaseExpr, the proper types are assessed and the necessary casting is added. The Calcite planner follows a similar path. The resolver always returns boolean as well and the coerce nodes module determines the proper return type for the case statement. Two other related issues are also fixed here: Literal strings should be treated as type STRING instead of CHAR(X), but a null should literal should not be changed from a CHAR(x) to a STRING. This broke a 'case' test in the test framework where the columns were non-literals with type char(x), and the return value was a "null" which should not have forced a cast to string. A cast from a varchar to a varchar should be ignored. Testing: Added a test to calcite.test. Ensured the existing cast test in test_chars.py passed. Ran through the Jenkins Calcite testing framework. Change-Id: I82d657f4bfce432c458ee8198188dadf9f23f2ef Reviewed-on: http://gerrit.cloudera.org:8080/23560 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-18 07:47:39 +00:00
Peter Rozsa	8eb1d87edc	IMPALA-14272: Add extra flags option for coverage_helper.sh This change adds an optional flag to coverage_helper.sh script that accepts additional parameters for the wrapped gcovr call. Tests: - manually validated that the script has the original behaviour if the newly added flag is not set, also if it's set, the parameters are pushed down correctly. Change-Id: Iea26c9967b62b06ded6a0cb4c0346f0e789beb80 Reviewed-on: http://gerrit.cloudera.org:8080/23290 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Peter Rozsa <prozsa@cloudera.com>	2025-11-18 07:12:28 +00:00
Arnab Karmakar	068158e495	IMPALA-12401: Support more info types for HS2 GetInfo() API This patch adds support for 40+ additional TGetInfoType values in the HiveServer2 GetInfo() API, improving ODBC/JDBC driver compatibility. Previously, only 3 info types were supported (CLI_SERVER_NAME, CLI_DBMS_NAME, CLI_DBMS_VER). The implementation follows the ODBC CLI specification and matches the behavior of Hive's GetInfo implementation where applicable. Testing: - Added unit tests in test_hs2.py for new info types - Tests verify correct return values and data types for each info type Change-Id: I1ce5f2b9dcc2e4633b4679b002f57b5b4ea3e8bf Reviewed-on: http://gerrit.cloudera.org:8080/23528 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-17 19:32:50 +00:00
Riza Suminto	f2243b76b5	IMPALA-14557: Fix flaky test_show_files_partition TestIcebergTable.test_show_files_partition is unstable because files are alphanumerically sorted and the order between a random UUID and "delete-*" is not guaranteed. This patch fix the flakiness by specifying VERIFY_IS_SUBSET and using negative lookahead of "delete" word to detect valid Iceberg data file. Testing: - Loop and pass test_show_files_partition 50 times. Before, it can fail in less than 10 loops. Change-Id: I6243585a5b7ab7cf7c95d5a9530ce2f2825c550e Reviewed-on: http://gerrit.cloudera.org:8080/23680 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-17 17:13:19 +00:00
Michael Smith	166b39547e	IMPALA-14553: Run schema eval concurrently The majority of time spent in generate-schema-statements.py is in eval_section for schema operations that shell out, often uploading files via the hadoop CLI or generating data files. These operations should be independent. Runs eval_section at the beginning so we don't repeat it for each row in test_vectors, and executes them in parallel via a ThreadPool. Defaults to NUM_CONCURRENT_TESTS threads because the underlying operations have some concurrency to them (such as HDFS mirroring writes). Also collects existing tables into a set to optimize lookup. Reduces generate-schema-statements by ~60%, from 2m30s to 1m. Confirmed that contents of logs/data_loading/sql/functional are identical. Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2 Reviewed-on: http://gerrit.cloudera.org:8080/23627 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-17 16:34:22 +00:00
Steve Carlin	bc99705252	IMPALA-13902: Calcite planner: Implement is_spool_query_results The is_spool_query_results query option is now supported in Calcite. The returnAtMostOneRow method is now implemented to support this. PlanRootSink is refactored to extract sanitizing query options (a new method sanitizeSpoolingOptions()) out of PlanRootSink.computeResourceProfile(). The bulk of memory bounding calculation is also extracted out to a new class SpoolingMemoryBound. Added "sleep" in ImpalaOperatorTable.java since some EE tests related to result spooling calls sleep() function. Changed ImpalaPlanRel to extends RelNode interface. A sanity test has been added to calcite.test, but the bulk of the testing will be done through the Impala test framework when it is enabled. Testing: - Pass FE tests PlannerTest#testResultSpooling, TpcdsCpuCostPlannerTest, and all java tests under calcite-planner project. - Pass query_test/test_result_spooling.py and custom_cluster/test_result_spooling.py. Co-authored-by: Riza Suminto Change-Id: I5b9bf49e2874ee12de212b892bd898c296774c6f Reviewed-on: http://gerrit.cloudera.org:8080/23562 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-16 02:33:02 +00:00
Riza Suminto	898e03e9d5	IMPALA-14552: (addendum) Fix bad testcase in show-create-table.test The original IMPALA-14552 patch pass precommit tests before IMPALA-12893: (part 2) (`275f03f`) merged. As consequence, it does not catch missing comma in updated show-create-table.test. This patch add that missing comma. Testing: Pass metadata/test_show_create_table.py Change-Id: Ib06e690a81e6b0ca483b3647cc59c73802a0a7b7 Reviewed-on: http://gerrit.cloudera.org:8080/23673 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-15 21:34:44 +00:00
Zoltan Borok-Nagy	6810368c10	IMPALA-14552: test_show_create_table should be more strict with TBLPROPERTIES contents Currently we use this regex to parse the contents of TBLPROPERTIES: kv_regex = "'([^\']+)'\\s=\\s'([^\']+)'" kv_results = dict(re.findall(kv_regex, map_match.group(1))) This allows strings like: 'X'='Y'='Z' 'X'='Z'$'A'='B' This means it's easy to write strings in .test files that are not valid SQL. This patch adds a few extra checks to validate the TBLPROPERTIES contents. Change-Id: I94110f50720c01dc7807ee56c794d235f4990282 Reviewed-on: http://gerrit.cloudera.org:8080/23671 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-11-14 23:58:47 +00:00
Mihaly Szjatinya	087b715a2b	IMPALA-14108: Add support for SHOW FILES IN table PARTITION for Iceberg tables This patch implements partition filtering support for the SHOW FILES statement on Iceberg tables, based on the functionality added in IMPALA-12243. Prior to this change, the syntax resulted in a NullPointerException. Key changes: - Added ShowFilesStmt.analyzeIceberg() to validate and transform partition expressions using IcebergPartitionExpressionRewriter and IcebergPartitionPredicateConverter. After that, it collects matching file paths using IcebergUtil.planFiles(). - Added FeIcebergTable.Utils.getIcebergTableFilesFromPaths() to accept pre-filtered file lists from the analysis phase. - Enhanced TShowFilesParams thrift struct with optional selected_files field to pass pre-filtered file paths from frontend to backend. Testing: - Analyzer tests for negative cases: non-existent partitions, invalid expressions, non-partition columns, unsupported transforms. - Analyzer tests for positive cases: all transform types, complex expressions. - Authorization tests for non-filtered and filtered syntaxes. - E2E tests covering every partition transform type with various predicates. - Schema evolution and rollback scenarios. The implementation follows AlterTableDropPartition's pattern where the analysis phase performs validation/metadata retrieval and the execution phase handles result formatting and display. Change-Id: Ibb9913e078e6842861bdbb004ed5d67286bd3152 Reviewed-on: http://gerrit.cloudera.org:8080/23455 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 21:43:10 +00:00
Zoltan Borok-Nagy	275f03f10d	IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2 This patch updates CDP_BUILD_NUMBER to 71942734 to in order to upgrade Iceberg to 1.5.2. This patch updates some tests so they pass with Iceberg 1.5.2. The behavior changes of Iceberg 1.5.2 are (compared to 1.3.1): * Iceberg V2 tables are created by default * Metadata tables have different schema * Parquet compression is explicitly set for new tables (even for ORC tables) * Sequence numbers are assigned a bit differently Updated the tests where needed. Code changes to accomodate for the above behavior changes: * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables * CREATE TABLE statements don't throw errors when Parquet compression is set for ORC tables Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e Reviewed-on: http://gerrit.cloudera.org:8080/23670 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 01:27:45 +00:00
Xuebin Su	e4a508529c	IMPALA-14544: Fix use-after-poison for Kudu arrays This patch fixes the use-after-poison error caused by using the memory in the MemPool after calling `MemPool::Clear()` when reading Kudu arrays. Testing: - The ASAN build passed the core tests. Change-Id: I9b729fc6003e64856ea0e197b1e3c74dad7247a1 Reviewed-on: http://gerrit.cloudera.org:8080/23668 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 22:38:32 +00:00
Joe McDonnell	5f91838ada	IMPALA-14545: Don't use absolute hdfs paths for JDBC table driver.url After IMPALA-13661 merged, S3PlannerTest.testDataSourceTables has been failing with an error trying to fetch the JDBC driver for functional.jdbc_decimal_tbl. This particular table's definition uses a path like 'hdfs://localhost:20500/test-warehouse/...' which explicitly depends on HDFS rather than relying on the default filesystem. Changing this to use a path like '/test-warehouse/...' without the HDFS dependency fixes the S3PlannerTest. This changes create-ext-data-source-table.sql to a template using WAREHOUSE_LOCATION_PREFIX and replaces that variable before executing it. This is important for Ozone, as Ozone uses a WAREHOUSE_LOCATION_PREFIX set to the Ozone volume. Testing: - Ran S3 and regular HDFS fe tests Change-Id: I3f2c86fcc6c1dee75d7d9a9be04468cb197ae13c Reviewed-on: http://gerrit.cloudera.org:8080/23658 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 22:17:44 +00:00
Arnab Karmakar	760eb4f2fa	IMPALA-13066: Extend SHOW CREATE TABLE to include stats and partitions Adds a new WITH STATS option to the SHOW CREATE TABLE statement to emit additional SQL statements for recreating table statistics and partitions. When specified, Impala outputs: - Base CREATE TABLE statement. - ALTER TABLE ... SET TBLPROPERTIES for table-level stats. - ALTER TABLE ... SET COLUMN STATS for all non-partition columns, restoring column stats. - For partitioned tables: - ALTER TABLE ... ADD PARTITION statements to recreate partitions. - Per-partition ALTER TABLE ... PARTITION (...) SET TBLPROPERTIES to restore partition-level stats. Partition output is limited by the PARTITION_LIMIT query option (default 1000). Setting PARTITION_LIMIT=0 includes all partitions and emits a warning if the limit is exceeded. Tests added to verify correctness of emitted statements. Default behavior of SHOW CREATE TABLE remains unchanged for compatibility. Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2 Reviewed-on: http://gerrit.cloudera.org:8080/23536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 06:11:37 +00:00
ttttttz	75c639c9cd	IMPALA-14498: Fix a bug in initial code review checks When conducting a code review using flake8-diff, it may fail in some code sections due to the use of non-raw strings. This patch modifies one instance to successfully pass the initial code review. Although it is currently working, it may not cover all instances. Change-Id: I71889a117c64500bab13928971a2bce063a72cd4 Reviewed-on: http://gerrit.cloudera.org:8080/23656 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-11-12 01:05:10 +00:00
Michael Smith	d09940b5dd	IMPALA-13563: Cleanup logging Cleans up calls to logDebug and a few other locations: - exit early if producing debug message input is expensive - use slf4j parameterized logging - normalize on logDebug handling isDebugEnabled checks Change-Id: I32e1c62511c292d36aa879c60ae3d91ed4f65697 Reviewed-on: http://gerrit.cloudera.org:8080/22090 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-11 05:29:58 +00:00
Xuebin Su	6b6f7e614d	IMPALA-14472: Add create/read support for ARRAY column of Kudu Initial implementation of KUDU-1261 (array column type) recently merged in upstream Apache Kudu repository. This patch add initial Impala support for working with Kudu tables having array type columns. Unlike rows, the elements of a Kudu array are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays. The following types of queries are not supported in this patch: - (IMPALA-14538) Queries that reference an array column as a table, e.g. ```sql SELECT item FROM kudu_array.array_int; ``` - (IMPALA-14539) Queries that create duplicate collection slots, e.g. ```sql SELECT array_int FROM kudu_array AS t, t.array_int AS unnested; ``` Testing: - Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest. - Add EE test test_kudu.py::TestKuduArray. Since Impala does not support inserting complex types, including array, the data insertion part of the test is achieved through custom C++ code kudu-array-inserter.cc that insert into Kudu via Kudu C++ client. It would be great if we could migrate it to Python so that it can be moved to the same file as the test (IMPALA-14537). - Pass core tests. Co-authored-by: Riza Suminto Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Reviewed-on: http://gerrit.cloudera.org:8080/23493 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-08 06:41:07 +00:00

1 2 3 4 5 ...

12352 Commits