impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 09:58:28 -05:00

Author	SHA1	Message	Date
stiga-huang	68a9630adc	IMPALA-14284: Log the actual log files instead of symlinks in start-impala-cluster.py It's not that easy to find log files of a custom-cluster test. All custom-cluster tests use the same log dir and the test output just shows the symlink of the log files, e.g. "Starting State Store logging to .../logs/custom_cluster_tests/statestored.INFO". This patch prints the actual log file names after the cluster launchs. An example output: 15:17:19 MainThread: Starting State Store logging to /tmp/statestored.INFO 15:17:19 MainThread: Starting Catalog Service logging to /tmp/catalogd.INFO 15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad.INFO 15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node1.INFO 15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node2.INFO ... 15:17:24 MainThread: Total wait: 2.54s 15:17:24 MainThread: Actual log file names: 15:17:24 MainThread: statestored.INFO -> statestored.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094348 15:17:24 MainThread: catalogd.INFO -> catalogd.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094368 15:17:24 MainThread: impalad.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094466 15:17:24 MainThread: impalad_node1.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094468 15:17:24 MainThread: impalad_node2.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094470 15:17:24 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors). Tests - Ran the script locally. - Ran a failed custom-cluster test and verified the actual file names are printed in the output. Change-Id: Id76c0a8bdfb221ab24ee315e2e273abca4257398 Reviewed-on: http://gerrit.cloudera.org:8080/23781 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-12-18 11:18:41 +00:00
Riza Suminto	d4992d532b	Revert "IMPALA-14454: Exclude log4j 2 dependencies" This reverts commit `52b87fcefd`. The original commit caused an issue when Impala is deployed together with Apache Atlas. Coordinator failed to start with error message: java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/Layout Solved minor conflict in impala-config.sh due to IMPALA-14478 applied after IMPALA-14454. Change-Id: I77127db8d833c675c18c30eb3d6542ca906cd2a9 Reviewed-on: http://gerrit.cloudera.org:8080/23788 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-16 00:26:34 +00:00
Michael Smith	c3dc7f9667	IMPALA-13147: Limit concurrency of link jobs Configure separate compile and link pools for ninja. Configures link parallelism based on expected memory use, which can be reduced by setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true. Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to buildall.sh to force using 'make'. Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and linking test binaries. However 'mold' is not selected as the default due to test failures around SASL/GSSAPI (see IMPALA-14527). Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer needed with ninja. SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja. Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests' with default (make) and IMPALA_MAKE_CMD=ninja. Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db Reviewed-on: http://gerrit.cloudera.org:8080/23572 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-15 21:43:07 +00:00
jichen0919	bf517d3323	IMPALA-14610: Bump up arrow version to 15.0.0 The patch bumped up the arrow version to 15.0.0 and use latest toolchain to fix the arrow jni loading issue for linux on aarch64 environment. Background: We have fixed jni loading issue for aarch64 environment from native toolchain side in IMPALA-14609. We also need to bump up arrow version to 15.0.0 and use the toolchain to fix the issue. Testing: Built new toolchain and pass paimon test in aarch64 environment. Change-Id: I7b8dd6ab43cf05b4339880ecec0d1f48e44ef294 Reviewed-on: http://gerrit.cloudera.org:8080/23756 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-12-11 16:12:42 +00:00
Riza Suminto	b581e45286	IMPALA-14606: (addendum) Install Python 3 for RHEL8 The first IMPALA-14606 commit miss to setup Python 3 in fresh RHEL8 machine. This was not caught before because I test using downstream jenkins and it reuse RHEL8 machine that previously setup with Python 2. This patch fix the issue by skipping pip install argparse that broke the script and run setup_python3 instead for RHEL8 machine. Testing: - Run full bootstrap_system.sh and buildall.sh in fresh RHEL8 machine. Change-Id: I6df0a534175404fe96d32eeb1e7bf0aa9ca204cd Reviewed-on: http://gerrit.cloudera.org:8080/23772 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-10 17:22:32 +00:00
Riza Suminto	3ed2a82a95	IMPALA-14606: Stop building impala-shell for Python 2 This patch stop setting up and building impala-shell for Python 2. A more thorough clean up will be done in the future. Testing: Pass build and test/shell/ in RHEL8. Change-Id: Ic7d59b283f4e2f011880ff6221d550b52714a538 Reviewed-on: http://gerrit.cloudera.org:8080/23750 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-10 04:40:46 +00:00
Laszlo Gaal	fe41448780	IMPALA-14603: Force Java alternative after setup on Rocky and Red Hat Linux Impala allows various Java versions to be selected for its build and runtime environment when bin/bootstrap_system.sh is used to set up the environment. Unfortunately this setup failed to affect the current Java JRE and compiler tools on Red Hat Linux and compatibles (e.g. Rocky Linux), because bootstrap_system.sh failed to set up the requested version in the "alternatives" subsystem. The same failure was not observed on Ubuntu versions, on that platform `update_java_alternatives` was correctly run for the same purpose. This patch adds calls to `alternatives` to set the JRE and JDK environments to the requested version. This benefits automated test runs in Impala's pre- and post-commit environments as well as individual workstation setups. Change-Id: I8972fb35b232830c6d8cf1125a7a8223547bd206 Reviewed-on: http://gerrit.cloudera.org:8080/23741 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-05 21:42:55 +00:00
jichen0919	7e29ac23da	IMPALA-14092 Part2: Support querying of paimon data table via JNI This patch mainly implement the querying of paimon data table through JNI based scanner. Features implemented: - support column pruning. The partition pruning and predicate push down will be submitted as the third part of the patch. We implemented this by treating the paimon table as normal unpartitioned table. When querying paimon table: - PaimonScanNode will decide paimon splits need to be scanned, and then transfer splits to BE do the jni-based scan operation. - We also collect the required columns that need to be scanned, and pass the columns to Scanner for column pruning. This is implemented by passing the field ids of the columns to BE, instead of column position to support schema evolution. - In the original implementation, PaimonJniScanner will directly pass paimon row object to BE, and call corresponding paimon row field accessor, which is a java method to convert row fields to impala row batch tuples. We find it is slow due to overhead of JVM method calling. To minimize the overhead, we refashioned the implementation, the PaimonJniScanner will convert the paimon row batches to arrow recordbatch, which stores data in offheap region of impala JVM. And PaimonJniScanner will pass the arrow offheap record batch memory pointer to the BE backend. BE PaimonJniScanNode will directly read data from JVM offheap region, and convert the arrow record batch to impala row batch. The benchmark shows the later implementation is 2.x better than the original implementation. The lifecycle of arrow row batch is mainly like this: the arrow row batch is generated in FE,and passed to BE. After the record batch is imported to BE successfully, BE will be in charge of freeing the row batch. There are two free paths: the normal path, and the exception path. For the normal path, when the arrow batch is totally consumed by BE, BE will call jni to fetch the next arrow batch. For this case, the arrow batch is freed automatically. For the exceptional path, it happends when query is cancelled, or memory failed to allocate. For these corner cases, arrow batch is freed in the method close if it is not totally consumed by BE. Current supported impala data types for query includes: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE TODO: - Patches pending submission: - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Snapshot incremental read. - Complex type query support. - Native paimon table scanner, instead of jni based. Testing: - Create tests table in functional_schema_template.sql - Add TestPaimonScannerWithLimit in test_scanners.py - Add test_paimon_query in test_paimon.py. - Already passed the tpcds/tpch test for paimon table, due to the testing table data is currently generated by spark, and it is not supported by impala now, we have to do this since hive doesn't support generating paimon table for dynamic-partitioned tables. we plan to submit a separate patch for tpcds/tpch data loading and associated tpcds/tpch query tests. - JVM Offheap memory leak tests, have run looped tpch tests for 1 day, no obvious offheap memory increase is observed, offheap memory usage is within 10M. Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Reviewed-on: http://gerrit.cloudera.org:8080/23613 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-05 18:19:57 +00:00
ttttttz	5d1f1e0180	IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3 When the environment variable USE_APACHE_HIVE is set to true, build Impala for adapting to Apache Hive 3.x. In order to better distinguish it from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3. Additionally, to facilitate referencing different versions of the Hive MetastoreShim, the major version of Hive has been added to the environment variable IMPALA_HIVE_DIST_TYPE. Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d Reviewed-on: http://gerrit.cloudera.org:8080/21724 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 13:38:45 +00:00
jichen0919	685745f785	IMPALA-14579: Bump up paimon version to 1.3.1 for CVE-2025-46762 This patch mainly fix the CVE-2025-46762 by bumping up paimon version to 1.3.1. Background: Following PR: https://github.com/apache/incubator-paimon/pull/6363 has been merged by paimon community since paimon-1.3.0. So in impala, need to upgrade paimon version to 1.3.0 or later to fix the CVE as well. Testing: - All paimon related tests are passed. Change-Id: Ie8052f71a5e2a4e39b0ac39b6d349e55f10092bc Reviewed-on: http://gerrit.cloudera.org:8080/23717 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-26 16:55:30 +00:00
Joe McDonnell	5eea4f6f79	IMPALA-14559: Ship calcite-planner jar in Impala packages This adds the java/impala-package Maven project to make it easier to ship / test the Calcite planner. impala-package has a dependency on impala-frontend and calcite-planner, so its classpath requires no extra work when constructing the classpath. An additional cleanup is that this no longer puts the impala-frontend-*-tests.jar on the classpath by default. This requires updating the query event hooks test, as it relies on that jar being present. This does not change the default value for the use_calcite_planner query option, so there is no change in behavior. Testing: - Ran a core job - Built docker images and OS packages locally Change-Id: I81dec2a5b59e279229a735c8bb1a23c77111a793 Reviewed-on: http://gerrit.cloudera.org:8080/23497 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 03:36:12 +00:00
Zoltan Borok-Nagy	5ea4dc342e	IMPALA-14565: Update Apache component versions after CDP_BUILD_NUMBER bump to 71942734 CDP_BUILD_NUMBER was bumped to 71942734 which upgraded Iceberg to version 1.5.2. We should update our Apache component dependencies (not just Iceberg) accordingly. Change-Id: Ic353bbef64a59365b708a20bd0d5ed502cb6d44e Reviewed-on: http://gerrit.cloudera.org:8080/23678 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 01:40:05 +00:00
Steve Carlin	a6bb0c7c45	IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend When the --use_calcite_planner=true option is set at the server level, the queries will no longer go through CalciteJniFrontend. Instead, they will go through the regular JniFrontend, which is the path that is used when the query option for "use_calcite_planner" is set. The CalciteJniFrontend will be removed in a later commit. This commit also enables fallback to the original planner when an unsupported feature exception is thrown. This needed to be added to allow the tests to run properly. During initial database load, there are queries that access complex columns which throws the unsupported exception. Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f Reviewed-on: http://gerrit.cloudera.org:8080/23523 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-11-20 21:08:48 +00:00
Riza Suminto	64c4abe6ed	IMPALA-14547: Bumping Kudu version to pickup KUDU-3716 Redhat 9 environments recently switched to OpenSSL 3.5.1. On those machines, the Kudu minicluster fails to start up with CSR signature verification error. KUDU-3716 fixed this issue. This patch update Toolchain and Kudu version to pick up KUDU-3716. Testing: Pass data loading with in Redhat 9. Change-Id: I7262267939a9f08650af85443240950afbb3323f Reviewed-on: http://gerrit.cloudera.org:8080/23697 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 15:16:57 +00:00
Joe McDonnell	3ce0004c12	IMPALA-14512: Remove dependency on sh python package This modifies bin/single_node_perf_run.py to stop using the sh python package. It replaces sh with calls to subprocess. It stops installing sh for both the Python 2 and 3 virtualenvs. Testing: - Ran perf-AB-test job with it and examined the logs Change-Id: Ic5f9316a5d83c5c0dc37d4a94c55b6a655765fe3 Reviewed-on: http://gerrit.cloudera.org:8080/23600 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 03:29:48 +00:00
Joe McDonnell	001263f58a	IMPALA-14514: Handle serializing bytes in bin/run-workload.py On python 3, when Impyla receives a result with a string that is not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that contains invalid UTF-8, so bin/run-workload.py can fail while trying to dump this to JSON. This modifies CustomJSONEncoder to handle serializing bytes by converting it to a string with invalid unicode handled with backslashes. Testing: - Ran bin/run-workload.py against TPC-DS scale 20 Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea Reviewed-on: http://gerrit.cloudera.org:8080/23602 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-11-20 03:29:48 +00:00
Peter Rozsa	8eb1d87edc	IMPALA-14272: Add extra flags option for coverage_helper.sh This change adds an optional flag to coverage_helper.sh script that accepts additional parameters for the wrapped gcovr call. Tests: - manually validated that the script has the original behaviour if the newly added flag is not set, also if it's set, the parameters are pushed down correctly. Change-Id: Iea26c9967b62b06ded6a0cb4c0346f0e789beb80 Reviewed-on: http://gerrit.cloudera.org:8080/23290 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Peter Rozsa <prozsa@cloudera.com>	2025-11-18 07:12:28 +00:00
Zoltan Borok-Nagy	275f03f10d	IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2 This patch updates CDP_BUILD_NUMBER to 71942734 to in order to upgrade Iceberg to 1.5.2. This patch updates some tests so they pass with Iceberg 1.5.2. The behavior changes of Iceberg 1.5.2 are (compared to 1.3.1): * Iceberg V2 tables are created by default * Metadata tables have different schema * Parquet compression is explicitly set for new tables (even for ORC tables) * Sequence numbers are assigned a bit differently Updated the tests where needed. Code changes to accomodate for the above behavior changes: * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables * CREATE TABLE statements don't throw errors when Parquet compression is set for ORC tables Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e Reviewed-on: http://gerrit.cloudera.org:8080/23670 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 01:27:45 +00:00
Xuebin Su	6b6f7e614d	IMPALA-14472: Add create/read support for ARRAY column of Kudu Initial implementation of KUDU-1261 (array column type) recently merged in upstream Apache Kudu repository. This patch add initial Impala support for working with Kudu tables having array type columns. Unlike rows, the elements of a Kudu array are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays. The following types of queries are not supported in this patch: - (IMPALA-14538) Queries that reference an array column as a table, e.g. ```sql SELECT item FROM kudu_array.array_int; ``` - (IMPALA-14539) Queries that create duplicate collection slots, e.g. ```sql SELECT array_int FROM kudu_array AS t, t.array_int AS unnested; ``` Testing: - Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest. - Add EE test test_kudu.py::TestKuduArray. Since Impala does not support inserting complex types, including array, the data insertion part of the test is achieved through custom C++ code kudu-array-inserter.cc that insert into Kudu via Kudu C++ client. It would be great if we could migrate it to Python so that it can be moved to the same file as the test (IMPALA-14537). - Pass core tests. Co-authored-by: Riza Suminto Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Reviewed-on: http://gerrit.cloudera.org:8080/23493 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-08 06:41:07 +00:00
Michael Smith	8ed6d5c3ba	IMPALA-14530: Use minimal debug info in Jenkins Uses IMPALA_MINIMAL_DEBUG_INFO=true in Jenkins build-all-flag-combinations.sh to reduce memory usage during linking and avoid OOM kills. This script uses -skiptests to build all test binaries, but doesn't run them, so debug info is not needed. Change-Id: I4605b98d8d197e07c2eaac8218ff985c798875ed Reviewed-on: http://gerrit.cloudera.org:8080/23641 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-06 16:09:56 +00:00
Riza Suminto	0572dba245	IMPALA-14529: Bumping Kudu version to pickup latest KUDU-1261 patch This commit bump Impala toolchain to pickup latest Kudu version up to commit 60f5e5267b92c39485a66121d3ce3cc7ef57b0e0 (KUDU-1261 make ArrayCellMetadataView::Init() more robust). Change-Id: I68009e5fefd053882f5504cd2520bacb189a1b04 Reviewed-on: http://gerrit.cloudera.org:8080/23631 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-11-05 16:41:51 +00:00
Michael Smith	599b89306d	IMPALA-13145: Upgrade mold to 2.40.4 Upgrades mold to the latest release. Change-Id: If926b8065cccc4c9038c064c274b6ba97fdc2888 Reviewed-on: http://gerrit.cloudera.org:8080/23582 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-27 15:05:01 +00:00
Michael Smith	1152eef9bb	IMPALA-14501: (Addendum) Fix single node perf run Fixes open in generate_profile_files to read binary with Python 3, matching generate_profile_file. Change-Id: Ibd815e7eb989d7a2bcf52cadfcde4f355c18a148 Reviewed-on: http://gerrit.cloudera.org:8080/23596 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-25 17:31:06 +00:00
Joe McDonnell	1913ab46ed	IMPALA-14501: Migrate most scripts from impala-python to impala-python3 To remove the dependency on Python 2, existing scripts need to use python3 rather than python. These commands find those locations (for impala-python and regular python): git grep impala-python \| grep -v impala-python3 \| grep -v impala-python-common \| grep -v init-impala-python git grep bin/python \| grep -v python3 This removes or switches most of these locations by various means: 1. If a python file has a #!/bin/env impala-python (or python) but doesn't have a main function, it removes the hash-bang and makes sure that the file is not executable. 2. Most scripts can simply switch from impala-python to impala-python3 (or python to python3) with minimal changes. 3. The cm-api pypi package (which doesn't support Python 3) has been replaced by the cm-client pypi package and interfaces have changed. Rather than migrating the code (which hasn't been used in years), this deletes the old code and stops installing cm-api into the virtualenv. The code can be restored and revamped if there is any interest in interacting with CM clusters. 4. This switches tests/comparison over to impala-python3, but this code has bit-rotted. Some pieces can be run manually, but it can't be fully verified with Python 3. It shouldn't hold back the migration on its own. 5. This also replaces locations of impala-python in comments / documentation / READMEs. 6. kazoo (used for interacting with HBase) needed to be upgraded to a version that supports Python 3. The newest version of kazoo requires upgrades of other component versions, so this uses kazoo 2.8.0 to avoid needing other upgrades. The two remaining uses of impala-python are: - bin/cmake_aux/create_virtualenv.sh - bin/impala-env-versioned-python These will be removed separately when we drop Python 2 support completely. In particular, these are useful for testing impala-shell with Python 2 until we stop supporting Python 2 for impala-shell. The docker-based tests still use /usr/bin/python, but this can be switched over independently (and doesn't impact impala-python) Testing: - Ran core job - Ran build + dataload on Centos 7, Redhat 8 - Manual testing of individual scripts (except some bitrotted areas like the random query generator) Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc Reviewed-on: http://gerrit.cloudera.org:8080/23468 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-22 16:30:17 +00:00
Michael Smith	98f993da43	IMPALA-14478: Add CDP ORC build Adds CDP_ORC_JAVA_VERSION so we can build and test with Apache or CDP versions of ORC. Change-Id: Id9ba78051aff9c9129c244b1734b6f8a523858b5 Reviewed-on: http://gerrit.cloudera.org:8080/23506 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-08 23:34:55 +00:00
Riza Suminto	3d61c5ea9f	IMPALA-14476: Workaround TSAN issue in KuduClient Since the toolchain was bumped to pick up Kudu's array column feature (KUDU-1261), Impala's TSAN builds on the master branch consistently break during dataload with a data race detected by TSAN. The source of data race lies within libkudu_client.so and only trigger if Impala build machine has both ipv4 and ipv6 associated with localhost. Until the exact root cause is found and fixed, this patch workaround the TSAN issue by fixing KUDU_MASTER_HOSTS env var to 127.0.0.1. Testing: Run TSAN build and confirm no data race error is emmitted. Change-Id: I511ab625d18c6007567083557fcdf98980a6ac6f Reviewed-on: http://gerrit.cloudera.org:8080/23507 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-10-08 14:40:50 +00:00
Riza Suminto	a2e4463fbc	IMPALA-14471: Bump up KUDU_VERSION to pick up complex types This patch update Impala toolchain Kudu to 16689973a to pick up Kudu array column feature (KUDU-1261). Change-Id: Ib151d4ea6852e8ba8ae92697bd6806a074e37159 Reviewed-on: http://gerrit.cloudera.org:8080/23492 Reviewed-by: Alexey Serbin <alexey@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-04 06:07:09 +00:00
Joe McDonnell	e1b3c1445e	IMPALA-13472: Bump toolchain to fix minidump stacks on ARM Minidump stack resolution does not work on Redhat8 ARM64. Redhat8 ARM64 uses 64KB pages, and the Breakpad library does not properly handle collecting stacks for that configuration. Breakpad rounds off the stack pointer to the nearest page boundary below the stack pointer, then collects up to 32KB of stack memory. With a top-down stack, this means it is collecting some memory that is not used by the stack. With 64KB pages, the memory it collects usually doesn't contain any stack contents. This picks up a toolchain with Breakpad patched to fix this. The patch stops rounding the stack pointer to the nearest page. Instead, it adjusts the stack pointer to account for the red zone (128 bytes on x86_64) and then rounds to the nearest 1KB boundary below the stack pointer. Testing: - Produced and resolved minidumps on multiple build types for x86_64 and ARM64 (release, debug, asan, ubsan) Change-Id: I4fbd91abfbddfd8355d27ae9d9b86b70a9ce0409 Reviewed-on: http://gerrit.cloudera.org:8080/23465 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-25 23:44:31 +00:00
Michael Smith	52b87fcefd	IMPALA-14454: Exclude log4j 2 dependencies While we use reload4j, we can safely exclude log4j 2 dependencies to reduce the size of our artifacts. Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819 Reviewed-on: http://gerrit.cloudera.org:8080/23439 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-24 18:04:06 +00:00
Michael Smith	e5afebc0c1	IMPALA-14450: (Addendum) Fix other numeric comparison Fixes set-impala-java-tool-options.sh: line 25: ((: 1.8: syntax error: invalid arithmetic operator (error token is ".8") Double parentheses - ((...)) - only support integer arithmetic. I can't find any standard way to do decimal comparison in shells, so switch to extract Java major version as an integer and compare that. OpenJDK 8 has always considered "-target 1.8" and "-target 8" equivalent https://github.com/openjdk/jdk/blob/jdk8-b01/langtools/src/share/classes/com/sun/tools/javac/jvm/Target.java#L105 so maven target can be set to 8 when IMPALA_JAVA_TARGET is 8. Change-Id: I15cdd1859be51d3708f1c348e898831df2a92b13 Reviewed-on: http://gerrit.cloudera.org:8080/23452 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 03:42:29 +00:00
Michael Smith	5137bb94ac	IMPALA-14446: Clean up pom.xml Cleans up repetitive patterns in pom.xml. Centralize plugin configuration in pluginManagement. Replace inline maven-compiler-plugin configuration with newer maven.compiler.release and update to latest plugin version. Centralize common dependencies in dependencyManagement, including exclusions when appropriate. Remove exclusions that are no longer relevant. Compared before and after with dependency:tree; only difference is that commons-cli now comes from hadoop and jersey-serv{let,er} are effectively excluded; all versions matched. Also ensured USE_APACHE_COMPONENTS=true compiles. Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure it's not accidentally included alongside impala-minimal-s3a-aws-sdk. Removes missed io.netty exclusion from IMPALA-12816. Updates commons-dbcp2 to 2.12.0 to match Hive. Change-Id: If96649840e23036b4a73ee23e8d12516497994f0 Reviewed-on: http://gerrit.cloudera.org:8080/23432 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 02:50:22 +00:00
Laszlo Gaal	57eb5f653b	IMPALA-14449, IMPALA-14269: Fix Red Hat / Rocky 9 builds, ORC buffer overflow Downstream error reports pointed out that the toolchain version picked up for IMPALA-14139 contains toolchain binaries for Red Hat 9 (and compatibles) that require at least the 9.5 minor version because of OpenSSL library requirements. This was caused by the toolchain binary build process not using package repo pinning for the redhat9 build container definition, which caused the container process to install "latest" packages, in this case packages released in Rocky / Red Hat 9.5. This patch bumps the toolchain ID to a version in which the redhat9 binaries were produced in a build container "moved back in time" to the 9.2 release by pinning the package repos to the Rocky Linux 9.2 state, using the Rocky Vault. The patch also picks up a buffer overflow mitigation for the ORC library. Change-Id: I5c6921afdc69a4a6644b619de6b8d4e4cc69e601 Reviewed-on: http://gerrit.cloudera.org:8080/23448 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-22 19:54:25 +00:00
Michael Smith	8a80ede69b	IMPALA-14450: (Addendum) Fix numeric comparison Fix shell comparison to use string equality so it works for all POSIX shells instead of just zsh. Change-Id: If9b9ed7f59e71d024ec674bb30c57274567fb2a3 Reviewed-on: http://gerrit.cloudera.org:8080/23444 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-09-19 19:20:30 +00:00
Csaba Ringhofer	0e30792023	IMPALA-14444: Upgrade bouncycastle to 1.79 Change-Id: Ib20c840be2811467716c8de5d2f816a0e5531eb4 Reviewed-on: http://gerrit.cloudera.org:8080/23437 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-19 15:04:46 +00:00
Michael Smith	d217b9ecc6	IMPALA-14450: Simplify Java version selection Removes IMPALA_JAVA_HOME_OVERRIDE and updates version selection. In order of priority 1. If IMPALA_JDK_VERSION is set, use the OS JDK version from a known location. This is primarily used when also installing the JDK as part of automated builds. 2. If JAVA_HOME is set, use it. 3. Look for the system default JDK. The IMPALA_JDK_VERSION variable is no longer modified to avoid issues when sourcing impala-config.sh multiple times. JAVA_HOME will be modified if IMPALA_JDK_VERSION is set; both must be unset to restore using the system default Java. If switching between JDKs, now prefer setting JAVA_HOME. If relying on system Java, unset JAVA_HOME after e.g. update-java-alternatives. The detected Java version is set in IMPALA_JAVA_TARGET, which is used to add Java 9+ options and configure the Java compilation target. Eliminates IMPALA_JDK_VERSION_NUM as it's value was always identical to IMPALA_JAVA_TARGET. Stops printing from impala-config-java.sh. It made the output from impala-config.sh look strange, and the decisions can all be clearly determined from impala-config.sh printed variables later or the packages installed in bootstrap_system.sh. Fixes JAVA_HOME in bootstrap_build.sh on ARM64 systems. Change-Id: I68435ca69522f8310221a0f3050f13d86568b9da Reviewed-on: http://gerrit.cloudera.org:8080/23434 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-19 01:51:47 +00:00
pranav.lodha	0513c071b4	IMPALA-14151: Update jackson.core Bump IMPALA_JACKSON_VERSION from 2.15.3 to 2.18.1 as a part of maintenance upgrade to pick up fixes and improvements in the 2.18.x line. Change-Id: I7b63d8d58011c0dd1c00c72da386ec1b0fbc4d82 Reviewed-on: http://gerrit.cloudera.org:8080/23102 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-09-17 23:50:05 +00:00
Laszlo Gaal	89d2b23509	IMPALA-14139: Enable Impala builds on Ubuntu 24.04 Update the following elements of the Impala build environment to enable builds on Ubuntu 24.04: - Recognize and handle (where necessary) Ubuntu 24.04 in various bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.) - Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains Ubuntu 24.04-specific binary packages - Bump binutils to 2.42, and - Bump the GDB version to 12.1-p1, as required by the new toolchain version - Update unique_ptr usage syntax in be/src/util/webserver-test.cc to compensate for new GLIBC funtion prototypes: System headers in Ubuntu 24.04 adopted attributes on several widely used function prototypes. Such attributes are not considered to be part of the function's signature during template evaluation, so GCC throws a warning when such a function is passed as a template argument, which breaks the build, as warnings are treated as errors. webserver-test.cc uses pclose() as the deleter for a unique_ptr in a utility function. This patch encapsulates pclose() and its attributes in an explicit specialization for std::default_delete<>, "hiding" the attributes inside a functor. The particular solution was inspired by Anton-V-K's proposal in https://gist.github.com/t-mat/5849549 This commit builds on an earlier patch for the same purpose by Michael Smith: https://gerrit.cloudera.org/c/23058/ Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2 Reviewed-on: http://gerrit.cloudera.org:8080/23384 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-15 16:10:42 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Abhishek Rawat	f4c0c396ff	IMPALA-14175: Generate impala-udf-devel package using the build script Added '-udf_devel_package' option to buildall.sh. This generates impala-udf-devel rpm which includes udf headers and static libraries - ImpalaUdf-retail.a and ImpalaUdf-debug.a. Testing: - Tested that rpm is generated using build script: ./buildall.sh -release_and_debug -notests -udf_devel_package - Tested that the rpm is also generated using standalone script: ./bin/make-impala-udf-devel-rpm.sh - Generated impala-udf-devel package and tested compiling impala_udf_samples: https://github.com/cloudera/impala-udf-samples Change-Id: I5b85df9c3f680a7e5551f067a97a5650daba9b50 Reviewed-on: http://gerrit.cloudera.org:8080/23060 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-09 22:42:05 +00:00
Michael Smith	db92c88a4c	IMPALA-13417: Run mvn clean on all Java projects Runs mvn clean on all Java subprojects - instead of just ext-data-source - to avoid build failures when files from other versions of the code and dependencies are left behind. Change-Id: I8cf540f90adbff327de98f900059bfa3bbc8ef22 Reviewed-on: http://gerrit.cloudera.org:8080/23374 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-04 04:15:18 +00:00
Riza Suminto	28cff4022d	IMPALA-14333: Run impala-py.test using Python3 Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 10:01:29 +00:00
Sai Hemanth Gantasala	b67a9cecb3	IMPALA-13593: Enable event processor to consume ALTER_PARTITIONS events from metastore HIVE-27746 introduced ALTER_PARTITIONS event type which is an optimization of reducing the bulk ALTER_PARTITION events into a single event. The components version is updated to pick up this change. It would be a good optimization to include this in Impala so that the number of events consumed by event processor would be significantly reduced and help event processor to catch up with events quickly. This patch enables the ability to consume ALTER_PARTITIONS event. The downside of this patch is that, there is no before_partitions object in the event message. This can cause partitions to be refreshed even on trivial changes to them. HIVE-29141 will address this concern. Testing: - Added an end-to-end test to verify consuming the ALTER_PARTITIONS event. Also, bigger time outs were added in this test as there was flakiness observed while looping this test several times. Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737 Reviewed-on: http://gerrit.cloudera.org:8080/22554 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-08-28 06:53:32 +00:00
Laszlo Gaal	ad7888898b	IMPALA-13223: Fix bootstrap-build.sh for platforms without Python2 bin/bootstrap-build.sh did not distinguish between various version of the Ubuntu platform, and attempted to install unversioned Python packages (python-dev and python-setuptools) even on newer versions that don't support Python 2 any longer (e.g. Ubuntu 22.04 and 24.04). On older Ubuntu versions these packages are still useful, so at this point it is not feasible just to drop them. This patch makes these packages optional: they are added to the list of packages to be installed only if they actually exist for the platform. The patch also extends the package list with some basic packages that are needed when bin/bootstrap_build.sh is run inside an Ubuntu 22.04 Docker container. Tests: ran a compile-only build on Ubuntu 20.04 (still has Python 2) and on Ubuntu 22.04 (does not support Python 2 any more). Change-Id: I94ade35395afded4e130b79eab8c27c6171b50d6 Reviewed-on: http://gerrit.cloudera.org:8080/21800 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-21 22:03:52 +00:00
Daniel Becker	991c0d5cf3	IMPALA-14326: Update commons-lang3 to version 3.18.0 Update commons-lang3 from version 3.17.0 to 3.18.0. Testing: - Core tests passed. Change-Id: Ie3f2e4ac7232e3f2e2c1c6c6a62225564faaaf4a Reviewed-on: http://gerrit.cloudera.org:8080/23324 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-21 16:13:03 +00:00
Riza Suminto	9fc941b611	IMPALA-14327: Update load-data.py and run-workload.py to use HS2 load-data.py is used for dataloading while run-workload.py is used for running perf-AB-test. This patch change the script from using beeswax protocol to HS2 protocol. Testing: Run data loading and perf-AB-test-ub2004 based on this patch. Change-Id: I1c3727871b8b2e75c3f10ceabfbe9cb96e36ead3 Reviewed-on: http://gerrit.cloudera.org:8080/23309 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-20 07:20:29 +00:00
Riza Suminto	14ff597e2f	IMPALA-14289: Suppress data race in ThreadTokenAvailableCb TSAN build in RHEL9 hit a data race issue in HdfsScanNode::ThreadTokenAvailableCb from timed_mutex + try_lock_for usage. It seems to be a known false-positive in ThreadSanitizer: https://github.com/google/sanitizers/issues/1620 https://github.com/llvm/llvm-project/issues/142370 This patch suppress the TSAN error in ThreadTokenAvailableCb. Testing: Pass dataloading and BE tests in TSAN in RHEL9. Change-Id: I87950cdc3fedc8d80adeb788c6d29791db58242a Reviewed-on: http://gerrit.cloudera.org:8080/23281 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-12 21:52:57 +00:00
jasonmfehr	2ad6f818a5	IMPALA-13237: [Patch 5] - Implement OpenTelemetry Traces for Select Queries Tracking Adds representation of Impala select queries using OpenTelemetry traces. Each Impala query is represented as its own individual OpenTelemetry trace. The one exception is retried queries which will have an individual trace for each attempt. These traces consist of a root span and several child spans. Each child span has the root as its parent. No child span has another child span as its parent. Each child span represents one high-level query lifecycle stage. Each child span also has span attributes that further describe the state of the query. Child spans: 1. Init 2. Submitted 3. Planning 4. Admission Control 5. Query Execution 6. Close Each child span contains a mix of universal attributes (available on all spans) and query phase specific attributes. For example, the "ErrorMsg" attribute, present on all child spans, is the error message (if any) at the end of that particular query phase. One example of a child span specific attribute is "QueryType" on the Planning span. Since query type is first determined during query planning, the "QueryType" attribute is present on the Planning span and has a value of "QUERY" (since only selects are supported). Since queries can run for lengthy periods of time, the Init span communicates the beginning of a query along with global query attributes. For example, span attributes include query id, session id, sql, user, etc. Once the query has closed, the root span is closed. Testing accomplished with new custom cluster tests. Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7) Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b Reviewed-on: http://gerrit.cloudera.org:8080/22924 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-12 04:11:06 +00:00
zhangyifan27	f0757418c8	IMPALA-14257: Support set USE_APACHE_* when USE_APACHE_COMPONENTS=false Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_* variables, but we should support using specific apache components. After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_ {HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise, we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}. Test: - Built and ran a test cluster with setting USE_APACHE_HIVE=true and USE_APACHE_COMPONENTS=false. Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc Reviewed-on: http://gerrit.cloudera.org:8080/23211 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-11 12:44:26 +00:00
jasonmfehr	19f662301c	IMPALA-14214: [Addendum] - Ensure IMPALA_TOOLCHAIN_COMMIT_HASH Matches Build IDs Adds verification code to ensure the IMPALA_TOOLCHAIN_COMMIT_HASH environment variable matches the commit hash in the IMPALA_TOOLCHAIN_BUILD_ID_AARCH64 and IMPALA_TOOLCHAIN_BUILD_ID_X86_64 environment variables. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: I348698356a014413875f6b8b54a005bf89b9793a Reviewed-on: http://gerrit.cloudera.org:8080/23243 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-05 06:14:28 +00:00
jasonmfehr	7b7e7709aa	IMPALA-14214: Correct IMPALA_TOOLCHAIN_COMMIT_HASH Fixes the default value of the IMPALA_TOOLCHAIN_COMMIT_HASH environment variable to be the correct hash. Change-Id: I98824f363334a15e4f91c0b3f51fa09a5d15c241 Reviewed-on: http://gerrit.cloudera.org:8080/23233 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-08-04 01:22:23 +00:00

1 2 3 4 5 ...

1574 Commits