1574 Commits

Author SHA1 Message Date
stiga-huang
68a9630adc IMPALA-14284: Log the actual log files instead of symlinks in start-impala-cluster.py
It's not that easy to find log files of a custom-cluster test. All
custom-cluster tests use the same log dir and the test output just shows
the symlink of the log files, e.g. "Starting State Store logging to
.../logs/custom_cluster_tests/statestored.INFO".

This patch prints the actual log file names after the cluster launchs.
An example output:

15:17:19 MainThread: Starting State Store logging to /tmp/statestored.INFO
15:17:19 MainThread: Starting Catalog Service logging to /tmp/catalogd.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node1.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node2.INFO
...
15:17:24 MainThread: Total wait: 2.54s
15:17:24 MainThread: Actual log file names:
15:17:24 MainThread: statestored.INFO -> statestored.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094348
15:17:24 MainThread: catalogd.INFO -> catalogd.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094368
15:17:24 MainThread: impalad.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094466
15:17:24 MainThread: impalad_node1.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094468
15:17:24 MainThread: impalad_node2.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094470
15:17:24 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors).

Tests
 - Ran the script locally.
 - Ran a failed custom-cluster test and verified the actual file names
   are printed in the output.

Change-Id: Id76c0a8bdfb221ab24ee315e2e273abca4257398
Reviewed-on: http://gerrit.cloudera.org:8080/23781
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-12-18 11:18:41 +00:00
Riza Suminto
d4992d532b Revert "IMPALA-14454: Exclude log4j 2 dependencies"
This reverts commit 52b87fcefd.

The original commit caused an issue when Impala is deployed together
with Apache Atlas. Coordinator failed to start with error message:

java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/Layout

Solved minor conflict in impala-config.sh due to IMPALA-14478 applied
after IMPALA-14454.

Change-Id: I77127db8d833c675c18c30eb3d6542ca906cd2a9
Reviewed-on: http://gerrit.cloudera.org:8080/23788
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-16 00:26:34 +00:00
Michael Smith
c3dc7f9667 IMPALA-13147: Limit concurrency of link jobs
Configure separate compile and link pools for ninja. Configures link
parallelism based on expected memory use, which can be reduced by
setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true.

Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make
operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to
buildall.sh to force using 'make'.

Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and
linking test binaries. However 'mold' is not selected as the default
due to test failures around SASL/GSSAPI (see IMPALA-14527).

Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in
bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer
needed with ninja.

SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with
TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja.

Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests'
with default (make) and IMPALA_MAKE_CMD=ninja.

Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db
Reviewed-on: http://gerrit.cloudera.org:8080/23572
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-15 21:43:07 +00:00
jichen0919
bf517d3323 IMPALA-14610: Bump up arrow version to 15.0.0
The patch bumped up the arrow version to 15.0.0 and use
latest toolchain to fix the arrow jni loading issue for linux on
aarch64 environment.

Background:
We have fixed jni loading issue for aarch64 environment from
native toolchain side in IMPALA-14609. We also need to bump up
arrow version to 15.0.0 and use the toolchain to fix the issue.

Testing:
Built new toolchain and pass paimon test in aarch64
environment.

Change-Id: I7b8dd6ab43cf05b4339880ecec0d1f48e44ef294
Reviewed-on: http://gerrit.cloudera.org:8080/23756
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2025-12-11 16:12:42 +00:00
Riza Suminto
b581e45286 IMPALA-14606: (addendum) Install Python 3 for RHEL8
The first IMPALA-14606 commit miss to setup Python 3 in fresh RHEL8
machine. This was not caught before because I test using downstream
jenkins and it reuse RHEL8 machine that previously setup with Python 2.

This patch fix the issue by skipping pip install argparse that broke the
script and run setup_python3 instead for RHEL8 machine.

Testing:
- Run full bootstrap_system.sh and buildall.sh in fresh RHEL8 machine.

Change-Id: I6df0a534175404fe96d32eeb1e7bf0aa9ca204cd
Reviewed-on: http://gerrit.cloudera.org:8080/23772
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-12-10 17:22:32 +00:00
Riza Suminto
3ed2a82a95 IMPALA-14606: Stop building impala-shell for Python 2
This patch stop setting up and building impala-shell for Python 2.
A more thorough clean up will be done in the future.

Testing:
Pass build and test/shell/ in RHEL8.

Change-Id: Ic7d59b283f4e2f011880ff6221d550b52714a538
Reviewed-on: http://gerrit.cloudera.org:8080/23750
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-10 04:40:46 +00:00
Laszlo Gaal
fe41448780 IMPALA-14603: Force Java alternative after setup on Rocky and Red Hat Linux
Impala allows various Java versions to be selected for its build and
runtime environment when bin/bootstrap_system.sh is used to set up the
environment. Unfortunately this setup failed to affect the current Java
JRE and compiler tools on Red Hat Linux and compatibles (e.g. Rocky
Linux), because bootstrap_system.sh failed to set up the requested
version in the "alternatives" subsystem. The same failure was not
observed on Ubuntu versions, on that platform `update_java_alternatives`
was correctly run for the same purpose.

This patch adds calls to `alternatives` to set the JRE and JDK
environments to the requested version. This benefits automated test runs
in Impala's pre- and post-commit environments as well as individual
workstation setups.

Change-Id: I8972fb35b232830c6d8cf1125a7a8223547bd206
Reviewed-on: http://gerrit.cloudera.org:8080/23741
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-05 21:42:55 +00:00
jichen0919
7e29ac23da IMPALA-14092 Part2: Support querying of paimon data table via JNI
This patch mainly implement the querying of paimon data table
through JNI based scanner.

Features implemented:
- support column pruning.
The partition pruning and predicate push down will be submitted
as the third part of the patch.

We implemented this by treating the paimon table as normal
unpartitioned table. When querying paimon table:
- PaimonScanNode will decide paimon splits need to be scanned,
  and then transfer splits to BE do the jni-based scan operation.

- We also collect the required columns that need to be scanned,
  and pass the columns to Scanner for column pruning. This is
  implemented by passing the field ids of the columns to BE,
  instead of column position to support schema evolution.

- In the original implementation, PaimonJniScanner will directly
  pass paimon row object to BE, and call corresponding paimon row
  field accessor, which is a java method to convert row fields to
  impala row batch tuples. We find it is slow due to overhead of
  JVM method calling.
  To minimize the overhead, we refashioned the implementation,
  the PaimonJniScanner will convert the paimon row batches to
  arrow recordbatch, which stores data in offheap region of
  impala JVM. And PaimonJniScanner will pass the arrow offheap
  record batch memory pointer to the BE backend.
  BE PaimonJniScanNode will directly read data from JVM offheap
  region, and convert the arrow record batch to impala row batch.

  The benchmark shows the later implementation is 2.x better
  than the original implementation.

  The lifecycle of arrow row batch is mainly like this:
  the arrow row batch is generated in FE,and passed to BE.
  After the record batch is imported to BE successfully,
  BE will be in charge of freeing the row batch.
  There are two free paths: the normal path, and the
  exception path. For the normal path, when the arrow batch
  is totally consumed by BE, BE will call jni to fetch the next arrow
  batch. For this case, the arrow batch is freed automatically.
  For the exceptional path, it happends when query  is cancelled, or memory
  failed to allocate. For these corner cases, arrow batch is freed in the
  method close if it is not totally consumed by BE.

Current supported impala data types for query includes:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE

TODO:
    - Patches pending submission:
        - Support tpcds/tpch data-loading
          for paimon data table.
        - Virtual Column query support for querying
          paimon data table.
        - Query support with time travel.
        - Query support for paimon meta tables.
    - WIP:
        - Snapshot incremental read.
        - Complex type query support.
        - Native paimon table scanner, instead of
          jni based.

Testing:
    - Create tests table in functional_schema_template.sql
    - Add TestPaimonScannerWithLimit in test_scanners.py
    - Add test_paimon_query in test_paimon.py.
    - Already passed the tpcds/tpch test for paimon table, due to the
      testing table data is currently generated by spark, and it is
      not supported by impala now, we have to do this since hive
      doesn't support generating paimon table for dynamic-partitioned
      tables. we plan to submit a separate patch for tpcds/tpch data
      loading and associated tpcds/tpch query tests.
    - JVM Offheap memory leak tests, have run looped tpch tests for
      1 day, no obvious offheap memory increase is observed,
      offheap memory usage is within 10M.

Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Reviewed-on: http://gerrit.cloudera.org:8080/23613
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-12-05 18:19:57 +00:00
ttttttz
5d1f1e0180 IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.

Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-03 13:38:45 +00:00
jichen0919
685745f785 IMPALA-14579: Bump up paimon version to 1.3.1 for CVE-2025-46762
This patch mainly fix the CVE-2025-46762 by bumping up paimon
version to 1.3.1.

Background:
Following PR: https://github.com/apache/incubator-paimon/pull/6363
has been merged by paimon community since paimon-1.3.0. So in
impala, need to upgrade paimon version to 1.3.0 or later to fix the
CVE as well.

Testing:
- All paimon related tests are passed.

Change-Id: Ie8052f71a5e2a4e39b0ac39b6d349e55f10092bc
Reviewed-on: http://gerrit.cloudera.org:8080/23717
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-26 16:55:30 +00:00
Joe McDonnell
5eea4f6f79 IMPALA-14559: Ship calcite-planner jar in Impala packages
This adds the java/impala-package Maven project to make it easier
to ship / test the Calcite planner. impala-package has a dependency
on impala-frontend and calcite-planner, so its classpath requires
no extra work when constructing the classpath.

An additional cleanup is that this no longer puts the
impala-frontend-*-tests.jar on the classpath by default. This requires
updating the query event hooks test, as it relies on that jar being
present.

This does not change the default value for the use_calcite_planner
query option, so there is no change in behavior.

Testing:
 - Ran a core job
 - Built docker images and OS packages locally

Change-Id: I81dec2a5b59e279229a735c8bb1a23c77111a793
Reviewed-on: http://gerrit.cloudera.org:8080/23497
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-21 03:36:12 +00:00
Zoltan Borok-Nagy
5ea4dc342e IMPALA-14565: Update Apache component versions after CDP_BUILD_NUMBER bump to 71942734
CDP_BUILD_NUMBER was bumped to 71942734 which upgraded Iceberg to
version 1.5.2. We should update our Apache component dependencies
(not just Iceberg) accordingly.

Change-Id: Ic353bbef64a59365b708a20bd0d5ed502cb6d44e
Reviewed-on: http://gerrit.cloudera.org:8080/23678
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-21 01:40:05 +00:00
Steve Carlin
a6bb0c7c45 IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend
When the --use_calcite_planner=true option is set at the server level,
the queries will no longer go through CalciteJniFrontend. Instead, they
will go through the regular JniFrontend, which is the path that is used
when the query option for "use_calcite_planner" is set.

The CalciteJniFrontend will be removed in a later commit.

This commit also enables fallback to the original planner when an unsupported
feature exception is thrown. This needed to be added to allow the tests to run
properly. During initial database load, there are queries that access complex
columns which throws the unsupported exception.

Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f
Reviewed-on: http://gerrit.cloudera.org:8080/23523
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
Tested-by: Steve Carlin <scarlin@cloudera.com>
2025-11-20 21:08:48 +00:00
Riza Suminto
64c4abe6ed IMPALA-14547: Bumping Kudu version to pickup KUDU-3716
Redhat 9 environments recently switched to OpenSSL 3.5.1. On those
machines, the Kudu minicluster fails to start up with CSR signature
verification error. KUDU-3716 fixed this issue.

This patch update Toolchain and Kudu version to pick up KUDU-3716.

Testing:
Pass data loading with in Redhat 9.

Change-Id: I7262267939a9f08650af85443240950afbb3323f
Reviewed-on: http://gerrit.cloudera.org:8080/23697
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-20 15:16:57 +00:00
Joe McDonnell
3ce0004c12 IMPALA-14512: Remove dependency on sh python package
This modifies bin/single_node_perf_run.py to stop using the sh
python package. It replaces sh with calls to subprocess. It
stops installing sh for both the Python 2 and 3 virtualenvs.

Testing:
 - Ran perf-AB-test job with it and examined the logs

Change-Id: Ic5f9316a5d83c5c0dc37d4a94c55b6a655765fe3
Reviewed-on: http://gerrit.cloudera.org:8080/23600
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-20 03:29:48 +00:00
Joe McDonnell
001263f58a IMPALA-14514: Handle serializing bytes in bin/run-workload.py
On python 3, when Impyla receives a result with a string that is
not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20
has a result that contains invalid UTF-8, so bin/run-workload.py
can fail while trying to dump this to JSON.

This modifies CustomJSONEncoder to handle serializing bytes by
converting it to a string with invalid unicode handled with
backslashes.

Testing:
 - Ran bin/run-workload.py against TPC-DS scale 20

Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea
Reviewed-on: http://gerrit.cloudera.org:8080/23602
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-11-20 03:29:48 +00:00
Peter Rozsa
8eb1d87edc IMPALA-14272: Add extra flags option for coverage_helper.sh
This change adds an optional flag to coverage_helper.sh script that
accepts additional parameters for the wrapped gcovr call.

Tests:
 - manually validated that the script has the original behaviour if the
newly added flag is not set, also if it's set, the parameters are pushed
down correctly.

Change-Id: Iea26c9967b62b06ded6a0cb4c0346f0e789beb80
Reviewed-on: http://gerrit.cloudera.org:8080/23290
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Peter Rozsa <prozsa@cloudera.com>
2025-11-18 07:12:28 +00:00
Zoltan Borok-Nagy
275f03f10d IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2
This patch updates CDP_BUILD_NUMBER to 71942734 to in order to
upgrade Iceberg to 1.5.2.

This patch updates some tests so they pass with Iceberg 1.5.2. The
behavior changes of Iceberg 1.5.2 are (compared to 1.3.1):
 * Iceberg V2 tables are created by default
 * Metadata tables have different schema
 * Parquet compression is explicitly set for new tables (even for ORC
   tables)
 * Sequence numbers are assigned a bit differently

Updated the tests where needed.

Code changes to accomodate for the above behavior changes:
 * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables
 * CREATE TABLE statements don't throw errors when Parquet compression
   is set for ORC tables

Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e
Reviewed-on: http://gerrit.cloudera.org:8080/23670
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-14 01:27:45 +00:00
Xuebin Su
6b6f7e614d IMPALA-14472: Add create/read support for ARRAY column of Kudu
Initial implementation of KUDU-1261 (array column type) recently merged
in upstream Apache Kudu repository. This patch add initial Impala
support for working with Kudu tables having array type columns.

Unlike rows, the elements of a Kudu array are stored in a different
format than Impala. Instead of per-row bit flag for NULL info, values
and NULL bits are stored in separate arrays.

The following types of queries are not supported in this patch:
- (IMPALA-14538) Queries that reference an array column as a table, e.g.
  ```sql
  SELECT item FROM kudu_array.array_int;
  ```
- (IMPALA-14539) Queries that create duplicate collection slots, e.g.
  ```sql
  SELECT array_int FROM kudu_array AS t, t.array_int AS unnested;
  ```

Testing:
- Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest.
- Add EE test test_kudu.py::TestKuduArray.
  Since Impala does not support inserting complex types, including
  array, the data insertion part of the test is achieved through
  custom C++ code kudu-array-inserter.cc that insert into Kudu via
  Kudu C++ client. It would be great if we could migrate it to Python so
  that it can be moved to the same file as the test (IMPALA-14537).
- Pass core tests.

Co-authored-by: Riza Suminto

Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310
Reviewed-on: http://gerrit.cloudera.org:8080/23493
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-08 06:41:07 +00:00
Michael Smith
8ed6d5c3ba IMPALA-14530: Use minimal debug info in Jenkins
Uses IMPALA_MINIMAL_DEBUG_INFO=true in Jenkins
build-all-flag-combinations.sh to reduce memory usage during linking and
avoid OOM kills. This script uses -skiptests to build all test binaries,
but doesn't run them, so debug info is not needed.

Change-Id: I4605b98d8d197e07c2eaac8218ff985c798875ed
Reviewed-on: http://gerrit.cloudera.org:8080/23641
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-06 16:09:56 +00:00
Riza Suminto
0572dba245 IMPALA-14529: Bumping Kudu version to pickup latest KUDU-1261 patch
This commit bump Impala toolchain to pickup latest Kudu version up to
commit 60f5e5267b92c39485a66121d3ce3cc7ef57b0e0 (KUDU-1261 make
ArrayCellMetadataView::Init() more robust).

Change-Id: I68009e5fefd053882f5504cd2520bacb189a1b04
Reviewed-on: http://gerrit.cloudera.org:8080/23631
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-11-05 16:41:51 +00:00
Michael Smith
599b89306d IMPALA-13145: Upgrade mold to 2.40.4
Upgrades mold to the latest release.

Change-Id: If926b8065cccc4c9038c064c274b6ba97fdc2888
Reviewed-on: http://gerrit.cloudera.org:8080/23582
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-10-27 15:05:01 +00:00
Michael Smith
1152eef9bb IMPALA-14501: (Addendum) Fix single node perf run
Fixes open in generate_profile_files to read binary with Python 3,
matching generate_profile_file.

Change-Id: Ibd815e7eb989d7a2bcf52cadfcde4f355c18a148
Reviewed-on: http://gerrit.cloudera.org:8080/23596
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-10-25 17:31:06 +00:00
Joe McDonnell
1913ab46ed IMPALA-14501: Migrate most scripts from impala-python to impala-python3
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3

This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
   doesn't have a main function, it removes the hash-bang and makes
   sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
   (or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
   replaced by the cm-client pypi package and interfaces have changed.
   Rather than migrating the code (which hasn't been used in years), this
   deletes the old code and stops installing cm-api into the virtualenv.
   The code can be restored and revamped if there is any interest in
   interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
   bit-rotted. Some pieces can be run manually, but it can't be fully
   verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
   READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
   version that supports Python 3. The newest version of kazoo requires
   upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
   needing other upgrades.

The two remaining uses of impala-python are:
 - bin/cmake_aux/create_virtualenv.sh
 - bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.

The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)

Testing:
 - Ran core job
 - Ran build + dataload on Centos 7, Redhat 8
 - Manual testing of individual scripts (except some bitrotted areas like the
   random query generator)

Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-22 16:30:17 +00:00
Michael Smith
98f993da43 IMPALA-14478: Add CDP ORC build
Adds CDP_ORC_JAVA_VERSION so we can build and test with Apache or CDP
versions of ORC.

Change-Id: Id9ba78051aff9c9129c244b1734b6f8a523858b5
Reviewed-on: http://gerrit.cloudera.org:8080/23506
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-08 23:34:55 +00:00
Riza Suminto
3d61c5ea9f IMPALA-14476: Workaround TSAN issue in KuduClient
Since the toolchain was bumped to pick up Kudu's array column
feature (KUDU-1261), Impala's TSAN builds on the master branch
consistently break during dataload with a data race detected by TSAN.

The source of data race lies within libkudu_client.so and only trigger
if Impala build machine has both ipv4 and ipv6 associated with
localhost. Until the exact root cause is found and fixed, this patch
workaround the TSAN issue by fixing KUDU_MASTER_HOSTS env var to
127.0.0.1.

Testing:
Run TSAN build and confirm no data race error is emmitted.

Change-Id: I511ab625d18c6007567083557fcdf98980a6ac6f
Reviewed-on: http://gerrit.cloudera.org:8080/23507
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-10-08 14:40:50 +00:00
Riza Suminto
a2e4463fbc IMPALA-14471: Bump up KUDU_VERSION to pick up complex types
This patch update Impala toolchain Kudu to 16689973a
to pick up Kudu array column feature (KUDU-1261).

Change-Id: Ib151d4ea6852e8ba8ae92697bd6806a074e37159
Reviewed-on: http://gerrit.cloudera.org:8080/23492
Reviewed-by: Alexey Serbin <alexey@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-10-04 06:07:09 +00:00
Joe McDonnell
e1b3c1445e IMPALA-13472: Bump toolchain to fix minidump stacks on ARM
Minidump stack resolution does not work on Redhat8 ARM64.
Redhat8 ARM64 uses 64KB pages, and the Breakpad library does
not properly handle collecting stacks for that configuration.
Breakpad rounds off the stack pointer to the nearest page
boundary below the stack pointer, then collects up to 32KB of
stack memory. With a top-down stack, this means it is collecting
some memory that is not used by the stack. With 64KB pages,
the memory it collects usually doesn't contain any stack contents.

This picks up a toolchain with Breakpad patched to fix this. The
patch stops rounding the stack pointer to the nearest page.
Instead, it adjusts the stack pointer to account for the red
zone (128 bytes on x86_64) and then rounds to the nearest 1KB
boundary below the stack pointer.

Testing:
 - Produced and resolved minidumps on multiple build types for
   x86_64 and ARM64 (release, debug, asan, ubsan)

Change-Id: I4fbd91abfbddfd8355d27ae9d9b86b70a9ce0409
Reviewed-on: http://gerrit.cloudera.org:8080/23465
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-25 23:44:31 +00:00
Michael Smith
52b87fcefd IMPALA-14454: Exclude log4j 2 dependencies
While we use reload4j, we can safely exclude log4j 2 dependencies to
reduce the size of our artifacts.

Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819
Reviewed-on: http://gerrit.cloudera.org:8080/23439
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-09-24 18:04:06 +00:00
Michael Smith
e5afebc0c1 IMPALA-14450: (Addendum) Fix other numeric comparison
Fixes

    set-impala-java-tool-options.sh: line 25: ((: 1.8: syntax error:
    invalid arithmetic operator (error token is ".8")

Double parentheses - ((...)) - only support integer arithmetic. I can't
find any standard way to do decimal comparison in shells, so switch to
extract Java major version as an integer and compare that.

OpenJDK 8 has always considered "-target 1.8" and "-target 8" equivalent
https://github.com/openjdk/jdk/blob/jdk8-b01/langtools/src/share/classes/com/sun/tools/javac/jvm/Target.java#L105
so maven target can be set to 8 when IMPALA_JAVA_TARGET is 8.

Change-Id: I15cdd1859be51d3708f1c348e898831df2a92b13
Reviewed-on: http://gerrit.cloudera.org:8080/23452
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-23 03:42:29 +00:00
Michael Smith
5137bb94ac IMPALA-14446: Clean up pom.xml
Cleans up repetitive patterns in pom.xml.

Centralize plugin configuration in pluginManagement. Replace inline
maven-compiler-plugin configuration with newer maven.compiler.release
and update to latest plugin version.

Centralize common dependencies in dependencyManagement, including
exclusions when appropriate. Remove exclusions that are no longer
relevant.

Compared before and after with dependency:tree; only difference is that
commons-cli now comes from hadoop and jersey-serv{let,er} are
effectively excluded; all versions matched. Also ensured
USE_APACHE_COMPONENTS=true compiles.

Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure
it's not accidentally included alongside impala-minimal-s3a-aws-sdk.

Removes missed io.netty exclusion from IMPALA-12816.

Updates commons-dbcp2 to 2.12.0 to match Hive.

Change-Id: If96649840e23036b4a73ee23e8d12516497994f0
Reviewed-on: http://gerrit.cloudera.org:8080/23432
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-23 02:50:22 +00:00
Laszlo Gaal
57eb5f653b IMPALA-14449, IMPALA-14269: Fix Red Hat / Rocky 9 builds, ORC buffer overflow
Downstream error reports pointed out that the toolchain version picked
up for IMPALA-14139 contains toolchain binaries for Red Hat 9 (and
compatibles) that require at least the 9.5 minor version because of
OpenSSL library requirements. This was caused by the toolchain binary
build process not using package repo pinning for the redhat9 build
container definition, which caused the container process to install
"latest" packages, in this case packages released in Rocky / Red Hat
9.5.

This patch bumps the toolchain ID to a version in which the redhat9
binaries were produced in a build container "moved back in time" to the
9.2 release by pinning the package repos to the Rocky Linux 9.2 state,
using the Rocky Vault.

The patch also picks up a buffer overflow mitigation for the ORC
library.

Change-Id: I5c6921afdc69a4a6644b619de6b8d4e4cc69e601
Reviewed-on: http://gerrit.cloudera.org:8080/23448
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-22 19:54:25 +00:00
Michael Smith
8a80ede69b IMPALA-14450: (Addendum) Fix numeric comparison
Fix shell comparison to use string equality so it works for all POSIX
shells instead of just zsh.

Change-Id: If9b9ed7f59e71d024ec674bb30c57274567fb2a3
Reviewed-on: http://gerrit.cloudera.org:8080/23444
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-09-19 19:20:30 +00:00
Csaba Ringhofer
0e30792023 IMPALA-14444: Upgrade bouncycastle to 1.79
Change-Id: Ib20c840be2811467716c8de5d2f816a0e5531eb4
Reviewed-on: http://gerrit.cloudera.org:8080/23437
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-19 15:04:46 +00:00
Michael Smith
d217b9ecc6 IMPALA-14450: Simplify Java version selection
Removes IMPALA_JAVA_HOME_OVERRIDE and updates version selection. In
order of priority
1. If IMPALA_JDK_VERSION is set, use the OS JDK version from a known
   location. This is primarily used when also installing the JDK as part
   of automated builds.
2. If JAVA_HOME is set, use it.
3. Look for the system default JDK.

The IMPALA_JDK_VERSION variable is no longer modified to avoid issues
when sourcing impala-config.sh multiple times. JAVA_HOME will be
modified if IMPALA_JDK_VERSION is set; both must be unset to restore
using the system default Java.

If switching between JDKs, now prefer setting JAVA_HOME. If relying on
system Java, unset JAVA_HOME after e.g. update-java-alternatives.

The detected Java version is set in IMPALA_JAVA_TARGET, which is used to
add Java 9+ options and configure the Java compilation target.

Eliminates IMPALA_JDK_VERSION_NUM as it's value was always identical to
IMPALA_JAVA_TARGET.

Stops printing from impala-config-java.sh. It made the output from
impala-config.sh look strange, and the decisions can all be clearly
determined from impala-config.sh printed variables later or the packages
installed in bootstrap_system.sh.

Fixes JAVA_HOME in bootstrap_build.sh on ARM64 systems.

Change-Id: I68435ca69522f8310221a0f3050f13d86568b9da
Reviewed-on: http://gerrit.cloudera.org:8080/23434
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-19 01:51:47 +00:00
pranav.lodha
0513c071b4 IMPALA-14151: Update jackson.core
Bump IMPALA_JACKSON_VERSION from 2.15.3 to 2.18.1
as a part of maintenance upgrade to pick up
fixes and improvements in the 2.18.x line.

Change-Id: I7b63d8d58011c0dd1c00c72da386ec1b0fbc4d82
Reviewed-on: http://gerrit.cloudera.org:8080/23102
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-09-17 23:50:05 +00:00
Laszlo Gaal
89d2b23509 IMPALA-14139: Enable Impala builds on Ubuntu 24.04
Update the following elements of the Impala build environment to enable
builds on Ubuntu 24.04:

- Recognize and handle (where necessary) Ubuntu 24.04 in various
  bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.)
- Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains
  Ubuntu 24.04-specific binary packages
- Bump binutils to 2.42, and
- Bump the GDB version to 12.1-p1, as required by the new toolchain
  version
- Update unique_ptr usage syntax in  be/src/util/webserver-test.cc to
  compensate for new GLIBC funtion prototypes:

System headers in Ubuntu 24.04 adopted attributes on several widely
used function prototypes. Such attributes are not considered to be part
of the function's signature during template evaluation, so GCC throws a
warning when such a function is passed as a template argument, which
breaks the build, as warnings are treated as errors.

webserver-test.cc uses pclose() as the deleter for a unique_ptr in a
utility function. This patch encapsulates pclose() and its attributes in
an explicit specialization for std::default_delete<>, "hiding" the
attributes inside a functor.

The particular solution was inspired by Anton-V-K's proposal in
https://gist.github.com/t-mat/5849549

This commit builds on an earlier patch for the same purpose by Michael
Smith: https://gerrit.cloudera.org/c/23058/

Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2
Reviewed-on: http://gerrit.cloudera.org:8080/23384
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-09-15 16:10:42 +00:00
jichen0919
826c8cf9b0 IMPALA-14081: Support create/drop paimon table for impala
This patch mainly implement the creation/drop of paimon table
through impala.

Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE

Syntax for creating paimon table:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];

Two types of paimon catalogs are supported.

(1) Create table with hive catalog:

CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;

(2) Create table with hadoop catalog:

CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');

SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.

TODO:
    - Patches pending submission:
        - Query support for paimon data files.
        - Partition pruning and predicate push down.
        - Query support with time travel.
        - Query support for paimon meta tables.
    - WIP:
        - Complex type query support.
        - Virtual Column query support for querying
          paimon data table.
        - Native paimon table scanner, instead of
          jni based.
Testing:
    - Add unit test for paimon impala type conversion.
    - Add unit test for ToSqlTest.java.
    - Add unit test for AnalyzeDDLTest.java.
    - Update default_file_format TestEnumCase in
      be/src/service/query-options-test.cc.
    - Update test case in
      testdata/workloads/functional-query/queries/QueryTest/set.test.
    - Add test cases in metadata/test_show_create_table.py.
    - Add custom test test_paimon.py.

Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-09-10 21:24:49 +00:00
Abhishek Rawat
f4c0c396ff IMPALA-14175: Generate impala-udf-devel package using the build script
Added '-udf_devel_package' option to buildall.sh. This generates
impala-udf-devel rpm which includes udf headers and static libraries -
ImpalaUdf-retail.a and ImpalaUdf-debug.a.

Testing:
- Tested that rpm is generated using build script:
 ./buildall.sh -release_and_debug -notests -udf_devel_package
- Tested that the rpm is also generated using standalone script:
 ./bin/make-impala-udf-devel-rpm.sh
- Generated impala-udf-devel package and tested compiling
impala_udf_samples:
https://github.com/cloudera/impala-udf-samples

Change-Id: I5b85df9c3f680a7e5551f067a97a5650daba9b50
Reviewed-on: http://gerrit.cloudera.org:8080/23060
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-09 22:42:05 +00:00
Michael Smith
db92c88a4c IMPALA-13417: Run mvn clean on all Java projects
Runs mvn clean on all Java subprojects - instead of just ext-data-source
- to avoid build failures when files from other versions of the code and
dependencies are left behind.

Change-Id: I8cf540f90adbff327de98f900059bfa3bbc8ef22
Reviewed-on: http://gerrit.cloudera.org:8080/23374
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-04 04:15:18 +00:00
Riza Suminto
28cff4022d IMPALA-14333: Run impala-py.test using Python3
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-03 10:01:29 +00:00
Sai Hemanth Gantasala
b67a9cecb3 IMPALA-13593: Enable event processor to consume ALTER_PARTITIONS events
from metastore

HIVE-27746 introduced ALTER_PARTITIONS event type which is an
optimization of reducing the bulk ALTER_PARTITION events into a single
event. The components version is updated to pick up this change. It
would be a good optimization to include this in Impala so that the
number of events consumed by event processor would be significantly
reduced and help event processor to catch up with events quickly.

This patch enables the ability to consume ALTER_PARTITIONS event. The
downside of this patch is that, there is no before_partitions object in
the event message. This can cause partitions to be refreshed even on
trivial changes to them. HIVE-29141 will address this concern.

Testing:
- Added an end-to-end test to verify consuming the ALTER_PARTITIONS
event. Also, bigger time outs were added in this test as there was
flakiness observed while looping this test several times.

Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737
Reviewed-on: http://gerrit.cloudera.org:8080/22554
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-08-28 06:53:32 +00:00
Laszlo Gaal
ad7888898b IMPALA-13223: Fix bootstrap-build.sh for platforms without Python2
bin/bootstrap-build.sh did not distinguish between various version of
the Ubuntu platform, and attempted to install unversioned Python
packages (python-dev and python-setuptools) even on newer versions
that don't support Python 2 any longer (e.g. Ubuntu 22.04 and 24.04).
On older Ubuntu versions these packages are still useful, so at this
point it is not feasible just to drop them.

This patch makes these packages optional: they are added to the list of
packages to be installed only if they actually exist for the platform.

The patch also extends the package list with some basic packages that
are needed when bin/bootstrap_build.sh is run inside an Ubuntu 22.04
Docker container.

Tests: ran a compile-only build on Ubuntu 20.04 (still has Python 2) and
on Ubuntu 22.04 (does not support Python 2 any more).

Change-Id: I94ade35395afded4e130b79eab8c27c6171b50d6
Reviewed-on: http://gerrit.cloudera.org:8080/21800
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-21 22:03:52 +00:00
Daniel Becker
991c0d5cf3 IMPALA-14326: Update commons-lang3 to version 3.18.0
Update commons-lang3 from version 3.17.0 to 3.18.0.

Testing:
 - Core tests passed.

Change-Id: Ie3f2e4ac7232e3f2e2c1c6c6a62225564faaaf4a
Reviewed-on: http://gerrit.cloudera.org:8080/23324
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-21 16:13:03 +00:00
Riza Suminto
9fc941b611 IMPALA-14327: Update load-data.py and run-workload.py to use HS2
load-data.py is used for dataloading while run-workload.py is used for
running perf-AB-test. This patch change the script from using beeswax
protocol to HS2 protocol.

Testing:
Run data loading and perf-AB-test-ub2004 based on this patch.

Change-Id: I1c3727871b8b2e75c3f10ceabfbe9cb96e36ead3
Reviewed-on: http://gerrit.cloudera.org:8080/23309
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-20 07:20:29 +00:00
Riza Suminto
14ff597e2f IMPALA-14289: Suppress data race in ThreadTokenAvailableCb
TSAN build in RHEL9 hit a data race issue in
HdfsScanNode::ThreadTokenAvailableCb from timed_mutex + try_lock_for
usage. It seems to be a known false-positive in ThreadSanitizer:

https://github.com/google/sanitizers/issues/1620
https://github.com/llvm/llvm-project/issues/142370

This patch suppress the TSAN error in ThreadTokenAvailableCb.

Testing:
Pass dataloading and BE tests in TSAN in RHEL9.

Change-Id: I87950cdc3fedc8d80adeb788c6d29791db58242a
Reviewed-on: http://gerrit.cloudera.org:8080/23281
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-12 21:52:57 +00:00
jasonmfehr
2ad6f818a5 IMPALA-13237: [Patch 5] - Implement OpenTelemetry Traces for Select Queries Tracking
Adds representation of Impala select queries using OpenTelemetry
traces.

Each Impala query is represented as its own individual OpenTelemetry
trace. The one exception is retried queries which will have an
individual trace for each attempt. These traces consist of a root span
and several child spans. Each child span has the root as its parent.
No child span has another child span as its parent. Each child span
represents one high-level query lifecycle stage. Each child span also
has span attributes that further describe the state of the query.

Child spans:
  1. Init
  2. Submitted
  3. Planning
  4. Admission Control
  5. Query Execution
  6. Close

Each child span contains a mix of universal attributes (available on
all spans) and query phase specific attributes. For example, the
"ErrorMsg" attribute, present on all child spans, is the error
message (if any) at the end of that particular query phase. One
example of a child span specific attribute is "QueryType" on the
Planning span. Since query type is first determined during query
planning, the "QueryType" attribute is present on the Planning span
and has a value of "QUERY" (since only selects are supported).

Since queries can run for lengthy periods of time, the Init span
communicates the beginning of a query along with global query
attributes. For example, span attributes include query id, session
id, sql, user, etc.

Once the query has closed, the root span is closed.

Testing accomplished with new custom cluster tests.

Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7)
Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b
Reviewed-on: http://gerrit.cloudera.org:8080/22924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-12 04:11:06 +00:00
zhangyifan27
f0757418c8 IMPALA-14257: Support set USE_APACHE_* when USE_APACHE_COMPONENTS=false
Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_*
variables, but we should support using specific apache components.

After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_
{HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise,
we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}.

Test:
 - Built and ran a test cluster with setting USE_APACHE_HIVE=true
and USE_APACHE_COMPONENTS=false.

Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc
Reviewed-on: http://gerrit.cloudera.org:8080/23211
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-11 12:44:26 +00:00
jasonmfehr
19f662301c IMPALA-14214: [Addendum] - Ensure IMPALA_TOOLCHAIN_COMMIT_HASH Matches Build IDs
Adds verification code to ensure the IMPALA_TOOLCHAIN_COMMIT_HASH
environment variable matches the commit hash in the
IMPALA_TOOLCHAIN_BUILD_ID_AARCH64 and
IMPALA_TOOLCHAIN_BUILD_ID_X86_64 environment variables.

Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I348698356a014413875f6b8b54a005bf89b9793a
Reviewed-on: http://gerrit.cloudera.org:8080/23243
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-05 06:14:28 +00:00
jasonmfehr
7b7e7709aa IMPALA-14214: Correct IMPALA_TOOLCHAIN_COMMIT_HASH
Fixes the default value of the IMPALA_TOOLCHAIN_COMMIT_HASH
environment variable to be the correct hash.

Change-Id: I98824f363334a15e4f91c0b3f51fa09a5d15c241
Reviewed-on: http://gerrit.cloudera.org:8080/23233
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-08-04 01:22:23 +00:00