Commit Graph

615 Commits

Author SHA1 Message Date
Riza Suminto
3ed2a82a95 IMPALA-14606: Stop building impala-shell for Python 2
This patch stop setting up and building impala-shell for Python 2.
A more thorough clean up will be done in the future.

Testing:
Pass build and test/shell/ in RHEL8.

Change-Id: Ic7d59b283f4e2f011880ff6221d550b52714a538
Reviewed-on: http://gerrit.cloudera.org:8080/23750
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-10 04:40:46 +00:00
jichen0919
7e29ac23da IMPALA-14092 Part2: Support querying of paimon data table via JNI
This patch mainly implement the querying of paimon data table
through JNI based scanner.

Features implemented:
- support column pruning.
The partition pruning and predicate push down will be submitted
as the third part of the patch.

We implemented this by treating the paimon table as normal
unpartitioned table. When querying paimon table:
- PaimonScanNode will decide paimon splits need to be scanned,
  and then transfer splits to BE do the jni-based scan operation.

- We also collect the required columns that need to be scanned,
  and pass the columns to Scanner for column pruning. This is
  implemented by passing the field ids of the columns to BE,
  instead of column position to support schema evolution.

- In the original implementation, PaimonJniScanner will directly
  pass paimon row object to BE, and call corresponding paimon row
  field accessor, which is a java method to convert row fields to
  impala row batch tuples. We find it is slow due to overhead of
  JVM method calling.
  To minimize the overhead, we refashioned the implementation,
  the PaimonJniScanner will convert the paimon row batches to
  arrow recordbatch, which stores data in offheap region of
  impala JVM. And PaimonJniScanner will pass the arrow offheap
  record batch memory pointer to the BE backend.
  BE PaimonJniScanNode will directly read data from JVM offheap
  region, and convert the arrow record batch to impala row batch.

  The benchmark shows the later implementation is 2.x better
  than the original implementation.

  The lifecycle of arrow row batch is mainly like this:
  the arrow row batch is generated in FE,and passed to BE.
  After the record batch is imported to BE successfully,
  BE will be in charge of freeing the row batch.
  There are two free paths: the normal path, and the
  exception path. For the normal path, when the arrow batch
  is totally consumed by BE, BE will call jni to fetch the next arrow
  batch. For this case, the arrow batch is freed automatically.
  For the exceptional path, it happends when query  is cancelled, or memory
  failed to allocate. For these corner cases, arrow batch is freed in the
  method close if it is not totally consumed by BE.

Current supported impala data types for query includes:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE

TODO:
    - Patches pending submission:
        - Support tpcds/tpch data-loading
          for paimon data table.
        - Virtual Column query support for querying
          paimon data table.
        - Query support with time travel.
        - Query support for paimon meta tables.
    - WIP:
        - Snapshot incremental read.
        - Complex type query support.
        - Native paimon table scanner, instead of
          jni based.

Testing:
    - Create tests table in functional_schema_template.sql
    - Add TestPaimonScannerWithLimit in test_scanners.py
    - Add test_paimon_query in test_paimon.py.
    - Already passed the tpcds/tpch test for paimon table, due to the
      testing table data is currently generated by spark, and it is
      not supported by impala now, we have to do this since hive
      doesn't support generating paimon table for dynamic-partitioned
      tables. we plan to submit a separate patch for tpcds/tpch data
      loading and associated tpcds/tpch query tests.
    - JVM Offheap memory leak tests, have run looped tpch tests for
      1 day, no obvious offheap memory increase is observed,
      offheap memory usage is within 10M.

Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Reviewed-on: http://gerrit.cloudera.org:8080/23613
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-12-05 18:19:57 +00:00
ttttttz
5d1f1e0180 IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.

Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-03 13:38:45 +00:00
jichen0919
685745f785 IMPALA-14579: Bump up paimon version to 1.3.1 for CVE-2025-46762
This patch mainly fix the CVE-2025-46762 by bumping up paimon
version to 1.3.1.

Background:
Following PR: https://github.com/apache/incubator-paimon/pull/6363
has been merged by paimon community since paimon-1.3.0. So in
impala, need to upgrade paimon version to 1.3.0 or later to fix the
CVE as well.

Testing:
- All paimon related tests are passed.

Change-Id: Ie8052f71a5e2a4e39b0ac39b6d349e55f10092bc
Reviewed-on: http://gerrit.cloudera.org:8080/23717
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-26 16:55:30 +00:00
Zoltan Borok-Nagy
5ea4dc342e IMPALA-14565: Update Apache component versions after CDP_BUILD_NUMBER bump to 71942734
CDP_BUILD_NUMBER was bumped to 71942734 which upgraded Iceberg to
version 1.5.2. We should update our Apache component dependencies
(not just Iceberg) accordingly.

Change-Id: Ic353bbef64a59365b708a20bd0d5ed502cb6d44e
Reviewed-on: http://gerrit.cloudera.org:8080/23678
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-21 01:40:05 +00:00
Riza Suminto
64c4abe6ed IMPALA-14547: Bumping Kudu version to pickup KUDU-3716
Redhat 9 environments recently switched to OpenSSL 3.5.1. On those
machines, the Kudu minicluster fails to start up with CSR signature
verification error. KUDU-3716 fixed this issue.

This patch update Toolchain and Kudu version to pick up KUDU-3716.

Testing:
Pass data loading with in Redhat 9.

Change-Id: I7262267939a9f08650af85443240950afbb3323f
Reviewed-on: http://gerrit.cloudera.org:8080/23697
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-20 15:16:57 +00:00
Zoltan Borok-Nagy
275f03f10d IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2
This patch updates CDP_BUILD_NUMBER to 71942734 to in order to
upgrade Iceberg to 1.5.2.

This patch updates some tests so they pass with Iceberg 1.5.2. The
behavior changes of Iceberg 1.5.2 are (compared to 1.3.1):
 * Iceberg V2 tables are created by default
 * Metadata tables have different schema
 * Parquet compression is explicitly set for new tables (even for ORC
   tables)
 * Sequence numbers are assigned a bit differently

Updated the tests where needed.

Code changes to accomodate for the above behavior changes:
 * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables
 * CREATE TABLE statements don't throw errors when Parquet compression
   is set for ORC tables

Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e
Reviewed-on: http://gerrit.cloudera.org:8080/23670
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-14 01:27:45 +00:00
Riza Suminto
0572dba245 IMPALA-14529: Bumping Kudu version to pickup latest KUDU-1261 patch
This commit bump Impala toolchain to pickup latest Kudu version up to
commit 60f5e5267b92c39485a66121d3ce3cc7ef57b0e0 (KUDU-1261 make
ArrayCellMetadataView::Init() more robust).

Change-Id: I68009e5fefd053882f5504cd2520bacb189a1b04
Reviewed-on: http://gerrit.cloudera.org:8080/23631
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-11-05 16:41:51 +00:00
Michael Smith
599b89306d IMPALA-13145: Upgrade mold to 2.40.4
Upgrades mold to the latest release.

Change-Id: If926b8065cccc4c9038c064c274b6ba97fdc2888
Reviewed-on: http://gerrit.cloudera.org:8080/23582
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-10-27 15:05:01 +00:00
Michael Smith
98f993da43 IMPALA-14478: Add CDP ORC build
Adds CDP_ORC_JAVA_VERSION so we can build and test with Apache or CDP
versions of ORC.

Change-Id: Id9ba78051aff9c9129c244b1734b6f8a523858b5
Reviewed-on: http://gerrit.cloudera.org:8080/23506
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-10-08 23:34:55 +00:00
Riza Suminto
3d61c5ea9f IMPALA-14476: Workaround TSAN issue in KuduClient
Since the toolchain was bumped to pick up Kudu's array column
feature (KUDU-1261), Impala's TSAN builds on the master branch
consistently break during dataload with a data race detected by TSAN.

The source of data race lies within libkudu_client.so and only trigger
if Impala build machine has both ipv4 and ipv6 associated with
localhost. Until the exact root cause is found and fixed, this patch
workaround the TSAN issue by fixing KUDU_MASTER_HOSTS env var to
127.0.0.1.

Testing:
Run TSAN build and confirm no data race error is emmitted.

Change-Id: I511ab625d18c6007567083557fcdf98980a6ac6f
Reviewed-on: http://gerrit.cloudera.org:8080/23507
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-10-08 14:40:50 +00:00
Riza Suminto
a2e4463fbc IMPALA-14471: Bump up KUDU_VERSION to pick up complex types
This patch update Impala toolchain Kudu to 16689973a
to pick up Kudu array column feature (KUDU-1261).

Change-Id: Ib151d4ea6852e8ba8ae92697bd6806a074e37159
Reviewed-on: http://gerrit.cloudera.org:8080/23492
Reviewed-by: Alexey Serbin <alexey@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-10-04 06:07:09 +00:00
Joe McDonnell
e1b3c1445e IMPALA-13472: Bump toolchain to fix minidump stacks on ARM
Minidump stack resolution does not work on Redhat8 ARM64.
Redhat8 ARM64 uses 64KB pages, and the Breakpad library does
not properly handle collecting stacks for that configuration.
Breakpad rounds off the stack pointer to the nearest page
boundary below the stack pointer, then collects up to 32KB of
stack memory. With a top-down stack, this means it is collecting
some memory that is not used by the stack. With 64KB pages,
the memory it collects usually doesn't contain any stack contents.

This picks up a toolchain with Breakpad patched to fix this. The
patch stops rounding the stack pointer to the nearest page.
Instead, it adjusts the stack pointer to account for the red
zone (128 bytes on x86_64) and then rounds to the nearest 1KB
boundary below the stack pointer.

Testing:
 - Produced and resolved minidumps on multiple build types for
   x86_64 and ARM64 (release, debug, asan, ubsan)

Change-Id: I4fbd91abfbddfd8355d27ae9d9b86b70a9ce0409
Reviewed-on: http://gerrit.cloudera.org:8080/23465
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-25 23:44:31 +00:00
Michael Smith
52b87fcefd IMPALA-14454: Exclude log4j 2 dependencies
While we use reload4j, we can safely exclude log4j 2 dependencies to
reduce the size of our artifacts.

Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819
Reviewed-on: http://gerrit.cloudera.org:8080/23439
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-09-24 18:04:06 +00:00
Michael Smith
e5afebc0c1 IMPALA-14450: (Addendum) Fix other numeric comparison
Fixes

    set-impala-java-tool-options.sh: line 25: ((: 1.8: syntax error:
    invalid arithmetic operator (error token is ".8")

Double parentheses - ((...)) - only support integer arithmetic. I can't
find any standard way to do decimal comparison in shells, so switch to
extract Java major version as an integer and compare that.

OpenJDK 8 has always considered "-target 1.8" and "-target 8" equivalent
https://github.com/openjdk/jdk/blob/jdk8-b01/langtools/src/share/classes/com/sun/tools/javac/jvm/Target.java#L105
so maven target can be set to 8 when IMPALA_JAVA_TARGET is 8.

Change-Id: I15cdd1859be51d3708f1c348e898831df2a92b13
Reviewed-on: http://gerrit.cloudera.org:8080/23452
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-23 03:42:29 +00:00
Michael Smith
5137bb94ac IMPALA-14446: Clean up pom.xml
Cleans up repetitive patterns in pom.xml.

Centralize plugin configuration in pluginManagement. Replace inline
maven-compiler-plugin configuration with newer maven.compiler.release
and update to latest plugin version.

Centralize common dependencies in dependencyManagement, including
exclusions when appropriate. Remove exclusions that are no longer
relevant.

Compared before and after with dependency:tree; only difference is that
commons-cli now comes from hadoop and jersey-serv{let,er} are
effectively excluded; all versions matched. Also ensured
USE_APACHE_COMPONENTS=true compiles.

Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure
it's not accidentally included alongside impala-minimal-s3a-aws-sdk.

Removes missed io.netty exclusion from IMPALA-12816.

Updates commons-dbcp2 to 2.12.0 to match Hive.

Change-Id: If96649840e23036b4a73ee23e8d12516497994f0
Reviewed-on: http://gerrit.cloudera.org:8080/23432
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-23 02:50:22 +00:00
Laszlo Gaal
57eb5f653b IMPALA-14449, IMPALA-14269: Fix Red Hat / Rocky 9 builds, ORC buffer overflow
Downstream error reports pointed out that the toolchain version picked
up for IMPALA-14139 contains toolchain binaries for Red Hat 9 (and
compatibles) that require at least the 9.5 minor version because of
OpenSSL library requirements. This was caused by the toolchain binary
build process not using package repo pinning for the redhat9 build
container definition, which caused the container process to install
"latest" packages, in this case packages released in Rocky / Red Hat
9.5.

This patch bumps the toolchain ID to a version in which the redhat9
binaries were produced in a build container "moved back in time" to the
9.2 release by pinning the package repos to the Rocky Linux 9.2 state,
using the Rocky Vault.

The patch also picks up a buffer overflow mitigation for the ORC
library.

Change-Id: I5c6921afdc69a4a6644b619de6b8d4e4cc69e601
Reviewed-on: http://gerrit.cloudera.org:8080/23448
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-22 19:54:25 +00:00
Michael Smith
8a80ede69b IMPALA-14450: (Addendum) Fix numeric comparison
Fix shell comparison to use string equality so it works for all POSIX
shells instead of just zsh.

Change-Id: If9b9ed7f59e71d024ec674bb30c57274567fb2a3
Reviewed-on: http://gerrit.cloudera.org:8080/23444
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-09-19 19:20:30 +00:00
Csaba Ringhofer
0e30792023 IMPALA-14444: Upgrade bouncycastle to 1.79
Change-Id: Ib20c840be2811467716c8de5d2f816a0e5531eb4
Reviewed-on: http://gerrit.cloudera.org:8080/23437
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-19 15:04:46 +00:00
Michael Smith
d217b9ecc6 IMPALA-14450: Simplify Java version selection
Removes IMPALA_JAVA_HOME_OVERRIDE and updates version selection. In
order of priority
1. If IMPALA_JDK_VERSION is set, use the OS JDK version from a known
   location. This is primarily used when also installing the JDK as part
   of automated builds.
2. If JAVA_HOME is set, use it.
3. Look for the system default JDK.

The IMPALA_JDK_VERSION variable is no longer modified to avoid issues
when sourcing impala-config.sh multiple times. JAVA_HOME will be
modified if IMPALA_JDK_VERSION is set; both must be unset to restore
using the system default Java.

If switching between JDKs, now prefer setting JAVA_HOME. If relying on
system Java, unset JAVA_HOME after e.g. update-java-alternatives.

The detected Java version is set in IMPALA_JAVA_TARGET, which is used to
add Java 9+ options and configure the Java compilation target.

Eliminates IMPALA_JDK_VERSION_NUM as it's value was always identical to
IMPALA_JAVA_TARGET.

Stops printing from impala-config-java.sh. It made the output from
impala-config.sh look strange, and the decisions can all be clearly
determined from impala-config.sh printed variables later or the packages
installed in bootstrap_system.sh.

Fixes JAVA_HOME in bootstrap_build.sh on ARM64 systems.

Change-Id: I68435ca69522f8310221a0f3050f13d86568b9da
Reviewed-on: http://gerrit.cloudera.org:8080/23434
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-19 01:51:47 +00:00
pranav.lodha
0513c071b4 IMPALA-14151: Update jackson.core
Bump IMPALA_JACKSON_VERSION from 2.15.3 to 2.18.1
as a part of maintenance upgrade to pick up
fixes and improvements in the 2.18.x line.

Change-Id: I7b63d8d58011c0dd1c00c72da386ec1b0fbc4d82
Reviewed-on: http://gerrit.cloudera.org:8080/23102
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2025-09-17 23:50:05 +00:00
Laszlo Gaal
89d2b23509 IMPALA-14139: Enable Impala builds on Ubuntu 24.04
Update the following elements of the Impala build environment to enable
builds on Ubuntu 24.04:

- Recognize and handle (where necessary) Ubuntu 24.04 in various
  bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.)
- Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains
  Ubuntu 24.04-specific binary packages
- Bump binutils to 2.42, and
- Bump the GDB version to 12.1-p1, as required by the new toolchain
  version
- Update unique_ptr usage syntax in  be/src/util/webserver-test.cc to
  compensate for new GLIBC funtion prototypes:

System headers in Ubuntu 24.04 adopted attributes on several widely
used function prototypes. Such attributes are not considered to be part
of the function's signature during template evaluation, so GCC throws a
warning when such a function is passed as a template argument, which
breaks the build, as warnings are treated as errors.

webserver-test.cc uses pclose() as the deleter for a unique_ptr in a
utility function. This patch encapsulates pclose() and its attributes in
an explicit specialization for std::default_delete<>, "hiding" the
attributes inside a functor.

The particular solution was inspired by Anton-V-K's proposal in
https://gist.github.com/t-mat/5849549

This commit builds on an earlier patch for the same purpose by Michael
Smith: https://gerrit.cloudera.org/c/23058/

Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2
Reviewed-on: http://gerrit.cloudera.org:8080/23384
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-09-15 16:10:42 +00:00
jichen0919
826c8cf9b0 IMPALA-14081: Support create/drop paimon table for impala
This patch mainly implement the creation/drop of paimon table
through impala.

Supported impala data types:
- BOOLEAN
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- STRING
- DECIMAL(P,S)
- TIMESTAMP
- CHAR(N)
- VARCHAR(N)
- BINARY
- DATE

Syntax for creating paimon table:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(
[col_name data_type ,...]
[PRIMARY KEY (col1,col2)]
)
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
STORED AS PAIMON
[LOCATION 'hdfs_path']
[TBLPROPERTIES (
'primary-key'='col1,col2',
'file.format' = 'orc/parquet',
'bucket' = '2',
'bucket-key' = 'col3',
];

Two types of paimon catalogs are supported.

(1) Create table with hive catalog:

CREATE TABLE paimon_hive_cat(userid INT,movieId INT)
STORED AS PAIMON;

(2) Create table with hadoop catalog:

CREATE [EXTERNAL] TABLE paimon_hadoop_cat
STORED AS PAIMON
TBLPROPERTIES('paimon.catalog'='hadoop',
'paimon.catalog_location'='/path/to/paimon_hadoop_catalog',
'paimon.table_identifier'='paimondb.paimontable');

SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES
statements are also supported.

TODO:
    - Patches pending submission:
        - Query support for paimon data files.
        - Partition pruning and predicate push down.
        - Query support with time travel.
        - Query support for paimon meta tables.
    - WIP:
        - Complex type query support.
        - Virtual Column query support for querying
          paimon data table.
        - Native paimon table scanner, instead of
          jni based.
Testing:
    - Add unit test for paimon impala type conversion.
    - Add unit test for ToSqlTest.java.
    - Add unit test for AnalyzeDDLTest.java.
    - Update default_file_format TestEnumCase in
      be/src/service/query-options-test.cc.
    - Update test case in
      testdata/workloads/functional-query/queries/QueryTest/set.test.
    - Add test cases in metadata/test_show_create_table.py.
    - Add custom test test_paimon.py.

Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Reviewed-on: http://gerrit.cloudera.org:8080/22914
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-09-10 21:24:49 +00:00
Riza Suminto
28cff4022d IMPALA-14333: Run impala-py.test using Python3
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-03 10:01:29 +00:00
Sai Hemanth Gantasala
b67a9cecb3 IMPALA-13593: Enable event processor to consume ALTER_PARTITIONS events
from metastore

HIVE-27746 introduced ALTER_PARTITIONS event type which is an
optimization of reducing the bulk ALTER_PARTITION events into a single
event. The components version is updated to pick up this change. It
would be a good optimization to include this in Impala so that the
number of events consumed by event processor would be significantly
reduced and help event processor to catch up with events quickly.

This patch enables the ability to consume ALTER_PARTITIONS event. The
downside of this patch is that, there is no before_partitions object in
the event message. This can cause partitions to be refreshed even on
trivial changes to them. HIVE-29141 will address this concern.

Testing:
- Added an end-to-end test to verify consuming the ALTER_PARTITIONS
event. Also, bigger time outs were added in this test as there was
flakiness observed while looping this test several times.

Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737
Reviewed-on: http://gerrit.cloudera.org:8080/22554
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-08-28 06:53:32 +00:00
Daniel Becker
991c0d5cf3 IMPALA-14326: Update commons-lang3 to version 3.18.0
Update commons-lang3 from version 3.17.0 to 3.18.0.

Testing:
 - Core tests passed.

Change-Id: Ie3f2e4ac7232e3f2e2c1c6c6a62225564faaaf4a
Reviewed-on: http://gerrit.cloudera.org:8080/23324
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-21 16:13:03 +00:00
jasonmfehr
2ad6f818a5 IMPALA-13237: [Patch 5] - Implement OpenTelemetry Traces for Select Queries Tracking
Adds representation of Impala select queries using OpenTelemetry
traces.

Each Impala query is represented as its own individual OpenTelemetry
trace. The one exception is retried queries which will have an
individual trace for each attempt. These traces consist of a root span
and several child spans. Each child span has the root as its parent.
No child span has another child span as its parent. Each child span
represents one high-level query lifecycle stage. Each child span also
has span attributes that further describe the state of the query.

Child spans:
  1. Init
  2. Submitted
  3. Planning
  4. Admission Control
  5. Query Execution
  6. Close

Each child span contains a mix of universal attributes (available on
all spans) and query phase specific attributes. For example, the
"ErrorMsg" attribute, present on all child spans, is the error
message (if any) at the end of that particular query phase. One
example of a child span specific attribute is "QueryType" on the
Planning span. Since query type is first determined during query
planning, the "QueryType" attribute is present on the Planning span
and has a value of "QUERY" (since only selects are supported).

Since queries can run for lengthy periods of time, the Init span
communicates the beginning of a query along with global query
attributes. For example, span attributes include query id, session
id, sql, user, etc.

Once the query has closed, the root span is closed.

Testing accomplished with new custom cluster tests.

Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7)
Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b
Reviewed-on: http://gerrit.cloudera.org:8080/22924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-12 04:11:06 +00:00
zhangyifan27
f0757418c8 IMPALA-14257: Support set USE_APACHE_* when USE_APACHE_COMPONENTS=false
Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_*
variables, but we should support using specific apache components.

After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_
{HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise,
we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}.

Test:
 - Built and ran a test cluster with setting USE_APACHE_HIVE=true
and USE_APACHE_COMPONENTS=false.

Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc
Reviewed-on: http://gerrit.cloudera.org:8080/23211
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-11 12:44:26 +00:00
jasonmfehr
19f662301c IMPALA-14214: [Addendum] - Ensure IMPALA_TOOLCHAIN_COMMIT_HASH Matches Build IDs
Adds verification code to ensure the IMPALA_TOOLCHAIN_COMMIT_HASH
environment variable matches the commit hash in the
IMPALA_TOOLCHAIN_BUILD_ID_AARCH64 and
IMPALA_TOOLCHAIN_BUILD_ID_X86_64 environment variables.

Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I348698356a014413875f6b8b54a005bf89b9793a
Reviewed-on: http://gerrit.cloudera.org:8080/23243
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-05 06:14:28 +00:00
jasonmfehr
7b7e7709aa IMPALA-14214: Correct IMPALA_TOOLCHAIN_COMMIT_HASH
Fixes the default value of the IMPALA_TOOLCHAIN_COMMIT_HASH
environment variable to be the correct hash.

Change-Id: I98824f363334a15e4f91c0b3f51fa09a5d15c241
Reviewed-on: http://gerrit.cloudera.org:8080/23233
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-08-04 01:22:23 +00:00
jasonmfehr
5bad0daf72 IMPALA-14214: Compile OpenTelemetry-cpp Against STDLIB
Consumes the new toolchain builds that compiled the OpenTelemetry-cpp
SDK libraries against the standard C++ library instead of the SDK's
nostd translation layer.

Change-Id: Icf06710d5f7987f43cb8bae5450b657f251f199b
Reviewed-on: http://gerrit.cloudera.org:8080/23192
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
2025-07-22 15:43:41 +00:00
jasonmfehr
fe1a78d16e IMPALA-13235: [Patch 3 of 5] - Consume OpenTelemetry C++ SDK
Adds the OpenTelemetry C++ SDK version 1.20.0 from the toolchain into
the cmake files for consumption during builds.

Testing was accomplished by building locally and in Jenkins.

Generated-by: Github Copilot (GPT-4.1)
Change-Id: Ib30123f79270e3f11233e28a2a34725e7d455f5e
Reviewed-on: http://gerrit.cloudera.org:8080/23101
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-11 23:39:23 +00:00
jfehr
3475e6b506 IMPALA-13235: Consume Latest Toolchain Builds
Switches to the toolchain builds that contain the
OpenTelemetry C++ SDK.

Change-Id: I9b844c27e5b732055a38613f03a1546b3d4491cc
Reviewed-on: http://gerrit.cloudera.org:8080/23046
Reviewed-by: gaurav singh <gsingh@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-06-25 15:42:29 +00:00
pranav.lodha
078677e67a IMPALA-14149: Update guava from 28.1-jre to 32.1.2-jre
Update guava from 28.1-jre to 32.1.2-jre due to the following CVEs:
CVE-2020-8908, CVE-2023-2976.

Change-Id: I4e8bb7c7963ae7c52a8f12fa8529122e662c5def
Reviewed-on: http://gerrit.cloudera.org:8080/23029
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-23 04:02:16 +00:00
pranav.lodha
a5b651660a IMPALA-14150: Update slf4j-api from 2.0.3 to 2.0.13
Updating slf4j to the latest version.

Change-Id: I55ec1414b6a0b0452f1baabe582cdadb465eaab5
Reviewed-on: http://gerrit.cloudera.org:8080/23030
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-18 03:04:11 +00:00
Riza Suminto
58245f3706 IMPALA-14143: Remove unshaded Hbase jars from AUX_CLASSPATH
HBase jars are added into AUX_CLASSPATH in impala-config.sh so that Hive
can write into HBase. Newer Hive version already have
hbase-shaded-mapreduce jar included. Thus, it is not necessary to add
unshaded jar to AUX_CLASSPATH. Adding the unshaded jars can lead to
conflict in downstream build.

Testing:
- Run and pass dataload.
- Pass custom_cluster/test_hbase_hms_column_order.py and
  query_test/test_hbase_queries.py.

Change-Id: I4caf37571a8bc2543bbc58071e5cb7046f216fa9
Reviewed-on: http://gerrit.cloudera.org:8080/23022
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-06-16 14:34:36 +00:00
Joe McDonnell
2560487700 IMPALA-13952: Update curl version to 8.14.1
This bumps the curl version to the latest (8.14.1) to resolve
some minor CVEs. See https://curl.se/docs/security.html

This also incorporates a newer toolchain with the fix for
IMPALA-14129, bumping the patch level on hadoop-client.

Testing:
 - Ran precommit

Change-Id: Ia488b381f0cd9f4e6d239d265a897be1ab96915e
Reviewed-on: http://gerrit.cloudera.org:8080/23013
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-06-13 17:19:01 +00:00
Joe McDonnell
935a5e2b8d IMPALA-14134: Switch to newer versions of zlib / cloudflare zlib
This moves from zlib 1.2.13 to zlib 1.3.1 and bumps cloudflare
zlib to a newer version. This does not require any update to the
toolchain, because these newer versions were already present.

Testing:
 - Ran a perf-AB-test with no major difference in performance

Change-Id: I09ec358ea49198485d53e85eae7d0b61beac3308
Reviewed-on: http://gerrit.cloudera.org:8080/22993
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2025-06-11 18:31:11 +00:00
Daniel Becker
067b25e526 IMPALA-14067: Bump glog version to 0.6.0 in Impala
Some minor changes were needed on the Impala side because of changes in
glog (for example some variables and function parameters were changed
from signed to unsigned integer types).

Testing:
 - passed exhaustive DEBUG tests
 - core ASAN tests

Change-Id: Ifbe341265fd7aa7be8fe304b9fda31b4470237cf
Reviewed-on: http://gerrit.cloudera.org:8080/22906
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-22 04:20:40 +00:00
Joe McDonnell
1157d6e10f IMPALA-13479: Patch gperftools to remove 1GB limit on thread caches
Upstream gperftools does not allow setting tcmalloc.max_total_thread_cache_bytes
to greater than 1GB. This moves to a new toolchain that has patched
gperftools to remove this limitation and allow setting
tcmalloc.max_total_thread_cache_bytes > 1GB. This also reads back the
value from tcmalloc and prints a warning if it doesn't match what we set.

Testing:
 - Set tcmalloc_max_total_thread_cache_bytes to 2GB and verified that
   the warning message doesn't appear. On unpatched versions of
   gperftools, the warning message does appear.

Change-Id: If78c8734c704090c12737a8c2a8456b73ea4b8e8
Reviewed-on: http://gerrit.cloudera.org:8080/22834
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-05-01 16:41:09 +00:00
Laszlo Gaal
e6078b4281 IMPALA-13825: Extend Docker container build to custom base images
Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.

This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.

Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.

The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
  If set to 'true', triggers the use of  the custom image.
  When set to 'false' or left unspecified, the Docker base image is
  selected by the existing logic of matching the build platform's
  operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image

These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.

The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.

A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.

To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:

- synchronizes every launched container's timezone to the host.
  This is needed for Iceberg time-travel test, which create timestamped
  Iceberg metadata items in the impalad context inside a container, but
  check creation/modification times of the same items in the test scripts
  running on the host, outside the containers. The tests scripts have
  the implicit expectation that the same local time is shared across
  all these contexts, but this is not necessarily true if the host,
  where tests are running is set to a timezone other than UTC.

  Time sycnhronization is achieved by injecting the TZ environment
  variable into the container, holding the name of the timezone used
  on the host. The timezone name is taken either from the host's TZ
  variable (if set), or from the host's /etc/localtime symlink,
  checking the name of the timezone file it points to.
  In case /etc/localtime is not a symlink (and TZ is not set on the
  host), the host's /etc/localtime file is mounted into the container.

- sets up a directory for each container to collect the Java VMs error
  files (hs_err_pidNNNN.log) from the containers.

- adds the --mount_sources command line parameter, which mounts the
  complete $IMPALA_HOME subtree into the container at
  /opt/impala/sources to make source code available inside the container
  for easier debugging.

Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
  containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
  containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
  containers) using Wolfi's wolfi-base containers

Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-28 13:40:38 +00:00
Peter Rozsa
1f70269392 IMPALA-13838: Update Impala version to 5.0.0-SNAPSHOT
Change-Id: I9c5a2d817b30e14333feeb5b2de3e0c40795723f
Reviewed-on: http://gerrit.cloudera.org:8080/22596
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-08 14:13:48 +00:00
Pranav Lodha
4c549d79f2 IMPALA-12992: Support for Hive JDBC Storage handler tables
This is an enhancement request to support JDBC tables
created by Hive JDBC Storage handler. This is essentially
done by making JDBC table properties compatible with
Impala. It is done by translating when loading the table,
and maintaining that only in the Impala cluster, i.e. it's
not written back to HMS.

Impala includes JDBC drivers for PostgreSQL and MySQL
making 'driver.url' not mandatory in such cases. The
Impala JDBC driver is still required for Impala-to-Impala
JDBC connections. Additionally, Hive allows adding database
driver JARs at runtime via Beeline, enabling users to
dynamically include JDBC driver JARs. However, Impala does
not support adding database driver JARs at runtime,
making the driver.url field still useful
in cases where additional drivers are needed.

'hive.sql.query' property is not handled in this patch.
It'll be covered in a separate jira.

Testing: End-to-end tests are included in
test_ext_data_sources.py.

Change-Id: I1674b93a02f43df8c1a449cdc54053cc80d9c458
Reviewed-on: http://gerrit.cloudera.org:8080/22134
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-27 11:44:38 +00:00
Michael Smith
2506e849c6 IMPALA-13753: Support Hadoop 3.4
Add org.apache.kerby.kerb-simplekdc as a test dependency and update
upstream Hadoop dependency to 3.4.1.

Change-Id: I4fbce9f783ac1d07a27011d0bfd5f1af988203e0
Reviewed-on: http://gerrit.cloudera.org:8080/22473
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-24 09:43:21 +00:00
Michael Smith
88067c576b IMPALA-13740: Update velocity-engine-core to 2.4.1
Updates velocity-engine-core - required by pac4j - to 2.4.1 to avoid
including a shaded version of commons-io vulnerable to CVE-2024-47554.

Change-Id: I76624851d6f51d1b9d4dd61fc488932a51e9cba0
Reviewed-on: http://gerrit.cloudera.org:8080/22454
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
2025-02-06 16:31:39 +00:00
Laszlo Gaal
5f4321373a IMPALA-13662: Bump the ARM toolchain to support ARM builds for RHEL 9
Pick up a new binary build of the current toolchain version for ARM.

The toolchain version is identical, the only difference is that the new
build added binaries for Rocky/RHEL 9 to the already supported OS
versions, reaching the same level of Impala build support as
Rocky/RHEL 8.

Tested by building Impala for RHEL9 for Intel and ARM both on private
infrastructure.

Change-Id: I5fd2e8c3187cb7829de55d6739cf5d68a09a2ed3
Reviewed-on: http://gerrit.cloudera.org:8080/22323
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-10 22:49:31 +00:00
Michael Smith
740ee28eb1 IMPALA-13618: Move to commons-lang3
Updates from commons-lang (2.6) to commons-lang3.

Switches getFullStackTrace to getStackTrace. getFullStackTrace is not
present in lang3, and https://issues.apache.org/jira/browse/LANG-904
suggests that getFullStackTrace existed for handling chained exceptions
in older Java runtimes.

Change-Id: Ie16af2692858f6a571cc1e5b85ecba3806da8d7e
Reviewed-on: http://gerrit.cloudera.org:8080/22228
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-09 07:36:39 +00:00
Michael Smith
30ffc2f493 IMPALA-13619: Update commons-lang3 to 3.17.0
Updates commons-lang3 - used by Thrift and Orc - to 3.17.0, and
provides the IMPALA_COMMONS_LANG3_VERSION environment variable to
override the version.

Change-Id: I4005f8aef1cf66a32840cd0b510cd7faf597f5f2
Reviewed-on: http://gerrit.cloudera.org:8080/22227
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2024-12-18 18:26:13 +00:00
Joe McDonnell
aefd1b0920 IMPALA-13551: Produce the shell tarball by pip installing impala-shell
Currently, the shell tarball maintains its own packaging code
and directory layout. This is very complicated and currently has
several Python packages directly checked into our repository.

To simplify it, this changes the shell tarball to be based on
pip installing the pypi package. Specifically, the new directory
structure for an unpack shell tarball is:
impala-shell-4.5.0-SNAPSHOT/
  impala-shell
  install_py${PYTHON_VERSION}/
  install_py${ANOTHER_PYTHON_VERSION}/
For example, install_py2.7 is the Python 2.7 pip install of impala-shell.
install_py3.8 is a Python 3.8 pip install of impala-shell. This means
that the impala-shell script simply picks the install for the
specified version of python and uses that pip install directory.
To make this more consistent across different Linux distributions, this
upgrades pip in the virtualenv to the latest.

With this, ext-py and pkg_resources.py can be removed.

This requires rearranging the shell build code. Specifically, this splits
out the code that generates impala_build_version.py so that it can run
before generating the pypi package. The shell tarball now has a dependency
on the pypi package and must run after it.

This builds on Michael Smith's work from IMPALA-11399.

Testing:
 - Ran shell tests locally
 - Built on Centos 7, Redhat 8 & 9, Ubuntu 20 & 22, SLES 15

Change-Id: Ifbb66ab2c5bc7180221f98d9bf5e38d62f4ac036
Reviewed-on: http://gerrit.cloudera.org:8080/20171
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-17 22:52:01 +00:00
Joe McDonnell
8d5adfd0ba IMPALA-13123: Add option to run tests with Python 3
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.

This fixes a first batch of Python 2 vs 3 issues:
 - Deciding whether to open a file in bytes mode or text mode
 - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
 - Eliminating 'basestring' and 'unicode' locations in tests/ by using
   the recommendations from future
   ( https://python-future.org/compatible_idioms.html#basestring and
     https://python-future.org/compatible_idioms.html#unicode )
 - Uses impala-python3 for bin/start-impala-cluster.py

All fixes leave the Python 2 path working normally.

Testing:
 - Ran an exhaustive run with Python 2 to verify nothing broke
 - Verified that the new environment variable works and that
   it uses Python 3 from the toolchain when specified

Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-12-17 07:28:51 +00:00