impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	3ed2a82a95	IMPALA-14606: Stop building impala-shell for Python 2 This patch stop setting up and building impala-shell for Python 2. A more thorough clean up will be done in the future. Testing: Pass build and test/shell/ in RHEL8. Change-Id: Ic7d59b283f4e2f011880ff6221d550b52714a538 Reviewed-on: http://gerrit.cloudera.org:8080/23750 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-10 04:40:46 +00:00
jichen0919	7e29ac23da	IMPALA-14092 Part2: Support querying of paimon data table via JNI This patch mainly implement the querying of paimon data table through JNI based scanner. Features implemented: - support column pruning. The partition pruning and predicate push down will be submitted as the third part of the patch. We implemented this by treating the paimon table as normal unpartitioned table. When querying paimon table: - PaimonScanNode will decide paimon splits need to be scanned, and then transfer splits to BE do the jni-based scan operation. - We also collect the required columns that need to be scanned, and pass the columns to Scanner for column pruning. This is implemented by passing the field ids of the columns to BE, instead of column position to support schema evolution. - In the original implementation, PaimonJniScanner will directly pass paimon row object to BE, and call corresponding paimon row field accessor, which is a java method to convert row fields to impala row batch tuples. We find it is slow due to overhead of JVM method calling. To minimize the overhead, we refashioned the implementation, the PaimonJniScanner will convert the paimon row batches to arrow recordbatch, which stores data in offheap region of impala JVM. And PaimonJniScanner will pass the arrow offheap record batch memory pointer to the BE backend. BE PaimonJniScanNode will directly read data from JVM offheap region, and convert the arrow record batch to impala row batch. The benchmark shows the later implementation is 2.x better than the original implementation. The lifecycle of arrow row batch is mainly like this: the arrow row batch is generated in FE,and passed to BE. After the record batch is imported to BE successfully, BE will be in charge of freeing the row batch. There are two free paths: the normal path, and the exception path. For the normal path, when the arrow batch is totally consumed by BE, BE will call jni to fetch the next arrow batch. For this case, the arrow batch is freed automatically. For the exceptional path, it happends when query is cancelled, or memory failed to allocate. For these corner cases, arrow batch is freed in the method close if it is not totally consumed by BE. Current supported impala data types for query includes: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE TODO: - Patches pending submission: - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Snapshot incremental read. - Complex type query support. - Native paimon table scanner, instead of jni based. Testing: - Create tests table in functional_schema_template.sql - Add TestPaimonScannerWithLimit in test_scanners.py - Add test_paimon_query in test_paimon.py. - Already passed the tpcds/tpch test for paimon table, due to the testing table data is currently generated by spark, and it is not supported by impala now, we have to do this since hive doesn't support generating paimon table for dynamic-partitioned tables. we plan to submit a separate patch for tpcds/tpch data loading and associated tpcds/tpch query tests. - JVM Offheap memory leak tests, have run looped tpch tests for 1 day, no obvious offheap memory increase is observed, offheap memory usage is within 10M. Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Reviewed-on: http://gerrit.cloudera.org:8080/23613 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-05 18:19:57 +00:00
ttttttz	5d1f1e0180	IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3 When the environment variable USE_APACHE_HIVE is set to true, build Impala for adapting to Apache Hive 3.x. In order to better distinguish it from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3. Additionally, to facilitate referencing different versions of the Hive MetastoreShim, the major version of Hive has been added to the environment variable IMPALA_HIVE_DIST_TYPE. Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d Reviewed-on: http://gerrit.cloudera.org:8080/21724 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 13:38:45 +00:00
jichen0919	685745f785	IMPALA-14579: Bump up paimon version to 1.3.1 for CVE-2025-46762 This patch mainly fix the CVE-2025-46762 by bumping up paimon version to 1.3.1. Background: Following PR: https://github.com/apache/incubator-paimon/pull/6363 has been merged by paimon community since paimon-1.3.0. So in impala, need to upgrade paimon version to 1.3.0 or later to fix the CVE as well. Testing: - All paimon related tests are passed. Change-Id: Ie8052f71a5e2a4e39b0ac39b6d349e55f10092bc Reviewed-on: http://gerrit.cloudera.org:8080/23717 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-26 16:55:30 +00:00
Zoltan Borok-Nagy	5ea4dc342e	IMPALA-14565: Update Apache component versions after CDP_BUILD_NUMBER bump to 71942734 CDP_BUILD_NUMBER was bumped to 71942734 which upgraded Iceberg to version 1.5.2. We should update our Apache component dependencies (not just Iceberg) accordingly. Change-Id: Ic353bbef64a59365b708a20bd0d5ed502cb6d44e Reviewed-on: http://gerrit.cloudera.org:8080/23678 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 01:40:05 +00:00
Riza Suminto	64c4abe6ed	IMPALA-14547: Bumping Kudu version to pickup KUDU-3716 Redhat 9 environments recently switched to OpenSSL 3.5.1. On those machines, the Kudu minicluster fails to start up with CSR signature verification error. KUDU-3716 fixed this issue. This patch update Toolchain and Kudu version to pick up KUDU-3716. Testing: Pass data loading with in Redhat 9. Change-Id: I7262267939a9f08650af85443240950afbb3323f Reviewed-on: http://gerrit.cloudera.org:8080/23697 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 15:16:57 +00:00
Zoltan Borok-Nagy	275f03f10d	IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2 This patch updates CDP_BUILD_NUMBER to 71942734 to in order to upgrade Iceberg to 1.5.2. This patch updates some tests so they pass with Iceberg 1.5.2. The behavior changes of Iceberg 1.5.2 are (compared to 1.3.1): * Iceberg V2 tables are created by default * Metadata tables have different schema * Parquet compression is explicitly set for new tables (even for ORC tables) * Sequence numbers are assigned a bit differently Updated the tests where needed. Code changes to accomodate for the above behavior changes: * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables * CREATE TABLE statements don't throw errors when Parquet compression is set for ORC tables Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e Reviewed-on: http://gerrit.cloudera.org:8080/23670 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 01:27:45 +00:00
Riza Suminto	0572dba245	IMPALA-14529: Bumping Kudu version to pickup latest KUDU-1261 patch This commit bump Impala toolchain to pickup latest Kudu version up to commit 60f5e5267b92c39485a66121d3ce3cc7ef57b0e0 (KUDU-1261 make ArrayCellMetadataView::Init() more robust). Change-Id: I68009e5fefd053882f5504cd2520bacb189a1b04 Reviewed-on: http://gerrit.cloudera.org:8080/23631 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-11-05 16:41:51 +00:00
Michael Smith	599b89306d	IMPALA-13145: Upgrade mold to 2.40.4 Upgrades mold to the latest release. Change-Id: If926b8065cccc4c9038c064c274b6ba97fdc2888 Reviewed-on: http://gerrit.cloudera.org:8080/23582 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-27 15:05:01 +00:00
Michael Smith	98f993da43	IMPALA-14478: Add CDP ORC build Adds CDP_ORC_JAVA_VERSION so we can build and test with Apache or CDP versions of ORC. Change-Id: Id9ba78051aff9c9129c244b1734b6f8a523858b5 Reviewed-on: http://gerrit.cloudera.org:8080/23506 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-08 23:34:55 +00:00
Riza Suminto	3d61c5ea9f	IMPALA-14476: Workaround TSAN issue in KuduClient Since the toolchain was bumped to pick up Kudu's array column feature (KUDU-1261), Impala's TSAN builds on the master branch consistently break during dataload with a data race detected by TSAN. The source of data race lies within libkudu_client.so and only trigger if Impala build machine has both ipv4 and ipv6 associated with localhost. Until the exact root cause is found and fixed, this patch workaround the TSAN issue by fixing KUDU_MASTER_HOSTS env var to 127.0.0.1. Testing: Run TSAN build and confirm no data race error is emmitted. Change-Id: I511ab625d18c6007567083557fcdf98980a6ac6f Reviewed-on: http://gerrit.cloudera.org:8080/23507 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-10-08 14:40:50 +00:00
Riza Suminto	a2e4463fbc	IMPALA-14471: Bump up KUDU_VERSION to pick up complex types This patch update Impala toolchain Kudu to 16689973a to pick up Kudu array column feature (KUDU-1261). Change-Id: Ib151d4ea6852e8ba8ae92697bd6806a074e37159 Reviewed-on: http://gerrit.cloudera.org:8080/23492 Reviewed-by: Alexey Serbin <alexey@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-04 06:07:09 +00:00
Joe McDonnell	e1b3c1445e	IMPALA-13472: Bump toolchain to fix minidump stacks on ARM Minidump stack resolution does not work on Redhat8 ARM64. Redhat8 ARM64 uses 64KB pages, and the Breakpad library does not properly handle collecting stacks for that configuration. Breakpad rounds off the stack pointer to the nearest page boundary below the stack pointer, then collects up to 32KB of stack memory. With a top-down stack, this means it is collecting some memory that is not used by the stack. With 64KB pages, the memory it collects usually doesn't contain any stack contents. This picks up a toolchain with Breakpad patched to fix this. The patch stops rounding the stack pointer to the nearest page. Instead, it adjusts the stack pointer to account for the red zone (128 bytes on x86_64) and then rounds to the nearest 1KB boundary below the stack pointer. Testing: - Produced and resolved minidumps on multiple build types for x86_64 and ARM64 (release, debug, asan, ubsan) Change-Id: I4fbd91abfbddfd8355d27ae9d9b86b70a9ce0409 Reviewed-on: http://gerrit.cloudera.org:8080/23465 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-25 23:44:31 +00:00
Michael Smith	52b87fcefd	IMPALA-14454: Exclude log4j 2 dependencies While we use reload4j, we can safely exclude log4j 2 dependencies to reduce the size of our artifacts. Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819 Reviewed-on: http://gerrit.cloudera.org:8080/23439 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-24 18:04:06 +00:00
Michael Smith	e5afebc0c1	IMPALA-14450: (Addendum) Fix other numeric comparison Fixes set-impala-java-tool-options.sh: line 25: ((: 1.8: syntax error: invalid arithmetic operator (error token is ".8") Double parentheses - ((...)) - only support integer arithmetic. I can't find any standard way to do decimal comparison in shells, so switch to extract Java major version as an integer and compare that. OpenJDK 8 has always considered "-target 1.8" and "-target 8" equivalent https://github.com/openjdk/jdk/blob/jdk8-b01/langtools/src/share/classes/com/sun/tools/javac/jvm/Target.java#L105 so maven target can be set to 8 when IMPALA_JAVA_TARGET is 8. Change-Id: I15cdd1859be51d3708f1c348e898831df2a92b13 Reviewed-on: http://gerrit.cloudera.org:8080/23452 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 03:42:29 +00:00
Michael Smith	5137bb94ac	IMPALA-14446: Clean up pom.xml Cleans up repetitive patterns in pom.xml. Centralize plugin configuration in pluginManagement. Replace inline maven-compiler-plugin configuration with newer maven.compiler.release and update to latest plugin version. Centralize common dependencies in dependencyManagement, including exclusions when appropriate. Remove exclusions that are no longer relevant. Compared before and after with dependency:tree; only difference is that commons-cli now comes from hadoop and jersey-serv{let,er} are effectively excluded; all versions matched. Also ensured USE_APACHE_COMPONENTS=true compiles. Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure it's not accidentally included alongside impala-minimal-s3a-aws-sdk. Removes missed io.netty exclusion from IMPALA-12816. Updates commons-dbcp2 to 2.12.0 to match Hive. Change-Id: If96649840e23036b4a73ee23e8d12516497994f0 Reviewed-on: http://gerrit.cloudera.org:8080/23432 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 02:50:22 +00:00
Laszlo Gaal	57eb5f653b	IMPALA-14449, IMPALA-14269: Fix Red Hat / Rocky 9 builds, ORC buffer overflow Downstream error reports pointed out that the toolchain version picked up for IMPALA-14139 contains toolchain binaries for Red Hat 9 (and compatibles) that require at least the 9.5 minor version because of OpenSSL library requirements. This was caused by the toolchain binary build process not using package repo pinning for the redhat9 build container definition, which caused the container process to install "latest" packages, in this case packages released in Rocky / Red Hat 9.5. This patch bumps the toolchain ID to a version in which the redhat9 binaries were produced in a build container "moved back in time" to the 9.2 release by pinning the package repos to the Rocky Linux 9.2 state, using the Rocky Vault. The patch also picks up a buffer overflow mitigation for the ORC library. Change-Id: I5c6921afdc69a4a6644b619de6b8d4e4cc69e601 Reviewed-on: http://gerrit.cloudera.org:8080/23448 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-22 19:54:25 +00:00
Michael Smith	8a80ede69b	IMPALA-14450: (Addendum) Fix numeric comparison Fix shell comparison to use string equality so it works for all POSIX shells instead of just zsh. Change-Id: If9b9ed7f59e71d024ec674bb30c57274567fb2a3 Reviewed-on: http://gerrit.cloudera.org:8080/23444 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-09-19 19:20:30 +00:00
Csaba Ringhofer	0e30792023	IMPALA-14444: Upgrade bouncycastle to 1.79 Change-Id: Ib20c840be2811467716c8de5d2f816a0e5531eb4 Reviewed-on: http://gerrit.cloudera.org:8080/23437 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-19 15:04:46 +00:00
Michael Smith	d217b9ecc6	IMPALA-14450: Simplify Java version selection Removes IMPALA_JAVA_HOME_OVERRIDE and updates version selection. In order of priority 1. If IMPALA_JDK_VERSION is set, use the OS JDK version from a known location. This is primarily used when also installing the JDK as part of automated builds. 2. If JAVA_HOME is set, use it. 3. Look for the system default JDK. The IMPALA_JDK_VERSION variable is no longer modified to avoid issues when sourcing impala-config.sh multiple times. JAVA_HOME will be modified if IMPALA_JDK_VERSION is set; both must be unset to restore using the system default Java. If switching between JDKs, now prefer setting JAVA_HOME. If relying on system Java, unset JAVA_HOME after e.g. update-java-alternatives. The detected Java version is set in IMPALA_JAVA_TARGET, which is used to add Java 9+ options and configure the Java compilation target. Eliminates IMPALA_JDK_VERSION_NUM as it's value was always identical to IMPALA_JAVA_TARGET. Stops printing from impala-config-java.sh. It made the output from impala-config.sh look strange, and the decisions can all be clearly determined from impala-config.sh printed variables later or the packages installed in bootstrap_system.sh. Fixes JAVA_HOME in bootstrap_build.sh on ARM64 systems. Change-Id: I68435ca69522f8310221a0f3050f13d86568b9da Reviewed-on: http://gerrit.cloudera.org:8080/23434 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-19 01:51:47 +00:00
pranav.lodha	0513c071b4	IMPALA-14151: Update jackson.core Bump IMPALA_JACKSON_VERSION from 2.15.3 to 2.18.1 as a part of maintenance upgrade to pick up fixes and improvements in the 2.18.x line. Change-Id: I7b63d8d58011c0dd1c00c72da386ec1b0fbc4d82 Reviewed-on: http://gerrit.cloudera.org:8080/23102 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-09-17 23:50:05 +00:00
Laszlo Gaal	89d2b23509	IMPALA-14139: Enable Impala builds on Ubuntu 24.04 Update the following elements of the Impala build environment to enable builds on Ubuntu 24.04: - Recognize and handle (where necessary) Ubuntu 24.04 in various bootstrap scripts (bootstrap_system.sh, bootstrap_toolchain.py, etc.) - Bump IMPALA_TOOLCHAIN_ID to an official toolchain build that contains Ubuntu 24.04-specific binary packages - Bump binutils to 2.42, and - Bump the GDB version to 12.1-p1, as required by the new toolchain version - Update unique_ptr usage syntax in be/src/util/webserver-test.cc to compensate for new GLIBC funtion prototypes: System headers in Ubuntu 24.04 adopted attributes on several widely used function prototypes. Such attributes are not considered to be part of the function's signature during template evaluation, so GCC throws a warning when such a function is passed as a template argument, which breaks the build, as warnings are treated as errors. webserver-test.cc uses pclose() as the deleter for a unique_ptr in a utility function. This patch encapsulates pclose() and its attributes in an explicit specialization for std::default_delete<>, "hiding" the attributes inside a functor. The particular solution was inspired by Anton-V-K's proposal in https://gist.github.com/t-mat/5849549 This commit builds on an earlier patch for the same purpose by Michael Smith: https://gerrit.cloudera.org/c/23058/ Change-Id: Ia4454b0c359dbf579e6ba2f9f9c44cfa3f1de0d2 Reviewed-on: http://gerrit.cloudera.org:8080/23384 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-15 16:10:42 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Riza Suminto	28cff4022d	IMPALA-14333: Run impala-py.test using Python3 Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 10:01:29 +00:00
Sai Hemanth Gantasala	b67a9cecb3	IMPALA-13593: Enable event processor to consume ALTER_PARTITIONS events from metastore HIVE-27746 introduced ALTER_PARTITIONS event type which is an optimization of reducing the bulk ALTER_PARTITION events into a single event. The components version is updated to pick up this change. It would be a good optimization to include this in Impala so that the number of events consumed by event processor would be significantly reduced and help event processor to catch up with events quickly. This patch enables the ability to consume ALTER_PARTITIONS event. The downside of this patch is that, there is no before_partitions object in the event message. This can cause partitions to be refreshed even on trivial changes to them. HIVE-29141 will address this concern. Testing: - Added an end-to-end test to verify consuming the ALTER_PARTITIONS event. Also, bigger time outs were added in this test as there was flakiness observed while looping this test several times. Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737 Reviewed-on: http://gerrit.cloudera.org:8080/22554 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-08-28 06:53:32 +00:00
Daniel Becker	991c0d5cf3	IMPALA-14326: Update commons-lang3 to version 3.18.0 Update commons-lang3 from version 3.17.0 to 3.18.0. Testing: - Core tests passed. Change-Id: Ie3f2e4ac7232e3f2e2c1c6c6a62225564faaaf4a Reviewed-on: http://gerrit.cloudera.org:8080/23324 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-21 16:13:03 +00:00
jasonmfehr	2ad6f818a5	IMPALA-13237: [Patch 5] - Implement OpenTelemetry Traces for Select Queries Tracking Adds representation of Impala select queries using OpenTelemetry traces. Each Impala query is represented as its own individual OpenTelemetry trace. The one exception is retried queries which will have an individual trace for each attempt. These traces consist of a root span and several child spans. Each child span has the root as its parent. No child span has another child span as its parent. Each child span represents one high-level query lifecycle stage. Each child span also has span attributes that further describe the state of the query. Child spans: 1. Init 2. Submitted 3. Planning 4. Admission Control 5. Query Execution 6. Close Each child span contains a mix of universal attributes (available on all spans) and query phase specific attributes. For example, the "ErrorMsg" attribute, present on all child spans, is the error message (if any) at the end of that particular query phase. One example of a child span specific attribute is "QueryType" on the Planning span. Since query type is first determined during query planning, the "QueryType" attribute is present on the Planning span and has a value of "QUERY" (since only selects are supported). Since queries can run for lengthy periods of time, the Init span communicates the beginning of a query along with global query attributes. For example, span attributes include query id, session id, sql, user, etc. Once the query has closed, the root span is closed. Testing accomplished with new custom cluster tests. Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7) Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b Reviewed-on: http://gerrit.cloudera.org:8080/22924 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-12 04:11:06 +00:00
zhangyifan27	f0757418c8	IMPALA-14257: Support set USE_APACHE_* when USE_APACHE_COMPONENTS=false Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_* variables, but we should support using specific apache components. After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_ {HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise, we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}. Test: - Built and ran a test cluster with setting USE_APACHE_HIVE=true and USE_APACHE_COMPONENTS=false. Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc Reviewed-on: http://gerrit.cloudera.org:8080/23211 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-11 12:44:26 +00:00
jasonmfehr	19f662301c	IMPALA-14214: [Addendum] - Ensure IMPALA_TOOLCHAIN_COMMIT_HASH Matches Build IDs Adds verification code to ensure the IMPALA_TOOLCHAIN_COMMIT_HASH environment variable matches the commit hash in the IMPALA_TOOLCHAIN_BUILD_ID_AARCH64 and IMPALA_TOOLCHAIN_BUILD_ID_X86_64 environment variables. Generated-by: Github Copilot (Claude Sonnet 3.7) Change-Id: I348698356a014413875f6b8b54a005bf89b9793a Reviewed-on: http://gerrit.cloudera.org:8080/23243 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-05 06:14:28 +00:00
jasonmfehr	7b7e7709aa	IMPALA-14214: Correct IMPALA_TOOLCHAIN_COMMIT_HASH Fixes the default value of the IMPALA_TOOLCHAIN_COMMIT_HASH environment variable to be the correct hash. Change-Id: I98824f363334a15e4f91c0b3f51fa09a5d15c241 Reviewed-on: http://gerrit.cloudera.org:8080/23233 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-08-04 01:22:23 +00:00
jasonmfehr	5bad0daf72	IMPALA-14214: Compile OpenTelemetry-cpp Against STDLIB Consumes the new toolchain builds that compiled the OpenTelemetry-cpp SDK libraries against the standard C++ library instead of the SDK's nostd translation layer. Change-Id: Icf06710d5f7987f43cb8bae5450b657f251f199b Reviewed-on: http://gerrit.cloudera.org:8080/23192 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Jason Fehr <jfehr@cloudera.com>	2025-07-22 15:43:41 +00:00
jasonmfehr	fe1a78d16e	IMPALA-13235: [Patch 3 of 5] - Consume OpenTelemetry C++ SDK Adds the OpenTelemetry C++ SDK version 1.20.0 from the toolchain into the cmake files for consumption during builds. Testing was accomplished by building locally and in Jenkins. Generated-by: Github Copilot (GPT-4.1) Change-Id: Ib30123f79270e3f11233e28a2a34725e7d455f5e Reviewed-on: http://gerrit.cloudera.org:8080/23101 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-07-11 23:39:23 +00:00
jfehr	3475e6b506	IMPALA-13235: Consume Latest Toolchain Builds Switches to the toolchain builds that contain the OpenTelemetry C++ SDK. Change-Id: I9b844c27e5b732055a38613f03a1546b3d4491cc Reviewed-on: http://gerrit.cloudera.org:8080/23046 Reviewed-by: gaurav singh <gsingh@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-06-25 15:42:29 +00:00
pranav.lodha	078677e67a	IMPALA-14149: Update guava from 28.1-jre to 32.1.2-jre Update guava from 28.1-jre to 32.1.2-jre due to the following CVEs: CVE-2020-8908, CVE-2023-2976. Change-Id: I4e8bb7c7963ae7c52a8f12fa8529122e662c5def Reviewed-on: http://gerrit.cloudera.org:8080/23029 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-23 04:02:16 +00:00
pranav.lodha	a5b651660a	IMPALA-14150: Update slf4j-api from 2.0.3 to 2.0.13 Updating slf4j to the latest version. Change-Id: I55ec1414b6a0b0452f1baabe582cdadb465eaab5 Reviewed-on: http://gerrit.cloudera.org:8080/23030 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-18 03:04:11 +00:00
Riza Suminto	58245f3706	IMPALA-14143: Remove unshaded Hbase jars from AUX_CLASSPATH HBase jars are added into AUX_CLASSPATH in impala-config.sh so that Hive can write into HBase. Newer Hive version already have hbase-shaded-mapreduce jar included. Thus, it is not necessary to add unshaded jar to AUX_CLASSPATH. Adding the unshaded jars can lead to conflict in downstream build. Testing: - Run and pass dataload. - Pass custom_cluster/test_hbase_hms_column_order.py and query_test/test_hbase_queries.py. Change-Id: I4caf37571a8bc2543bbc58071e5cb7046f216fa9 Reviewed-on: http://gerrit.cloudera.org:8080/23022 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-06-16 14:34:36 +00:00
Joe McDonnell	2560487700	IMPALA-13952: Update curl version to 8.14.1 This bumps the curl version to the latest (8.14.1) to resolve some minor CVEs. See https://curl.se/docs/security.html This also incorporates a newer toolchain with the fix for IMPALA-14129, bumping the patch level on hadoop-client. Testing: - Ran precommit Change-Id: Ia488b381f0cd9f4e6d239d265a897be1ab96915e Reviewed-on: http://gerrit.cloudera.org:8080/23013 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-06-13 17:19:01 +00:00
Joe McDonnell	935a5e2b8d	IMPALA-14134: Switch to newer versions of zlib / cloudflare zlib This moves from zlib 1.2.13 to zlib 1.3.1 and bumps cloudflare zlib to a newer version. This does not require any update to the toolchain, because these newer versions were already present. Testing: - Ran a perf-AB-test with no major difference in performance Change-Id: I09ec358ea49198485d53e85eae7d0b61beac3308 Reviewed-on: http://gerrit.cloudera.org:8080/22993 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2025-06-11 18:31:11 +00:00
Daniel Becker	067b25e526	IMPALA-14067: Bump glog version to 0.6.0 in Impala Some minor changes were needed on the Impala side because of changes in glog (for example some variables and function parameters were changed from signed to unsigned integer types). Testing: - passed exhaustive DEBUG tests - core ASAN tests Change-Id: Ifbe341265fd7aa7be8fe304b9fda31b4470237cf Reviewed-on: http://gerrit.cloudera.org:8080/22906 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-22 04:20:40 +00:00
Joe McDonnell	1157d6e10f	IMPALA-13479: Patch gperftools to remove 1GB limit on thread caches Upstream gperftools does not allow setting tcmalloc.max_total_thread_cache_bytes to greater than 1GB. This moves to a new toolchain that has patched gperftools to remove this limitation and allow setting tcmalloc.max_total_thread_cache_bytes > 1GB. This also reads back the value from tcmalloc and prints a warning if it doesn't match what we set. Testing: - Set tcmalloc_max_total_thread_cache_bytes to 2GB and verified that the warning message doesn't appear. On unpatched versions of gperftools, the warning message does appear. Change-Id: If78c8734c704090c12737a8c2a8456b73ea4b8e8 Reviewed-on: http://gerrit.cloudera.org:8080/22834 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-05-01 16:41:09 +00:00
Laszlo Gaal	e6078b4281	IMPALA-13825: Extend Docker container build to custom base images Downstream system vendors, users and customers have lately expressed interest in consuming Impala in containerized forms, taking advantage of various specialized, hardened container base image offerings, like container offerings based on the Wolfi project by Chainguard; see: https://github.com/wolfi-dev. This patch enables Impala container images to be built on top of custom base images, and adds an implementation example that uses the publicly available Wolfi base image. Building a customized Docker image follows a hybrid approach. Instead of replicating the complete Impala build process inside a Wolfi container for a fully native binary build, it relies on an existing build platform that is compatible with the binary packages available inside the custom container image. For Wolfi the Impala binaries are supplied by the Red Hat 9 build of Impala. This is made possible by the fact that major library dependencies of Impala have the same versions on Wolfi OS and Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi with no changes. The binaries produced by the regular build process are then installed into a Docker image built on top of an explicitly specified custom base image. The selection of a custom base image is controlled by two environment variables: - USE_CUSTOM_IMPALA_BASE_IMAGE (boolean): If set to 'true', triggers the use of the custom image. When set to 'false' or left unspecified, the Docker base image is selected by the existing logic of matching the build platform's operating system. - IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image These environment variables can be overridden from the environment, from impala-config-branch.sh, or impala-config-local.sh. They are reported at the end of bin/impala-config.sh where important environment variables are listed. They are also added to the list of variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure that they can be used in the context of Jenkins jobs as well. The unified script that installs Impala's required dependencies into the container image is extended for Wolfi to handle APK packages. A new script is added to install Bash in the Docker image if it is missing. Impala build scripts (including the scripts used during Docker image builds) as well as container startup scripts require Bash, but minimal container base images usually omit it, favoring a smaller alternative. To improve the debugging experience for a containerized Impala minicluster, the minicluster starter script bin/start-impala-cluster.py is extended with the following features: - synchronizes every launched container's timezone to the host. This is needed for Iceberg time-travel test, which create timestamped Iceberg metadata items in the impalad context inside a container, but check creation/modification times of the same items in the test scripts running on the host, outside the containers. The tests scripts have the implicit expectation that the same local time is shared across all these contexts, but this is not necessarily true if the host, where tests are running is set to a timezone other than UTC. Time sycnhronization is achieved by injecting the TZ environment variable into the container, holding the name of the timezone used on the host. The timezone name is taken either from the host's TZ variable (if set), or from the host's /etc/localtime symlink, checking the name of the timezone file it points to. In case /etc/localtime is not a symlink (and TZ is not set on the host), the host's /etc/localtime file is mounted into the container. - sets up a directory for each container to collect the Java VMs error files (hs_err_pidNNNN.log) from the containers. - adds the --mount_sources command line parameter, which mounts the complete $IMPALA_HOME subtree into the container at /opt/impala/sources to make source code available inside the container for easier debugging. Tested by running core-mode tests in the following environments: - Regular run (impalad running natively on the platform) on Ubuntu 20.04 - Regular run on Rocky Linux 9.2 - Dockerised run (impalad instances running in their individual containers) using Ubuntu 20.04 containers - Dockerised run (impalad instances running in their individual containers) using Rocky Linux 9.2 containers - Dockerised run (impalad instances running in their individual containers) using Wolfi's wolfi-base containers Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc Reviewed-on: http://gerrit.cloudera.org:8080/22583 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-28 13:40:38 +00:00
Peter Rozsa	1f70269392	IMPALA-13838: Update Impala version to 5.0.0-SNAPSHOT Change-Id: I9c5a2d817b30e14333feeb5b2de3e0c40795723f Reviewed-on: http://gerrit.cloudera.org:8080/22596 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-08 14:13:48 +00:00
Pranav Lodha	4c549d79f2	IMPALA-12992: Support for Hive JDBC Storage handler tables This is an enhancement request to support JDBC tables created by Hive JDBC Storage handler. This is essentially done by making JDBC table properties compatible with Impala. It is done by translating when loading the table, and maintaining that only in the Impala cluster, i.e. it's not written back to HMS. Impala includes JDBC drivers for PostgreSQL and MySQL making 'driver.url' not mandatory in such cases. The Impala JDBC driver is still required for Impala-to-Impala JDBC connections. Additionally, Hive allows adding database driver JARs at runtime via Beeline, enabling users to dynamically include JDBC driver JARs. However, Impala does not support adding database driver JARs at runtime, making the driver.url field still useful in cases where additional drivers are needed. 'hive.sql.query' property is not handled in this patch. It'll be covered in a separate jira. Testing: End-to-end tests are included in test_ext_data_sources.py. Change-Id: I1674b93a02f43df8c1a449cdc54053cc80d9c458 Reviewed-on: http://gerrit.cloudera.org:8080/22134 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-27 11:44:38 +00:00
Michael Smith	2506e849c6	IMPALA-13753: Support Hadoop 3.4 Add org.apache.kerby.kerb-simplekdc as a test dependency and update upstream Hadoop dependency to 3.4.1. Change-Id: I4fbce9f783ac1d07a27011d0bfd5f1af988203e0 Reviewed-on: http://gerrit.cloudera.org:8080/22473 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-24 09:43:21 +00:00
Michael Smith	88067c576b	IMPALA-13740: Update velocity-engine-core to 2.4.1 Updates velocity-engine-core - required by pac4j - to 2.4.1 to avoid including a shaded version of commons-io vulnerable to CVE-2024-47554. Change-Id: I76624851d6f51d1b9d4dd61fc488932a51e9cba0 Reviewed-on: http://gerrit.cloudera.org:8080/22454 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2025-02-06 16:31:39 +00:00
Laszlo Gaal	5f4321373a	IMPALA-13662: Bump the ARM toolchain to support ARM builds for RHEL 9 Pick up a new binary build of the current toolchain version for ARM. The toolchain version is identical, the only difference is that the new build added binaries for Rocky/RHEL 9 to the already supported OS versions, reaching the same level of Impala build support as Rocky/RHEL 8. Tested by building Impala for RHEL9 for Intel and ARM both on private infrastructure. Change-Id: I5fd2e8c3187cb7829de55d6739cf5d68a09a2ed3 Reviewed-on: http://gerrit.cloudera.org:8080/22323 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-10 22:49:31 +00:00
Michael Smith	740ee28eb1	IMPALA-13618: Move to commons-lang3 Updates from commons-lang (2.6) to commons-lang3. Switches getFullStackTrace to getStackTrace. getFullStackTrace is not present in lang3, and https://issues.apache.org/jira/browse/LANG-904 suggests that getFullStackTrace existed for handling chained exceptions in older Java runtimes. Change-Id: Ie16af2692858f6a571cc1e5b85ecba3806da8d7e Reviewed-on: http://gerrit.cloudera.org:8080/22228 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-09 07:36:39 +00:00
Michael Smith	30ffc2f493	IMPALA-13619: Update commons-lang3 to 3.17.0 Updates commons-lang3 - used by Thrift and Orc - to 3.17.0, and provides the IMPALA_COMMONS_LANG3_VERSION environment variable to override the version. Change-Id: I4005f8aef1cf66a32840cd0b510cd7faf597f5f2 Reviewed-on: http://gerrit.cloudera.org:8080/22227 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-12-18 18:26:13 +00:00
Joe McDonnell	aefd1b0920	IMPALA-13551: Produce the shell tarball by pip installing impala-shell Currently, the shell tarball maintains its own packaging code and directory layout. This is very complicated and currently has several Python packages directly checked into our repository. To simplify it, this changes the shell tarball to be based on pip installing the pypi package. Specifically, the new directory structure for an unpack shell tarball is: impala-shell-4.5.0-SNAPSHOT/ impala-shell install_py${PYTHON_VERSION}/ install_py${ANOTHER_PYTHON_VERSION}/ For example, install_py2.7 is the Python 2.7 pip install of impala-shell. install_py3.8 is a Python 3.8 pip install of impala-shell. This means that the impala-shell script simply picks the install for the specified version of python and uses that pip install directory. To make this more consistent across different Linux distributions, this upgrades pip in the virtualenv to the latest. With this, ext-py and pkg_resources.py can be removed. This requires rearranging the shell build code. Specifically, this splits out the code that generates impala_build_version.py so that it can run before generating the pypi package. The shell tarball now has a dependency on the pypi package and must run after it. This builds on Michael Smith's work from IMPALA-11399. Testing: - Ran shell tests locally - Built on Centos 7, Redhat 8 & 9, Ubuntu 20 & 22, SLES 15 Change-Id: Ifbb66ab2c5bc7180221f98d9bf5e38d62f4ac036 Reviewed-on: http://gerrit.cloudera.org:8080/20171 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-17 22:52:01 +00:00
Joe McDonnell	8d5adfd0ba	IMPALA-13123: Add option to run tests with Python 3 This introduces the IMPALA_USE_PYTHON3_TESTS environment variable to select whether to run tests using the toolchain Python 3. This is an experimental option, so it defaults to false, continuing to run tests with Python 2. This fixes a first batch of Python 2 vs 3 issues: - Deciding whether to open a file in bytes mode or text mode - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs) - Eliminating 'basestring' and 'unicode' locations in tests/ by using the recommendations from future ( https://python-future.org/compatible_idioms.html#basestring and https://python-future.org/compatible_idioms.html#unicode ) - Uses impala-python3 for bin/start-impala-cluster.py All fixes leave the Python 2 path working normally. Testing: - Ran an exhaustive run with Python 2 to verify nothing broke - Verified that the new environment variable works and that it uses Python 3 from the toolchain when specified Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c Reviewed-on: http://gerrit.cloudera.org:8080/21474 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-12-17 07:28:51 +00:00

1 2 3 4 5 ...

615 Commits