impala

mirror of https://github.com/apache/impala.git synced 2025-12-29 09:04:47 -05:00

Author	SHA1	Message	Date
Joe McDonnell	5eea4f6f79	IMPALA-14559: Ship calcite-planner jar in Impala packages This adds the java/impala-package Maven project to make it easier to ship / test the Calcite planner. impala-package has a dependency on impala-frontend and calcite-planner, so its classpath requires no extra work when constructing the classpath. An additional cleanup is that this no longer puts the impala-frontend-*-tests.jar on the classpath by default. This requires updating the query event hooks test, as it relies on that jar being present. This does not change the default value for the use_calcite_planner query option, so there is no change in behavior. Testing: - Ran a core job - Built docker images and OS packages locally Change-Id: I81dec2a5b59e279229a735c8bb1a23c77111a793 Reviewed-on: http://gerrit.cloudera.org:8080/23497 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 03:36:12 +00:00
Michael Smith	52b87fcefd	IMPALA-14454: Exclude log4j 2 dependencies While we use reload4j, we can safely exclude log4j 2 dependencies to reduce the size of our artifacts. Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819 Reviewed-on: http://gerrit.cloudera.org:8080/23439 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-24 18:04:06 +00:00
Michael Smith	5137bb94ac	IMPALA-14446: Clean up pom.xml Cleans up repetitive patterns in pom.xml. Centralize plugin configuration in pluginManagement. Replace inline maven-compiler-plugin configuration with newer maven.compiler.release and update to latest plugin version. Centralize common dependencies in dependencyManagement, including exclusions when appropriate. Remove exclusions that are no longer relevant. Compared before and after with dependency:tree; only difference is that commons-cli now comes from hadoop and jersey-serv{let,er} are effectively excluded; all versions matched. Also ensured USE_APACHE_COMPONENTS=true compiles. Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure it's not accidentally included alongside impala-minimal-s3a-aws-sdk. Removes missed io.netty exclusion from IMPALA-12816. Updates commons-dbcp2 to 2.12.0 to match Hive. Change-Id: If96649840e23036b4a73ee23e8d12516497994f0 Reviewed-on: http://gerrit.cloudera.org:8080/23432 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 02:50:22 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Riza Suminto	35aa2e2add	IMPALA-14187: Add IMPALA_JAVA_TARGET env var Impala is preparing to switch to JDK17 for Java compilation by default. While the source version might remain in 1.8 for longer, we should experiment with targeting binary version 17. This patch adds IMPALA_JAVA_TARGET env var to control target binary version. It is initialized in impala-config-java.sh, depending on value of IMPALA_JDK_VERSION env var. Testing: Pass data load and FE tests with IMPALA_JDK_VERSION=17. Change-Id: If194d87c542d416b878661403c32c6adc2930199 Reviewed-on: http://gerrit.cloudera.org:8080/23096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-27 00:41:57 +00:00
Zoltan Borok-Nagy	bd3486c051	IMPALA-13586: Initial support for Iceberg REST Catalogs This patch adds initial support for Iceberg REST Catalogs. This means now it's possible to run an Impala cluster without the Hive Metastore, and without the Impala CatalogD. Impala Coordinators can directly connect to an Iceberg REST server and fetch metadata for databases and tables from there. The support is read-only, i.e. DDL and DML statements are not supported yet. This was initially developed in the context of a company Hackathon program, i.e. it was a team effort that I squashed into a single commit and polished the code a bit. The Hackathon team members were: * Daniel Becker * Gabor Kaszab * Kurt Deschler * Peter Rozsa * Zoltan Borok-Nagy The Iceberg REST Catalog support can be configured via a Java properties file, the location of it can be specified via: --catalog_config_dir: Directory of configuration files Currently only one configuration file can be in the direcory as we only support a single Catalog at a time. The following properties are mandatory in the config file: * connector.name=iceberg * iceberg.catalog.type=rest * iceberg.rest-catalog.uri The first two properties can only be 'iceberg' and 'rest' for now, they are needed for extensibility in the future. Moreover, Impala Daemons need to specify the following flags to connect to an Iceberg REST Catalog: --use_local_catalog=true --catalogd_deployed=false Testing * e2e added to test basic functionlity with against a custom-built Iceberg REST server that delegates to HadoopCatalog under the hood * Further testing, e.g. Ranger tests are expected in subsequent commits TODO: * manual testing against Polaris / Lakekeeper, we could add automated tests in a later patch Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c Reviewed-on: http://gerrit.cloudera.org:8080/22353 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-02 20:04:12 +00:00
Peter Rozsa	1f70269392	IMPALA-13838: Update Impala version to 5.0.0-SNAPSHOT Change-Id: I9c5a2d817b30e14333feeb5b2de3e0c40795723f Reviewed-on: http://gerrit.cloudera.org:8080/22596 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-08 14:13:48 +00:00
Michael Smith	88067c576b	IMPALA-13740: Update velocity-engine-core to 2.4.1 Updates velocity-engine-core - required by pac4j - to 2.4.1 to avoid including a shaded version of commons-io vulnerable to CVE-2024-47554. Change-Id: I76624851d6f51d1b9d4dd61fc488932a51e9cba0 Reviewed-on: http://gerrit.cloudera.org:8080/22454 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2025-02-06 16:31:39 +00:00
Michael Smith	740ee28eb1	IMPALA-13618: Move to commons-lang3 Updates from commons-lang (2.6) to commons-lang3. Switches getFullStackTrace to getStackTrace. getFullStackTrace is not present in lang3, and https://issues.apache.org/jira/browse/LANG-904 suggests that getFullStackTrace existed for handling chained exceptions in older Java runtimes. Change-Id: Ie16af2692858f6a571cc1e5b85ecba3806da8d7e Reviewed-on: http://gerrit.cloudera.org:8080/22228 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-09 07:36:39 +00:00
Michael Smith	30ffc2f493	IMPALA-13619: Update commons-lang3 to 3.17.0 Updates commons-lang3 - used by Thrift and Orc - to 3.17.0, and provides the IMPALA_COMMONS_LANG3_VERSION environment variable to override the version. Change-Id: I4005f8aef1cf66a32840cd0b510cd7faf597f5f2 Reviewed-on: http://gerrit.cloudera.org:8080/22227 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-12-18 18:26:13 +00:00
Daniel Becker	b05b408f17	IMPALA-13247: Support Reading Puffin files for the current snapshot This change adds support for reading NDV statistics from Puffin files when they are available for the current snapshot. Puffin files or blobs that were written for other snapshots than the current one are ignored. Because this behaviour is different from what we have for HMS stats and may therefore be unintuitive for users, reading Puffin stats is disabled by default; set the "--disable_reading_puffin_stats" startup flag to false to enable it. When Puffin stats reading is enabled, the NDV values read from Puffin files take precedence over NDV values stored in the HMS. This is because we only read Puffin stats for the current snapshot, so these values are always up-to-date, while the values in the HMS may be stale. Note that it is currently not possible to drop Puffin stats from Impala. For this reason, this patch also introduces two ways of disabling the reading of Puffin stats: - globally, with the aforementioned "--disable_reading_puffin_stats" startup flag: when it is set to true, Impala will never read Puffin stats - for specific tables, by setting the "impala.iceberg_disable_reading_puffin_stats" table property to true. Note that this change is only about reading Puffin files, Impala does not yet support writing them. Testing: - created the PuffinDataGenerator tool which can generate Puffin files and metadata.json files for different scenarios (e.g. all stats are in the same Puffin file; stats for different columns are in different Puffin files; some Puffin files are corrupt etc.). The generated files are under the "testdata/ice_puffin/generated" directory. - The new custom cluster test class 'test_iceberg_with_puffin.py::TestIcebergTableWithPuffinStats' uses the generated data to test various scenarios. - Added custom cluster tests that test the 'disable_reading_puffin_stats' startup flag. Change-Id: I50c1228988960a686d08a9b2942e01e366678866 Reviewed-on: http://gerrit.cloudera.org:8080/21605 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-19 22:14:59 +00:00
Michael Smith	4c3b5f94f1	IMPALA-13393: Remove old javax.el config Pinning javax.el was done when Impala still used Sentry. That was removed in IMPALA-9708, and Hbase now explicitly depends on a specific version. So this pin is no longer relevant. Change-Id: I5be3eeeacf2f6fb04bc5106902e1d11b3886d844 Reviewed-on: http://gerrit.cloudera.org:8080/21827 Tested-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-10-15 23:38:44 +00:00
Joe McDonnell	ae6a3b9ec0	IMPALA-13082: Use separate versions for jackson vs jackson-databind Sometimes there is a jackson-databind patch release without a corresponding release of other jackson libraries. For example, there is a jackson-databind 2.12.7.1, but jackson-core does not have an artifact with that version. To handle these scenarios, it is useful to have a separate version for jackson-databind vs other jackson libraries. This introduces IMPALA_JACKSON_VERSION (which currently matches IMPALA_JACKSON_DATABIND_VERSION) and uses this for non-databind jackson libraries. Testing: - Ran a local build Change-Id: I3055cb47986581793d947eaedb6a24b4dd92e3a6 Reviewed-on: http://gerrit.cloudera.org:8080/21719 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-08-26 22:52:25 +00:00
Yubi Lee	ba67660b3a	IMPALA-10408: Support build using Apache components Apache Impala uses many CDP components to build it. This patch provides a way to support building Apache Impala using Apache components. Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 Reviewed-on: http://gerrit.cloudera.org:8080/18977 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-19 17:36:05 +00:00
Michael Smith	65927f4ba3	IMPALA-13301: Upgrade aircompressor to 0.27 Upgrades io.airlift.aircompressor to 0.27 to address CVE-2024-36114. Aircompressor is a dependency of Orc, however we tend to upgrade Orc more deliberately and synchronize C++ and Java upgrades. Aircompressor upgrades in Orc did not require any code changes, so manage this dependency directly to address the CVE. Change-Id: I6c56daa61d5ecbcb3a5f7fbd0665043bb49b469f Reviewed-on: http://gerrit.cloudera.org:8080/21677 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-16 03:01:58 +00:00
Michael Smith	22b59d27d0	IMPALA-13243: Update Dropwizard Metrics to 4.2.x Updates Dropwizard Metrics components to the latest 4.2.x release, 4.2.26. We directly use metrics-core, and metrics-jvm/metrics-json are imported via Hive (via https://github.com/joshelser/dropwizard-hadoop-metrics2). Dropwizard Metrics manually tested with these versions on https://github.com/joshelser/dropwizard-hadoop-metrics2/pull/8. Change-Id: Ie9bec7a7c23194604430531bd83b25c5969e888e Reviewed-on: http://gerrit.cloudera.org:8080/21599 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-07-23 05:22:59 +00:00
Zoltan Borok-Nagy	1324a6e6c9	IMPALA-13108: Update version to 4.5.0-SNAPSHOT Updated IMPALA_VERSION in impala-config.sh Executed the followings for Java: cd java mvn versions:set -DnewVersion=4.5.0-SNAPSHOT Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244 Reviewed-on: http://gerrit.cloudera.org:8080/21460 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-29 23:47:05 +00:00
Peter Rozsa	7ad9400656	IMPALA-13044: Upgrade bouncycastle to 1.78 This patch upgrades bouncycastle to 1.78. As of bouncycastle:1.71, the -jdk15on artifact is no longer available, the artifact is changed to -jdk18on. Tests: - core tests ran Change-Id: I8372916ab79b863e7a07d22e8333abd54492fa29 Reviewed-on: http://gerrit.cloudera.org:8080/21371 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-03 00:09:15 +00:00
Joe McDonnell	d09c502490	IMPALA-13049: Add dependency management for log4j2 to use 2.18.0 Currently, there is no dependency management for the log4j2 version. Impala itself doesn't use log4j2. However, recently we encountered a case where one dependency brought in log4-core 2.18.0 and another brought in log4j-api 2.17.1. log4j-core 2.18.0 relies on the existence of the ServiceLoaderUtil class from log4j-api 2.18.0. log4j-api 2.17.1 doesn't have this class, which causes class not found exceptions. This uses dependency management to set the log4j2 version to 2.18.0 for log4j-core and log4j-api to avoid any mismatch. Testing: - Ran a local build and verified that both log4j-core and log4j-api are using 2.18.0. Change-Id: Ib4f8485adadb90f66f354a5dedca29992c6d4e6f Reviewed-on: http://gerrit.cloudera.org:8080/21379 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-01 02:37:49 +00:00
Steve Carlin	b39cd79ae8	IMPALA-12872: Use Calcite for optimization - part 1: simple queries This is the first commit to use the Calcite library to parse, analyze, and optimize queries. The hook for the planner is through an override of the JniFrontend. The CalciteJniFrontend class is the driver that walks through each of the Calcite steps which are as follows: CalciteQueryParser: Takes the string query and outputs an AST in the form of Calcite's SqlNode object. CalciteMetadataHandler: Iterate through the SqlNode from the previous step and make sure all essential table metadata is retrieved from catalogd. CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer. CalciteRelNodeConverter: Change the AST into a logical plan. In this first commit, the only logical nodes used are LogicalTableScan and LogicalProject. The LogicalTableScan will serve as the node that reads from an Hdfs Table and the LogicalProject will only project out the used columns in the query. In later versions, the LogicalProject will also handle function changes. CalciteOptimizer: This step is to optimize the query. In this cut, it will be a nop, but in later versions, it will perform logical optimizations via Calcite's rule mechanism. CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into Impala's PlanNode physical tree ExecRequestCreator: Implement the existing Impala steps that turn a Single Node Plan into a Distributed Plan. It will also create the TExecRequest object needed by the runtime server. Only some very basic queries will work with this commit. These include: select * from tbl <-- only needs the LogicalTableScan select c1 from tbl <-- Also uses the LogicalProject In the CalciteJniFrontend, there is some basic checks to make sure only select statements will get processed. Any non-query statement will revert back to the current Impala planner. In this iteration, any queries besides the minimal ones listed above will result in a caught exception which will then be run through the current Impala planner. The tests that do work can be found in calcite.test and run through the custom cluster test test_experimental_planner.py This iteration should support all types with the exception of complex types. Calcite does not have a STRING type, so the string type is represented as VARCHAR(MAXINT) similar to how Hive represents their STRING type. The ImpalaTypeConverter file is used to convert the Impala Type object to corresponding Calcite objects. Authorization is not yet working with this current commit. A Jira has been filed (IMPALA-13011) to deal with this. Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Reviewed-on: http://gerrit.cloudera.org:8080/21109 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-04-25 20:09:09 +00:00
wzhou-code	fc74ca672a	IMPALA-12378: Auto Ship JDBC Data Source This patch moves the source files of jdbc package to fe. Data source location is optional. Data source could be created without specifying HDFS location. Assume data source class is in the classpath and instance of data source class could be created with current class loader. Impala still try to load the jar file of the data source in runtime if it's set in data source location. Testing: - Passed core test - Passed dockerised-tests Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f Reviewed-on: http://gerrit.cloudera.org:8080/20971 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-02-07 16:29:11 +00:00
Csaba Ringhofer	c14156eb3a	IMPALA-12746: Bump jackson.databind to 2.15.3 Also sets dependencyManagement to force using the same version for jackson-databind, jackson-core and jackon-annotations. This is needed because datagenerator depends on kitesdk, which would pull in a very old jackson-core version (2.3.1) and lead to build failures with the newer jackson.databind. Change-Id: I8440426da1395045cf149aca0044286015861e5f Reviewed-on: http://gerrit.cloudera.org:8080/20914 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-24 15:13:36 +00:00
Michael Smith	098ad53f65	IMPALA-12480: Use Hadoop version for hadoop-aliyun Uses the imported Hadoop version for the hadoop-aliyun module, which is a tool in the hadoop project. This allows us to exclude vulnerable versions of jdom that were previously included via hadoop-aliyun. Change-Id: I270f3895ec668d9fb907f35b04cad2f149e3d0de Reviewed-on: http://gerrit.cloudera.org:8080/20532 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-10 20:38:36 +00:00
Michael Smith	1cf5bc6e79	Update version to 4.4.0-SNAPSHOT Change-Id: I21c3b823c1b0db198d442d155c01d4cfd3a5c522 Reviewed-on: http://gerrit.cloudera.org:8080/20534 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-07 01:43:15 +00:00
Steve Carlin	bc83d46a9a	IMPALA-12424: Allow third party JniFrontend interface. This patch allows a third party to inject their own frontend class instead of using the default JniFrontend included in the project. The test case includes an interface that runs queries as normal except for the "select 1" query which gets changed to "select 42". Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5 Reviewed-on: http://gerrit.cloudera.org:8080/20459 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-09-08 20:20:56 +00:00
Michael Smith	7fb6a9a1d2	IMPALA-11941: (Addendum) Use released jamm 0.4.0 Switches to the 0.4.0 release of jamm, as building a shaded JAR from source was a temporary measure. Change-Id: I5b88b479580f7d0baff502ad9551d2764971babf Reviewed-on: http://gerrit.cloudera.org:8080/20237 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-07-25 00:27:56 +00:00
Michael Smith	87fd844d3e	IMPALA-11941: (Addendum) Produce shaded copy of Jamm Produces a shaded copy of a pre-release jamm jar that supports Java 17. Building a copy of jamm and directly depending on it meant any consumer of Impala would have to provide their own build of it. Testing: ran custom_cluster/test_local_catalog.py with Java 8 and 17 Change-Id: Ida42d720a2639b65391c07a9237556311e04fac6 Reviewed-on: http://gerrit.cloudera.org:8080/20147 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-07-01 01:10:12 +00:00
yx91490	f4d306cbca	IMPALA-11629: Support for huawei OBS FileSystem This patch adds support for huawei OBS (Object Storage Service) FileSystem. The implementation is similar to other remote FileSystems. New flags for OBS: - num_obs_io_threads: Number of OBS I/O threads. Defaults to be 16. Testing: - Upload hdfs test data to an OBS bucket. Modify all locations in HMS DB to point to the OBS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: I84a54dbebcc5b71e9bcdd141dae9e95104d98cb1 Reviewed-on: http://gerrit.cloudera.org:8080/19110 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-02-09 08:10:19 +00:00
Daniel Becker	a71e69f570	IMPALA-11792: Update Impala version to 4.3.0-SNAPSHOT As 4.2.0 has been released this commit updates the master to 4.3.0. This step needs to happen on each release. Testing: - Ran a build Change-Id: Iebedcfbc1fd8018391a6c78a9aca4a9d754780fa Reviewed-on: http://gerrit.cloudera.org:8080/19344 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-12-13 05:44:10 +00:00
Michael Smith	c3ec9272c5	IMPALA-11724: Use CDP Ozone in test environment Updates the test environment to default to the CDP build of Ozone, as the latest build of CDP Hive depends on pre-release features unavailable in Ozone 1.2.1. Apache Ozone 1.2 can still be used by setting USE_APACHE_OZONE=true. The latest CDP build also includes a version of Ozone based on ozone#master with a candidate version of 1.3.0. Both Apache and CDP therefore have builds of Ozone we can test with that use the new artifact names introduced in Ozone 1.2, so this patch cleans up setup that was only needed for Ozone versions prior to 1.2. Change-Id: I1177a1b820fe21adca9f8c1cc51ff73ee001d3f2 Reviewed-on: http://gerrit.cloudera.org:8080/19247 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2022-11-16 22:13:06 +00:00
yacai	c953426692	IMPALA-11683: Support Aliyun OSS File System This patch adds support for OSS (Aliyun Object Storage Service). Using the hadoop-aliyun, the implementation is similar to other remote FileSystems. Tests: - Prepare: Initialize OSS-related environment variables: OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT. Compile and create hdfs test data on a ECS instance. Upload test data to an OSS bucket. - Modify all locations in HMS DB to point to the OSS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: I267e6531da58e3ac97029fea4c5e075724587910 Reviewed-on: http://gerrit.cloudera.org:8080/19165 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-11-16 10:14:49 +00:00
Michael Smith	83c5e6e409	IMPALA-11670: Upgrade components, add envvars for override Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address CVEs. Adds environment variables for commonly-updated components so they can be customized via the branch-specific impala-config-branch.sh in a way that allows both to be updated regularly without merge conflicts. Also updates httpcomponents.httpcore to 4.4.14 to be consistent with other httpcomponents libraries included transitively. Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216 Reviewed-on: http://gerrit.cloudera.org:8080/19147 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-19 15:54:00 +00:00
Michael Smith	22e5ca3d0a	IMPALA-11667: Clean up Java dependency exclusions Use dependencyManagement to simplify Java dependencies by directly controlling versions of transitive dependencies instead of using exclusions and direct inclusion. Dependency management specifies versions authoritatively, so redundant version declarations are also removed. Change-Id: I424a175135855dcbd38ae432ea111cca5f562633 Reviewed-on: http://gerrit.cloudera.org:8080/19146 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-19 15:54:00 +00:00
Michael Smith	a1fddf1022	IMPALA-11628: Switch to reload4j, update slf4j Switches from log4j 1.x to reload4j, a maintained fork. Updates slf4j to the latest version so we can include all CVE fixes. slf4j 2.0.x requires Java 8 and adds a backward-compatible fluent logging api. Neither seems like a problem for Impala. Bans all use of log4j 1.x so we only use reload4j. Change-Id: I5238b9c8247af3e0f4cb05c0b76a75bfee37f5c8 Reviewed-on: http://gerrit.cloudera.org:8080/19102 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-11 21:23:11 +00:00
wzhou-code	010f1b943c	IMPALA-11639: Upgrade Spring framework to 5.3.20 This patch upgrade the Spring framework to 5.3.20 to address multiple CVEs: - CVE-2022-22971 - CVE-2022-22968 - CVE-2022-22970 Testing: - Ran core job - Ran custom cluster tests in exhaustive mode Change-Id: I33f4f1d22fc27227e31d744658a17c16b61b9677 Reviewed-on: http://gerrit.cloudera.org:8080/19091 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-05 19:34:39 +00:00
Steve Carlin	4e813b7085	IMPALA-11528: Catalogd should start up with a corrupt Hive function. This commit handles the case for a specific kind of corrupt function within the Hive Metastore in the following situation: A valid Hive SQL function gets created in HMS. This UDF is written in Java and must derive from the "UDF" class. After creating this function in Impala, we then replace the underlying jar file with a class that does NOT derive from the "UDF" class. In this scenario, catalogd should reject the function and still start up gracefully. Before this commit, catalogd wasn't coming up. The reason for this was because the Hive function FunctionUtils.getUDFClassType() has a dependency on UDAF and was throwing a LinkageError exception, so we need to include the UDAF class in the shaded jar. Change-Id: I54e7a1df6d018ba6cf5ecf32dc9946edf86e2112 Reviewed-on: http://gerrit.cloudera.org:8080/18927 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2022-09-13 14:48:31 +00:00
Joe McDonnell	7581cedd52	IMPALA-11394: Update jackson-databind to 2.12.6.1 This updates jackson-databind to address CVE-2020-36518. Testing: - Ran a core job Change-Id: I8db403a102097a22c48f5d9d42ced3b85930078f Reviewed-on: http://gerrit.cloudera.org:8080/18891 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-23 15:50:35 +00:00
Joe McDonnell	4845f36b4e	IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies Hadoop provides hadoop-cloud-storage, which includes most of the dependencies that Impala currently uses like hadoop-aws, hadoop-azure, Knox's gateway-cloud-bindings, etc. Hadoop has put in a lot of work to make sure that this package includes the right version of dependencies (including shading some dependencies for GCS). It seems like this is a more reliable way to consume these dependencies. This switches the Java build to use hadoop-cloud-storage and removes the dependencies that it replaces. This eliminates the need to control the version of oauth and GCS, as those are determined by hadoop-cloud-storage. Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd Reviewed-on: http://gerrit.cloudera.org:8080/18817 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-15 21:11:42 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Michael Smith	00db9a27df	IMPALA-11407: Upgrade google-oauth-client to 1.33.3 Upgrades google-oauth-client and google-oauth-client-java6 to 1.33.3 to address CVE-2021-22573. These are included as dependencies of com.google.cloud.bigdataoss/gcs-connector, which does not yet have a release that includes versions 1.33.3 or later. Change-Id: I8d95913f26e6073373374e169ee045881f40f065 Reviewed-on: http://gerrit.cloudera.org:8080/18683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-07-01 04:10:13 +00:00
Riza Suminto	06b1db4675	IMPALA-11369: Separate thrift compiler for different component Impala used to have one thrift compiler version to compile C++, Java, and Python code. Most Thrift serialization/deserialization between minor versions are compatible with each other. So it is possible to have different thrift compiler versions for different target codes. It is beneficial to do so because it will allow Impala to upgrade separate components independently. This patch implements the infrastructure change required to do so. It replace most of the 'THRIFT_' environment variable and CMake variable with 'THRFIT_CPP_', 'THRFIT_JAVA_', and 'THRFIT_PY_' to compile C++, Java, and Python code accordingly. All three still refer to the same thrift version (thrift-0.11.0-p5). Testing: - Build Impala and pass core tests. Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4 Reviewed-on: http://gerrit.cloudera.org:8080/18636 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-21 02:40:59 +00:00
Tamas Mate	97d3b25be3	IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT As 4.1.0 has been released this commit updates the master to 4.2.0. This step needs to happen on each release, related changes are: IMPALA-10198, IMPALA-10057 Testing: - Ran a build Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70 Reviewed-on: http://gerrit.cloudera.org:8080/18595 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-07 22:50:50 +00:00
Joe McDonnell	3627b027fe	IMPALA-11229: Upgrade Spring framework to 5.3.18 This upgrade the Spring framework to 5.3.18 to address multiple CVEs: - CVE-2022-22965 - CVE-2022-22950 - CVE-2021-22060 Testing: - Ran core job - Ran custom cluster tests in exhaustive mode Change-Id: Ie1b299c5b24e70c9db6eb0ce37fee9e32908423e Reviewed-on: http://gerrit.cloudera.org:8080/18405 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2022-04-13 15:12:25 +00:00
Joe McDonnell	26398855bf	IMPALA-10930: Bump the Java artifact versions to 4.1.0-SNAPSHOT This changes the Maven pom.xml files to use verison 4.1.0-SNAPSHOT rather than 4.0.0-SNAPSHOT. In the past, these versions were a fixed value, but that changed with IMPALA-10198. This is a new step that needs to happen on each release. Testing: - Ran a build Change-Id: I10a589b4fbc15048199943a0e06d079f57840239 Reviewed-on: http://gerrit.cloudera.org:8080/18390 Reviewed-by: Tamas Mate <tmater@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-04-11 16:06:46 +00:00
Joe McDonnell	aa404b856f	IMPALA-11197/IMPALA-11149: Address CVEs in pac4j/xmlsec This upgrades pac4j and several of its dependencies (including xmlsec) to address CVEs in those components. Specifically: - pac4j 4.5.5 addresses CVE-2021-44878 - xmlsec 2.2.3 addresses CVE-2021-40690 - bcprov 1.68 addresses CVE-2020-15522 This also upgrade springframework to 5.2.9.RELEASE to match the version for pac4j 4.5.5. Testing: - Ran core job Change-Id: I8421d867dd0fce8eeaa6bc13a511ca3e8dd05723 Reviewed-on: http://gerrit.cloudera.org:8080/18348 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-03-24 15:49:47 +00:00
Fucun Chu	4186727fe6	IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in cdp-hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Reviewed-on: http://gerrit.cloudera.org:8080/17774 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-02-27 06:36:19 +00:00
Fucun Chu	157086cb80	IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Reviewed-on: http://gerrit.cloudera.org:8080/17503 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-12-08 16:32:02 +00:00
Zoltan Borok-Nagy	7b2bb13ecc	IMPALA-10810: Bump json-smart from 2.3 to 2.4.7 I noticed that our json-smart dependency is stale and we could pick up a newer version. This patch upgrades it to 2.4.7 which is the newest version at the time of writing. Change-Id: I6b43f606f40e172aa267b55c564fa64d68515bd5 Reviewed-on: http://gerrit.cloudera.org:8080/17702 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-22 23:05:34 +00:00
Csaba Ringhofer	94f67a3432	IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Reviewed-on: http://gerrit.cloudera.org:8080/17170 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-27 13:36:54 +00:00
Joe McDonnell	267f4d67f4	IMPALA-10455: Reorder Maven repositories for cleaner mirror semantics When using a Maven mirror that uses a mirrorOf pattern, the order of repositories in the pom.xml has a strong influence on whether the build tries the mirror for a particular artifact. If an early repository matches the mirrorOf condition, Maven may try the mirror for all artifacts, even those that only exist in the s3 bucket. This extra check can slow down the build, especially if the mirror is slow to respond for unknown artifacts. For Impala, the common case is for a mirror to cover everything except the artifacts that come from the Kudu local repository or the s3 bucket. To optimize for that case, this reorders the Maven repositories to be in this order: 1. Local/S3 repositories 2. Regular repositories 3. Banned repositories The repositories are otherwise unchanged. Testing: - Ran an ordinary build - Ran a build with a mirrorOf "external:*,!impala.cdp.repo" and verified that the build went directly to the s3 bucket first. Change-Id: I7046c7ec5391833e98ee6a463fb8c08b6a04cb26 Reviewed-on: http://gerrit.cloudera.org:8080/17020 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-08 21:38:35 +00:00

1 2

58 Commits