impala

mirror of https://github.com/apache/impala.git synced 2026-02-03 09:00:39 -05:00

Author	SHA1	Message	Date
Arnab Karmakar	6a0eedf4af	IMPALA-13299: Support CREATE TABLE LIKE for Iceberg from HDFS sources This patch enables creating Iceberg tables from non-Iceberg HDFS source tables (Parquet, ORC, etc.) using CREATE TABLE LIKE with STORED BY ICEBERG. This provides a metadata-only operation to convert table schemas to Iceberg format without copying data. Supported source types: Parquet, ORC, Avro, Text, and other HDFS-based formats Not supported: Kudu tables, JDBC tables, Paimon tables Use case: This is particularly useful for Apache Hive 3.1 environments where CTAS (CREATE TABLE AS SELECT) with STORED BY ICEBERG is not supported - that feature requires Hive 4.0+. Users can use CREATE TABLE LIKE to create the Iceberg schema, then use INSERT INTO to migrate data. Testing: - Comprehensive tests covering schema conversion with various data types, partitioned and external tables, complex types (STRUCT, ARRAY, MAP) - Bidirectional conversion tests (non-Iceberg → Iceberg and reverse) - Hive interoperability tests verifying data round-trips correctly Change-Id: Id162f217e49e9f396419b09815b92eb7f351881e Reviewed-on: http://gerrit.cloudera.org:8080/23733 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-02-02 16:29:43 +00:00
Steve Carlin	593b0bfad3	IMPALA-13712: Calcite Planner - Enable constant folding Constant folding is enabled by this patch. Calcite does constant folding via the RexExecutor.reduce() method. However, we need to use Impala's constant folding algorithm to ensure that Impala expressions are folded. This is done through the derived class ImpalaRexExecutor and is called from the Simplify rules. The ImpalaRexExecutor calls an internal shuttle class which recursively walks through the RexNode which checks if portions of the expression can be constant folded. Some expressions are not folded due to various reasons: - We avoid folding 'cast(1.2 as double)' type expressions because folding this creates an inexact number, and this is problematic for partition pruning directory names on double columns which contain the exact number (1.2 in this case). - Interval expressions are skipped temporarily since the Expr class generated is not meant to be simplified. However, an Expr object that contains an IntervalExpr may be simplified. There is a special case that needed to be handled for a values query with different sized arguments across rows. In Calcite version 1.40 (not yet upgraded as of this commit), an extra cast is added around smaller strings to ensure the char(x) is the same size across all rows. However, this adds extra spaces to the string which causes results different from the original Impala planner. This must be caught before Calcite converts the abstract syntax tree into a RelNode logical tree. A special RexExecutor has been created to handle this which looks for char casts around a char literal and removes it. This is fine because the literal will be changed into a string in the "coercenodes" module. Change-Id: I98c21ef75b2f5f8e3390ff5de5fdf45d9645b326 Reviewed-on: http://gerrit.cloudera.org:8080/23723 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-28 20:32:12 +00:00
Steve Carlin	37a8007df0	IMPALA-14434: Calcite planner: implement partition key scan optimization Implemented the partition key scan optimization for Calcite planner Most of the code was already in place. Just needed to refactor some code in SingleNodePlanner to make it callable from Calcite and use the already created isPartitionKeyScan method in ImpalaHdfsScanRel. Testing was done by running the Impala e2e tests with the use_calcite_planner flag set to true. Change-Id: I7b5b8a8115f65f6be27a5be0e19f21eebab61a32 Reviewed-on: http://gerrit.cloudera.org:8080/23691 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2026-01-27 12:47:12 +00:00
Daniel Vanko	00c233cc4f	IMPALA-14692: Fix test_spilling_hash_join IMPALA-14680 mistakenly removed greedy regex patterns from query-impala-13138.test, but this test checks for the query results not for the query profile, which was modified in IMPALA-14680. Testing: * test_spilling_hash_join passed in exhaustive mode Change-Id: I709f81217f44c9377e4a1e8419787591ba7b7451 Reviewed-on: http://gerrit.cloudera.org:8080/23898 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-26 15:22:25 +00:00
Steve Carlin	2360a06e4a	IMPALA-14525: Calcite planner: Add support for RexSimplify RexSimplify is a class in Calcite that simplifies expressions into something more optimal. It was disabled up until this point because it converts IN clauses into a Calcite internal SEARCH object which isn't directly supported by Impala. This commit brings back the RexSimplify class. The SEARCH operator is now converted into an IN operator when RexNode objects are changed into Expr objects. Some notes about the changes that had to be made: - some small refactoring needed to be done in the Impala Expr objects. - RexSimplify is very stringent about operators that are nullable, as there is an assert when certain operators are checked. There is logic in the CoerceOperandShuttle that ensures the nullability is now set correctly. - Some duplicated logic at line 148 in CoerceOperandShuttle was removed, (existing logic in getReturnType) - The AnalyzedInPredicate subclass was created to avoid analysis done in InPredicate. - Removed ImpalaRexBuilder logic which avoided creation of the SEARCH op. - Created ImpalaRexSimplify which extends RexSimplify. RexSimplify causes regressions with NaN on comparisons with Double. For instance, "where not(my_col > 30)" changes to "where my_col <= 30": The first expression returns true when my_col is NaN and the second expression returns false. So ImpalaRexSimplify looks for the existence of any binary comparison operator with Double in it and avoids the simplification. - Added ImpalaRexUtil which copies the RexUtil.expandSearch() method that converts the SEARCH operator into non-search operators. The version here handles the conversion to the custom Impala IN operator. - Created an ImpalaCoreRules class. Even though RexSimplify is supported, it is important it is run through ImpalaRexSimplify. The RexSimplify is disabled for the SqlToRelNode converter and for all rules given by Calcite. ImpalaCoreRules also has the benefit of having one place where one can find all the rules used by Impala. - Created simplify rules for the filter condition, and the projects in the project object. - Changed the FilterSelectivityEstimator to get the selectivity for the SEARCH operator. - Added a couple of rules in the optimizer for a bug that was being exposed when enabling the SEARCH operator. The PROJECT_JOIN_TRANSPOSE was removed because it did not serve any purpose, as we transpose JOIN_PROJECT in the join phase. Some other rules were added to help with pushdown predicates like JOIN_DERIVE_IS_NOT_NULL_FILTER and JOIN_PUSH_EXPRESSIONS. And the Simplifier rules have also been added. - Some of the new rules caused many changes in the estimations of cardinality and memory. The one noticeable change was using IsNullPredicate for the IS_NULL and IS_NOT_NULL operators. Previously, these functions were using FunctionCallExpr, and the cardinality estimation was way off. - Fixed a small bug in RexLiteralConverter where a string literal was treated as a VARCHAR. A string literal should always be treated as a STRING. Change-Id: I44792688f361bf15affa565e5de5709f64dcf18c Reviewed-on: http://gerrit.cloudera.org:8080/23679 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2026-01-26 00:29:12 +00:00
Balazs Hevele	df8c4be032	IMPALA-11609: split up resource-requirements.test Separated resource-requirements.test into several files based on file format. This makes it easier to diagnose an issue breaking the test, since it will only break a smaller test related to that file format. Also, running a specific test locally requires a data load only for the given file format. Change-Id: I50849539ffc58412e1e65008e319cc46a6bb9b79 Reviewed-on: http://gerrit.cloudera.org:8080/23875 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-21 17:40:35 +00:00
Csaba Ringhofer	4963ee6d22	IMPALA-14629: Implement st_point(double,double) in c++ Adds native implementation for the common 2d constructor while keeps the more complex st_point(string) in Java. Tried using boost's WKT parser but making it work with >2d geometries seemed non-trivial. Also removes the following overloads: st_point(double,double,double) st_point(double,double,double,double) st_pointz(double,double,double,double) These conflict with overloads that have different semantics in PostGis, see HIVE-29395 for details. Testing: - no test changes are needed - the remaining constructors are covered while the removed ones were not used at all Change-Id: I927413f92cf4d4e9a995f7024de0ec2e3b584b6d Reviewed-on: http://gerrit.cloudera.org:8080/23874 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-19 15:22:37 +00:00
Fang-Yu Rao	8e3c99b801	IMPALA-14085: Implement GRANT/REVOKE ROLE TO/FROM a user As a follow-up to IMPALA-10211, this patch adds the support for the following role-related statements. 1. GRANT ROLE <role_name> TO USER <user_name>. 2. REVOKE ROLE <role_name> FROM USER <user_name>. 3. SHOW ROLE GRANT USER <user_name>. Testing: - Extended the test file grant_revoke.test to additionally cover the scenario in which the grantee/revokee is a user. - Added an end-to-end test to briefly verify the statements SHOW ROLE GRANT USER <user_name> and SHOW CURRENT ROLES are working as expected. Change-Id: Ie5a16aeb4bbf8637ad326a1ec3d5fce1b196d73f Reviewed-on: http://gerrit.cloudera.org:8080/23815 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>	2026-01-17 18:37:48 +00:00
Daniel Vanko	df84e777d6	IMPALA-14321: Add BINARY partition transform to Iceberg tables With this change we add support for IDENTITY, TRUNCATE and BUCKET partition transformation functions with binary parameter to Iceberg tables. Flatbuffer schema has changed, because when reading a string, flatbuffers tries to enforce UTF-8 encoding, which fails for arbitrary binary data. FbIcebergDataFile's raw_partition_fields is an array of ubyte arrays from now on. Testing: - Added TestBinary() in iceberg-functions-test.cc with truncate width edge cases - Extended iceberg-partitioned-insert-*.test files with binary_col partition tests - Verified partition pruning works correctly for BINARY predicates (NumFileMetadataRead metrics) Generated-by: Github Copilot (Claude Sonnet 4.5) Change-Id: I5fd1ef382aa064dad55445dea00fbd39caeca1d3 Reviewed-on: http://gerrit.cloudera.org:8080/23783 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-16 14:48:37 +00:00
Surya Hebbar	97d766577d	IMPALA-14680: Improve row regex search syntax in runtime profile tests Currently, the runtime profile tests contain row regex searches which try to find matches by comparing the regex line by line. This form of search is inefficient. So, while updating the tests for the aggregated profile IMPALA-9846, this performance is being improved by accumulating row regexes together, then searching the entire profile at once. In order to support this improvement, we need to correct the current `row_regex` syntax being used. The current tests use greedy regex like "." at the beginning and end of `row_regex` searches. Using greedy regex in this way consumes more resources and is redundant for the current implementation. To fix this, these additional greedy regex characters(i.e. `.`,`.+`) are being removed or replaced across all the runtime profile tests. Change-Id: I1460c2d22b03c06aa43c85f78fa9e05cec2775ec Reviewed-on: http://gerrit.cloudera.org:8080/23864 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2026-01-16 07:11:16 +00:00
Peter Rozsa	a146d91aa7	IMPALA-14666: Fix invalid input handling for aes_decrypt aes_decrypt with AES_128_GCM and AES_256_GCM modes subtracts the AES block size from the length of the input that causes negative numbers if the input text is shorter than the block size. This change adds a check for GCM mode and reports an error if the input is shorter than the block size. Tests: - new test cases added to encryption_exprs_errors.test Change-Id: I8e23c2682b851082479a52d754b74f35fe0734c7 Reviewed-on: http://gerrit.cloudera.org:8080/23839 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-12 23:49:47 +00:00
Steve Carlin	83036b13e5	IMPALA-14487: Calcite planner: handle escaped double quote character Added the double quote character to the fix added for IMPALA-13525 to handle escaped characters. Change-Id: Ic65fbb4546ae071a9442c0b4884254c15b268087 Reviewed-on: http://gerrit.cloudera.org:8080/23695 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-11 22:28:53 +00:00
Csaba Ringhofer	c96b7b082d	IMPALA-14576, IMPALA-14577: add rewrite rules for geospatial relations Apply rewrites to geospatial relations like st_intersects. 3 rewrites are added: 1. NormalizeGeospatialRelationsRule moves const arguments to the first position (this can be useful as the current Java implementation optimizes const first arguments, see IMPALA-14575): st_intersects(geom_col, ST_Polygon(1,1, 1,4, 4,4, 4,1)) -> st_intersects(ST_Polygon(1,1, 1,4, 4,4, 4,1), geom_col) 2. AddEnvIntersectsRule adds st_envintersects() before relations that can be only true when the bounding rectangles intersect. This is useful as st_envintersects() has native implementation (IMPALA-14573): st_intersects(geom1, geom2) -> st_envintersects(geom1, geom2) AND st_intersects(geom1, geom2) 3. PointEnvIntersectsRule replaces bounding rect (envelope) intersection on geometries from st_point with predicates directly on coordinates: st_envintersects(CONST_GEOM, st_point(x, y)) -> x >= st_minx(CONST_GEOM) AND y >= st_miny(CONST_GEOM) AND x <= st_maxx(CONST_GEOM) AND y <= st_maxy(CONST_GEOM) Note that AddEnvIntersectsRule is only valid in planar geometry (the relation functions in HIVE_ESRI are all planar). 2 and 3 are not applied if the cost of child expression is above some treshold. AddEnvIntersectsRule needed a new type of expression rewrite ("non-idempotent rules") that runs rules only once to avoid triggering the rules multiple times on the same input predicate. Other changes: - Changed handling of malformed geometries in c++ functions from error to warning. This is consistent with handling in Java. Change-Id: Id65f646db6f1c89a74253e9ff755c39c400328be Reviewed-on: http://gerrit.cloudera.org:8080/23719 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-09 22:55:34 +00:00
ttttttz	ee67ede314	IMPALA-12349: Support Apache Hive 2.x in Impala Like IMPALA-10871, this patch adds MetastoreShim to support Apache Hive 2.x. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the three shims is added to as source using the fe/pom.xml build plugin. And select the dependencies related to Hive in the fe/pom.xml based on the environment variable IMPALA_HIVE_MAJOR_VERSION. There are some duplicate classes under compat-apache-hive2 directory, e.g. fe/src/compat-apache-hive-2/java/ org/apache/impala/catalog/events/MetastoreEvents.java duplicates fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java The class in compat-apache-hive2 is a simplified version that works with Apache Hive 2.x. So we don't need to extract lots of Hive-dependent codes in MetastoreEvents.java into the metastore shim. Due to this, the build process simply remove the original source code when building on Apache Hive 2. Additionally, it should be noted that all the code in the fe/src/compat-apache-hive-2/java/org/apache/hadoop/hive directory comes from Apache Hive 3.x, original source: https://github.com/apache/hive/blob/branch-3.1 In order to reduce the unnecessary intrusion into the code, skip all tests when building with Apache Hive 2.x. If wanting to build Impala adapted to Apache Hive 2.x, please set the following environment variables before `source bin/impala-config.sh`: export USE_APACHE_COMPONENTS=true export USE_APACHE_HIVE_2=true TODO: - IMPALA-14581: Support testing related to Apache Hive 2 in the minicluster. Testing: - Compile using the -package option to obtain the package. After deployment, perform all types of query tests, including SELECT, INSERT, CREATE TABLE, ALTER TABLE, COMPUTE STATUS, etc. In addition, comprehensive testing has been conducted on the metadata auto-synchronization functionality. The tests confirm that all event types are supported except for AlterDatabaseEvent, AllocWriteIdEvent, AbortTxnEvent, PseudoAbortTxnEvent, and CommitCompactionEvent. It is worth noting that these unsupported events are not generated in Apache Hive 2, so their lack of processing support does not impact the functionality. Change-Id: Ib5f104dc8d131835b8118b9d54077471db65681c Reviewed-on: http://gerrit.cloudera.org:8080/21760 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-06 16:02:01 +00:00
Peter Rozsa	57954760f9	IMPALA-14160: add ugsync-util's jar to Hive's classpath at startup Hive Metastore's Ranger plugin introduced a runtime dependency for Ranger's ugsync-util module, this patch changes the classpath for Hive's startup script by including ugsync-util's jar. Change-Id: If03f99448a871711f8a04ad5eb775e02ecffabd4 Reviewed-on: http://gerrit.cloudera.org:8080/23578 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-03 00:29:41 +00:00
Michael Smith	3a5a6f612a	IMPALA-14638: Schedule union of iceberg metadata scanner to coordinator On clusters with dedicated coordinators and executors the Iceberg metadata scanner fragment(s) must be scheduled to coordinators. IMPALA-12809 ensured this for most plans, but if the Iceberg metadata scanner is part of a union of unpartitioned fragments a new fragment is created for the union that subsumes existing fragments and loses the coordinatorOnly flag. Fixes cases where a multi-fragment plan includes a union of iceberg metadata scans by setting coordinatorOnly on the new union fragment. Adds new planner and runtime tests for this case. Change-Id: If2f19945037b4a7a6433cd9c6e7e2b352fae7356 Reviewed-on: http://gerrit.cloudera.org:8080/23803 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-29 15:44:04 +00:00
Zoltan Borok-Nagy	85d77b908b	IMPALA-13756: Fix Iceberg V2 count() optimization for complex queries We optimize plain count() queries on Iceberg tables the following way: AGGREGATE COUNT() \| UNION ALL / \ / \ / \ SCAN all ANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafiles deletes \|\| rewrite \|\| \/ ArithmethicExpr: LHS + RHS / \ / \ / \ record_count AGGREGATE of all COUNT() datafiles \| without ANTI JOIN deletes / \ / \ SCAN SCAN datafiles deletes This optimization consists of two parts: 1 Rewriting count() expression to count() + "record_count" (of data files without deletes) 2 In IcebergScanPlanner we only need to consruct the right side of the original UNION ALL operator, i.e.: ANTI JOIN / \ / \ SCAN SCAN datafiles deletes SelectStmt decides whether we can do the count() optimization, and if so, does the following: 1: SelectStmt sets 'TotalRecordsNumV2' in the analyzer, then during the expression rewrite phase the CountStarToConstRule rewrites the count() to count() + record_count 2: SelectStmt sets "OptimizeCountStarForIcebergV2" in the query context then IcebergScanPlanner creates plan accordingly. This mechanism works for simple queries, but can turn on count() optimization in IcebergScanPlanner for all Iceberg V2 tables in complex queries. Even if only one subquery enables count() optimization during analysis. With this patch the followings change: 1: We introduce IcebergV2CountStarAccumulator which we use instead of the ArithmethicExpr. So after rewrite we still know if count() optimization should be enabled for the planner. 2: Instead of using the query context, we pass the information to the IcebergScanPlanner via the TableRef object. Testing * e2e tests Change-Id: I1940031298eb634aa82c3d32bbbf16bce8eaf874 Reviewed-on: http://gerrit.cloudera.org:8080/23705 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-12-19 17:53:50 +00:00
Zoltan Borok-Nagy	6649b92cb2	IMPALA-14635: We should not check for exact file sizes in iceberg-metadata-tables.test The Impala version string is written into the Parquet footer. This means in our tests we shouldn't check for exact file sizes of tables written during data loading/testing. Change-Id: I589ade5f81879ede54ff41466b77b5db3349a14f Reviewed-on: http://gerrit.cloudera.org:8080/23802 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-18 15:58:41 +00:00
Michael Smith	c3dc7f9667	IMPALA-13147: Limit concurrency of link jobs Configure separate compile and link pools for ninja. Configures link parallelism based on expected memory use, which can be reduced by setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true. Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to buildall.sh to force using 'make'. Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and linking test binaries. However 'mold' is not selected as the default due to test failures around SASL/GSSAPI (see IMPALA-14527). Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer needed with ninja. SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja. Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests' with default (make) and IMPALA_MAKE_CMD=ninja. Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db Reviewed-on: http://gerrit.cloudera.org:8080/23572 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-15 21:43:07 +00:00
Daniel Vanko	9d112dae23	IMPALA-14536: Fix CONVERT TO ICEBERG to not throw exception on Iceberg tables Previously, running ALTER TABLE <table> CONVERT TO ICEBERG on an Iceberg table produced an error. This patch fixes that, so the statement will do nothing when called on an Iceberg table and return with 'Table has already been migrated.' message. This is achieved by adding a new flag to StatementBase to signal when a statement ends up NO_OP, if that's true, the new TStmtType::NO_OP will be set as TExecRequest's type and noop_result can be used to set result from Frontend-side. Tests: * extended fe and e2e tests Change-Id: I41ecbfd350d38e4e3fd7b813a4fc27211d828f73 Reviewed-on: http://gerrit.cloudera.org:8080/23699 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2025-12-12 15:35:28 +00:00
Xuebin Su	d54b75ccf1	IMPALA-14619: Reset levels_readahead_ for late materialization Previously, `BaseScalarColumnReader::levels_readahead_` was not reset when the reader did not do page filtering. If a query selected the last row containing a collection value in a row group, `levels_readahead_` would be set and would not be reset when advancing to the next row group without page filtering. As a result, trying to skip collection values at the start of the next row group would cause a check failure. This patch fixes the failure by resetting `levels_readahead_` in `BaseScalarColumnReader::Reset()`, which is always called when advancing to the next row group. `levels_readahead_` is also moved out of the "Members used for page filtering" section as the variable is also used in late materialization. Testing: - Added an E2E test for the fix. Change-Id: Idac138ffe4e1a9260f9080a97a1090b467781d00 Reviewed-on: http://gerrit.cloudera.org:8080/23779 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-12 15:12:50 +00:00
Nandor Kollar	65639f16b9	IMPALA-12330: Allow setting format-version in ALTER TABLE CONVERT TO This change allows modifying the format version table property in ALTER TABLE CONVERT TO statements. It adds verification for the property value too: only 1 or 2 is supported as of now. Change-Id: Iaed207feb83a277a1c2f81dcf58c42f0721c0865 Reviewed-on: http://gerrit.cloudera.org:8080/23721 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Peter Rozsa <prozsa@cloudera.com>	2025-12-12 08:18:08 +00:00
Arnab Karmakar	ddd82e02b9	IMPALA-14065: Support WHERE clause in SHOW PARTITIONS statement This patch extends the SHOW PARTITIONS statement to allow an optional WHERE clause that filters partitions based on partition column values. The implementation adds support for various comparison operators, IN lists, BETWEEN clauses, IS NULL, and logical AND/OR expressions involving partition columns. Non-partition columns, subqueries, and analytic expressions in the WHERE clause are not allowed and will result in an analysis error. New analyzer tests have been added to AnalyzeDDLTest#TestShowPartitions to verify correct parsing, semantic validation, and error handling for supported and unsupported cases. Testing: - Added new unit tests in AnalyzeDDLTest for valid and invalid WHERE clause cases. - Verified functional tests covering partition filtering behavior. Change-Id: I2e2a14aabcea3fb17083d4ad6f87b7861113f89e Reviewed-on: http://gerrit.cloudera.org:8080/23566 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-11 15:36:08 +00:00
Csaba Ringhofer	780e6683a2	IMPALA-14573: port critical geospatial functions to c++ (part 1) This commit contains the simpler parts from https://gerrit.cloudera.org/#/c/20602 This mainly means accessors for the header of the binary format and bounding box check (st_envIntersects). New tests for not yet covered functions / overloads are also added. For details of the binary format see be/src/exprs/geo/shape-format.h Differences from the PR above: Only a subset of functions are added. The criteria was: 1. the native function must be fully compatible with the Java version* 2. must not rely on (de)serializing the full geometry 3. the function must be tested 1 implies 2 because (de)serialization is not implemented yet in the original patch for >2d geometries, which would break compatibility for the Java version for ZYZ/XYM/XYZM geometries. *: there are 2 known differences: 1. NULL handling: the Java functions return error instead of NULL when getting a NULL parameter 2. st_envIntersects() doesn't check if the SRID matches - the Java library looks inconsistant about this Because the native functions are fairly safe replacements for the Java ones, they are always used when geospatial_library=HIVE_ESRI. Change-Id: I0ff950a25320549290a83a3b1c31ce828dd68e3c Reviewed-on: http://gerrit.cloudera.org:8080/23700 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-06 07:50:23 +00:00
jichen0919	7e29ac23da	IMPALA-14092 Part2: Support querying of paimon data table via JNI This patch mainly implement the querying of paimon data table through JNI based scanner. Features implemented: - support column pruning. The partition pruning and predicate push down will be submitted as the third part of the patch. We implemented this by treating the paimon table as normal unpartitioned table. When querying paimon table: - PaimonScanNode will decide paimon splits need to be scanned, and then transfer splits to BE do the jni-based scan operation. - We also collect the required columns that need to be scanned, and pass the columns to Scanner for column pruning. This is implemented by passing the field ids of the columns to BE, instead of column position to support schema evolution. - In the original implementation, PaimonJniScanner will directly pass paimon row object to BE, and call corresponding paimon row field accessor, which is a java method to convert row fields to impala row batch tuples. We find it is slow due to overhead of JVM method calling. To minimize the overhead, we refashioned the implementation, the PaimonJniScanner will convert the paimon row batches to arrow recordbatch, which stores data in offheap region of impala JVM. And PaimonJniScanner will pass the arrow offheap record batch memory pointer to the BE backend. BE PaimonJniScanNode will directly read data from JVM offheap region, and convert the arrow record batch to impala row batch. The benchmark shows the later implementation is 2.x better than the original implementation. The lifecycle of arrow row batch is mainly like this: the arrow row batch is generated in FE,and passed to BE. After the record batch is imported to BE successfully, BE will be in charge of freeing the row batch. There are two free paths: the normal path, and the exception path. For the normal path, when the arrow batch is totally consumed by BE, BE will call jni to fetch the next arrow batch. For this case, the arrow batch is freed automatically. For the exceptional path, it happends when query is cancelled, or memory failed to allocate. For these corner cases, arrow batch is freed in the method close if it is not totally consumed by BE. Current supported impala data types for query includes: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE TODO: - Patches pending submission: - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Snapshot incremental read. - Complex type query support. - Native paimon table scanner, instead of jni based. Testing: - Create tests table in functional_schema_template.sql - Add TestPaimonScannerWithLimit in test_scanners.py - Add test_paimon_query in test_paimon.py. - Already passed the tpcds/tpch test for paimon table, due to the testing table data is currently generated by spark, and it is not supported by impala now, we have to do this since hive doesn't support generating paimon table for dynamic-partitioned tables. we plan to submit a separate patch for tpcds/tpch data loading and associated tpcds/tpch query tests. - JVM Offheap memory leak tests, have run looped tpch tests for 1 day, no obvious offheap memory increase is observed, offheap memory usage is within 10M. Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Reviewed-on: http://gerrit.cloudera.org:8080/23613 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-05 18:19:57 +00:00
ttttttz	5d1f1e0180	IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3 When the environment variable USE_APACHE_HIVE is set to true, build Impala for adapting to Apache Hive 3.x. In order to better distinguish it from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3. Additionally, to facilitate referencing different versions of the Hive MetastoreShim, the major version of Hive has been added to the environment variable IMPALA_HIVE_DIST_TYPE. Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d Reviewed-on: http://gerrit.cloudera.org:8080/21724 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 13:38:45 +00:00
Peter Rozsa	d67ab6f11f	IMPALA-14569: (addendum) Fix 'partitions' row matching IMPALA-14569 introduced a test that asserts for a profile row like 'HDFS partitions' and it's possible for test environments to run on a different storage system. This change omits the storage type from the row_regex. Change-Id: If9b223f2be2dfe7be8724423fefdfb56ffeeba6e Reviewed-on: http://gerrit.cloudera.org:8080/23727 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-01 23:06:47 +00:00
Peter Rozsa	6cf21464b4	IMPALA-14569: Fix IllegalStateException in partition pruning on type mismatch This fixes an IllegalStateException in HdfsPartitionPruner when evaluating 'IN' predicates whose consist of two compatible types, for example DATE and STRING: date_col in (<date as string>). Previously, 'canEvalUsingPartitionMd' did not check if the slot type matched the literal type. This caused the frontend to attempt invalid comparisons via 'LiteralExpr.compareTo', leading to IllegalStateException or incorrect pruning. The fix ensures 'canEvalUsingPartitionMd' returns false on type mismatches, deferring evaluation to the backend where proper casting occurs. Testing: - Added regression test in hdfs-partition-pruning.test. Change-Id: Idc226a628c8df559329a060cb963b81e27e21eda Reviewed-on: http://gerrit.cloudera.org:8080/23706 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-27 02:48:28 +00:00
jasonmfehr	2ac5a24dc0	IMPALA-14455: Cleanup OpenTelemetry Tracing Startup Flags Fixes several issues with the OpenTelemetry tracing startup flags: 1. otel_trace_beeswax -- Removes this hidden flag which enabled tracing of queries submitted over Beeswax. Since this protocol is deprecated and no tests assert the traces generated by Beeswax queries, this flag was removed to eliminate an extra check when determining if OpenTelemetry tracing should be enabled. 2. otel_trace_tls_minimum_version -- Fixes parsing of this flag's value. This flag is in the format "tlsv1.2" or "tlsv1.3", but the OpenTelemetry C++ SDK expects the minimum TLS version to be in the format "1.2" or "1.3". The code now removes the "tlsv" prefix before passing the value to the OpenTelemetry C++ SDK. 3. otel_trace_tls_insecure_skip_verify -- Fixes the guidance to only set this flag to true in dev/testing. Adds ctest tests for the functions that configure the TraceProvider singleton to ensure startup flags are correctly parsed and applied. Modifies the http_exporter_config and init_otel_tracer function signatures in otel.cc to return the actual object they create instead of a Status since these functions only ever returned OK. Updates the OpenTelemetry collector docker-compose file to support the collector receiving traces over both HTTP and HTTPS. This setup is used to manually smoke test the integration from Impala to an OpenTelemetry collector. Change-Id: Ie321fa37c0fd260f783dc6cf47924d53a06d82ea Reviewed-on: http://gerrit.cloudera.org:8080/23440 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-11-24 23:46:57 +00:00
Daniel Vanko	3d22c7fe05	IMPALA-12209: Always include format-version in DESCRIBE FORMATTED and SHOW CREATE TABLE for Iceberg tables HiveCatalog does not include format-version for Iceberg tables in the table's parameters, therefore the output of SHOW CREATE TABLE may not replicate the original table. This patch makes sure to add it to both the SHOW CREATE TABLE and DESCRIBE FORMATTED/EXTENDED output. Additionally, adds ICEBERG_DEFAULT_FORMAT_VERSION variable to E2E tests, deducting from IMPALA_ICEBERG_VERSION environment variable. If Iceberg version is at least 1.4, default format-version is 2, before 1.4 it's 1. This way tests can work with multiple Iceberg versions. Testing: * updated show-create-table.test and show-create-table-with-stats.test for Iceberg tables * added format-version checks to multiple DESCRIBE FORMATTED tests Change-Id: I991edf408b24fa73e8a8abe64ac24929aeb8e2f8 Reviewed-on: http://gerrit.cloudera.org:8080/23514 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-24 21:48:17 +00:00
Csaba Ringhofer	f6ceca2b4d	IMPALA-14571: increase planner cost of java functions The main motivation is to evaluate expensive geospatial functions (which are Java functions) last in predicates. Java functions have a major overhead anyway from the JNI call, so bumping all Java function costs seems beneficial. Note that currently geospatial functions are the only built-in Java functions. Change-Id: I11d1652d76092ec60af18a33502dacc25b284fcc Reviewed-on: http://gerrit.cloudera.org:8080/22733 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-24 16:52:59 +00:00
Csaba Ringhofer	f12bb87d42	IMPALA-14081: (addendum) add ';' to CREATE part in dataload The missing ';' can cause problems for the next created table. Change-Id: I719872de23941bf81289340ce246d25ee113223a Reviewed-on: http://gerrit.cloudera.org:8080/23704 Reviewed-by: Daniel Vanko <dvanko@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-21 12:29:48 +00:00
Steve Carlin	54c0074b33	IMPALA-14405 ADDENDUM: Catch exception for bad column names This commit is a fix on top of IMPALA-14405 for the Calcite planner. The original commit matches column names from the expression in the select clause. For instance, if the query is "select 1 + 1", the label in impala-shell will be "1 + 1". It accomplished this by retrieving the string from the SqlNode object through the MySql dialect. However, when the expression doesn't succeed in the MySql dialect, an AssertionError gets thrown, causing the query to fail. We don't want the query to fail, we just want to go back to using the Calcite expression, e.g. EXPR$0. This occurred with this specific query: "select timestamp_col + interval 3 nanoseconds" So now the exception is caught and the default label name is used. Eventually we should try to match what Impala has, but this is a harder problem to fix. Change-Id: I6c4d76a25fb2486eb1ef19485bce7888d45d282f Reviewed-on: http://gerrit.cloudera.org:8080/23665 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-11-18 21:34:29 +00:00
Arnab Karmakar	a2a11dec62	IMPALA-13263: Add single-argument overload for ST_ConvexHull() Implemented a single-argument version of ST_ConvexHull() to align with PostGIS behavior and simplify usage across geometry types. Testing: Added new tests in test_geospatial_functions.py for ST_ConvexHull(), which previously had no test coverage, to verify correctness across supported geometry types. Change-Id: Idb17d98f5e75929ec0143aa16195a84dd6e50796 Reviewed-on: http://gerrit.cloudera.org:8080/23604 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-18 10:26:04 +00:00
Steve Carlin	52334ba426	IMPALA-14421: Calcite planner: case statement returning wrong types for char, varchar The 'case' function resolver in the original Impala planner has a quirk in it which caused issues in the Calcite planner. The function resolver for the original planner resolves all case statements with the "boolean" version. Later on, in the analysis of the CaseExpr, the proper types are assessed and the necessary casting is added. The Calcite planner follows a similar path. The resolver always returns boolean as well and the coerce nodes module determines the proper return type for the case statement. Two other related issues are also fixed here: Literal strings should be treated as type STRING instead of CHAR(X), but a null should literal should not be changed from a CHAR(x) to a STRING. This broke a 'case' test in the test framework where the columns were non-literals with type char(x), and the return value was a "null" which should not have forced a cast to string. A cast from a varchar to a varchar should be ignored. Testing: Added a test to calcite.test. Ensured the existing cast test in test_chars.py passed. Ran through the Jenkins Calcite testing framework. Change-Id: I82d657f4bfce432c458ee8198188dadf9f23f2ef Reviewed-on: http://gerrit.cloudera.org:8080/23560 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-18 07:47:39 +00:00
Riza Suminto	f2243b76b5	IMPALA-14557: Fix flaky test_show_files_partition TestIcebergTable.test_show_files_partition is unstable because files are alphanumerically sorted and the order between a random UUID and "delete-*" is not guaranteed. This patch fix the flakiness by specifying VERIFY_IS_SUBSET and using negative lookahead of "delete" word to detect valid Iceberg data file. Testing: - Loop and pass test_show_files_partition 50 times. Before, it can fail in less than 10 loops. Change-Id: I6243585a5b7ab7cf7c95d5a9530ce2f2825c550e Reviewed-on: http://gerrit.cloudera.org:8080/23680 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-17 17:13:19 +00:00
Michael Smith	166b39547e	IMPALA-14553: Run schema eval concurrently The majority of time spent in generate-schema-statements.py is in eval_section for schema operations that shell out, often uploading files via the hadoop CLI or generating data files. These operations should be independent. Runs eval_section at the beginning so we don't repeat it for each row in test_vectors, and executes them in parallel via a ThreadPool. Defaults to NUM_CONCURRENT_TESTS threads because the underlying operations have some concurrency to them (such as HDFS mirroring writes). Also collects existing tables into a set to optimize lookup. Reduces generate-schema-statements by ~60%, from 2m30s to 1m. Confirmed that contents of logs/data_loading/sql/functional are identical. Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2 Reviewed-on: http://gerrit.cloudera.org:8080/23627 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-17 16:34:22 +00:00
Steve Carlin	bc99705252	IMPALA-13902: Calcite planner: Implement is_spool_query_results The is_spool_query_results query option is now supported in Calcite. The returnAtMostOneRow method is now implemented to support this. PlanRootSink is refactored to extract sanitizing query options (a new method sanitizeSpoolingOptions()) out of PlanRootSink.computeResourceProfile(). The bulk of memory bounding calculation is also extracted out to a new class SpoolingMemoryBound. Added "sleep" in ImpalaOperatorTable.java since some EE tests related to result spooling calls sleep() function. Changed ImpalaPlanRel to extends RelNode interface. A sanity test has been added to calcite.test, but the bulk of the testing will be done through the Impala test framework when it is enabled. Testing: - Pass FE tests PlannerTest#testResultSpooling, TpcdsCpuCostPlannerTest, and all java tests under calcite-planner project. - Pass query_test/test_result_spooling.py and custom_cluster/test_result_spooling.py. Co-authored-by: Riza Suminto Change-Id: I5b9bf49e2874ee12de212b892bd898c296774c6f Reviewed-on: http://gerrit.cloudera.org:8080/23562 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-16 02:33:02 +00:00
Riza Suminto	898e03e9d5	IMPALA-14552: (addendum) Fix bad testcase in show-create-table.test The original IMPALA-14552 patch pass precommit tests before IMPALA-12893: (part 2) (`275f03f`) merged. As consequence, it does not catch missing comma in updated show-create-table.test. This patch add that missing comma. Testing: Pass metadata/test_show_create_table.py Change-Id: Ib06e690a81e6b0ca483b3647cc59c73802a0a7b7 Reviewed-on: http://gerrit.cloudera.org:8080/23673 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-15 21:34:44 +00:00
Mihaly Szjatinya	087b715a2b	IMPALA-14108: Add support for SHOW FILES IN table PARTITION for Iceberg tables This patch implements partition filtering support for the SHOW FILES statement on Iceberg tables, based on the functionality added in IMPALA-12243. Prior to this change, the syntax resulted in a NullPointerException. Key changes: - Added ShowFilesStmt.analyzeIceberg() to validate and transform partition expressions using IcebergPartitionExpressionRewriter and IcebergPartitionPredicateConverter. After that, it collects matching file paths using IcebergUtil.planFiles(). - Added FeIcebergTable.Utils.getIcebergTableFilesFromPaths() to accept pre-filtered file lists from the analysis phase. - Enhanced TShowFilesParams thrift struct with optional selected_files field to pass pre-filtered file paths from frontend to backend. Testing: - Analyzer tests for negative cases: non-existent partitions, invalid expressions, non-partition columns, unsupported transforms. - Analyzer tests for positive cases: all transform types, complex expressions. - Authorization tests for non-filtered and filtered syntaxes. - E2E tests covering every partition transform type with various predicates. - Schema evolution and rollback scenarios. The implementation follows AlterTableDropPartition's pattern where the analysis phase performs validation/metadata retrieval and the execution phase handles result formatting and display. Change-Id: Ibb9913e078e6842861bdbb004ed5d67286bd3152 Reviewed-on: http://gerrit.cloudera.org:8080/23455 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 21:43:10 +00:00
Zoltan Borok-Nagy	275f03f10d	IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2 This patch updates CDP_BUILD_NUMBER to 71942734 to in order to upgrade Iceberg to 1.5.2. This patch updates some tests so they pass with Iceberg 1.5.2. The behavior changes of Iceberg 1.5.2 are (compared to 1.3.1): * Iceberg V2 tables are created by default * Metadata tables have different schema * Parquet compression is explicitly set for new tables (even for ORC tables) * Sequence numbers are assigned a bit differently Updated the tests where needed. Code changes to accomodate for the above behavior changes: * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables * CREATE TABLE statements don't throw errors when Parquet compression is set for ORC tables Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e Reviewed-on: http://gerrit.cloudera.org:8080/23670 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 01:27:45 +00:00
Joe McDonnell	5f91838ada	IMPALA-14545: Don't use absolute hdfs paths for JDBC table driver.url After IMPALA-13661 merged, S3PlannerTest.testDataSourceTables has been failing with an error trying to fetch the JDBC driver for functional.jdbc_decimal_tbl. This particular table's definition uses a path like 'hdfs://localhost:20500/test-warehouse/...' which explicitly depends on HDFS rather than relying on the default filesystem. Changing this to use a path like '/test-warehouse/...' without the HDFS dependency fixes the S3PlannerTest. This changes create-ext-data-source-table.sql to a template using WAREHOUSE_LOCATION_PREFIX and replaces that variable before executing it. This is important for Ozone, as Ozone uses a WAREHOUSE_LOCATION_PREFIX set to the Ozone volume. Testing: - Ran S3 and regular HDFS fe tests Change-Id: I3f2c86fcc6c1dee75d7d9a9be04468cb197ae13c Reviewed-on: http://gerrit.cloudera.org:8080/23658 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 22:17:44 +00:00
Arnab Karmakar	760eb4f2fa	IMPALA-13066: Extend SHOW CREATE TABLE to include stats and partitions Adds a new WITH STATS option to the SHOW CREATE TABLE statement to emit additional SQL statements for recreating table statistics and partitions. When specified, Impala outputs: - Base CREATE TABLE statement. - ALTER TABLE ... SET TBLPROPERTIES for table-level stats. - ALTER TABLE ... SET COLUMN STATS for all non-partition columns, restoring column stats. - For partitioned tables: - ALTER TABLE ... ADD PARTITION statements to recreate partitions. - Per-partition ALTER TABLE ... PARTITION (...) SET TBLPROPERTIES to restore partition-level stats. Partition output is limited by the PARTITION_LIMIT query option (default 1000). Setting PARTITION_LIMIT=0 includes all partitions and emits a warning if the limit is exceeded. Tests added to verify correctness of emitted statements. Default behavior of SHOW CREATE TABLE remains unchanged for compatibility. Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2 Reviewed-on: http://gerrit.cloudera.org:8080/23536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 06:11:37 +00:00
Xuebin Su	6b6f7e614d	IMPALA-14472: Add create/read support for ARRAY column of Kudu Initial implementation of KUDU-1261 (array column type) recently merged in upstream Apache Kudu repository. This patch add initial Impala support for working with Kudu tables having array type columns. Unlike rows, the elements of a Kudu array are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays. The following types of queries are not supported in this patch: - (IMPALA-14538) Queries that reference an array column as a table, e.g. ```sql SELECT item FROM kudu_array.array_int; ``` - (IMPALA-14539) Queries that create duplicate collection slots, e.g. ```sql SELECT array_int FROM kudu_array AS t, t.array_int AS unnested; ``` Testing: - Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest. - Add EE test test_kudu.py::TestKuduArray. Since Impala does not support inserting complex types, including array, the data insertion part of the test is achieved through custom C++ code kudu-array-inserter.cc that insert into Kudu via Kudu C++ client. It would be great if we could migrate it to Python so that it can be moved to the same file as the test (IMPALA-14537). - Pass core tests. Co-authored-by: Riza Suminto Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Reviewed-on: http://gerrit.cloudera.org:8080/23493 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-08 06:41:07 +00:00
Riza Suminto	671a7fcada	IMPALA-14529: (addendum) Fix kudu_create.test Kudu throws different error message after IMPALA-14529. This patch adjust the error message in kudu_create.test to let the test pass. Testing: Pass TestDdlStatements.test_create_kudu and TestKuduHMSIntegration.test_create_managed_kudu_tables. Change-Id: Iff4cd08f46626d03b1f0800828e5872b83f522ca Reviewed-on: http://gerrit.cloudera.org:8080/23648 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-06 22:42:34 +00:00
Steve Carlin	62bf609942	IMPALA-14414: Calcite planner: Added new code to handle nan/inf The current code works for NaN and Inf, but it breaks when upgrading to v1.40. This commit changes the code to handle these when we do the upgrade to 1.40 and adds a basic test into the calcite.test to ensure that when the upgrade happens, it does not break. Change-Id: I8593a4942a2fe785a0c77134b78a9d97257225fc Reviewed-on: http://gerrit.cloudera.org:8080/23561 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-05 12:55:39 +00:00
Riza Suminto	f34dea9b6f	IMPALA-14522: Fix test_paimon_show_stats after DST ends Test failed due to mismatch on "Last Creation Time" matching. This patch fix the assertion with simple regex. Testing: Pass test_paimon.py. Change-Id: I6855c0014111cef18318cdc4904782097a070ced Reviewed-on: http://gerrit.cloudera.org:8080/23619 Reviewed-by: Mihaly Szjatinya <mszjat@pm.me> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-03 21:25:42 +00:00
jichen0919	541fb3f405	IMPALA-14092 Part1: Prohibit Unsupported Operation for paimon table This patch is to prohibit un-supported operation against paimon table. All unsupported operations are added the checked in the analyze stage in order to avoid mis-operation. Currently only CREATE/DROP statement is supported, the prohibition will be removed later after the corresponding operation is truly supported. TODO: - Patches pending submission: - Support jni based query for paimon data table. - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. Testing: - Add unit test for AnalyzeDDLTest.java. - Add unit test for AnalyzerTest.java. - Add test_paimon_negative and test_paimon_query in test_paimon.py. Change-Id: Ie39fa4836cb1be1b1a53aa62d5c02d7ec8fdc9d7 Reviewed-on: http://gerrit.cloudera.org:8080/23530 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 23:06:08 +00:00
Michael Smith	ea0ef5a799	IMPALA-14511: Fix pgrep to avoid warning kill-all.sh tries to find a process named mini-impalad-cluster with, which results in an (ignored) error pgrep: pattern that searches for process name longer than 15 characters will result in zero matches Try `pgrep -f' option to match against the complete command line. This was accidentally changed from mini-impala-cluster in 2015. Neither term is used anymore, so this process name will never exist. Remove it to fix the error. Change-Id: Id1340e85cbcd3b699b333316da618774cb4e9dcd Reviewed-on: http://gerrit.cloudera.org:8080/23586 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-23 22:00:36 +00:00
pranav.lodha	7f77176970	IMPALA-13869: Support for 'hive.sql.query' property for Hive JDBC tables This patch adds support for the hive.sql.query table property in Hive JDBC tables accessed through Impala. Impala has support for Hive JDBC tables using the hive.sql.table property, which limits users to simple table access. However, many use cases demand the ability to expose complex joins, filters, aggregations, or derived columns as external views. Hive.sql.query leads to a custom SQL query that returns a virtual table(subquery) instead of pointing to a physical table. These use cases cannot be achieved with just the hive.sql.table property. This change allows Impala to: • Interact with views or complex queries defined on external systems without needing schema-level access to base tables. • Expose materialized logic (such as filters, joins, or transformations) via Hive to Impala consumers in a secure, abstracted way. • Better align with data virtualization use cases where physical data location and structure should be hidden from the querying engine. This patch also lays the groundwork for future enhancements such as predicate pushdown and performance optimizations for Hive JDBC tables backed by queries. Testing: End-to-end tests are included in test_ext_data_sources.py. Change-Id: I039fcc1e008233a3eeed8d09554195fdb8c8706b Reviewed-on: http://gerrit.cloudera.org:8080/22865 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 21:34:29 +00:00

1 2 3 4 5 ...

3226 Commits