impala

mirror of https://github.com/apache/impala.git synced 2026-02-03 09:00:39 -05:00

Author	SHA1	Message	Date
Steve Carlin	9220b9699f	IMPALA-14710: Fixed flaky test in TestReduceExprShuttle The charset system property is being set in CalciteCompilerFactory. It seems that if this test is run via mvn clean install test ...it runs fine, but if it is called via mvn clean install -Dtest=TestReduceExprShuttle#testFoldConcatString ... the static initializer isn't called. This could either be fixed by importing CalciteCompilerFactory or explicitly setting the static initializer in this class. The latter was chosen because it would be awkward to have a java class only imported due to a static initializer. However, the downside is that this is duplicate code. Change-Id: Iecb124f43bd7090411bdf1bb8203c15d75158154 Reviewed-on: http://gerrit.cloudera.org:8080/23919 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2026-01-30 00:10:50 +00:00
Steve Carlin	593b0bfad3	IMPALA-13712: Calcite Planner - Enable constant folding Constant folding is enabled by this patch. Calcite does constant folding via the RexExecutor.reduce() method. However, we need to use Impala's constant folding algorithm to ensure that Impala expressions are folded. This is done through the derived class ImpalaRexExecutor and is called from the Simplify rules. The ImpalaRexExecutor calls an internal shuttle class which recursively walks through the RexNode which checks if portions of the expression can be constant folded. Some expressions are not folded due to various reasons: - We avoid folding 'cast(1.2 as double)' type expressions because folding this creates an inexact number, and this is problematic for partition pruning directory names on double columns which contain the exact number (1.2 in this case). - Interval expressions are skipped temporarily since the Expr class generated is not meant to be simplified. However, an Expr object that contains an IntervalExpr may be simplified. There is a special case that needed to be handled for a values query with different sized arguments across rows. In Calcite version 1.40 (not yet upgraded as of this commit), an extra cast is added around smaller strings to ensure the char(x) is the same size across all rows. However, this adds extra spaces to the string which causes results different from the original Impala planner. This must be caught before Calcite converts the abstract syntax tree into a RelNode logical tree. A special RexExecutor has been created to handle this which looks for char casts around a char literal and removes it. This is fine because the literal will be changed into a string in the "coercenodes" module. Change-Id: I98c21ef75b2f5f8e3390ff5de5fdf45d9645b326 Reviewed-on: http://gerrit.cloudera.org:8080/23723 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-28 20:32:12 +00:00
Steve Carlin	37a8007df0	IMPALA-14434: Calcite planner: implement partition key scan optimization Implemented the partition key scan optimization for Calcite planner Most of the code was already in place. Just needed to refactor some code in SingleNodePlanner to make it callable from Calcite and use the already created isPartitionKeyScan method in ImpalaHdfsScanRel. Testing was done by running the Impala e2e tests with the use_calcite_planner flag set to true. Change-Id: I7b5b8a8115f65f6be27a5be0e19f21eebab61a32 Reviewed-on: http://gerrit.cloudera.org:8080/23691 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2026-01-27 12:47:12 +00:00
Steve Carlin	2360a06e4a	IMPALA-14525: Calcite planner: Add support for RexSimplify RexSimplify is a class in Calcite that simplifies expressions into something more optimal. It was disabled up until this point because it converts IN clauses into a Calcite internal SEARCH object which isn't directly supported by Impala. This commit brings back the RexSimplify class. The SEARCH operator is now converted into an IN operator when RexNode objects are changed into Expr objects. Some notes about the changes that had to be made: - some small refactoring needed to be done in the Impala Expr objects. - RexSimplify is very stringent about operators that are nullable, as there is an assert when certain operators are checked. There is logic in the CoerceOperandShuttle that ensures the nullability is now set correctly. - Some duplicated logic at line 148 in CoerceOperandShuttle was removed, (existing logic in getReturnType) - The AnalyzedInPredicate subclass was created to avoid analysis done in InPredicate. - Removed ImpalaRexBuilder logic which avoided creation of the SEARCH op. - Created ImpalaRexSimplify which extends RexSimplify. RexSimplify causes regressions with NaN on comparisons with Double. For instance, "where not(my_col > 30)" changes to "where my_col <= 30": The first expression returns true when my_col is NaN and the second expression returns false. So ImpalaRexSimplify looks for the existence of any binary comparison operator with Double in it and avoids the simplification. - Added ImpalaRexUtil which copies the RexUtil.expandSearch() method that converts the SEARCH operator into non-search operators. The version here handles the conversion to the custom Impala IN operator. - Created an ImpalaCoreRules class. Even though RexSimplify is supported, it is important it is run through ImpalaRexSimplify. The RexSimplify is disabled for the SqlToRelNode converter and for all rules given by Calcite. ImpalaCoreRules also has the benefit of having one place where one can find all the rules used by Impala. - Created simplify rules for the filter condition, and the projects in the project object. - Changed the FilterSelectivityEstimator to get the selectivity for the SEARCH operator. - Added a couple of rules in the optimizer for a bug that was being exposed when enabling the SEARCH operator. The PROJECT_JOIN_TRANSPOSE was removed because it did not serve any purpose, as we transpose JOIN_PROJECT in the join phase. Some other rules were added to help with pushdown predicates like JOIN_DERIVE_IS_NOT_NULL_FILTER and JOIN_PUSH_EXPRESSIONS. And the Simplifier rules have also been added. - Some of the new rules caused many changes in the estimations of cardinality and memory. The one noticeable change was using IsNullPredicate for the IS_NULL and IS_NOT_NULL operators. Previously, these functions were using FunctionCallExpr, and the cardinality estimation was way off. - Fixed a small bug in RexLiteralConverter where a string literal was treated as a VARCHAR. A string literal should always be treated as a STRING. Change-Id: I44792688f361bf15affa565e5de5709f64dcf18c Reviewed-on: http://gerrit.cloudera.org:8080/23679 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2026-01-26 00:29:12 +00:00
Steve Carlin	83036b13e5	IMPALA-14487: Calcite planner: handle escaped double quote character Added the double quote character to the fix added for IMPALA-13525 to handle escaped characters. Change-Id: Ic65fbb4546ae071a9442c0b4884254c15b268087 Reviewed-on: http://gerrit.cloudera.org:8080/23695 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-11 22:28:53 +00:00
ttttttz	ee67ede314	IMPALA-12349: Support Apache Hive 2.x in Impala Like IMPALA-10871, this patch adds MetastoreShim to support Apache Hive 2.x. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the three shims is added to as source using the fe/pom.xml build plugin. And select the dependencies related to Hive in the fe/pom.xml based on the environment variable IMPALA_HIVE_MAJOR_VERSION. There are some duplicate classes under compat-apache-hive2 directory, e.g. fe/src/compat-apache-hive-2/java/ org/apache/impala/catalog/events/MetastoreEvents.java duplicates fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java The class in compat-apache-hive2 is a simplified version that works with Apache Hive 2.x. So we don't need to extract lots of Hive-dependent codes in MetastoreEvents.java into the metastore shim. Due to this, the build process simply remove the original source code when building on Apache Hive 2. Additionally, it should be noted that all the code in the fe/src/compat-apache-hive-2/java/org/apache/hadoop/hive directory comes from Apache Hive 3.x, original source: https://github.com/apache/hive/blob/branch-3.1 In order to reduce the unnecessary intrusion into the code, skip all tests when building with Apache Hive 2.x. If wanting to build Impala adapted to Apache Hive 2.x, please set the following environment variables before `source bin/impala-config.sh`: export USE_APACHE_COMPONENTS=true export USE_APACHE_HIVE_2=true TODO: - IMPALA-14581: Support testing related to Apache Hive 2 in the minicluster. Testing: - Compile using the -package option to obtain the package. After deployment, perform all types of query tests, including SELECT, INSERT, CREATE TABLE, ALTER TABLE, COMPUTE STATUS, etc. In addition, comprehensive testing has been conducted on the metadata auto-synchronization functionality. The tests confirm that all event types are supported except for AlterDatabaseEvent, AllocWriteIdEvent, AbortTxnEvent, PseudoAbortTxnEvent, and CommitCompactionEvent. It is worth noting that these unsupported events are not generated in Apache Hive 2, so their lack of processing support does not impact the functionality. Change-Id: Ib5f104dc8d131835b8118b9d54077471db65681c Reviewed-on: http://gerrit.cloudera.org:8080/21760 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2026-01-06 16:02:01 +00:00
Riza Suminto	d4992d532b	Revert "IMPALA-14454: Exclude log4j 2 dependencies" This reverts commit `52b87fcefd`. The original commit caused an issue when Impala is deployed together with Apache Atlas. Coordinator failed to start with error message: java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/Layout Solved minor conflict in impala-config.sh due to IMPALA-14478 applied after IMPALA-14454. Change-Id: I77127db8d833c675c18c30eb3d6542ca906cd2a9 Reviewed-on: http://gerrit.cloudera.org:8080/23788 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-16 00:26:34 +00:00
Arnab Karmakar	ddd82e02b9	IMPALA-14065: Support WHERE clause in SHOW PARTITIONS statement This patch extends the SHOW PARTITIONS statement to allow an optional WHERE clause that filters partitions based on partition column values. The implementation adds support for various comparison operators, IN lists, BETWEEN clauses, IS NULL, and logical AND/OR expressions involving partition columns. Non-partition columns, subqueries, and analytic expressions in the WHERE clause are not allowed and will result in an analysis error. New analyzer tests have been added to AnalyzeDDLTest#TestShowPartitions to verify correct parsing, semantic validation, and error handling for supported and unsupported cases. Testing: - Added new unit tests in AnalyzeDDLTest for valid and invalid WHERE clause cases. - Verified functional tests covering partition filtering behavior. Change-Id: I2e2a14aabcea3fb17083d4ad6f87b7861113f89e Reviewed-on: http://gerrit.cloudera.org:8080/23566 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-11 15:36:08 +00:00
jichen0919	7e29ac23da	IMPALA-14092 Part2: Support querying of paimon data table via JNI This patch mainly implement the querying of paimon data table through JNI based scanner. Features implemented: - support column pruning. The partition pruning and predicate push down will be submitted as the third part of the patch. We implemented this by treating the paimon table as normal unpartitioned table. When querying paimon table: - PaimonScanNode will decide paimon splits need to be scanned, and then transfer splits to BE do the jni-based scan operation. - We also collect the required columns that need to be scanned, and pass the columns to Scanner for column pruning. This is implemented by passing the field ids of the columns to BE, instead of column position to support schema evolution. - In the original implementation, PaimonJniScanner will directly pass paimon row object to BE, and call corresponding paimon row field accessor, which is a java method to convert row fields to impala row batch tuples. We find it is slow due to overhead of JVM method calling. To minimize the overhead, we refashioned the implementation, the PaimonJniScanner will convert the paimon row batches to arrow recordbatch, which stores data in offheap region of impala JVM. And PaimonJniScanner will pass the arrow offheap record batch memory pointer to the BE backend. BE PaimonJniScanNode will directly read data from JVM offheap region, and convert the arrow record batch to impala row batch. The benchmark shows the later implementation is 2.x better than the original implementation. The lifecycle of arrow row batch is mainly like this: the arrow row batch is generated in FE,and passed to BE. After the record batch is imported to BE successfully, BE will be in charge of freeing the row batch. There are two free paths: the normal path, and the exception path. For the normal path, when the arrow batch is totally consumed by BE, BE will call jni to fetch the next arrow batch. For this case, the arrow batch is freed automatically. For the exceptional path, it happends when query is cancelled, or memory failed to allocate. For these corner cases, arrow batch is freed in the method close if it is not totally consumed by BE. Current supported impala data types for query includes: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE TODO: - Patches pending submission: - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Snapshot incremental read. - Complex type query support. - Native paimon table scanner, instead of jni based. Testing: - Create tests table in functional_schema_template.sql - Add TestPaimonScannerWithLimit in test_scanners.py - Add test_paimon_query in test_paimon.py. - Already passed the tpcds/tpch test for paimon table, due to the testing table data is currently generated by spark, and it is not supported by impala now, we have to do this since hive doesn't support generating paimon table for dynamic-partitioned tables. we plan to submit a separate patch for tpcds/tpch data loading and associated tpcds/tpch query tests. - JVM Offheap memory leak tests, have run looped tpch tests for 1 day, no obvious offheap memory increase is observed, offheap memory usage is within 10M. Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Reviewed-on: http://gerrit.cloudera.org:8080/23613 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-12-05 18:19:57 +00:00
Joe McDonnell	5eea4f6f79	IMPALA-14559: Ship calcite-planner jar in Impala packages This adds the java/impala-package Maven project to make it easier to ship / test the Calcite planner. impala-package has a dependency on impala-frontend and calcite-planner, so its classpath requires no extra work when constructing the classpath. An additional cleanup is that this no longer puts the impala-frontend-*-tests.jar on the classpath by default. This requires updating the query event hooks test, as it relies on that jar being present. This does not change the default value for the use_calcite_planner query option, so there is no change in behavior. Testing: - Ran a core job - Built docker images and OS packages locally Change-Id: I81dec2a5b59e279229a735c8bb1a23c77111a793 Reviewed-on: http://gerrit.cloudera.org:8080/23497 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-21 03:36:12 +00:00
Steve Carlin	e67b627858	IMPALA-14408: (addendum) Log Calcite exception in profile This addendum logs the exception thrown in the runtime profile under the CalciteFailureReason key. Testing: test_ranger.py uses this. Change-Id: Ia18a52c488f9c73d51690997b277fd8e918c645f Reviewed-on: http://gerrit.cloudera.org:8080/23686 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-20 21:08:48 +00:00
Steve Carlin	a6bb0c7c45	IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend When the --use_calcite_planner=true option is set at the server level, the queries will no longer go through CalciteJniFrontend. Instead, they will go through the regular JniFrontend, which is the path that is used when the query option for "use_calcite_planner" is set. The CalciteJniFrontend will be removed in a later commit. This commit also enables fallback to the original planner when an unsupported feature exception is thrown. This needed to be added to allow the tests to run properly. During initial database load, there are queries that access complex columns which throws the unsupported exception. Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f Reviewed-on: http://gerrit.cloudera.org:8080/23523 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-11-20 21:08:48 +00:00
Steve Carlin	54c0074b33	IMPALA-14405 ADDENDUM: Catch exception for bad column names This commit is a fix on top of IMPALA-14405 for the Calcite planner. The original commit matches column names from the expression in the select clause. For instance, if the query is "select 1 + 1", the label in impala-shell will be "1 + 1". It accomplished this by retrieving the string from the SqlNode object through the MySql dialect. However, when the expression doesn't succeed in the MySql dialect, an AssertionError gets thrown, causing the query to fail. We don't want the query to fail, we just want to go back to using the Calcite expression, e.g. EXPR$0. This occurred with this specific query: "select timestamp_col + interval 3 nanoseconds" So now the exception is caught and the default label name is used. Eventually we should try to match what Impala has, but this is a harder problem to fix. Change-Id: I6c4d76a25fb2486eb1ef19485bce7888d45d282f Reviewed-on: http://gerrit.cloudera.org:8080/23665 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-11-18 21:34:29 +00:00
Steve Carlin	52334ba426	IMPALA-14421: Calcite planner: case statement returning wrong types for char, varchar The 'case' function resolver in the original Impala planner has a quirk in it which caused issues in the Calcite planner. The function resolver for the original planner resolves all case statements with the "boolean" version. Later on, in the analysis of the CaseExpr, the proper types are assessed and the necessary casting is added. The Calcite planner follows a similar path. The resolver always returns boolean as well and the coerce nodes module determines the proper return type for the case statement. Two other related issues are also fixed here: Literal strings should be treated as type STRING instead of CHAR(X), but a null should literal should not be changed from a CHAR(x) to a STRING. This broke a 'case' test in the test framework where the columns were non-literals with type char(x), and the return value was a "null" which should not have forced a cast to string. A cast from a varchar to a varchar should be ignored. Testing: Added a test to calcite.test. Ensured the existing cast test in test_chars.py passed. Ran through the Jenkins Calcite testing framework. Change-Id: I82d657f4bfce432c458ee8198188dadf9f23f2ef Reviewed-on: http://gerrit.cloudera.org:8080/23560 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-18 07:47:39 +00:00
Steve Carlin	bc99705252	IMPALA-13902: Calcite planner: Implement is_spool_query_results The is_spool_query_results query option is now supported in Calcite. The returnAtMostOneRow method is now implemented to support this. PlanRootSink is refactored to extract sanitizing query options (a new method sanitizeSpoolingOptions()) out of PlanRootSink.computeResourceProfile(). The bulk of memory bounding calculation is also extracted out to a new class SpoolingMemoryBound. Added "sleep" in ImpalaOperatorTable.java since some EE tests related to result spooling calls sleep() function. Changed ImpalaPlanRel to extends RelNode interface. A sanity test has been added to calcite.test, but the bulk of the testing will be done through the Impala test framework when it is enabled. Testing: - Pass FE tests PlannerTest#testResultSpooling, TpcdsCpuCostPlannerTest, and all java tests under calcite-planner project. - Pass query_test/test_result_spooling.py and custom_cluster/test_result_spooling.py. Co-authored-by: Riza Suminto Change-Id: I5b9bf49e2874ee12de212b892bd898c296774c6f Reviewed-on: http://gerrit.cloudera.org:8080/23562 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-16 02:33:02 +00:00
Michael Smith	d09940b5dd	IMPALA-13563: Cleanup logging Cleans up calls to logDebug and a few other locations: - exit early if producing debug message input is expensive - use slf4j parameterized logging - normalize on logDebug handling isDebugEnabled checks Change-Id: I32e1c62511c292d36aa879c60ae3d91ed4f65697 Reviewed-on: http://gerrit.cloudera.org:8080/22090 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-11 05:29:58 +00:00
Steve Carlin	62bf609942	IMPALA-14414: Calcite planner: Added new code to handle nan/inf The current code works for NaN and Inf, but it breaks when upgrading to v1.40. This commit changes the code to handle these when we do the upgrade to 1.40 and adds a basic test into the calcite.test to ensure that when the upgrade happens, it does not break. Change-Id: I8593a4942a2fe785a0c77134b78a9d97257225fc Reviewed-on: http://gerrit.cloudera.org:8080/23561 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-05 12:55:39 +00:00
Steve Carlin	c67b19daf6	IMPALA-14405: Labels for Calcite expressions not matching original planner Calcite sets literal expressions to EXPR$<x> which did not match expressions given by the Impala planner. For literal expressions such as "select 1 + 1", Impala creates the column name as "1 + 1". The field names can be found in the abstract syntax tree, so they are not set within the CalciteRelNodeConverter before the logical tree is created. A small test was added to calcite.test for a basic sanity check, but more comprehensive tests will be run in the tests/shell module (e.g. in test_shell_commandline.py and test_shell_interactive) which contain tests for labels. Change-Id: Ibd3e6366a284f53807b4b2c42efafa279249c1ea Reviewed-on: http://gerrit.cloudera.org:8080/23516 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-22 03:37:48 +00:00
Steve Carlin	420e357b95	IMPALA-13695: Calcite planner: fix for ndv with 2 args The NDV function was crashing when called with the "scale" arg. This requires special processing which exists in FunctionCallExpr. The validation for this is now done in ImpalaNdvFunction and the special calculation is done within ImpalaAggRel This also fixes ndv for varchar types. The aggregation call within CoerceNodes was not differentiating between varchar and string. A cast to string function is needed in order to run the ndv function on a varchar column. Change-Id: I82419f77e043e9975865a042ffb8db75a26931f7 Reviewed-on: http://gerrit.cloudera.org:8080/23513 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-20 23:28:39 +00:00
Michael Smith	7fb986e47a	IMPALA-14504: Use shaded hbase, protobuf from Hadoop Switches to shaded Hbase so it can include its own versions of dependencies. Note that hbase-client includes hbase-common, hbase-protocol. Excludes older protobuf-java from mysql-connector so we get it from Hadoop. Allows orc-format 1.0, which is a dependency in future ORC releases. Change-Id: I386d03c3123ce1159abc54c505f60e0ae619f5fe Reviewed-on: http://gerrit.cloudera.org:8080/23553 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-17 01:47:18 +00:00
Steve Carlin	69813a8c40	IMPALA-14464: Calcite planner should allow semi-colon in statement The Calcite planner now handles a sql statement that has a semi-colon at the end. Note that impala-shell doesn't pass the semi-colon into the server. This is only seen with a direct call to the server. Change-Id: Ie690159cd03f28f6b793628aa946292af71b6970 Reviewed-on: http://gerrit.cloudera.org:8080/23517 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-17 00:59:44 +00:00
Riza Suminto	3560621931	IMPALA-14503: Log maven dependency when building frontend Impala Frontend has plenty of dependency, along with multitudes of dependency exclusion/inclusion rules in it. This patch adds maven dependency tree log to logs/mvn/mvn.log when invoking "make java" command. Testing: Manually run "make java" from $IMPALA_HOME and verify that the dependency trees are logged to logs/mvn/mvn.log. Change-Id: I8cbe20faeab24bae708733d54996bd6c1dd97757 Reviewed-on: http://gerrit.cloudera.org:8080/23551 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-15 22:56:05 +00:00
Steve Carlin	cde4bc016c	IMPALA-14115: Calcite planner: Added top-n analytic PlanNode optimization. Impala has an optimization for analytic expressions that have a rank filter on top of the analytic expression. It can add a top-n plan node to reduce the amount of rows examined. This is tested in tpcds query 67. The optimization logic relies on an unassigned rank conjunct within the analyzer while creating the analytic plan node. A slight reorganization of the code was needed to implement this optimization. The SlotRefs for the AnalyticInfo needed to be created a little earlier from where it was done in the previous commit. A small fix was made to normalize binary predicates. A non-normalized binary predicate prevents the optimization from being used. A call to the checkAndApplyLimitPushdown is needed for some of the optimizations to kick in. A new AllProjectInfo internal class was created to hold the relationships between the Calcite RexNode objects and the Impala Analytic expressions. Also, IMPALA-14158 is fixed by this commit. The nullsFirst value was incorrect when the syntax was explicit in the query. A new Calcite planner test was added in the junit tests to ensure the optimization kicks in. The new test file is in the PlannerTest/calcite/limit-pushdown-analytic-calcite.test file. This is a copy of the limit-pushdown-analytic.test file in its parent directory but with some modified results. Most of the differences are trivial, but IMPALA-14469 has been filed to deal with one optimization that did not get fixed, which is when the order by clause has a constant expression. Change-Id: Ie6fa6781db56771b13b0cf49bd236f776016bf8d Reviewed-on: http://gerrit.cloudera.org:8080/23317 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-10-10 17:11:45 +00:00
Michael Smith	17b3f9ee88	IMPALA-14470: Migrate fair scheduler to slf4j Moves our fair scheduler code off commons-logging to use slf4j like the rest of Impala. Relies on the reload4j implementation to add an appender for message capture. Change-Id: Ia94d512f61c7e959c17e1139dceac31ad1a01bf2 Reviewed-on: http://gerrit.cloudera.org:8080/23478 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-01 04:59:58 +00:00
Steve Carlin	6aa4df4443	IMPALA-14105: Calcite planner: Runtime filters not being applied with outer joins Previous to this commit, outer join conjuncts were not being placed into the ValueTransfersGraph which prevented them from being considered for runtime filters. This caused a slowdown in some tpcds queries. The conjuncts are now registered with the ImpalaJoinRel. The appropriate TableRef objects are picked up from the underyling plan nodes. Change-Id: I9e06d3f35a10f35ff8b57ba25dbab1bc6a35238a Reviewed-on: http://gerrit.cloudera.org:8080/23318 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-28 23:40:32 +00:00
Steve Carlin	a6dbd4015c	IMPALA-14106: Calcite planner: Register equivalent union expressions in value transfer graph This commit registers the equivalent union expressions in the value transfer graph when the physical union node is created for the Calcite planner. Change-Id: I4c858ae82a1cb7b89b0ae4e70205d8eeaeb28687 Reviewed-on: http://gerrit.cloudera.org:8080/23316 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-24 22:58:33 +00:00
Michael Smith	52b87fcefd	IMPALA-14454: Exclude log4j 2 dependencies While we use reload4j, we can safely exclude log4j 2 dependencies to reduce the size of our artifacts. Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819 Reviewed-on: http://gerrit.cloudera.org:8080/23439 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-24 18:04:06 +00:00
Michael Smith	5137bb94ac	IMPALA-14446: Clean up pom.xml Cleans up repetitive patterns in pom.xml. Centralize plugin configuration in pluginManagement. Replace inline maven-compiler-plugin configuration with newer maven.compiler.release and update to latest plugin version. Centralize common dependencies in dependencyManagement, including exclusions when appropriate. Remove exclusions that are no longer relevant. Compared before and after with dependency:tree; only difference is that commons-cli now comes from hadoop and jersey-serv{let,er} are effectively excluded; all versions matched. Also ensured USE_APACHE_COMPONENTS=true compiles. Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure it's not accidentally included alongside impala-minimal-s3a-aws-sdk. Removes missed io.netty exclusion from IMPALA-12816. Updates commons-dbcp2 to 2.12.0 to match Hive. Change-Id: If96649840e23036b4a73ee23e8d12516497994f0 Reviewed-on: http://gerrit.cloudera.org:8080/23432 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 02:50:22 +00:00
Peter Rozsa	b0f1d49042	IMPALA-14016: Add multi-catalog support for local catalog mode This patch adds a new MetaProvider called MultiMetaProvider, which is capable of handling multiple MetaProviders at once, prioritizing one primary provider over multiple secondary providers. The primary provider handles some methods exclusively for deterministic behavior. In database listings, if one database name occurs multiple times the contained tables are merged under that database name; if the two separate databases contain a table with the same name, the query analyzation fails with an error. This change also modifies the local catalog implementation's initialization. If catalogd is deployed, then it instantiates the CatalogdMetaProvider and checks if the catalog configuration directory is set as a backend flag. If it's set, then it tries to load every configuration from the folder, and tries to instantiate the IcebergMetaProvider from those configs. If the instantiation fails, an error is reported to the logs, but the startup is not interrupted. Tests: - E2E tests for multi-catalog behavior - Unit test for ConfigLoader Change-Id: Ifbdd0f7085345e7954d9f6f264202699182dd1e1 Reviewed-on: http://gerrit.cloudera.org:8080/22878 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-09-19 15:03:59 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Steve Carlin	8b057881c7	IMPALA-14102: [part 2] Fixed the JoinTranspose rule. This fix is needed before the join optimization fix can be committed. The JoinTranspose rule provided by Calcite was having 2 issues: 1) For tpcds-q10 and q35, an exception was being thrown. There is a bug in the Calcite code when the Join-Project gets matched but the Join is of reltype SemiJoin. In this case, the Projects do not get created correctly and the exception gets thrown. 2) We only want to transpose a Project above a Join if there is an underlying Join underneath the Project. The whole purpose is to be able to create adjacent Join RelNodes. We do not have to transpose the Project when it is not sandwiched between two Join nodes. It is preferable to keep it underneath the Join since the row width calculation would be affected (the Project may reduce the number of columns, thus reducing the row width). This commit extends the given JoinProjectTranspose rule by Calcite and handles these added restrictions. Change-Id: I7f62ec030fc8fbe36e6150bf96c9673c44b7da1b Reviewed-on: http://gerrit.cloudera.org:8080/23313 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-08 02:49:21 +00:00
Fang-Yu Rao	1ff4e1b682	IMPALA-13767: Do not treat CTEs as names of actual tables This patch implements an additional check when collecting table names that are used in the given query. Specifically, for a table name that is not fully qualified, we make sure the derived fully qualified table name is not a common table expression (CTE) of a SqlWithItem in a WITH clause since such CTE's are not actual tables. Testing: - Added a test in test_ranger.py to verify the issue is fixed. Change-Id: I3f51af42d64cdcff3c26ad5a96c7f53ebef431b3 Reviewed-on: http://gerrit.cloudera.org:8080/23209 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>	2025-09-06 05:25:03 +00:00
Steve Carlin	83499e5be7	IMPALA-14102: [part 1] Calcite Planner: optimize join rule This is part 1 of the commit for optimizing join rules for Calcite. This commit is just a copy of the LoptOptimizeJoinRule.java from Calcite v1.37 for subsequent modification. The purpose of this commit is to serve as a placeholder starting point so we can easily see the customized changes that are made by comparing Impala specific modifications for the rule which will be done in subsequent commits for IMPALA-14102. Change-Id: I63daf6dacf0547a0488c1ecf0bc185b548e00d87 Reviewed-on: http://gerrit.cloudera.org:8080/23312 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 04:05:09 +00:00
Steve Carlin	e74495e656	IMPALA-14101: [part 2] Calcite planner: Add cost model calculations This commit adds the cost model and calculations to be used in the join optimizer rule. The ImpalaCost object implements the RelOptCost interface and contains values which contribute to a cost. The ImpalaCost object roughly mirrors the Calcite VolcanoCost object with some slight variations. The ImpalaCost object only looks at the cpu and io cost and ignores the rowCount cost. The rowCount cost is not needed because it is already baked into the cpu and io results. That is to say, we determine the cpu cost and io cost by using the rowCount cost. The ImpalaCost object is generated in the ImpalaRelMdNonCumulativeCost class which is called from Calcite for a given RelNode. The cost generated by this object uses the various inputs of the RelNode to calculate the cpu and io time for the given logical node. Note that this is a non-cumulative cost. A cumulative cost exists within Calcite as well, but there was no need to change the cumulative cost logic. The cost is used by the Calcite LoptOptimizeJoinRule when determining join ordering. It will compare costs of different join ordering and choose the join ordering with a lower cost. With the current iteration, we only customize the costs for Impala for aggregates, table scans, and joins. A TODO in this commit is to allow various cpu and io costs to be configurable. Change-Id: I1e52b0e11e9a6d5814b0313117dd9c56602f3ff5 Reviewed-on: http://gerrit.cloudera.org:8080/23311 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-03 04:05:09 +00:00
Fang-Yu Rao	0b9e2b2cd1	IMPALA-13011: Support authorization for Calcite in Impala This patch adds support for authorization when Calcite is the planner. Specifically, this patch focuses on the authorization of table-level and column-level privilege requests, including the case when a table is a regular view, whether the view was created by a superuser. Note that CalciteAnalysisDriver would throw an exception from analysis() if given a query that requires table masking, i.e., column masking or row filtering, since this feature is not yet supported by the Calcite planner. Moreover, we register the VIEW_METADATA privilege for each function involved in the given query. We hardcode the database associated with the function to 'BuiltinsDb', which is a bit hacky. We should not be doing this once each function could be associated with a database when we are using the Calcite planner. We may need to change Calcite's parser for this. The issue reported in IMPALA-13767 will be taken care of in another separate patch and hence this patch could incorrectly register the privilege request for a common table expression (CTE) in a WITH clause, preventing a legitimate user from executing a query involving CTE's. Testing: - We manually verified that the patch could pass the test cases in AuthorizationStmtTest#testPrivilegeRequests() except for "with t as (select * from alltypes) select * from t", for which the fix will be provided via IMPALA-13767. - Added various tests in test_ranger.py. Change-Id: I9a7f7e4dc9a86a2da9e387832e552538e34029c1 Reviewed-on: http://gerrit.cloudera.org:8080/22716 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 00:15:22 +00:00
Steve Carlin	048b5689fd	IMPALA-14080: Support LocalFsTable table types in Calcite planner. IMPALA-13947 changes the use_local_catalog default to true. This causes failure for when the use_calcite_planner query option is set to true. The Calcite planner was only handling HdfsTable table types. It will now handle LocalFsTable table types as well. Currently, if table num rows is missing from table, Calcite planner will load all partitions to estimate by iterating all partitions. This is inefficent in local catalog mode and ideally should happen later after partition prunning. Follow up work is needed to improve this. Testing: Reenable local catalog mode in TestCalcitePlanner.test_calcite_frontend TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal Co-authored-by: Riza Suminto Change-Id: Ic855779aa64d11b7a8b19dd261c0164e65604e44 Reviewed-on: http://gerrit.cloudera.org:8080/23341 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-27 03:17:13 +00:00
Steve Carlin	ff58c5d42f	IMPALA-14101: [part 1] Commit Cost file from Calcite This commit is just a copy of the VolcanoCost.java file from Calcite into this Impala repository. The file can be found here: https://github.com/apache/calcite/blob/calcite-1.37.0/core/src/main/... .../java/org/apache/calcite/plan/volcano/VolcanoCost.java The only differences between this file and the Calcite file are: 1) All VolcanoCost strings have been changed to ImpalaCost 2) The package name is an Impala package. This will make it easier to show the changes made for the Impala cost model change in IMPALA-14101. Change-Id: I864e20fb63c0ae4f2f88016128d2a68f39e17dfb Reviewed-on: http://gerrit.cloudera.org:8080/23310 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-20 21:01:29 +00:00
Steve Carlin	5244f6169e	IMPALA-14061: Calcite Planner: added Calcite rules This commit adds Calcite optimization rules to create more efficient plans. These rules should be considered a work in progress. These were tested against a 3TB tpcds database so they are fairly efficient as/is, but we can make improvements as we see them along the way. Most of the changes have been added to the CalciteOptimizer file. There are several phases of rules that are applied, which are as follows: - expand nodes: These rules change the plan to a plan that can be handled by Impala. For instance, there are RelNodes such as "LogicalIntersect" which are not directly applicable to the Impala physical nodes so they need to be expanded. - coerce nodes: This module changes the nodes so they have the correct datatype values (e.g. literal strings in Calcite are char but need to be varchar for Impala) - optimize nodes: first pass on reordering the logical RelNode ordering. - join: Squishes the join RelNodes together, pushes them into one "multiJoin" and then lets Calcite's join optimizer reorder the joins into a more optimal plan. A note on this: with this iteration, statistics are still not being applied. This will come in with later commits to make better plans. - post join optimize nodes: Reruns the optimize nodes since the join ordering may present new optimization opportunities - pre Impala commit: Extra massaging after optimization that is done at the end - conversion to Impala RelNodes: Maps Calcite RelNodes into Impala RelNodes which will then be mapped to Impala PlanNodes In addition to this general change, there is also a change with removing the "toCNF" rule. Calcite has multiple places where it creates a SEARCH operator via "simplifying" the RexNodes within various rules. This operator is not supported directly in Impala and we need to call "expandSearch" to handle this. Because Impala does this under the covers in the rules, this has been fixed by overriding the RexBuilder (with ImpalaRexBuilder) and expanding the SEARCH operator whenever it is called (sidenote: we could have changed the rules that called simplify, but that would have resulted in too much code duplication). The toCNF rule was removed and placed as a call within the CoerceOperandShuttle, which already manipulates all the RexNodes, so all that code is now in one place. Change-Id: I6671f7ed298a18965ef0b7a5fc10f4912333a52b Reviewed-on: http://gerrit.cloudera.org:8080/22870 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-20 12:03:16 +00:00
Steve Carlin	922443da46	IMPALA-14165: Type coercion code accidentally omitted from analysis On the first cut of creating the Calcite planner, the Calcite planner was standalone and ran its own JniFrontend. In the current version, the parsing, validating, and single node planning is called from the Impala framework. There is some code in the first cut regarding the "ImpalaTypeCoercionFactory" class which handles deriving the correct data type for various expressions, for instance (found in exprs.test): select count(*) from alltypesagg where 10.1 in (tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col) Without this patch, the query returns the following error: UDF ERROR: Decimal expression overflowed This code can be found in CalciteValidator.java, but was accidentally omitted from CalciteAnalysisDriver. Change-Id: I74c4c714504400591d1ec6313f040191613c25d9 Reviewed-on: http://gerrit.cloudera.org:8080/23039 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-08-10 17:00:54 +00:00
Steve Carlin	98b6c0208f	IMPALA-14094: Calcite planner: Use table and column statistics for optimization This commit enables the Calcite planner join optimization rule to make use of table and column statistics in Impala. The ImpalaRelMetadataProvider class provides the metadata classes to the rule optimizer. All the ImpalaRelMd* classes are extensions of Calcite Metadata classes. The ones overridden are: ImpalaRelMdRowCount: This provides the cardinality of a given type of RelNode. The default implementation in the RelMdRowCount is used for some of the RelNodes. The ones overridden are: TableScan: Gets the row count from the Table object. Filter: Calls the FilterSelectivityEstimator and adjusts the number of rows based on the selectivity of the filter condition. Join: Uses our own algorithm to determine the number of rows that will be created by the join condition using the JoinRelationInfo (more on this below). ImpalaRelMdDistinctRowCount: This provides the number of distinct rows returned by the RelNode. The default implementation in the RelMdDistinct RowCount is used for some of the RelNodes. The ones overridden are: TableScan: Uses the stats. If stats are not defined, all rows will be marked as distinct. Aggregate: For some reason, Calcite sometimes returns a number of distinct rows greater than the number of rows, which doesn't make sense. So this ensures the number of distinct rows never exceeds the number of rows. Filter: The number of distinct rows is reduced by the calculated selectivity. Join: same as aggregate. ImpalaRelMdRowSize: Provides the Impala interpreted size of the Calcite datatypes. ImpalaRelMdSelectivity: The selectivity is calculated within the RowCount. An initial attempt was done to use this class for selectivity, but it was seemed rather clunky since the row counts and selectivity are very closely intertwined and the pruned row counts (a future commit) made this even more complicated. So the selectivity metadata is overridden or all our RelNodes as full selectivity (1.0). As mentioned above, the FilterSelectivityEstimator class tries to approximate the number of rows filtered out with the given condition. Some work still needs to be done to make this more in line with the Expr seletivities, a Jira will be filed for this. The JoinRelationInfo is the helper class that estimates the number of rows that will be output of the Join RelNode. The join condition is split up into multiple conditions broken up by the AND keyword. This first pass has some major flaws which need to be corrected, including: - Only equality conditions limit the number of rows. Non-equality conditions will be ignored. If there are only non-equality conditions, the cardinality will be the equivalent of a cross join. - Left joins take the maximum of the calculated join and the total number of rows on the left side. This can probably be improved upon if we find the matching rows provide a cardinality that is greater than one for each row. (Of course, right joins and outer joins have this same logic). Change-Id: I9d5bb50eb562c28e4b7c7a6529d140f98e77295c Reviewed-on: http://gerrit.cloudera.org:8080/23122 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-08-10 01:20:43 +00:00
Steve Carlin	8fa383d9eb	IMPALA-14166: Calcite Planner: Ensure 'unsupported' functions are handled correctly There are some datasketches functions which return a Function object where the "isUnsupported" method returns true. This needs to be explicitly handled in the Calcite code as unsupported. Change-Id: Ic2c4a96005fc7571bde28643ea4cecda61839c77 Reviewed-on: http://gerrit.cloudera.org:8080/23041 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-07-05 23:50:44 +00:00
Riza Suminto	35aa2e2add	IMPALA-14187: Add IMPALA_JAVA_TARGET env var Impala is preparing to switch to JDK17 for Java compilation by default. While the source version might remain in 1.8 for longer, we should experiment with targeting binary version 17. This patch adds IMPALA_JAVA_TARGET env var to control target binary version. It is initialized in impala-config-java.sh, depending on value of IMPALA_JDK_VERSION env var. Testing: Pass data load and FE tests with IMPALA_JDK_VERSION=17. Change-Id: If194d87c542d416b878661403c32c6adc2930199 Reviewed-on: http://gerrit.cloudera.org:8080/23096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-27 00:41:57 +00:00
Fang-Yu Rao	aba3a705a4	IMPALA-13982: Support regular views for Calcite planner in Impala Before this patch, the Calcite planner in Impala only supported inline views like 'temp' in the following query. select id from ( select * from functional.alltypes ) as temp; Regular views, on the other hand, were not supported. For instance, the Calcite planner in Impala did not support regular views like 'functional.alltypes_view' created via the following statement and hence queries against such regular views like "select id from functional.alltypes_view" were not supported. CREATE VIEW functional.alltypes_view AS SELECT * FROM functional.alltypes; This patch adds the support for regular views to the Calcite planner via adding a ViewTable for each regular view in the given query when populating the Calcite schema. This is similar to how regular views are supported in PlannerTest#testView() at https://github.com/apache/calcite/blob/main/core/src/test/java/org/apache/calcite/tools/PlannerTest.java where the regular view to be tested is added in https://github.com/apache/calcite/blob/main/testkit/src/main/java/org/apache/calcite/test/CalciteAssert.java. We do not have to use or extend ViewTableMacro in Apache Calcite because the information about the data types returned from a regular view is already available in its respective FeTable. Therefore, there is no need to parse the SQL statement representing the regular view and collect the metadata of tables referenced by the regular view as done by ViewTableMacro. The patch supports the following cases, where 'functional.alltypes_view' is a regular view defined as "SELECT * FROM functional.alltypes". 1. select id from functional.alltypes_view. 2. select alltypes_view.id from functional.alltypes_view. 3. select functional.alltypes_view.id from functional.alltypes_view. Joining a regular view with an HDFS table like the following is also supported. select alltypestiny.id from functional.alltypes_view, functional.alltypestiny Note that after this patch, queries against regular views are supported only in the legacy catalog mode but not the local catalog mode. In fact, queries against HDFS tables in the local catalog mode are not supported yet by the Calcite planner either. We will deal with this in IMPALA-14080. Testing: - Added test cases mentioned above to calcite.test. This makes sure the test cases are supported when we start the Impala server with the flag of '--use_calcite_planner=true'. - Manually verified the test cases above are supported if we start the Impala server with the environment variable USE_CALCITE_PLANNER set to true and the query option use_calcite_planner set to 1. Change-Id: I600aae816727ae942fb221fae84c2aac63ae1893 Reviewed-on: http://gerrit.cloudera.org:8080/22883 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-06-24 22:30:15 +00:00
Steve Carlin	e9d7c152dc	IMPALA-13582: Calcite planner: return proper labels for columns The field names were not getting passed up to the output expressions. These are found on the RelNode row type object. The change is made in two different flows: The first flow is in CalciteSingleNodePlanner which gets hit when running from impala-shell and the use_calcite_planner query option is used. The second flow is in ExecRequestCreator and gets hit when running with the start-up option that loads a different JniFrontend jar. This mode will soon be deprecated, but is still used for testing purposes. Change-Id: I42818646d98f87d8744585010fc166f9d416aec1 Reviewed-on: http://gerrit.cloudera.org:8080/22117 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-06-05 21:49:30 +00:00
Steve Carlin	006f9ba589	IMPALA-14041: Enable planner tests This commit will enable some junit tests for the Calcite planner. To run these tests, use the following command from $IMPALA_HOME: (pushd java/calcite-planner && mvn -B -fae test -Dtest=TpcdsCpuCostPlannerTest) Change-Id: Idaab4e9068bb64e9a9ee12d83cd2b6b55b99b9bf Reviewed-on: http://gerrit.cloudera.org:8080/22864 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-06-03 00:27:34 +00:00
Zoltan Borok-Nagy	afa329fd89	IMPALA-13931: TestIcebergRestCatalog.test_rest_catalog_basic failed at setup There were several issues with test_rest_catalog_basic which made it fail in environments that used Ozone or S3. Missing dependency of Ozone and S3 classes: * This is resolved in iceberg-rest-catalog-test/pom.xml by adding a dependency to impala-executor-deps Hadoop configuration was initialized properly: * run-iceberg-rest-server.sh used Maven to run Iceberg REST Catalog in which case Maven is in charge of setting the CLASSPATH but the core-site/ozone-site/etc. config files were not on it, so the REST Catalog used a default Hadoop configuration that wasn't good for our environment. * To overcome the CLASSPATH problem now we create a runnable JAR in iceberg-rest-catalog-test/pom.xml and also generate the proper CLASSPATH during compilation. * run-iceberg-rest-server.sh now uses java -cp to run the REST Catalog S3 builds threw NoSuchMethodException for the "create" method of ApacheHttpClientConfigurations: * The Iceberg library dynamically load its http client builders to workaround an error, see details in https://github.com/apache/iceberg/issues/6715 * So the Iceberg lib dynamically wants to load the "create" method of its own ApacheHttpClientConfigurations class but it fails with NoSuchMethodException. * The critical code is invoked from Impala's IcebergMetadataScanner's ScanMetadataTable() method which happens to be invoked through JNI from the C++ backend. * The context class loader of such threads are NULL, which means Java will use the bootstrap class loader to load classes and methods, but that doesn't have the proper resources on its classpath. * To overcome this issue we set the context class loader for the thread to the class loader that originally loaded the IcebergMetadataScanner class. Change-Id: I9dc0e30aeaff0b8de41426ba38506383b4af472c Reviewed-on: http://gerrit.cloudera.org:8080/22818 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-09 17:01:56 +00:00
Steve Carlin	55804f7874	IMPALA-12959: Calcite planner: Implement count star optimization... IMPALA-13779: Handle partition key scan optimization IMPALA-13780: Handle full acid selects The 3 commits referenced here are somewhat related in that they all involve changes for the HdfsScanRel column layout and have been combined. For the optimizations, some infrastructure code was added. Information from the Aggregation RelNode is needed by the TableScan RelNode and vice versa. The mechanism to send information to children RelNodes is by using the ParentPlanRelContext. The mechanism for sending information up to the parent is by using the NodeWithExprs object. If the conditions are met for the optimizations (equivalent to the conditions in the current Impala planner), the optimizations are applied. For count star optimization, the STAT_NUM_ROWS fake column is added to hold the information, and then the aggregate applies a sum_init_zero on this column. For partition key scan, if the conditions are met, the Impala HdfsScanNode is sent a flag in its constructor that handles the optimization. For acid selects, the SingleNodePlanner has code to handle the additional PlanNodes needed. Some code involving column number calculation was needed to deal with the extra columns that are present in a full acid table. One extra note: In HdfsScanNode, a Preconditions check was removed. This state check ensured that the countStarSlot only existed when the aggregation substitution map was set. This does not apply to Calcite which does not use the substitution map to handle the count star optimization. Change-Id: I975beefedd2cceb34dad0f93343a46d1b7094c13 Reviewed-on: http://gerrit.cloudera.org:8080/22425 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-04 03:16:36 +00:00
Steve Carlin	30979e7d30	IMPALA-13517: Support overloaded \|\| operator The \|\| operator is used for both "or" and "concat". A new Impala custom operator is created to handle both of them, treating the precedence of the operator as if it's an "or". The "or" is chosen if both parameters are null or boolean, as taken from logic in CompoundVerticalBarExpr. At convertlet time (when converting from SqlNode to RelNode), the real operator is placed into the RexNode. Change-Id: Iabaf02e84b769db1419bd96e1d1b30b8f83d56f5 Reviewed-on: http://gerrit.cloudera.org:8080/22105 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-05-01 19:33:45 +00:00
Steve Carlin	e473f034dc	IMPALA-13042: Calcite Planner; Enable partition pruning Enables partition pruning in the HdfsScan RelNode. The PrunedPartitionHelper is used to separate the conjuncts and is a wrapper around the HdfsPartitionPruner. There are tests that currently exist in the Impala test framework that check the runtime profile for the number of files read from the table scan which are fixed with this commit. One small modification was made to the "preserveRootTypes" parameter in HdfsPartitionPruner. When it was set to false, the Calcite planner failed in one place. It makes sense for the main code line that the root type should not change, and this was tested on a full Jenkins run. Change-Id: I8c698b857555baeae347835b4a6b39d035f12405 Reviewed-on: http://gerrit.cloudera.org:8080/22409 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-05-01 00:07:03 +00:00
Steve Carlin	3d24f45f9c	IMPALA-13796: Calcite planner: Improper casting for char on join condition For the following query: SELECT COUNT(*) from orders t1 LEFT OUTER JOIN orders t2 ON cast(t1.o_comment as char(120)) = cast(t2.o_comment as char(120)); The join condition uses the Function "=(CHAR,CHAR)". The function defined within Impala uses a wildcard for the length of the char (-1). Previous to the fix, the code detected that the char(120) needed casting, would cast it to a char(1), and this produced erroneous results. The fix is to make sure we don't cast from a char(x) to a char(-1). Change-Id: Ib9f44e3d5a7623a20d9841541bb496c1dee32d1e Reviewed-on: http://gerrit.cloudera.org:8080/22541 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-04-29 12:38:08 +00:00

1 2 3 4

194 Commits