impala

mirror of https://github.com/apache/impala.git synced 2025-12-23 11:55:25 -05:00

Author	SHA1	Message	Date
Steve Carlin	c67b19daf6	IMPALA-14405: Labels for Calcite expressions not matching original planner Calcite sets literal expressions to EXPR$<x> which did not match expressions given by the Impala planner. For literal expressions such as "select 1 + 1", Impala creates the column name as "1 + 1". The field names can be found in the abstract syntax tree, so they are not set within the CalciteRelNodeConverter before the logical tree is created. A small test was added to calcite.test for a basic sanity check, but more comprehensive tests will be run in the tests/shell module (e.g. in test_shell_commandline.py and test_shell_interactive) which contain tests for labels. Change-Id: Ibd3e6366a284f53807b4b2c42efafa279249c1ea Reviewed-on: http://gerrit.cloudera.org:8080/23516 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-22 03:37:48 +00:00
Steve Carlin	420e357b95	IMPALA-13695: Calcite planner: fix for ndv with 2 args The NDV function was crashing when called with the "scale" arg. This requires special processing which exists in FunctionCallExpr. The validation for this is now done in ImpalaNdvFunction and the special calculation is done within ImpalaAggRel This also fixes ndv for varchar types. The aggregation call within CoerceNodes was not differentiating between varchar and string. A cast to string function is needed in order to run the ndv function on a varchar column. Change-Id: I82419f77e043e9975865a042ffb8db75a26931f7 Reviewed-on: http://gerrit.cloudera.org:8080/23513 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-20 23:28:39 +00:00
Michael Smith	7fb986e47a	IMPALA-14504: Use shaded hbase, protobuf from Hadoop Switches to shaded Hbase so it can include its own versions of dependencies. Note that hbase-client includes hbase-common, hbase-protocol. Excludes older protobuf-java from mysql-connector so we get it from Hadoop. Allows orc-format 1.0, which is a dependency in future ORC releases. Change-Id: I386d03c3123ce1159abc54c505f60e0ae619f5fe Reviewed-on: http://gerrit.cloudera.org:8080/23553 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-17 01:47:18 +00:00
Steve Carlin	69813a8c40	IMPALA-14464: Calcite planner should allow semi-colon in statement The Calcite planner now handles a sql statement that has a semi-colon at the end. Note that impala-shell doesn't pass the semi-colon into the server. This is only seen with a direct call to the server. Change-Id: Ie690159cd03f28f6b793628aa946292af71b6970 Reviewed-on: http://gerrit.cloudera.org:8080/23517 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-17 00:59:44 +00:00
Riza Suminto	3560621931	IMPALA-14503: Log maven dependency when building frontend Impala Frontend has plenty of dependency, along with multitudes of dependency exclusion/inclusion rules in it. This patch adds maven dependency tree log to logs/mvn/mvn.log when invoking "make java" command. Testing: Manually run "make java" from $IMPALA_HOME and verify that the dependency trees are logged to logs/mvn/mvn.log. Change-Id: I8cbe20faeab24bae708733d54996bd6c1dd97757 Reviewed-on: http://gerrit.cloudera.org:8080/23551 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-15 22:56:05 +00:00
Steve Carlin	cde4bc016c	IMPALA-14115: Calcite planner: Added top-n analytic PlanNode optimization. Impala has an optimization for analytic expressions that have a rank filter on top of the analytic expression. It can add a top-n plan node to reduce the amount of rows examined. This is tested in tpcds query 67. The optimization logic relies on an unassigned rank conjunct within the analyzer while creating the analytic plan node. A slight reorganization of the code was needed to implement this optimization. The SlotRefs for the AnalyticInfo needed to be created a little earlier from where it was done in the previous commit. A small fix was made to normalize binary predicates. A non-normalized binary predicate prevents the optimization from being used. A call to the checkAndApplyLimitPushdown is needed for some of the optimizations to kick in. A new AllProjectInfo internal class was created to hold the relationships between the Calcite RexNode objects and the Impala Analytic expressions. Also, IMPALA-14158 is fixed by this commit. The nullsFirst value was incorrect when the syntax was explicit in the query. A new Calcite planner test was added in the junit tests to ensure the optimization kicks in. The new test file is in the PlannerTest/calcite/limit-pushdown-analytic-calcite.test file. This is a copy of the limit-pushdown-analytic.test file in its parent directory but with some modified results. Most of the differences are trivial, but IMPALA-14469 has been filed to deal with one optimization that did not get fixed, which is when the order by clause has a constant expression. Change-Id: Ie6fa6781db56771b13b0cf49bd236f776016bf8d Reviewed-on: http://gerrit.cloudera.org:8080/23317 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-10-10 17:11:45 +00:00
Michael Smith	17b3f9ee88	IMPALA-14470: Migrate fair scheduler to slf4j Moves our fair scheduler code off commons-logging to use slf4j like the rest of Impala. Relies on the reload4j implementation to add an appender for message capture. Change-Id: Ia94d512f61c7e959c17e1139dceac31ad1a01bf2 Reviewed-on: http://gerrit.cloudera.org:8080/23478 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-01 04:59:58 +00:00
Steve Carlin	6aa4df4443	IMPALA-14105: Calcite planner: Runtime filters not being applied with outer joins Previous to this commit, outer join conjuncts were not being placed into the ValueTransfersGraph which prevented them from being considered for runtime filters. This caused a slowdown in some tpcds queries. The conjuncts are now registered with the ImpalaJoinRel. The appropriate TableRef objects are picked up from the underyling plan nodes. Change-Id: I9e06d3f35a10f35ff8b57ba25dbab1bc6a35238a Reviewed-on: http://gerrit.cloudera.org:8080/23318 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-28 23:40:32 +00:00
Steve Carlin	a6dbd4015c	IMPALA-14106: Calcite planner: Register equivalent union expressions in value transfer graph This commit registers the equivalent union expressions in the value transfer graph when the physical union node is created for the Calcite planner. Change-Id: I4c858ae82a1cb7b89b0ae4e70205d8eeaeb28687 Reviewed-on: http://gerrit.cloudera.org:8080/23316 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-24 22:58:33 +00:00
Michael Smith	52b87fcefd	IMPALA-14454: Exclude log4j 2 dependencies While we use reload4j, we can safely exclude log4j 2 dependencies to reduce the size of our artifacts. Change-Id: Ic060bdd969a6e5cd01646376b27c7355ce841819 Reviewed-on: http://gerrit.cloudera.org:8080/23439 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-09-24 18:04:06 +00:00
Michael Smith	5137bb94ac	IMPALA-14446: Clean up pom.xml Cleans up repetitive patterns in pom.xml. Centralize plugin configuration in pluginManagement. Replace inline maven-compiler-plugin configuration with newer maven.compiler.release and update to latest plugin version. Centralize common dependencies in dependencyManagement, including exclusions when appropriate. Remove exclusions that are no longer relevant. Compared before and after with dependency:tree; only difference is that commons-cli now comes from hadoop and jersey-serv{let,er} are effectively excluded; all versions matched. Also ensured USE_APACHE_COMPONENTS=true compiles. Adds com.amazonaws:aws-java-sdk-bundle to exclusion checking to ensure it's not accidentally included alongside impala-minimal-s3a-aws-sdk. Removes missed io.netty exclusion from IMPALA-12816. Updates commons-dbcp2 to 2.12.0 to match Hive. Change-Id: If96649840e23036b4a73ee23e8d12516497994f0 Reviewed-on: http://gerrit.cloudera.org:8080/23432 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-23 02:50:22 +00:00
Peter Rozsa	b0f1d49042	IMPALA-14016: Add multi-catalog support for local catalog mode This patch adds a new MetaProvider called MultiMetaProvider, which is capable of handling multiple MetaProviders at once, prioritizing one primary provider over multiple secondary providers. The primary provider handles some methods exclusively for deterministic behavior. In database listings, if one database name occurs multiple times the contained tables are merged under that database name; if the two separate databases contain a table with the same name, the query analyzation fails with an error. This change also modifies the local catalog implementation's initialization. If catalogd is deployed, then it instantiates the CatalogdMetaProvider and checks if the catalog configuration directory is set as a backend flag. If it's set, then it tries to load every configuration from the folder, and tries to instantiate the IcebergMetaProvider from those configs. If the instantiation fails, an error is reported to the logs, but the startup is not interrupted. Tests: - E2E tests for multi-catalog behavior - Unit test for ConfigLoader Change-Id: Ifbdd0f7085345e7954d9f6f264202699182dd1e1 Reviewed-on: http://gerrit.cloudera.org:8080/22878 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-09-19 15:03:59 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Steve Carlin	8b057881c7	IMPALA-14102: [part 2] Fixed the JoinTranspose rule. This fix is needed before the join optimization fix can be committed. The JoinTranspose rule provided by Calcite was having 2 issues: 1) For tpcds-q10 and q35, an exception was being thrown. There is a bug in the Calcite code when the Join-Project gets matched but the Join is of reltype SemiJoin. In this case, the Projects do not get created correctly and the exception gets thrown. 2) We only want to transpose a Project above a Join if there is an underlying Join underneath the Project. The whole purpose is to be able to create adjacent Join RelNodes. We do not have to transpose the Project when it is not sandwiched between two Join nodes. It is preferable to keep it underneath the Join since the row width calculation would be affected (the Project may reduce the number of columns, thus reducing the row width). This commit extends the given JoinProjectTranspose rule by Calcite and handles these added restrictions. Change-Id: I7f62ec030fc8fbe36e6150bf96c9673c44b7da1b Reviewed-on: http://gerrit.cloudera.org:8080/23313 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-08 02:49:21 +00:00
Fang-Yu Rao	1ff4e1b682	IMPALA-13767: Do not treat CTEs as names of actual tables This patch implements an additional check when collecting table names that are used in the given query. Specifically, for a table name that is not fully qualified, we make sure the derived fully qualified table name is not a common table expression (CTE) of a SqlWithItem in a WITH clause since such CTE's are not actual tables. Testing: - Added a test in test_ranger.py to verify the issue is fixed. Change-Id: I3f51af42d64cdcff3c26ad5a96c7f53ebef431b3 Reviewed-on: http://gerrit.cloudera.org:8080/23209 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>	2025-09-06 05:25:03 +00:00
Steve Carlin	83499e5be7	IMPALA-14102: [part 1] Calcite Planner: optimize join rule This is part 1 of the commit for optimizing join rules for Calcite. This commit is just a copy of the LoptOptimizeJoinRule.java from Calcite v1.37 for subsequent modification. The purpose of this commit is to serve as a placeholder starting point so we can easily see the customized changes that are made by comparing Impala specific modifications for the rule which will be done in subsequent commits for IMPALA-14102. Change-Id: I63daf6dacf0547a0488c1ecf0bc185b548e00d87 Reviewed-on: http://gerrit.cloudera.org:8080/23312 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 04:05:09 +00:00
Steve Carlin	e74495e656	IMPALA-14101: [part 2] Calcite planner: Add cost model calculations This commit adds the cost model and calculations to be used in the join optimizer rule. The ImpalaCost object implements the RelOptCost interface and contains values which contribute to a cost. The ImpalaCost object roughly mirrors the Calcite VolcanoCost object with some slight variations. The ImpalaCost object only looks at the cpu and io cost and ignores the rowCount cost. The rowCount cost is not needed because it is already baked into the cpu and io results. That is to say, we determine the cpu cost and io cost by using the rowCount cost. The ImpalaCost object is generated in the ImpalaRelMdNonCumulativeCost class which is called from Calcite for a given RelNode. The cost generated by this object uses the various inputs of the RelNode to calculate the cpu and io time for the given logical node. Note that this is a non-cumulative cost. A cumulative cost exists within Calcite as well, but there was no need to change the cumulative cost logic. The cost is used by the Calcite LoptOptimizeJoinRule when determining join ordering. It will compare costs of different join ordering and choose the join ordering with a lower cost. With the current iteration, we only customize the costs for Impala for aggregates, table scans, and joins. A TODO in this commit is to allow various cpu and io costs to be configurable. Change-Id: I1e52b0e11e9a6d5814b0313117dd9c56602f3ff5 Reviewed-on: http://gerrit.cloudera.org:8080/23311 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-03 04:05:09 +00:00
Fang-Yu Rao	0b9e2b2cd1	IMPALA-13011: Support authorization for Calcite in Impala This patch adds support for authorization when Calcite is the planner. Specifically, this patch focuses on the authorization of table-level and column-level privilege requests, including the case when a table is a regular view, whether the view was created by a superuser. Note that CalciteAnalysisDriver would throw an exception from analysis() if given a query that requires table masking, i.e., column masking or row filtering, since this feature is not yet supported by the Calcite planner. Moreover, we register the VIEW_METADATA privilege for each function involved in the given query. We hardcode the database associated with the function to 'BuiltinsDb', which is a bit hacky. We should not be doing this once each function could be associated with a database when we are using the Calcite planner. We may need to change Calcite's parser for this. The issue reported in IMPALA-13767 will be taken care of in another separate patch and hence this patch could incorrectly register the privilege request for a common table expression (CTE) in a WITH clause, preventing a legitimate user from executing a query involving CTE's. Testing: - We manually verified that the patch could pass the test cases in AuthorizationStmtTest#testPrivilegeRequests() except for "with t as (select * from alltypes) select * from t", for which the fix will be provided via IMPALA-13767. - Added various tests in test_ranger.py. Change-Id: I9a7f7e4dc9a86a2da9e387832e552538e34029c1 Reviewed-on: http://gerrit.cloudera.org:8080/22716 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 00:15:22 +00:00
Steve Carlin	048b5689fd	IMPALA-14080: Support LocalFsTable table types in Calcite planner. IMPALA-13947 changes the use_local_catalog default to true. This causes failure for when the use_calcite_planner query option is set to true. The Calcite planner was only handling HdfsTable table types. It will now handle LocalFsTable table types as well. Currently, if table num rows is missing from table, Calcite planner will load all partitions to estimate by iterating all partitions. This is inefficent in local catalog mode and ideally should happen later after partition prunning. Follow up work is needed to improve this. Testing: Reenable local catalog mode in TestCalcitePlanner.test_calcite_frontend TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal Co-authored-by: Riza Suminto Change-Id: Ic855779aa64d11b7a8b19dd261c0164e65604e44 Reviewed-on: http://gerrit.cloudera.org:8080/23341 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-27 03:17:13 +00:00
Steve Carlin	ff58c5d42f	IMPALA-14101: [part 1] Commit Cost file from Calcite This commit is just a copy of the VolcanoCost.java file from Calcite into this Impala repository. The file can be found here: https://github.com/apache/calcite/blob/calcite-1.37.0/core/src/main/... .../java/org/apache/calcite/plan/volcano/VolcanoCost.java The only differences between this file and the Calcite file are: 1) All VolcanoCost strings have been changed to ImpalaCost 2) The package name is an Impala package. This will make it easier to show the changes made for the Impala cost model change in IMPALA-14101. Change-Id: I864e20fb63c0ae4f2f88016128d2a68f39e17dfb Reviewed-on: http://gerrit.cloudera.org:8080/23310 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-20 21:01:29 +00:00
Steve Carlin	5244f6169e	IMPALA-14061: Calcite Planner: added Calcite rules This commit adds Calcite optimization rules to create more efficient plans. These rules should be considered a work in progress. These were tested against a 3TB tpcds database so they are fairly efficient as/is, but we can make improvements as we see them along the way. Most of the changes have been added to the CalciteOptimizer file. There are several phases of rules that are applied, which are as follows: - expand nodes: These rules change the plan to a plan that can be handled by Impala. For instance, there are RelNodes such as "LogicalIntersect" which are not directly applicable to the Impala physical nodes so they need to be expanded. - coerce nodes: This module changes the nodes so they have the correct datatype values (e.g. literal strings in Calcite are char but need to be varchar for Impala) - optimize nodes: first pass on reordering the logical RelNode ordering. - join: Squishes the join RelNodes together, pushes them into one "multiJoin" and then lets Calcite's join optimizer reorder the joins into a more optimal plan. A note on this: with this iteration, statistics are still not being applied. This will come in with later commits to make better plans. - post join optimize nodes: Reruns the optimize nodes since the join ordering may present new optimization opportunities - pre Impala commit: Extra massaging after optimization that is done at the end - conversion to Impala RelNodes: Maps Calcite RelNodes into Impala RelNodes which will then be mapped to Impala PlanNodes In addition to this general change, there is also a change with removing the "toCNF" rule. Calcite has multiple places where it creates a SEARCH operator via "simplifying" the RexNodes within various rules. This operator is not supported directly in Impala and we need to call "expandSearch" to handle this. Because Impala does this under the covers in the rules, this has been fixed by overriding the RexBuilder (with ImpalaRexBuilder) and expanding the SEARCH operator whenever it is called (sidenote: we could have changed the rules that called simplify, but that would have resulted in too much code duplication). The toCNF rule was removed and placed as a call within the CoerceOperandShuttle, which already manipulates all the RexNodes, so all that code is now in one place. Change-Id: I6671f7ed298a18965ef0b7a5fc10f4912333a52b Reviewed-on: http://gerrit.cloudera.org:8080/22870 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-20 12:03:16 +00:00
Steve Carlin	922443da46	IMPALA-14165: Type coercion code accidentally omitted from analysis On the first cut of creating the Calcite planner, the Calcite planner was standalone and ran its own JniFrontend. In the current version, the parsing, validating, and single node planning is called from the Impala framework. There is some code in the first cut regarding the "ImpalaTypeCoercionFactory" class which handles deriving the correct data type for various expressions, for instance (found in exprs.test): select count(*) from alltypesagg where 10.1 in (tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col) Without this patch, the query returns the following error: UDF ERROR: Decimal expression overflowed This code can be found in CalciteValidator.java, but was accidentally omitted from CalciteAnalysisDriver. Change-Id: I74c4c714504400591d1ec6313f040191613c25d9 Reviewed-on: http://gerrit.cloudera.org:8080/23039 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-08-10 17:00:54 +00:00
Steve Carlin	98b6c0208f	IMPALA-14094: Calcite planner: Use table and column statistics for optimization This commit enables the Calcite planner join optimization rule to make use of table and column statistics in Impala. The ImpalaRelMetadataProvider class provides the metadata classes to the rule optimizer. All the ImpalaRelMd* classes are extensions of Calcite Metadata classes. The ones overridden are: ImpalaRelMdRowCount: This provides the cardinality of a given type of RelNode. The default implementation in the RelMdRowCount is used for some of the RelNodes. The ones overridden are: TableScan: Gets the row count from the Table object. Filter: Calls the FilterSelectivityEstimator and adjusts the number of rows based on the selectivity of the filter condition. Join: Uses our own algorithm to determine the number of rows that will be created by the join condition using the JoinRelationInfo (more on this below). ImpalaRelMdDistinctRowCount: This provides the number of distinct rows returned by the RelNode. The default implementation in the RelMdDistinct RowCount is used for some of the RelNodes. The ones overridden are: TableScan: Uses the stats. If stats are not defined, all rows will be marked as distinct. Aggregate: For some reason, Calcite sometimes returns a number of distinct rows greater than the number of rows, which doesn't make sense. So this ensures the number of distinct rows never exceeds the number of rows. Filter: The number of distinct rows is reduced by the calculated selectivity. Join: same as aggregate. ImpalaRelMdRowSize: Provides the Impala interpreted size of the Calcite datatypes. ImpalaRelMdSelectivity: The selectivity is calculated within the RowCount. An initial attempt was done to use this class for selectivity, but it was seemed rather clunky since the row counts and selectivity are very closely intertwined and the pruned row counts (a future commit) made this even more complicated. So the selectivity metadata is overridden or all our RelNodes as full selectivity (1.0). As mentioned above, the FilterSelectivityEstimator class tries to approximate the number of rows filtered out with the given condition. Some work still needs to be done to make this more in line with the Expr seletivities, a Jira will be filed for this. The JoinRelationInfo is the helper class that estimates the number of rows that will be output of the Join RelNode. The join condition is split up into multiple conditions broken up by the AND keyword. This first pass has some major flaws which need to be corrected, including: - Only equality conditions limit the number of rows. Non-equality conditions will be ignored. If there are only non-equality conditions, the cardinality will be the equivalent of a cross join. - Left joins take the maximum of the calculated join and the total number of rows on the left side. This can probably be improved upon if we find the matching rows provide a cardinality that is greater than one for each row. (Of course, right joins and outer joins have this same logic). Change-Id: I9d5bb50eb562c28e4b7c7a6529d140f98e77295c Reviewed-on: http://gerrit.cloudera.org:8080/23122 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-08-10 01:20:43 +00:00
Steve Carlin	8fa383d9eb	IMPALA-14166: Calcite Planner: Ensure 'unsupported' functions are handled correctly There are some datasketches functions which return a Function object where the "isUnsupported" method returns true. This needs to be explicitly handled in the Calcite code as unsupported. Change-Id: Ic2c4a96005fc7571bde28643ea4cecda61839c77 Reviewed-on: http://gerrit.cloudera.org:8080/23041 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-07-05 23:50:44 +00:00
Riza Suminto	35aa2e2add	IMPALA-14187: Add IMPALA_JAVA_TARGET env var Impala is preparing to switch to JDK17 for Java compilation by default. While the source version might remain in 1.8 for longer, we should experiment with targeting binary version 17. This patch adds IMPALA_JAVA_TARGET env var to control target binary version. It is initialized in impala-config-java.sh, depending on value of IMPALA_JDK_VERSION env var. Testing: Pass data load and FE tests with IMPALA_JDK_VERSION=17. Change-Id: If194d87c542d416b878661403c32c6adc2930199 Reviewed-on: http://gerrit.cloudera.org:8080/23096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-27 00:41:57 +00:00
Fang-Yu Rao	aba3a705a4	IMPALA-13982: Support regular views for Calcite planner in Impala Before this patch, the Calcite planner in Impala only supported inline views like 'temp' in the following query. select id from ( select * from functional.alltypes ) as temp; Regular views, on the other hand, were not supported. For instance, the Calcite planner in Impala did not support regular views like 'functional.alltypes_view' created via the following statement and hence queries against such regular views like "select id from functional.alltypes_view" were not supported. CREATE VIEW functional.alltypes_view AS SELECT * FROM functional.alltypes; This patch adds the support for regular views to the Calcite planner via adding a ViewTable for each regular view in the given query when populating the Calcite schema. This is similar to how regular views are supported in PlannerTest#testView() at https://github.com/apache/calcite/blob/main/core/src/test/java/org/apache/calcite/tools/PlannerTest.java where the regular view to be tested is added in https://github.com/apache/calcite/blob/main/testkit/src/main/java/org/apache/calcite/test/CalciteAssert.java. We do not have to use or extend ViewTableMacro in Apache Calcite because the information about the data types returned from a regular view is already available in its respective FeTable. Therefore, there is no need to parse the SQL statement representing the regular view and collect the metadata of tables referenced by the regular view as done by ViewTableMacro. The patch supports the following cases, where 'functional.alltypes_view' is a regular view defined as "SELECT * FROM functional.alltypes". 1. select id from functional.alltypes_view. 2. select alltypes_view.id from functional.alltypes_view. 3. select functional.alltypes_view.id from functional.alltypes_view. Joining a regular view with an HDFS table like the following is also supported. select alltypestiny.id from functional.alltypes_view, functional.alltypestiny Note that after this patch, queries against regular views are supported only in the legacy catalog mode but not the local catalog mode. In fact, queries against HDFS tables in the local catalog mode are not supported yet by the Calcite planner either. We will deal with this in IMPALA-14080. Testing: - Added test cases mentioned above to calcite.test. This makes sure the test cases are supported when we start the Impala server with the flag of '--use_calcite_planner=true'. - Manually verified the test cases above are supported if we start the Impala server with the environment variable USE_CALCITE_PLANNER set to true and the query option use_calcite_planner set to 1. Change-Id: I600aae816727ae942fb221fae84c2aac63ae1893 Reviewed-on: http://gerrit.cloudera.org:8080/22883 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-06-24 22:30:15 +00:00
Steve Carlin	e9d7c152dc	IMPALA-13582: Calcite planner: return proper labels for columns The field names were not getting passed up to the output expressions. These are found on the RelNode row type object. The change is made in two different flows: The first flow is in CalciteSingleNodePlanner which gets hit when running from impala-shell and the use_calcite_planner query option is used. The second flow is in ExecRequestCreator and gets hit when running with the start-up option that loads a different JniFrontend jar. This mode will soon be deprecated, but is still used for testing purposes. Change-Id: I42818646d98f87d8744585010fc166f9d416aec1 Reviewed-on: http://gerrit.cloudera.org:8080/22117 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-06-05 21:49:30 +00:00
Steve Carlin	006f9ba589	IMPALA-14041: Enable planner tests This commit will enable some junit tests for the Calcite planner. To run these tests, use the following command from $IMPALA_HOME: (pushd java/calcite-planner && mvn -B -fae test -Dtest=TpcdsCpuCostPlannerTest) Change-Id: Idaab4e9068bb64e9a9ee12d83cd2b6b55b99b9bf Reviewed-on: http://gerrit.cloudera.org:8080/22864 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-06-03 00:27:34 +00:00
Zoltan Borok-Nagy	afa329fd89	IMPALA-13931: TestIcebergRestCatalog.test_rest_catalog_basic failed at setup There were several issues with test_rest_catalog_basic which made it fail in environments that used Ozone or S3. Missing dependency of Ozone and S3 classes: * This is resolved in iceberg-rest-catalog-test/pom.xml by adding a dependency to impala-executor-deps Hadoop configuration was initialized properly: * run-iceberg-rest-server.sh used Maven to run Iceberg REST Catalog in which case Maven is in charge of setting the CLASSPATH but the core-site/ozone-site/etc. config files were not on it, so the REST Catalog used a default Hadoop configuration that wasn't good for our environment. * To overcome the CLASSPATH problem now we create a runnable JAR in iceberg-rest-catalog-test/pom.xml and also generate the proper CLASSPATH during compilation. * run-iceberg-rest-server.sh now uses java -cp to run the REST Catalog S3 builds threw NoSuchMethodException for the "create" method of ApacheHttpClientConfigurations: * The Iceberg library dynamically load its http client builders to workaround an error, see details in https://github.com/apache/iceberg/issues/6715 * So the Iceberg lib dynamically wants to load the "create" method of its own ApacheHttpClientConfigurations class but it fails with NoSuchMethodException. * The critical code is invoked from Impala's IcebergMetadataScanner's ScanMetadataTable() method which happens to be invoked through JNI from the C++ backend. * The context class loader of such threads are NULL, which means Java will use the bootstrap class loader to load classes and methods, but that doesn't have the proper resources on its classpath. * To overcome this issue we set the context class loader for the thread to the class loader that originally loaded the IcebergMetadataScanner class. Change-Id: I9dc0e30aeaff0b8de41426ba38506383b4af472c Reviewed-on: http://gerrit.cloudera.org:8080/22818 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-09 17:01:56 +00:00
Steve Carlin	55804f7874	IMPALA-12959: Calcite planner: Implement count star optimization... IMPALA-13779: Handle partition key scan optimization IMPALA-13780: Handle full acid selects The 3 commits referenced here are somewhat related in that they all involve changes for the HdfsScanRel column layout and have been combined. For the optimizations, some infrastructure code was added. Information from the Aggregation RelNode is needed by the TableScan RelNode and vice versa. The mechanism to send information to children RelNodes is by using the ParentPlanRelContext. The mechanism for sending information up to the parent is by using the NodeWithExprs object. If the conditions are met for the optimizations (equivalent to the conditions in the current Impala planner), the optimizations are applied. For count star optimization, the STAT_NUM_ROWS fake column is added to hold the information, and then the aggregate applies a sum_init_zero on this column. For partition key scan, if the conditions are met, the Impala HdfsScanNode is sent a flag in its constructor that handles the optimization. For acid selects, the SingleNodePlanner has code to handle the additional PlanNodes needed. Some code involving column number calculation was needed to deal with the extra columns that are present in a full acid table. One extra note: In HdfsScanNode, a Preconditions check was removed. This state check ensured that the countStarSlot only existed when the aggregation substitution map was set. This does not apply to Calcite which does not use the substitution map to handle the count star optimization. Change-Id: I975beefedd2cceb34dad0f93343a46d1b7094c13 Reviewed-on: http://gerrit.cloudera.org:8080/22425 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-04 03:16:36 +00:00
Steve Carlin	30979e7d30	IMPALA-13517: Support overloaded \|\| operator The \|\| operator is used for both "or" and "concat". A new Impala custom operator is created to handle both of them, treating the precedence of the operator as if it's an "or". The "or" is chosen if both parameters are null or boolean, as taken from logic in CompoundVerticalBarExpr. At convertlet time (when converting from SqlNode to RelNode), the real operator is placed into the RexNode. Change-Id: Iabaf02e84b769db1419bd96e1d1b30b8f83d56f5 Reviewed-on: http://gerrit.cloudera.org:8080/22105 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-05-01 19:33:45 +00:00
Steve Carlin	e473f034dc	IMPALA-13042: Calcite Planner; Enable partition pruning Enables partition pruning in the HdfsScan RelNode. The PrunedPartitionHelper is used to separate the conjuncts and is a wrapper around the HdfsPartitionPruner. There are tests that currently exist in the Impala test framework that check the runtime profile for the number of files read from the table scan which are fixed with this commit. One small modification was made to the "preserveRootTypes" parameter in HdfsPartitionPruner. When it was set to false, the Calcite planner failed in one place. It makes sense for the main code line that the root type should not change, and this was tested on a full Jenkins run. Change-Id: I8c698b857555baeae347835b4a6b39d035f12405 Reviewed-on: http://gerrit.cloudera.org:8080/22409 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-05-01 00:07:03 +00:00
Steve Carlin	3d24f45f9c	IMPALA-13796: Calcite planner: Improper casting for char on join condition For the following query: SELECT COUNT(*) from orders t1 LEFT OUTER JOIN orders t2 ON cast(t1.o_comment as char(120)) = cast(t2.o_comment as char(120)); The join condition uses the Function "=(CHAR,CHAR)". The function defined within Impala uses a wildcard for the length of the char (-1). Previous to the fix, the code detected that the char(120) needed casting, would cast it to a char(1), and this produced erroneous results. The fix is to make sure we don't cast from a char(x) to a char(-1). Change-Id: Ib9f44e3d5a7623a20d9841541bb496c1dee32d1e Reviewed-on: http://gerrit.cloudera.org:8080/22541 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-04-29 12:38:08 +00:00
Steve Carlin	706e1f026c	IMPALA-13657: Connect Calcite planner to Impala Frontend framework This commit adds the plumbing created by IMPALA-13653. The Calcite planner is now called from Impala's Frontend code via 4 hooks which are: - CalciteCompilerFactory: the factory class that creates the implementations of the parser, analysis, and single node planner hooks. - CalciteParsedStatement: The class which holds the Calcite SqlNode AST. - CalciteAnalysisDriver: The class that does the validation of the SqlNode AST - CalciteSingleNodePlanner: The class that converts the AST to a logical plan, optimizes it, and converts it into an Impala PlanNode physical plan. To run on Calcite, one needs to do two things: 1) set the USE_CALCITE_PLANNER env variable to true before starting the cluster. This adds the jar file into the path in the bin/setclasspath.sh file, which is not there by default at the time of this commit. 2) set the use_calcite_planner query option to true. This commit makes the CalciteJniFrontend class obsolete. Once the test cases are moved out of there, that class and others can be removed. Change-Id: I3b30571beb797ede827ef4d794b8daefb130ccb1 Reviewed-on: http://gerrit.cloudera.org:8080/22319 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-04-09 23:55:15 +00:00
Zoltan Borok-Nagy	bd3486c051	IMPALA-13586: Initial support for Iceberg REST Catalogs This patch adds initial support for Iceberg REST Catalogs. This means now it's possible to run an Impala cluster without the Hive Metastore, and without the Impala CatalogD. Impala Coordinators can directly connect to an Iceberg REST server and fetch metadata for databases and tables from there. The support is read-only, i.e. DDL and DML statements are not supported yet. This was initially developed in the context of a company Hackathon program, i.e. it was a team effort that I squashed into a single commit and polished the code a bit. The Hackathon team members were: * Daniel Becker * Gabor Kaszab * Kurt Deschler * Peter Rozsa * Zoltan Borok-Nagy The Iceberg REST Catalog support can be configured via a Java properties file, the location of it can be specified via: --catalog_config_dir: Directory of configuration files Currently only one configuration file can be in the direcory as we only support a single Catalog at a time. The following properties are mandatory in the config file: * connector.name=iceberg * iceberg.catalog.type=rest * iceberg.rest-catalog.uri The first two properties can only be 'iceberg' and 'rest' for now, they are needed for extensibility in the future. Moreover, Impala Daemons need to specify the following flags to connect to an Iceberg REST Catalog: --use_local_catalog=true --catalogd_deployed=false Testing * e2e added to test basic functionlity with against a custom-built Iceberg REST server that delegates to HadoopCatalog under the hood * Further testing, e.g. Ranger tests are expected in subsequent commits TODO: * manual testing against Polaris / Lakekeeper, we could add automated tests in a later patch Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c Reviewed-on: http://gerrit.cloudera.org:8080/22353 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-02 20:04:12 +00:00
Peter Rozsa	1f70269392	IMPALA-13838: Update Impala version to 5.0.0-SNAPSHOT Change-Id: I9c5a2d817b30e14333feeb5b2de3e0c40795723f Reviewed-on: http://gerrit.cloudera.org:8080/22596 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-08 14:13:48 +00:00
Steve Carlin	5b4427ed1b	IMPALA-13587: Calcite planner: Outer join not aggregating nulls properly The following query is producing incorrect results: select t2.int_col y from alltypessmall t1 left outer join alltypestiny t2 on t1.int_col = t2.int_col group by 1 ... due to nulls not being aggregated properly on multiple nodes. This is because the value equivalency graph is being set for the join conjunct on an outer join. When a hash join partition node is being used, there is an optimization that skips the aggregation step that combines groups across nodes if, based on the value transfer graph, it deduces that all data for the partition column is being sent to the same node. The bug here is that even though an outer join is using an equi-conjunct, the left and right side are different when data is not found on the outer join side, where it becomes null. The fix is to avoid registering the equi-conjunct if the values are not always equal. Change-Id: I57e9d4ad4c4af5a4c268e43ac2937064dab6ffd7 Reviewed-on: http://gerrit.cloudera.org:8080/22138 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-03-06 21:14:01 +00:00
Steve Carlin	b449381cc8	IMPALA-13575: Calcite planner: Fix exception when null is in values clause The following query was failing with an exception: select * from (values(0), (null)) The null type was not being assigned a type correctly. After this fix, the null type will be created as an AnalyzedNullLiteral with the correct type. Change-Id: I4e78fb0ed63b9525540ad537cfb7aabd8bbfe7ea Reviewed-on: http://gerrit.cloudera.org:8080/22109 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-06 21:13:06 +00:00
Steve Carlin	796c25fc57	IMPALA-13716: Calcite Planner: TupleIsNullPredicate fix for analytic functions There is some special logic to materialize the TupleIsNullPredicate functions that are created by join nodes for outer joins for analytic functions. This commit refactors some of the code in the current Impala planner and materializes them with the Analytic RelNode. An example query from the test framework that causes this issue is: select avg(g) over (order by f) af3 from alltypestiny t1 left outer join (select id as a, coalesce(bigint_col, 30) as f, bigint_col as g from alltypestiny) t2 on (t1.id = t2.a); Change-Id: Iaec363c2fa93a1e21bf74a40e5399e21ddd9bd60 Reviewed-on: http://gerrit.cloudera.org:8080/22411 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-26 16:22:26 +00:00
jasonmfehr	aac67a077e	IMPALA-13201: System Table Queries Execute When Admission Queues are Full Queries that run only against in-memory system tables are currently subject to the same admission control process as all other queries. Since these queries do not use any resources on executors, admission control does not need to consider the state of executors when deciding to admit these queries. This change adds a boolean configuration option 'onlyCoordinators' to the fair-scheduler.xml file for specifying a request pool only applies to the coordinators. When a query is submitted to a coordinator only request pool, then no executors are required to be running. Instead, all fragment instances are executed exclusively on the coordinators. A new member was added to the ClusterMembershipMgr::Snapshot struct to hold the ExecutorGroup of all coordinators. This object is kept up to date by processing statestore messages and is used when executing queries that either require the coordinators (such as queries against sys.impala_query_live) or that use an only coordinators request pool. Testing was accomplished by: 1. Adding cluster membership manager ctests to assert cluster membership manager correctly builds the list of non-quiescing coordinators. 2. RequestPoolService JUnit tests to assert the new optional <onlyCoords> config in the fair scheduler xml file is correctly parsed. 3. ExecutorGroup ctests modified to assert the new function. 4. Custom cluster admission controller tests to assert queries with a coordinator only request pool only run on the active coordinators. Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda Reviewed-on: http://gerrit.cloudera.org:8080/22249 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 04:27:11 +00:00
Steve Carlin	f50514a265	IMPALA-13525: Handle escaped characters in string literal Changed the parser to handle escaped characters. The method is in a new class called ParserUtil. The method was copied from Calcite's SqlParserUtil, but one change was needed. The Calcite method did not handle a backslash in front of a regex character. For Impala, if we detect the backslash in front of a regex character, we leave the character but remove the backslash. This is tested in exprs.test Change-Id: I9b0fbe591d1101350b2ba0f6ddb2967b819ee685 Reviewed-on: http://gerrit.cloudera.org:8080/22106 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-12 13:24:13 +00:00
Steve Carlin	c9e9f86c05	IMPALA-13571: Calcite Planner: Fix join parsing errors. This commit fixes two parsing errors related to joins. 1) The straight_join hint is now parsed correctly. The hint, however, will be ignored. IMPALA-13574 has been filed for this feature. 2) The anti-join and semi-join will no longer throw parse exceptions but will now throw unsupported exceptions. The jiras IMPALA-13572 and IMPALA-13573 have been filed for these features. IMPALA-13708: Throw unsupported message for complex column syntax. There is special complex column syntax where the column is treated like a table and looks like a table name when parsed. If we get an analysis error for an unfound table, we can see if it is actually a complex column and throw an "unsupported" error instead of giving a cryptic "table not found" message. Also made the support for types of tables to be a bit more generic and check if it's not an HdfsTable rather than only throw an unsupported exception if it's an Iceberg table. Change-Id: Icd0f68441c84b090ed2cb45de96ccee1054deef5 Reviewed-on: http://gerrit.cloudera.org:8080/22412 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-09 01:27:07 +00:00
Michael Smith	88067c576b	IMPALA-13740: Update velocity-engine-core to 2.4.1 Updates velocity-engine-core - required by pac4j - to 2.4.1 to avoid including a shaded version of commons-io vulnerable to CVE-2024-47554. Change-Id: I76624851d6f51d1b9d4dd61fc488932a51e9cba0 Reviewed-on: http://gerrit.cloudera.org:8080/22454 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2025-02-06 16:31:39 +00:00
Steve Carlin	4290eb7dc5	IMPALA-13576: Fix filter placement in the plan and related changes. This patch combines the following changes: - Filter above a Sort RelNode is now supported (IMPALA-13576) - A new rule is added to remove empty sort keys, as in the example SELECT '1' FROM alltypestiny ORDER BY 1 This also required bumping up the Calcite version to 1.37. (IMPALA-13578) - A parser fix to allow LIMIT clause in a subquery (IMPALA-13579) - Added optimization to push Filter past the Project RelNode (IMPALA-13577) This optimization needs to be added before "CoerceNodes" so that the filter does not get passed through a generated Project RelNode created for handling conversion of Calcite literal types to Impala literal types (see ImpalaProjectRel.isCoercedProjectForValues() method for details on that). Change-Id: Id075d8516f1fcff4e6402c2ab4b4992a174c8151 Reviewed-on: http://gerrit.cloudera.org:8080/22405 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-02-06 16:31:21 +00:00
Steve Carlin	d5f43ff19a	IMPALA-13523: Decimal precision and scale needs to be in return type The inferred return type needs to contain a decimal precision and scale. The return type is calculated by taking the most compatible type of all the arguments One query in the e2e tests that will be fixed because of this is (previously it was throwing an analysis exception): select appx_median(c1), appx_median(c2), appx_median(c3) from decimal_tiny; The CoerceOperandShuttle also ensures that if the return type is a decimal in certain cases, all the arguments for the function will be cast to that specific return type. The 2 cases here are: 1) when the function is a case statement, in which all cases need to be the same precision and scale 2) when the function contains varargs, in which case all the comparisons need to be of the same precision and scale. Change-Id: Ie10521b587a74930a01c08b711364f897bb2dc33 Reviewed-on: http://gerrit.cloudera.org:8080/22086 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-02-05 15:37:42 +00:00
Steve Carlin	40747a7802	IMPALA-13524: Calcite planner: support for functions in exprs.test Specifically, this commit supports translate, the not operator (!), the null not equals operator and some "not regexp" functions. A little extra code had to be added for the not regexp functions because Calcite treats the "not" part as an attribute to the operator. Also, added support for "is distinct from" and "is not distinct from" operators which required a convertlet change to handle the conversion from SqlNodes to RexNodes. Change-Id: Ib8c5d5719a409a32ddb6946d1a87c77773f20820 Reviewed-on: http://gerrit.cloudera.org:8080/22104 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-04 13:01:46 +00:00
Steve Carlin	9b93ab8b55	IMPALA-13521: Calcite planner: Handle function problem with char params There is an issue in Calcite that it treats literals as a CHAR type whereas Impala treats a literal as a STRING type. This is fixed at coercion time. However, at validation time, it is possible that a function is passed an operand which originated as a literal type, but the information is lost by the time it hits the function. An example of this is in exprs.test: select from_unixtime(tmp.val, tmp.fmt) from (values (0 as val, 'yyyy-MM-dd HH:mm:ss' as fmt), (34304461, 'HH:mm:ss dd/yyyy/MM'), (68439783, 'yyyy\|\|MMM\|\|dd\|\|HHmmss')) as tmp The validator sees the literal in the values clause and creates a char type. When we validate the from_unixtime, Calcite propagates the char type into the function. This was failing since the from_unixtime function prototype only takes string params. The fix for this is to do 2 passes. Once with the char type and once with a string type if it detects any char types. If the type was actually a char type, it would fail later in the code after coercion is done (which would not change the char type to a string type). Change-Id: Icc07c6cacb81d02ba0659f2f3d8ececcc63f715e Reviewed-on: http://gerrit.cloudera.org:8080/22095 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-04 03:48:11 +00:00
Steve Carlin	8d74bfd18c	IMPALA-13520: Support in clause coercing Calcite has special processing for any in clause. It has a callback function that allows all the parameters to be coerced into its proper type. While there exists a mechanism to do coercion, in the CoerceNodes class, it only handles functions, and the in clause is handled in a special way in Calcite. So we use the Calcite mechanism to derive a common Impala type and coerce all the parameters. The CombineValuesNodesRule is also needed for this change. There is a test case in test_exprs.test where an in clause contains 10,000 params in side the IN clause (e.g. int_col IN (1, 2, 3, ..., 10000). In this case, Calcite creates 10,000 Values RelNodes which takes way too long to process on the execution side. The rule combines all the Values RelNodes into one Values RelNode with many tuples, which Impala handles quickly when converted into the physical Impala PlanNode. Change-Id: I492845d623766b9182bca5eeca22eb3352ef2f3d Reviewed-on: http://gerrit.cloudera.org:8080/22408 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-01 13:56:52 +00:00
Steve Carlin	98b584a45f	IMPALA-13481: Add support for various agg and analytic functions Various functions were added. There were several issues for these functions: 1) The Calcite parser and/or validator was generating SqlNodes that weren't compatible with Impala. To fix this, the parsing had to be removed from the Parser.jj file and the functions were marked to use the ImpalaOperator rather than the Calcite operator. These functions include: trim, extract, regr, regexp, localtime, group_concat 2) The ntile, cume_dist, and percent_rank functions undergo a transformation in AnalyticExpr. To make this more clean for Calcite, the transformation now happens in the RewriteRexOverRule. 3) The "negative" operator had to be added to the custom operator table. The subtract was already added there, and all "-" operators need to be in the same table. 4) Various functions were added to function resolver where the Calcite function name was different from the Impala function name. Also added the test mentioned in IMPALA-13688 for cume_dist with duplicates. Change-Id: I57c69a60c63872b2964688f395b662a85698555e Reviewed-on: http://gerrit.cloudera.org:8080/21976 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-28 18:53:31 +00:00
Steve Carlin	1490018810	IMPALA-13522: Calcite Planner: Treat the "real" type as double Real type was being treated as a float. E2E test can be found in exprs.test where there is a cast to real. Specifically, this test... select count(*) from alltypesagg where double_col >= 20.2 and cast(double_col as double) = cast(double_col as real) ... was casting double_col as a double and returning the wrong result previous to this commit. Change-Id: I5f3cc0e50a4dfc0e28f39d81b591c1b458fd59ce Reviewed-on: http://gerrit.cloudera.org:8080/22087 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-01-25 00:25:05 +00:00

1 2 3 4

177 Commits