impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Steve Carlin	8fa383d9eb	IMPALA-14166: Calcite Planner: Ensure 'unsupported' functions are handled correctly There are some datasketches functions which return a Function object where the "isUnsupported" method returns true. This needs to be explicitly handled in the Calcite code as unsupported. Change-Id: Ic2c4a96005fc7571bde28643ea4cecda61839c77 Reviewed-on: http://gerrit.cloudera.org:8080/23041 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-07-05 23:50:44 +00:00
Riza Suminto	35aa2e2add	IMPALA-14187: Add IMPALA_JAVA_TARGET env var Impala is preparing to switch to JDK17 for Java compilation by default. While the source version might remain in 1.8 for longer, we should experiment with targeting binary version 17. This patch adds IMPALA_JAVA_TARGET env var to control target binary version. It is initialized in impala-config-java.sh, depending on value of IMPALA_JDK_VERSION env var. Testing: Pass data load and FE tests with IMPALA_JDK_VERSION=17. Change-Id: If194d87c542d416b878661403c32c6adc2930199 Reviewed-on: http://gerrit.cloudera.org:8080/23096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-27 00:41:57 +00:00
Fang-Yu Rao	aba3a705a4	IMPALA-13982: Support regular views for Calcite planner in Impala Before this patch, the Calcite planner in Impala only supported inline views like 'temp' in the following query. select id from ( select * from functional.alltypes ) as temp; Regular views, on the other hand, were not supported. For instance, the Calcite planner in Impala did not support regular views like 'functional.alltypes_view' created via the following statement and hence queries against such regular views like "select id from functional.alltypes_view" were not supported. CREATE VIEW functional.alltypes_view AS SELECT * FROM functional.alltypes; This patch adds the support for regular views to the Calcite planner via adding a ViewTable for each regular view in the given query when populating the Calcite schema. This is similar to how regular views are supported in PlannerTest#testView() at https://github.com/apache/calcite/blob/main/core/src/test/java/org/apache/calcite/tools/PlannerTest.java where the regular view to be tested is added in https://github.com/apache/calcite/blob/main/testkit/src/main/java/org/apache/calcite/test/CalciteAssert.java. We do not have to use or extend ViewTableMacro in Apache Calcite because the information about the data types returned from a regular view is already available in its respective FeTable. Therefore, there is no need to parse the SQL statement representing the regular view and collect the metadata of tables referenced by the regular view as done by ViewTableMacro. The patch supports the following cases, where 'functional.alltypes_view' is a regular view defined as "SELECT * FROM functional.alltypes". 1. select id from functional.alltypes_view. 2. select alltypes_view.id from functional.alltypes_view. 3. select functional.alltypes_view.id from functional.alltypes_view. Joining a regular view with an HDFS table like the following is also supported. select alltypestiny.id from functional.alltypes_view, functional.alltypestiny Note that after this patch, queries against regular views are supported only in the legacy catalog mode but not the local catalog mode. In fact, queries against HDFS tables in the local catalog mode are not supported yet by the Calcite planner either. We will deal with this in IMPALA-14080. Testing: - Added test cases mentioned above to calcite.test. This makes sure the test cases are supported when we start the Impala server with the flag of '--use_calcite_planner=true'. - Manually verified the test cases above are supported if we start the Impala server with the environment variable USE_CALCITE_PLANNER set to true and the query option use_calcite_planner set to 1. Change-Id: I600aae816727ae942fb221fae84c2aac63ae1893 Reviewed-on: http://gerrit.cloudera.org:8080/22883 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-06-24 22:30:15 +00:00
Steve Carlin	e9d7c152dc	IMPALA-13582: Calcite planner: return proper labels for columns The field names were not getting passed up to the output expressions. These are found on the RelNode row type object. The change is made in two different flows: The first flow is in CalciteSingleNodePlanner which gets hit when running from impala-shell and the use_calcite_planner query option is used. The second flow is in ExecRequestCreator and gets hit when running with the start-up option that loads a different JniFrontend jar. This mode will soon be deprecated, but is still used for testing purposes. Change-Id: I42818646d98f87d8744585010fc166f9d416aec1 Reviewed-on: http://gerrit.cloudera.org:8080/22117 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-06-05 21:49:30 +00:00
Steve Carlin	006f9ba589	IMPALA-14041: Enable planner tests This commit will enable some junit tests for the Calcite planner. To run these tests, use the following command from $IMPALA_HOME: (pushd java/calcite-planner && mvn -B -fae test -Dtest=TpcdsCpuCostPlannerTest) Change-Id: Idaab4e9068bb64e9a9ee12d83cd2b6b55b99b9bf Reviewed-on: http://gerrit.cloudera.org:8080/22864 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-06-03 00:27:34 +00:00
Zoltan Borok-Nagy	afa329fd89	IMPALA-13931: TestIcebergRestCatalog.test_rest_catalog_basic failed at setup There were several issues with test_rest_catalog_basic which made it fail in environments that used Ozone or S3. Missing dependency of Ozone and S3 classes: * This is resolved in iceberg-rest-catalog-test/pom.xml by adding a dependency to impala-executor-deps Hadoop configuration was initialized properly: * run-iceberg-rest-server.sh used Maven to run Iceberg REST Catalog in which case Maven is in charge of setting the CLASSPATH but the core-site/ozone-site/etc. config files were not on it, so the REST Catalog used a default Hadoop configuration that wasn't good for our environment. * To overcome the CLASSPATH problem now we create a runnable JAR in iceberg-rest-catalog-test/pom.xml and also generate the proper CLASSPATH during compilation. * run-iceberg-rest-server.sh now uses java -cp to run the REST Catalog S3 builds threw NoSuchMethodException for the "create" method of ApacheHttpClientConfigurations: * The Iceberg library dynamically load its http client builders to workaround an error, see details in https://github.com/apache/iceberg/issues/6715 * So the Iceberg lib dynamically wants to load the "create" method of its own ApacheHttpClientConfigurations class but it fails with NoSuchMethodException. * The critical code is invoked from Impala's IcebergMetadataScanner's ScanMetadataTable() method which happens to be invoked through JNI from the C++ backend. * The context class loader of such threads are NULL, which means Java will use the bootstrap class loader to load classes and methods, but that doesn't have the proper resources on its classpath. * To overcome this issue we set the context class loader for the thread to the class loader that originally loaded the IcebergMetadataScanner class. Change-Id: I9dc0e30aeaff0b8de41426ba38506383b4af472c Reviewed-on: http://gerrit.cloudera.org:8080/22818 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-09 17:01:56 +00:00
Steve Carlin	55804f7874	IMPALA-12959: Calcite planner: Implement count star optimization... IMPALA-13779: Handle partition key scan optimization IMPALA-13780: Handle full acid selects The 3 commits referenced here are somewhat related in that they all involve changes for the HdfsScanRel column layout and have been combined. For the optimizations, some infrastructure code was added. Information from the Aggregation RelNode is needed by the TableScan RelNode and vice versa. The mechanism to send information to children RelNodes is by using the ParentPlanRelContext. The mechanism for sending information up to the parent is by using the NodeWithExprs object. If the conditions are met for the optimizations (equivalent to the conditions in the current Impala planner), the optimizations are applied. For count star optimization, the STAT_NUM_ROWS fake column is added to hold the information, and then the aggregate applies a sum_init_zero on this column. For partition key scan, if the conditions are met, the Impala HdfsScanNode is sent a flag in its constructor that handles the optimization. For acid selects, the SingleNodePlanner has code to handle the additional PlanNodes needed. Some code involving column number calculation was needed to deal with the extra columns that are present in a full acid table. One extra note: In HdfsScanNode, a Preconditions check was removed. This state check ensured that the countStarSlot only existed when the aggregation substitution map was set. This does not apply to Calcite which does not use the substitution map to handle the count star optimization. Change-Id: I975beefedd2cceb34dad0f93343a46d1b7094c13 Reviewed-on: http://gerrit.cloudera.org:8080/22425 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-04 03:16:36 +00:00
Steve Carlin	30979e7d30	IMPALA-13517: Support overloaded \|\| operator The \|\| operator is used for both "or" and "concat". A new Impala custom operator is created to handle both of them, treating the precedence of the operator as if it's an "or". The "or" is chosen if both parameters are null or boolean, as taken from logic in CompoundVerticalBarExpr. At convertlet time (when converting from SqlNode to RelNode), the real operator is placed into the RexNode. Change-Id: Iabaf02e84b769db1419bd96e1d1b30b8f83d56f5 Reviewed-on: http://gerrit.cloudera.org:8080/22105 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-05-01 19:33:45 +00:00
Steve Carlin	e473f034dc	IMPALA-13042: Calcite Planner; Enable partition pruning Enables partition pruning in the HdfsScan RelNode. The PrunedPartitionHelper is used to separate the conjuncts and is a wrapper around the HdfsPartitionPruner. There are tests that currently exist in the Impala test framework that check the runtime profile for the number of files read from the table scan which are fixed with this commit. One small modification was made to the "preserveRootTypes" parameter in HdfsPartitionPruner. When it was set to false, the Calcite planner failed in one place. It makes sense for the main code line that the root type should not change, and this was tested on a full Jenkins run. Change-Id: I8c698b857555baeae347835b4a6b39d035f12405 Reviewed-on: http://gerrit.cloudera.org:8080/22409 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-05-01 00:07:03 +00:00
Steve Carlin	3d24f45f9c	IMPALA-13796: Calcite planner: Improper casting for char on join condition For the following query: SELECT COUNT(*) from orders t1 LEFT OUTER JOIN orders t2 ON cast(t1.o_comment as char(120)) = cast(t2.o_comment as char(120)); The join condition uses the Function "=(CHAR,CHAR)". The function defined within Impala uses a wildcard for the length of the char (-1). Previous to the fix, the code detected that the char(120) needed casting, would cast it to a char(1), and this produced erroneous results. The fix is to make sure we don't cast from a char(x) to a char(-1). Change-Id: Ib9f44e3d5a7623a20d9841541bb496c1dee32d1e Reviewed-on: http://gerrit.cloudera.org:8080/22541 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-04-29 12:38:08 +00:00
Steve Carlin	706e1f026c	IMPALA-13657: Connect Calcite planner to Impala Frontend framework This commit adds the plumbing created by IMPALA-13653. The Calcite planner is now called from Impala's Frontend code via 4 hooks which are: - CalciteCompilerFactory: the factory class that creates the implementations of the parser, analysis, and single node planner hooks. - CalciteParsedStatement: The class which holds the Calcite SqlNode AST. - CalciteAnalysisDriver: The class that does the validation of the SqlNode AST - CalciteSingleNodePlanner: The class that converts the AST to a logical plan, optimizes it, and converts it into an Impala PlanNode physical plan. To run on Calcite, one needs to do two things: 1) set the USE_CALCITE_PLANNER env variable to true before starting the cluster. This adds the jar file into the path in the bin/setclasspath.sh file, which is not there by default at the time of this commit. 2) set the use_calcite_planner query option to true. This commit makes the CalciteJniFrontend class obsolete. Once the test cases are moved out of there, that class and others can be removed. Change-Id: I3b30571beb797ede827ef4d794b8daefb130ccb1 Reviewed-on: http://gerrit.cloudera.org:8080/22319 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-04-09 23:55:15 +00:00
Zoltan Borok-Nagy	bd3486c051	IMPALA-13586: Initial support for Iceberg REST Catalogs This patch adds initial support for Iceberg REST Catalogs. This means now it's possible to run an Impala cluster without the Hive Metastore, and without the Impala CatalogD. Impala Coordinators can directly connect to an Iceberg REST server and fetch metadata for databases and tables from there. The support is read-only, i.e. DDL and DML statements are not supported yet. This was initially developed in the context of a company Hackathon program, i.e. it was a team effort that I squashed into a single commit and polished the code a bit. The Hackathon team members were: * Daniel Becker * Gabor Kaszab * Kurt Deschler * Peter Rozsa * Zoltan Borok-Nagy The Iceberg REST Catalog support can be configured via a Java properties file, the location of it can be specified via: --catalog_config_dir: Directory of configuration files Currently only one configuration file can be in the direcory as we only support a single Catalog at a time. The following properties are mandatory in the config file: * connector.name=iceberg * iceberg.catalog.type=rest * iceberg.rest-catalog.uri The first two properties can only be 'iceberg' and 'rest' for now, they are needed for extensibility in the future. Moreover, Impala Daemons need to specify the following flags to connect to an Iceberg REST Catalog: --use_local_catalog=true --catalogd_deployed=false Testing * e2e added to test basic functionlity with against a custom-built Iceberg REST server that delegates to HadoopCatalog under the hood * Further testing, e.g. Ranger tests are expected in subsequent commits TODO: * manual testing against Polaris / Lakekeeper, we could add automated tests in a later patch Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c Reviewed-on: http://gerrit.cloudera.org:8080/22353 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-02 20:04:12 +00:00
Peter Rozsa	1f70269392	IMPALA-13838: Update Impala version to 5.0.0-SNAPSHOT Change-Id: I9c5a2d817b30e14333feeb5b2de3e0c40795723f Reviewed-on: http://gerrit.cloudera.org:8080/22596 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-08 14:13:48 +00:00
Steve Carlin	5b4427ed1b	IMPALA-13587: Calcite planner: Outer join not aggregating nulls properly The following query is producing incorrect results: select t2.int_col y from alltypessmall t1 left outer join alltypestiny t2 on t1.int_col = t2.int_col group by 1 ... due to nulls not being aggregated properly on multiple nodes. This is because the value equivalency graph is being set for the join conjunct on an outer join. When a hash join partition node is being used, there is an optimization that skips the aggregation step that combines groups across nodes if, based on the value transfer graph, it deduces that all data for the partition column is being sent to the same node. The bug here is that even though an outer join is using an equi-conjunct, the left and right side are different when data is not found on the outer join side, where it becomes null. The fix is to avoid registering the equi-conjunct if the values are not always equal. Change-Id: I57e9d4ad4c4af5a4c268e43ac2937064dab6ffd7 Reviewed-on: http://gerrit.cloudera.org:8080/22138 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Steve Carlin <scarlin@cloudera.com>	2025-03-06 21:14:01 +00:00
Steve Carlin	b449381cc8	IMPALA-13575: Calcite planner: Fix exception when null is in values clause The following query was failing with an exception: select * from (values(0), (null)) The null type was not being assigned a type correctly. After this fix, the null type will be created as an AnalyzedNullLiteral with the correct type. Change-Id: I4e78fb0ed63b9525540ad537cfb7aabd8bbfe7ea Reviewed-on: http://gerrit.cloudera.org:8080/22109 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-06 21:13:06 +00:00
Steve Carlin	796c25fc57	IMPALA-13716: Calcite Planner: TupleIsNullPredicate fix for analytic functions There is some special logic to materialize the TupleIsNullPredicate functions that are created by join nodes for outer joins for analytic functions. This commit refactors some of the code in the current Impala planner and materializes them with the Analytic RelNode. An example query from the test framework that causes this issue is: select avg(g) over (order by f) af3 from alltypestiny t1 left outer join (select id as a, coalesce(bigint_col, 30) as f, bigint_col as g from alltypestiny) t2 on (t1.id = t2.a); Change-Id: Iaec363c2fa93a1e21bf74a40e5399e21ddd9bd60 Reviewed-on: http://gerrit.cloudera.org:8080/22411 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-26 16:22:26 +00:00
jasonmfehr	aac67a077e	IMPALA-13201: System Table Queries Execute When Admission Queues are Full Queries that run only against in-memory system tables are currently subject to the same admission control process as all other queries. Since these queries do not use any resources on executors, admission control does not need to consider the state of executors when deciding to admit these queries. This change adds a boolean configuration option 'onlyCoordinators' to the fair-scheduler.xml file for specifying a request pool only applies to the coordinators. When a query is submitted to a coordinator only request pool, then no executors are required to be running. Instead, all fragment instances are executed exclusively on the coordinators. A new member was added to the ClusterMembershipMgr::Snapshot struct to hold the ExecutorGroup of all coordinators. This object is kept up to date by processing statestore messages and is used when executing queries that either require the coordinators (such as queries against sys.impala_query_live) or that use an only coordinators request pool. Testing was accomplished by: 1. Adding cluster membership manager ctests to assert cluster membership manager correctly builds the list of non-quiescing coordinators. 2. RequestPoolService JUnit tests to assert the new optional <onlyCoords> config in the fair scheduler xml file is correctly parsed. 3. ExecutorGroup ctests modified to assert the new function. 4. Custom cluster admission controller tests to assert queries with a coordinator only request pool only run on the active coordinators. Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda Reviewed-on: http://gerrit.cloudera.org:8080/22249 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 04:27:11 +00:00
Steve Carlin	f50514a265	IMPALA-13525: Handle escaped characters in string literal Changed the parser to handle escaped characters. The method is in a new class called ParserUtil. The method was copied from Calcite's SqlParserUtil, but one change was needed. The Calcite method did not handle a backslash in front of a regex character. For Impala, if we detect the backslash in front of a regex character, we leave the character but remove the backslash. This is tested in exprs.test Change-Id: I9b0fbe591d1101350b2ba0f6ddb2967b819ee685 Reviewed-on: http://gerrit.cloudera.org:8080/22106 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-12 13:24:13 +00:00
Steve Carlin	c9e9f86c05	IMPALA-13571: Calcite Planner: Fix join parsing errors. This commit fixes two parsing errors related to joins. 1) The straight_join hint is now parsed correctly. The hint, however, will be ignored. IMPALA-13574 has been filed for this feature. 2) The anti-join and semi-join will no longer throw parse exceptions but will now throw unsupported exceptions. The jiras IMPALA-13572 and IMPALA-13573 have been filed for these features. IMPALA-13708: Throw unsupported message for complex column syntax. There is special complex column syntax where the column is treated like a table and looks like a table name when parsed. If we get an analysis error for an unfound table, we can see if it is actually a complex column and throw an "unsupported" error instead of giving a cryptic "table not found" message. Also made the support for types of tables to be a bit more generic and check if it's not an HdfsTable rather than only throw an unsupported exception if it's an Iceberg table. Change-Id: Icd0f68441c84b090ed2cb45de96ccee1054deef5 Reviewed-on: http://gerrit.cloudera.org:8080/22412 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-09 01:27:07 +00:00
Michael Smith	88067c576b	IMPALA-13740: Update velocity-engine-core to 2.4.1 Updates velocity-engine-core - required by pac4j - to 2.4.1 to avoid including a shaded version of commons-io vulnerable to CVE-2024-47554. Change-Id: I76624851d6f51d1b9d4dd61fc488932a51e9cba0 Reviewed-on: http://gerrit.cloudera.org:8080/22454 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2025-02-06 16:31:39 +00:00
Steve Carlin	4290eb7dc5	IMPALA-13576: Fix filter placement in the plan and related changes. This patch combines the following changes: - Filter above a Sort RelNode is now supported (IMPALA-13576) - A new rule is added to remove empty sort keys, as in the example SELECT '1' FROM alltypestiny ORDER BY 1 This also required bumping up the Calcite version to 1.37. (IMPALA-13578) - A parser fix to allow LIMIT clause in a subquery (IMPALA-13579) - Added optimization to push Filter past the Project RelNode (IMPALA-13577) This optimization needs to be added before "CoerceNodes" so that the filter does not get passed through a generated Project RelNode created for handling conversion of Calcite literal types to Impala literal types (see ImpalaProjectRel.isCoercedProjectForValues() method for details on that). Change-Id: Id075d8516f1fcff4e6402c2ab4b4992a174c8151 Reviewed-on: http://gerrit.cloudera.org:8080/22405 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-02-06 16:31:21 +00:00
Steve Carlin	d5f43ff19a	IMPALA-13523: Decimal precision and scale needs to be in return type The inferred return type needs to contain a decimal precision and scale. The return type is calculated by taking the most compatible type of all the arguments One query in the e2e tests that will be fixed because of this is (previously it was throwing an analysis exception): select appx_median(c1), appx_median(c2), appx_median(c3) from decimal_tiny; The CoerceOperandShuttle also ensures that if the return type is a decimal in certain cases, all the arguments for the function will be cast to that specific return type. The 2 cases here are: 1) when the function is a case statement, in which all cases need to be the same precision and scale 2) when the function contains varargs, in which case all the comparisons need to be of the same precision and scale. Change-Id: Ie10521b587a74930a01c08b711364f897bb2dc33 Reviewed-on: http://gerrit.cloudera.org:8080/22086 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-02-05 15:37:42 +00:00
Steve Carlin	40747a7802	IMPALA-13524: Calcite planner: support for functions in exprs.test Specifically, this commit supports translate, the not operator (!), the null not equals operator and some "not regexp" functions. A little extra code had to be added for the not regexp functions because Calcite treats the "not" part as an attribute to the operator. Also, added support for "is distinct from" and "is not distinct from" operators which required a convertlet change to handle the conversion from SqlNodes to RexNodes. Change-Id: Ib8c5d5719a409a32ddb6946d1a87c77773f20820 Reviewed-on: http://gerrit.cloudera.org:8080/22104 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-04 13:01:46 +00:00
Steve Carlin	9b93ab8b55	IMPALA-13521: Calcite planner: Handle function problem with char params There is an issue in Calcite that it treats literals as a CHAR type whereas Impala treats a literal as a STRING type. This is fixed at coercion time. However, at validation time, it is possible that a function is passed an operand which originated as a literal type, but the information is lost by the time it hits the function. An example of this is in exprs.test: select from_unixtime(tmp.val, tmp.fmt) from (values (0 as val, 'yyyy-MM-dd HH:mm:ss' as fmt), (34304461, 'HH:mm:ss dd/yyyy/MM'), (68439783, 'yyyy\|\|MMM\|\|dd\|\|HHmmss')) as tmp The validator sees the literal in the values clause and creates a char type. When we validate the from_unixtime, Calcite propagates the char type into the function. This was failing since the from_unixtime function prototype only takes string params. The fix for this is to do 2 passes. Once with the char type and once with a string type if it detects any char types. If the type was actually a char type, it would fail later in the code after coercion is done (which would not change the char type to a string type). Change-Id: Icc07c6cacb81d02ba0659f2f3d8ececcc63f715e Reviewed-on: http://gerrit.cloudera.org:8080/22095 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-04 03:48:11 +00:00
Steve Carlin	8d74bfd18c	IMPALA-13520: Support in clause coercing Calcite has special processing for any in clause. It has a callback function that allows all the parameters to be coerced into its proper type. While there exists a mechanism to do coercion, in the CoerceNodes class, it only handles functions, and the in clause is handled in a special way in Calcite. So we use the Calcite mechanism to derive a common Impala type and coerce all the parameters. The CombineValuesNodesRule is also needed for this change. There is a test case in test_exprs.test where an in clause contains 10,000 params in side the IN clause (e.g. int_col IN (1, 2, 3, ..., 10000). In this case, Calcite creates 10,000 Values RelNodes which takes way too long to process on the execution side. The rule combines all the Values RelNodes into one Values RelNode with many tuples, which Impala handles quickly when converted into the physical Impala PlanNode. Change-Id: I492845d623766b9182bca5eeca22eb3352ef2f3d Reviewed-on: http://gerrit.cloudera.org:8080/22408 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-01 13:56:52 +00:00
Steve Carlin	98b584a45f	IMPALA-13481: Add support for various agg and analytic functions Various functions were added. There were several issues for these functions: 1) The Calcite parser and/or validator was generating SqlNodes that weren't compatible with Impala. To fix this, the parsing had to be removed from the Parser.jj file and the functions were marked to use the ImpalaOperator rather than the Calcite operator. These functions include: trim, extract, regr, regexp, localtime, group_concat 2) The ntile, cume_dist, and percent_rank functions undergo a transformation in AnalyticExpr. To make this more clean for Calcite, the transformation now happens in the RewriteRexOverRule. 3) The "negative" operator had to be added to the custom operator table. The subtract was already added there, and all "-" operators need to be in the same table. 4) Various functions were added to function resolver where the Calcite function name was different from the Impala function name. Also added the test mentioned in IMPALA-13688 for cume_dist with duplicates. Change-Id: I57c69a60c63872b2964688f395b662a85698555e Reviewed-on: http://gerrit.cloudera.org:8080/21976 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-28 18:53:31 +00:00
Steve Carlin	1490018810	IMPALA-13522: Calcite Planner: Treat the "real" type as double Real type was being treated as a float. E2E test can be found in exprs.test where there is a cast to real. Specifically, this test... select count(*) from alltypesagg where double_col >= 20.2 and cast(double_col as double) = cast(double_col as real) ... was casting double_col as a double and returning the wrong result previous to this commit. Change-Id: I5f3cc0e50a4dfc0e28f39d81b591c1b458fd59ce Reviewed-on: http://gerrit.cloudera.org:8080/22087 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2025-01-25 00:25:05 +00:00
Daniel Becker	c5b474d3f5	IMPALA-13594: Read Puffin stats also from older snapshots Before this change, Puffin stats were only read from the current snapshot. Now we also consider older snapshots, and for each column we choose the most recent available stats. Note that this means that the stats for different columns may come from different snapshots. In case there are both HMS and Puffin stats for a column, the more recent one will be used - for HMS stats we use the 'impala.lastComputeStatsTime' table property, and for Puffin stats we use the snapshot timestamp to determine which is more recent. This commit also renames the startup flag 'disable_reading_puffin_stats' to 'enable_reading_puffin_stats' and the table property 'impala.iceberg_disable_reading_puffin_stats' to 'impala.iceberg_read_puffin_stats' to make them more intuitive. The default values are flipped to keep the same behaviour as before. The documentation of Puffin reading is updated in docs/topics/impala_iceberg.xml Testing: - updated existing test cases and added new ones in test_iceberg_with_puffin.py - reorganised the tests in TestIcebergTableWithPuffinStats in test_iceberg_with_puffin.py: tests that modify table properties and other state that other tests rely on are now run separately to provide a clean environment for all tests. Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39 Reviewed-on: http://gerrit.cloudera.org:8080/22177 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-23 15:25:59 +00:00
Michael Smith	740ee28eb1	IMPALA-13618: Move to commons-lang3 Updates from commons-lang (2.6) to commons-lang3. Switches getFullStackTrace to getStackTrace. getFullStackTrace is not present in lang3, and https://issues.apache.org/jira/browse/LANG-904 suggests that getFullStackTrace existed for handling chained exceptions in older Java runtimes. Change-Id: Ie16af2692858f6a571cc1e5b85ecba3806da8d7e Reviewed-on: http://gerrit.cloudera.org:8080/22228 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-09 07:36:39 +00:00
Andrew Sherman	6d715fe7dc	IMPALA-13596: Add warnings and exceptions to reading of fair-scheduler file. The fair-scheduler file contains part of the configuration for Admission Control. This change adds some better error handling to the parsing of this file. Where it is safe to do so, new exceptions are thrown; this will cause Impala to refuse to start. This is consistent with other serious configuration errors. Where new exceptions might cause problems with existing configurations, or for less dangerous faults, new warnings are written to the server log. For the recently added User Quota configuration (IMPALA-12345) throw an exception when a duplicate snippet of configuration is found. New warning log messages are added for these cases: - when a user quota at the leaf level is completely ignored because of a user quota at the root level - when there is no user ACL on a leaf level queue. This prevents any queries from being submitted to the queue. Change-Id: Idcd50442ce16e7c4346c6da1624216d694f6f44d Reviewed-on: http://gerrit.cloudera.org:8080/22209 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-31 22:13:44 +00:00
Michael Smith	30ffc2f493	IMPALA-13619: Update commons-lang3 to 3.17.0 Updates commons-lang3 - used by Thrift and Orc - to 3.17.0, and provides the IMPALA_COMMONS_LANG3_VERSION environment variable to override the version. Change-Id: I4005f8aef1cf66a32840cd0b510cd7faf597f5f2 Reviewed-on: http://gerrit.cloudera.org:8080/22227 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-12-18 18:26:13 +00:00
Pranav Lodha	907c1738a0	IMPALA-12789: Fix unit-test code JdbcDataSourceTest.java The unit test `JdbcDataSourceTest.java` was originally implemented using the H2 database, which is no longer available in Impala's environment. The test code was also outdated and erroneous. This commit addresses and fixes the failure of JdbcDataSourceTest.java and rewrites it in Postgres, hence ensures compatibility with Impala's current environment and aligns with JDBC and external data source APIs. Please note, this test is moved to fe folder to fix the BackendConfig instance not initialized error. To test this file, run the following command: pushd fe && mvn -fae test -Dtest=JdbcDataSourceTest Please note that the tests in JdbcDataSourceTest have a dependency on previous tests and individual tests cannot be ran separately for this class. Change-Id: Ie07173d256d73c88f5a6c041f087db16b6ff3127 Reviewed-on: http://gerrit.cloudera.org:8080/21805 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-12 11:34:45 +00:00
Daniel Becker	e5919f13f9	IMPALA-13370: Read Puffin stats from metadata.json property if available When Trino writes Puffin stats for a column, it includes the NDV as a property (with key "ndv") in the "statistics" section of the metadata.json file, in addition to the Theta sketch in the Puffin file. When we are only reading the stats and not writing/updating them, it is enough to read this property if it is present. After this change, Impala only opens and reads a Puffin stats file if it contains stats for at least one column for which the "ndv" property is not set in the metadata.json file. Testing: - added a test in test_iceberg_with_puffin.py that verifies that the Puffin stats file is not read if the the metadata.json file contains the NDV property. It uses the newly added stats file with corrupt datasketches: 'metadata_ndv_ok_sketches_corrupt.stats'. Change-Id: I5e92056ce97c4849742db6309562af3b575f647b Reviewed-on: http://gerrit.cloudera.org:8080/21959 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-23 16:04:06 +00:00
Steve Carlin	0ca42fafec	IMPALA-13511: Addendum, fixed wrong case statement This was changed in code review. Because this is for the Calcite planner, the unit tests aren't in place yet. Change-Id: I12583f571392513055d73b74001a021cfc2c9813 Reviewed-on: http://gerrit.cloudera.org:8080/22098 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-11-22 10:33:21 -08:00
Steve Carlin	265479f068	IMPALA-13540: Calcite planner: fix wrong results for set operators Calcite treats the intersect set operator with higher precedence when compared with the except and union set operators. Impala treats all the precedences equally (favoring left operators over right). The following query was failing select 100 union select 101 intersect select 101 Calcite was returning 2 rows here, performing the intersect before the union. Impala does the union first and returned one row. To fix this, new custom operators were created for the set operators where all set operators have equal precedence. Change-Id: Ic52661a30cc90534ea1a20868799edf9ceed13b6 Reviewed-on: http://gerrit.cloudera.org:8080/22052 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-21 22:32:14 +00:00
Steve Carlin	9a2496dd28	IMPALA-13511: Calcite planner: support sub-second datetime parts This commit allows support for milliseconds, microseconds, and nanoseconds for Calcite. The ImpalaSqlIntervalQualifier class extends the Calcite SqlIntervalQualifier class which handles most datetime parts. Some code was copied from the base class since some of the methods were private, but in general, the code handling the parts is exactly the same. Change-Id: I392c3900c70e7754a35ef25fc720ba4a2f2e5dd6 Reviewed-on: http://gerrit.cloudera.org:8080/22029 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-19 17:29:54 +00:00
Steve Carlin	3574fe89b0	IMPALA-13513: Support decode function The decode operator is similar to the case operator. The tricky part for supporting the decode is that some of the operands are search parameters and some are return parameters. The search parameters all need to be compatible with each other. And the return parameters need to be compatible with each other as well. Much of the code deals with casting these parameters to compatible types. Change-Id: Ia3b68fda7cfa14799a41428e35d5bbc5984a801a Reviewed-on: http://gerrit.cloudera.org:8080/22031 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-19 17:25:55 +00:00
Andrew Sherman	de6b902581	IMPALA-12345: Add user quotas to Admission Control Allow administrators to configure per user limits on queries that can run in the Impala system. In order to do this, there are two parts. Firstly we must track the total counts of queries in the system on a per-user basis. Secondly there must be a user model that allows rules that control per-user limits on the number of queries that can be run. In a Kerberos environment the user names that are used for both the user model and at runtime are short user names, e.g. testuser when the Kerberos principal is testuser/scm@EXAMPLE.COM TPoolStats (the data that is shared between Admission Control instances) is extended to include a map from user name to a count of queries running. This (along with some derived data structures) is updated when queries are queued and when they are released from Admission Control. This lifecycle is slightly different from other TPoolStats data which usually tracks data about queries that are running. Queries can be rejected because of user quotas at submission time. This is done for two reasons: (1) queries can only be admitted from the front of the queue and we do not want to block other queries due to quotas, and (2) it is easy for users to understand what is going on when queries are rejected at submission time. Note that when running in configurations without an Admission Daemon then Admission Control does not have perfect information about the system and over-admission is possible for User-Level Admission Quotas in the same way that it is for other Admission Control controls. The User Model is implemented by extending the format of the fair-scheduler.xml file. The rules controlling the per-user limits are specified in terms of user or group names. Two new elements ‘userQueryLimit’ and ‘groupQueryLimit’ can be added to the fair-scheduler.xml file. These elements can be placed on the root configuration, which applies to all pools, or the pool configuration. The ‘userQueryLimit’ element has 2 child elements: "user" and "totalCount". The 'user' element contains the short names of users, and can be repeated, or have the value "*" for a wildcard name which matches all users. The ‘groupQueryLimit’ element has 2 child elements: "group" and "totalCount". The 'group' element contains group names. The root level rules and pool level rules must both be passed for a new query to be queued. The rules dictate a maximum number of queries that can run by a user. When evaluating rules at either the root level, or at the pool level, when a rule matches a user then there is no more evaluation done. To support reading the ‘userQueryLimit’ and ‘groupQueryLimit’ fields the RequestPoolService is enhanced. If user quotas are enabled for a pool then a list of the users with running or queued queries in that pool is visible on the coordinator webui admission control page. More comprehensive documentation of the user model will be provided in IMPALA-12943 TESTING New end-to-end tests are added to test_admission_controller.py, and admission-controller-test is extended to provide unit tests for the user model. Change-Id: I4c33f3f2427db57fb9b6c593a4b22d5029549b41 Reviewed-on: http://gerrit.cloudera.org:8080/21616 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-16 06:38:38 +00:00
Steve Carlin	75e0121890	IMPALA-13541: Calcite planner, declare explicit_cast in operator table. This fixes a regression caused by IMPALA-13516. The validator finds the explicit_cast function when the cast to timestamp function is present in the query. However, the validator then tries to validate the explicit_cast function which was not present in the operator table. Placing the operator in the table fixes the issue. Change-Id: Ib8577a06178435f5048d0a9721c16069ebe05743 Reviewed-on: http://gerrit.cloudera.org:8080/22057 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-11-13 00:50:11 +00:00
Steve Carlin	b703e68cda	IMPALA-13516: Fix handling of cast functions There were some cast functions that were failing. There were several reasons behind this. One reason was because Calcite classifies all integers as an "int" even if they can be other smaller types (e.g. tinyint). Normally this is handled by the "CoerceNodes" portion, but it is impossible to tell the type if the query had the phrase "select cast(1 as integer)" or "select 1" since both would show up to CoerceNodes as "select 1:INT" In order to handle this an "explicit_cast" operator now exists and is used when the cast function is parsed within the commit. The explicit_cast operator has to be different from the "cast" Calcite operator in order to avoid being optimized out in various portions of the compilation. Change-Id: I1edabc942de1c4030331bc29612c41b392cd8a05 Reviewed-on: http://gerrit.cloudera.org:8080/22034 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-11-12 17:47:27 +00:00
Steve Carlin	17d1c0fa1e	IMPALA-13528: Calcite planner; handle unsupported query options Unsupported query options will be run through the original planner. Change-Id: Ic1cc23a7447c052e81a42141ec052b6af4ad5e4a Reviewed-on: http://gerrit.cloudera.org:8080/22041 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-09 05:12:32 +00:00
Steve Carlin	453c9519bc	IMPALA-13482: Bug fixes for lag/coalesce in analytic function. The following SQL query in analytics.test ... select lag(coalesce(505, 1 + NULL), 1) over (order by int_col desc) from alltypestiny ... had a couple of issues 1) The coalesce function needed a special operator. This function derives its return type from a common type that works for all parameters. 2) The function was not being saved when being reset. This is needed for when resetAnalysisState is called. 3) createNullLiteral needed to be overriden for similar reasons. The null literal type needs to be saved for when resetAnalysisState is called. Change-Id: Ic54d955a73cec4b5f421099a74df4172a1b7dd8b Reviewed-on: http://gerrit.cloudera.org:8080/22024 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-08 01:27:21 +00:00
Joe McDonnell	d0423a83ef	IMPALA-13495: Make exceptions from the Calcite planner easier to classify This makes several changes to the Calcite planner to improve the generated exceptions when there are errors: 1. When the Calcite parser produces SqlParseException, this is converted to Impala's regular ParseException. 2. When the Calcite validation fails, it produces a CalciteContextException, which is a wrapper around the real cause. This converts these validation errors into AnalysisExceptions. 3. This produces UnsupportedFeatureException for non-HDFS table types like Kudu, HBase, Iceberg, and views. It also produces UnsupportedFeatureException for HDFS tables with complex types (which otherwise would hit ClassCastException). 4. This changes exception handling in CalciteJniFrontend.java so it does not convert exceptions to InternalException. The JNI code will print the stacktrace for exceptions, so this drops the existing call to print the exception stack trace. Testing: - Ran some end-to-end tests with a mode that continues past failures and examined the output. Change-Id: I6702ceac1d1d67c3d82ec357d938f12a6cf1c828 Reviewed-on: http://gerrit.cloudera.org:8080/21989 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-11-07 23:25:13 +00:00
Steve Carlin	e8e8a5150e	IMPALA-13441: Support explain statements in Impala planner This adds support for explain statements for the Calcite planner. This also fixes up the Parser.jj file so that statements not processed by Calcite planner will fail, like "describe" and other non-select statements. The parser will now only handle "select" and "explain" as the first keyword. If the parser fails, we need to do an additional check within Impala. We run the statement through the Impala parser and check the statement type. If the statement type is anything other than SelectStmt, we run the query within the original Impala planner. If it is a SelectStmt, we fail the query because we want all select statements to go through the Calcite parser. Change-Id: Iea6afaa1f1698a300ad047c8820691cf7e8eb44b Reviewed-on: http://gerrit.cloudera.org:8080/21923 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-02 02:54:25 +00:00
Steve Carlin	33631047f8	IMPALA-13468: Fix various aggregation issues in aggregation.test Various small issues fixed including: - There is a special operator dedicated to scalar functions not handled in Calcite. The special agg operator equivalent was created (ImpalaAggOperator) - Grouping_id function needs to be handled in a special way, calling AggregateFunction.createRewrittenFunction - A custom Avg operator was created to handle avg(TIMESTAMP) which isn't allowed in Calcite. - A custom Min/Max operator was created to handle min(NULL) and min(char types). - The corr, covar_pop, and covar_samp functions use the default Impala function resolver rather than the Calcite resolver. Change-Id: I038127d6a2f228ae8d263e983b1906e99ae05f77 Reviewed-on: http://gerrit.cloudera.org:8080/21961 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-10-31 18:06:02 +00:00
Steve Carlin	3cf05fe21a	IMPALA-13461: Added rules to make tpcds queries work. Among the rules that were added: The "Minus" RelNode is not handled directly by the physical node translator and is changed into other nodes that are handled. This was added in the ImpalaMinusToDistinctRule The ExtractLiteralAgg rule compensates for the fact that a literal value cannot be used directly in an agg. The CalciteRelNodeConverter handles breaking down a SubQuery RelNode into simpler RelNodes that can be optimized. The pom.xml file was also changed. There is a java bug in java 8 that causes incremental compiles to fail. So we do a full compile for the Calcite planner now. Change-Id: I03a38aaa5c413b9b4d2f4c179de07935b672a031 Reviewed-on: http://gerrit.cloudera.org:8080/21941 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-10-21 18:02:32 +00:00
Steve Carlin	7b109ca45f	IMPALA-13457: Calcite planner: fix datetime/interval issues for tpcds queries Added support for datetime interval operators. A dummy IntervalExpr was added to support logical to physical representation. The IntervalExpr is removed once the actual "+" operation for datetime is created, but the translation framework is simplified by working on the Calcite interval type in a separate temporary Expr. Some logic was inserted into CoerceOperandShuttle. This object is used to add casts when there is limitation in Calcite for number types. When we have an interval, we always want to represent it as a special type so we do not want to introduce a cast. Also, in ImpalaOperatorTable, the ability to use Impala functions over Calcite functions are needed because Calcite translates the year function into "extract" which causes some issues. Just using the Impala signature works for us here. Change-Id: I2b4afc3ab1d17ba1f168904a6ded052e1d62b3fe Reviewed-on: http://gerrit.cloudera.org:8080/21946 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-21 18:00:37 +00:00
Daniel Becker	b05b408f17	IMPALA-13247: Support Reading Puffin files for the current snapshot This change adds support for reading NDV statistics from Puffin files when they are available for the current snapshot. Puffin files or blobs that were written for other snapshots than the current one are ignored. Because this behaviour is different from what we have for HMS stats and may therefore be unintuitive for users, reading Puffin stats is disabled by default; set the "--disable_reading_puffin_stats" startup flag to false to enable it. When Puffin stats reading is enabled, the NDV values read from Puffin files take precedence over NDV values stored in the HMS. This is because we only read Puffin stats for the current snapshot, so these values are always up-to-date, while the values in the HMS may be stale. Note that it is currently not possible to drop Puffin stats from Impala. For this reason, this patch also introduces two ways of disabling the reading of Puffin stats: - globally, with the aforementioned "--disable_reading_puffin_stats" startup flag: when it is set to true, Impala will never read Puffin stats - for specific tables, by setting the "impala.iceberg_disable_reading_puffin_stats" table property to true. Note that this change is only about reading Puffin files, Impala does not yet support writing them. Testing: - created the PuffinDataGenerator tool which can generate Puffin files and metadata.json files for different scenarios (e.g. all stats are in the same Puffin file; stats for different columns are in different Puffin files; some Puffin files are corrupt etc.). The generated files are under the "testdata/ice_puffin/generated" directory. - The new custom cluster test class 'test_iceberg_with_puffin.py::TestIcebergTableWithPuffinStats' uses the generated data to test various scenarios. - Added custom cluster tests that test the 'disable_reading_puffin_stats' startup flag. Change-Id: I50c1228988960a686d08a9b2942e01e366678866 Reviewed-on: http://gerrit.cloudera.org:8080/21605 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-19 22:14:59 +00:00
Steve Carlin	ff7a553324	IMPALA-13462: Added support for functions used in tpcds The tpcds queries contain some functions that require some modifications that the general function resolver cannot handle. These include: - Some functions don't have the same name within Calcite. An example of this is "is_not_null" which is "is_not_null_pred" in Impala. - The grouping function returns a tinyint in Impala which is different from Calcite. - The params for functions that adjust the scale (e.g. ROUND) need to handle casting of parameters in the Impala way which is different from Calcite. Also handled in this commit is turning on the identifier expansion in the Calcite validator. This is needed to fix some of the tpcds queries as well. Change-Id: Id451357f2fb92d35e09b100751f0f4a49760a51c Reviewed-on: http://gerrit.cloudera.org:8080/21947 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-19 02:31:02 +00:00
Steve Carlin	d2cb00cece	IMPALA-13456: Calcite planner: fix issues with quotes Changed the parser to use quotes that are inline with how Impala treats quotes, including allowing single quotes, double quotes, and back ticks for aliases, and also allowing the backslash to be used as an escape character. This is inline with what BigQuery uses in Calcite. A couple of unit tests were added, but these will be tested more extensively by the ParserTest frontend unit test when that gets committed. Also, added the VALUE as a nonreserved keyword which is used in the tpcds queries (along with the doublequotes) Change-Id: I67ebb19912714c240b99a42d9f2f02f78c189350 Reviewed-on: http://gerrit.cloudera.org:8080/21942 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-10-18 17:13:43 +00:00

1 2 3 4

154 Commits