impala

mirror of https://github.com/apache/impala.git synced 2025-12-20 02:20:11 -05:00

Author	SHA1	Message	Date
Steve Carlin	0ca42fafec	IMPALA-13511: Addendum, fixed wrong case statement This was changed in code review. Because this is for the Calcite planner, the unit tests aren't in place yet. Change-Id: I12583f571392513055d73b74001a021cfc2c9813 Reviewed-on: http://gerrit.cloudera.org:8080/22098 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-11-22 10:33:21 -08:00
Steve Carlin	265479f068	IMPALA-13540: Calcite planner: fix wrong results for set operators Calcite treats the intersect set operator with higher precedence when compared with the except and union set operators. Impala treats all the precedences equally (favoring left operators over right). The following query was failing select 100 union select 101 intersect select 101 Calcite was returning 2 rows here, performing the intersect before the union. Impala does the union first and returned one row. To fix this, new custom operators were created for the set operators where all set operators have equal precedence. Change-Id: Ic52661a30cc90534ea1a20868799edf9ceed13b6 Reviewed-on: http://gerrit.cloudera.org:8080/22052 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-21 22:32:14 +00:00
Steve Carlin	9a2496dd28	IMPALA-13511: Calcite planner: support sub-second datetime parts This commit allows support for milliseconds, microseconds, and nanoseconds for Calcite. The ImpalaSqlIntervalQualifier class extends the Calcite SqlIntervalQualifier class which handles most datetime parts. Some code was copied from the base class since some of the methods were private, but in general, the code handling the parts is exactly the same. Change-Id: I392c3900c70e7754a35ef25fc720ba4a2f2e5dd6 Reviewed-on: http://gerrit.cloudera.org:8080/22029 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-19 17:29:54 +00:00
Steve Carlin	3574fe89b0	IMPALA-13513: Support decode function The decode operator is similar to the case operator. The tricky part for supporting the decode is that some of the operands are search parameters and some are return parameters. The search parameters all need to be compatible with each other. And the return parameters need to be compatible with each other as well. Much of the code deals with casting these parameters to compatible types. Change-Id: Ia3b68fda7cfa14799a41428e35d5bbc5984a801a Reviewed-on: http://gerrit.cloudera.org:8080/22031 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-19 17:25:55 +00:00
Andrew Sherman	de6b902581	IMPALA-12345: Add user quotas to Admission Control Allow administrators to configure per user limits on queries that can run in the Impala system. In order to do this, there are two parts. Firstly we must track the total counts of queries in the system on a per-user basis. Secondly there must be a user model that allows rules that control per-user limits on the number of queries that can be run. In a Kerberos environment the user names that are used for both the user model and at runtime are short user names, e.g. testuser when the Kerberos principal is testuser/scm@EXAMPLE.COM TPoolStats (the data that is shared between Admission Control instances) is extended to include a map from user name to a count of queries running. This (along with some derived data structures) is updated when queries are queued and when they are released from Admission Control. This lifecycle is slightly different from other TPoolStats data which usually tracks data about queries that are running. Queries can be rejected because of user quotas at submission time. This is done for two reasons: (1) queries can only be admitted from the front of the queue and we do not want to block other queries due to quotas, and (2) it is easy for users to understand what is going on when queries are rejected at submission time. Note that when running in configurations without an Admission Daemon then Admission Control does not have perfect information about the system and over-admission is possible for User-Level Admission Quotas in the same way that it is for other Admission Control controls. The User Model is implemented by extending the format of the fair-scheduler.xml file. The rules controlling the per-user limits are specified in terms of user or group names. Two new elements ‘userQueryLimit’ and ‘groupQueryLimit’ can be added to the fair-scheduler.xml file. These elements can be placed on the root configuration, which applies to all pools, or the pool configuration. The ‘userQueryLimit’ element has 2 child elements: "user" and "totalCount". The 'user' element contains the short names of users, and can be repeated, or have the value "*" for a wildcard name which matches all users. The ‘groupQueryLimit’ element has 2 child elements: "group" and "totalCount". The 'group' element contains group names. The root level rules and pool level rules must both be passed for a new query to be queued. The rules dictate a maximum number of queries that can run by a user. When evaluating rules at either the root level, or at the pool level, when a rule matches a user then there is no more evaluation done. To support reading the ‘userQueryLimit’ and ‘groupQueryLimit’ fields the RequestPoolService is enhanced. If user quotas are enabled for a pool then a list of the users with running or queued queries in that pool is visible on the coordinator webui admission control page. More comprehensive documentation of the user model will be provided in IMPALA-12943 TESTING New end-to-end tests are added to test_admission_controller.py, and admission-controller-test is extended to provide unit tests for the user model. Change-Id: I4c33f3f2427db57fb9b6c593a4b22d5029549b41 Reviewed-on: http://gerrit.cloudera.org:8080/21616 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-16 06:38:38 +00:00
Steve Carlin	75e0121890	IMPALA-13541: Calcite planner, declare explicit_cast in operator table. This fixes a regression caused by IMPALA-13516. The validator finds the explicit_cast function when the cast to timestamp function is present in the query. However, the validator then tries to validate the explicit_cast function which was not present in the operator table. Placing the operator in the table fixes the issue. Change-Id: Ib8577a06178435f5048d0a9721c16069ebe05743 Reviewed-on: http://gerrit.cloudera.org:8080/22057 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-11-13 00:50:11 +00:00
Steve Carlin	b703e68cda	IMPALA-13516: Fix handling of cast functions There were some cast functions that were failing. There were several reasons behind this. One reason was because Calcite classifies all integers as an "int" even if they can be other smaller types (e.g. tinyint). Normally this is handled by the "CoerceNodes" portion, but it is impossible to tell the type if the query had the phrase "select cast(1 as integer)" or "select 1" since both would show up to CoerceNodes as "select 1:INT" In order to handle this an "explicit_cast" operator now exists and is used when the cast function is parsed within the commit. The explicit_cast operator has to be different from the "cast" Calcite operator in order to avoid being optimized out in various portions of the compilation. Change-Id: I1edabc942de1c4030331bc29612c41b392cd8a05 Reviewed-on: http://gerrit.cloudera.org:8080/22034 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-11-12 17:47:27 +00:00
Steve Carlin	17d1c0fa1e	IMPALA-13528: Calcite planner; handle unsupported query options Unsupported query options will be run through the original planner. Change-Id: Ic1cc23a7447c052e81a42141ec052b6af4ad5e4a Reviewed-on: http://gerrit.cloudera.org:8080/22041 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-09 05:12:32 +00:00
Steve Carlin	453c9519bc	IMPALA-13482: Bug fixes for lag/coalesce in analytic function. The following SQL query in analytics.test ... select lag(coalesce(505, 1 + NULL), 1) over (order by int_col desc) from alltypestiny ... had a couple of issues 1) The coalesce function needed a special operator. This function derives its return type from a common type that works for all parameters. 2) The function was not being saved when being reset. This is needed for when resetAnalysisState is called. 3) createNullLiteral needed to be overriden for similar reasons. The null literal type needs to be saved for when resetAnalysisState is called. Change-Id: Ic54d955a73cec4b5f421099a74df4172a1b7dd8b Reviewed-on: http://gerrit.cloudera.org:8080/22024 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-08 01:27:21 +00:00
Joe McDonnell	d0423a83ef	IMPALA-13495: Make exceptions from the Calcite planner easier to classify This makes several changes to the Calcite planner to improve the generated exceptions when there are errors: 1. When the Calcite parser produces SqlParseException, this is converted to Impala's regular ParseException. 2. When the Calcite validation fails, it produces a CalciteContextException, which is a wrapper around the real cause. This converts these validation errors into AnalysisExceptions. 3. This produces UnsupportedFeatureException for non-HDFS table types like Kudu, HBase, Iceberg, and views. It also produces UnsupportedFeatureException for HDFS tables with complex types (which otherwise would hit ClassCastException). 4. This changes exception handling in CalciteJniFrontend.java so it does not convert exceptions to InternalException. The JNI code will print the stacktrace for exceptions, so this drops the existing call to print the exception stack trace. Testing: - Ran some end-to-end tests with a mode that continues past failures and examined the output. Change-Id: I6702ceac1d1d67c3d82ec357d938f12a6cf1c828 Reviewed-on: http://gerrit.cloudera.org:8080/21989 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-11-07 23:25:13 +00:00
Steve Carlin	e8e8a5150e	IMPALA-13441: Support explain statements in Impala planner This adds support for explain statements for the Calcite planner. This also fixes up the Parser.jj file so that statements not processed by Calcite planner will fail, like "describe" and other non-select statements. The parser will now only handle "select" and "explain" as the first keyword. If the parser fails, we need to do an additional check within Impala. We run the statement through the Impala parser and check the statement type. If the statement type is anything other than SelectStmt, we run the query within the original Impala planner. If it is a SelectStmt, we fail the query because we want all select statements to go through the Calcite parser. Change-Id: Iea6afaa1f1698a300ad047c8820691cf7e8eb44b Reviewed-on: http://gerrit.cloudera.org:8080/21923 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-02 02:54:25 +00:00
Steve Carlin	33631047f8	IMPALA-13468: Fix various aggregation issues in aggregation.test Various small issues fixed including: - There is a special operator dedicated to scalar functions not handled in Calcite. The special agg operator equivalent was created (ImpalaAggOperator) - Grouping_id function needs to be handled in a special way, calling AggregateFunction.createRewrittenFunction - A custom Avg operator was created to handle avg(TIMESTAMP) which isn't allowed in Calcite. - A custom Min/Max operator was created to handle min(NULL) and min(char types). - The corr, covar_pop, and covar_samp functions use the default Impala function resolver rather than the Calcite resolver. Change-Id: I038127d6a2f228ae8d263e983b1906e99ae05f77 Reviewed-on: http://gerrit.cloudera.org:8080/21961 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-10-31 18:06:02 +00:00
Steve Carlin	3cf05fe21a	IMPALA-13461: Added rules to make tpcds queries work. Among the rules that were added: The "Minus" RelNode is not handled directly by the physical node translator and is changed into other nodes that are handled. This was added in the ImpalaMinusToDistinctRule The ExtractLiteralAgg rule compensates for the fact that a literal value cannot be used directly in an agg. The CalciteRelNodeConverter handles breaking down a SubQuery RelNode into simpler RelNodes that can be optimized. The pom.xml file was also changed. There is a java bug in java 8 that causes incremental compiles to fail. So we do a full compile for the Calcite planner now. Change-Id: I03a38aaa5c413b9b4d2f4c179de07935b672a031 Reviewed-on: http://gerrit.cloudera.org:8080/21941 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-10-21 18:02:32 +00:00
Steve Carlin	7b109ca45f	IMPALA-13457: Calcite planner: fix datetime/interval issues for tpcds queries Added support for datetime interval operators. A dummy IntervalExpr was added to support logical to physical representation. The IntervalExpr is removed once the actual "+" operation for datetime is created, but the translation framework is simplified by working on the Calcite interval type in a separate temporary Expr. Some logic was inserted into CoerceOperandShuttle. This object is used to add casts when there is limitation in Calcite for number types. When we have an interval, we always want to represent it as a special type so we do not want to introduce a cast. Also, in ImpalaOperatorTable, the ability to use Impala functions over Calcite functions are needed because Calcite translates the year function into "extract" which causes some issues. Just using the Impala signature works for us here. Change-Id: I2b4afc3ab1d17ba1f168904a6ded052e1d62b3fe Reviewed-on: http://gerrit.cloudera.org:8080/21946 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-21 18:00:37 +00:00
Daniel Becker	b05b408f17	IMPALA-13247: Support Reading Puffin files for the current snapshot This change adds support for reading NDV statistics from Puffin files when they are available for the current snapshot. Puffin files or blobs that were written for other snapshots than the current one are ignored. Because this behaviour is different from what we have for HMS stats and may therefore be unintuitive for users, reading Puffin stats is disabled by default; set the "--disable_reading_puffin_stats" startup flag to false to enable it. When Puffin stats reading is enabled, the NDV values read from Puffin files take precedence over NDV values stored in the HMS. This is because we only read Puffin stats for the current snapshot, so these values are always up-to-date, while the values in the HMS may be stale. Note that it is currently not possible to drop Puffin stats from Impala. For this reason, this patch also introduces two ways of disabling the reading of Puffin stats: - globally, with the aforementioned "--disable_reading_puffin_stats" startup flag: when it is set to true, Impala will never read Puffin stats - for specific tables, by setting the "impala.iceberg_disable_reading_puffin_stats" table property to true. Note that this change is only about reading Puffin files, Impala does not yet support writing them. Testing: - created the PuffinDataGenerator tool which can generate Puffin files and metadata.json files for different scenarios (e.g. all stats are in the same Puffin file; stats for different columns are in different Puffin files; some Puffin files are corrupt etc.). The generated files are under the "testdata/ice_puffin/generated" directory. - The new custom cluster test class 'test_iceberg_with_puffin.py::TestIcebergTableWithPuffinStats' uses the generated data to test various scenarios. - Added custom cluster tests that test the 'disable_reading_puffin_stats' startup flag. Change-Id: I50c1228988960a686d08a9b2942e01e366678866 Reviewed-on: http://gerrit.cloudera.org:8080/21605 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-19 22:14:59 +00:00
Steve Carlin	ff7a553324	IMPALA-13462: Added support for functions used in tpcds The tpcds queries contain some functions that require some modifications that the general function resolver cannot handle. These include: - Some functions don't have the same name within Calcite. An example of this is "is_not_null" which is "is_not_null_pred" in Impala. - The grouping function returns a tinyint in Impala which is different from Calcite. - The params for functions that adjust the scale (e.g. ROUND) need to handle casting of parameters in the Impala way which is different from Calcite. Also handled in this commit is turning on the identifier expansion in the Calcite validator. This is needed to fix some of the tpcds queries as well. Change-Id: Id451357f2fb92d35e09b100751f0f4a49760a51c Reviewed-on: http://gerrit.cloudera.org:8080/21947 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-19 02:31:02 +00:00
Steve Carlin	d2cb00cece	IMPALA-13456: Calcite planner: fix issues with quotes Changed the parser to use quotes that are inline with how Impala treats quotes, including allowing single quotes, double quotes, and back ticks for aliases, and also allowing the backslash to be used as an escape character. This is inline with what BigQuery uses in Calcite. A couple of unit tests were added, but these will be tested more extensively by the ParserTest frontend unit test when that gets committed. Also, added the VALUE as a nonreserved keyword which is used in the tpcds queries (along with the doublequotes) Change-Id: I67ebb19912714c240b99a42d9f2f02f78c189350 Reviewed-on: http://gerrit.cloudera.org:8080/21942 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-10-18 17:13:43 +00:00
Steve Carlin	9008e1bfcb	IMPALA-13459: Handle duplicate table in same query WHen there are 2 references of the same table in a query, there needs to be a unique alias name used within the TableRef object. Code has been added to generate an alias. IMPALA-13460 has been filed because we should use the user provided alias name rather than a generated alias name. This is a little more difficult to implement because Calcite has a limitation in that their table object at validation time is equivalent to a FeTable in that there is only one object for the multiple tables. In order to fix IMPALA-13460, there is a Calcite bug that has to be fixed. We'd have to generate our own TableScan object underneath their LogicalTableScan that would hold an alias. This TableScan can be generated through their RelBuilder Factory object. But the current code creates the LogicalTableScan directly rather than go through a factory, so that would need to be fixed first. There are no unit tests attached to this Jira, but there are some tpcds queries that will start working when this gets committed. Change-Id: Ib9997bc642c320c2e26294d7d02a05bccbba6a0d Reviewed-on: http://gerrit.cloudera.org:8080/21945 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-10-18 17:12:45 +00:00
Steve Carlin	b195b12cab	IMPALA-13455: Put expressions in CNF form for performance. The optimizer now runs a rule to put expressions in conjunctive normal form. This commit will allow tpcds and tpch tests to run through without hanging, specifically queries 13 and 48. Change-Id: Iceca22f3b2d2b59ab21591f21c07650bbd8efb3c Reviewed-on: http://gerrit.cloudera.org:8080/21938 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-10-17 16:01:01 +00:00
Steve Carlin	42e3f22a68	IMPALA-13425: Iceberg tables crash server with Calcite planner This commit will prevent the server from crashing by catching Iceberg tables at compile time in the Calcite planner and failing the query. Change-Id: I4b7c889d4aa61888feeacb29366ec559000a943e Reviewed-on: http://gerrit.cloudera.org:8080/21902 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-17 00:09:48 +00:00
Steve Carlin	36d341c431	IMPALA-13430: Too many RelNodes created for "IN" literals The "withCreateValuesRel" false config parameter causes a "value" node to be created for every literal in an "in" clause. This slows down the compilation time and runtime massively. By removing this parameter (using the 'true' default), all literal values are placed within one Values RelNode. Changing this parameter exposed a bug in tpcds q8. The CoerceNodes module explicitly creates a Project node above a Values node when the values node contains a string literal. Unfortunately, a Calcite limitation prevents the string literal to be of type "string" but instead is of type "char(x)". Because of this limitation this Project hack was created. When converting Calcite RelNodes to Impala RelNodes, we "notify" the Values RelNode that it should ignore the row datatypes of the current Values RelNode and instead use the parent row datatypes. Change-Id: Ifc3d84c70af9cd4db44359c4ab7f0c9eb70738f5 Reviewed-on: http://gerrit.cloudera.org:8080/21911 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-10-16 18:12:36 +00:00
Joe McDonnell	b072f29552	IMPALA-13446: Bump CDP GBN to 58457853 to get Ranger improvements This updates the GBN to include the RANGER-4585 functionality to support multi-column policy creation. IMPALA-12554 will use this to create a single multi-column policy for a GRANT statement rather than many single-column policies. This fixes a few issues encountered during the upgrade: 1. This includes the fix for IMPALA-13433 to make test_sfs.py resilient to HMS versions that do not properly create the database directories. 2. This modified test_metadata_query_statements.py to use unique directories for the databases to avoid HMS bugs. 3. The version of Avro changed, which changed the version of Jackson and the package name of the JsonParseException. This adds code to tolerate both the old and new package name in the error message. 4. This includes the fix for IMPALA-13391 to exclude log4j-slf4j-impl from hadoop-cloud-storage. 5. This excludes an unnecessary org.cloudera.logredactor dependency. Testing: - Ran a core job Change-Id: I32727020a69a66c3af4f4096fe15bc81600e2215 Reviewed-on: http://gerrit.cloudera.org:8080/21921 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-16 16:35:23 +00:00
Michael Smith	4c3b5f94f1	IMPALA-13393: Remove old javax.el config Pinning javax.el was done when Impala still used Sentry. That was removed in IMPALA-9708, and Hbase now explicitly depends on a specific version. So this pin is no longer relevant. Change-Id: I5be3eeeacf2f6fb04bc5106902e1d11b3886d844 Reviewed-on: http://gerrit.cloudera.org:8080/21827 Tested-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-10-15 23:38:44 +00:00
Steve Carlin	32cec3fd80	IMPALA-13429: Calcite planner crashing with outer join The Calcite planner was crashing when there was an outer join and there was a conjunct that compared two columns within the same table. This conjunct needs to be place in "other" conjuncts rather than "equi" conjuncts. Change-Id: I4ae2d257fa58f3a58079b6aa551c32ffda7d28cf Reviewed-on: http://gerrit.cloudera.org:8080/21908 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-10-15 20:09:23 +00:00
Steve Carlin	4289fa1a16	IMPALA-13197: (part 2) Added Analytic Expressions to Calcite Planner This commit contains fixes on top of the analytic expressions which fixes some of the tests in analytic-fns.test. The fixes include: - The AnalyzedAnalyticExpr object now calls "standardize" on AnalyticExpr which mutates AnalyticExpr into its final compiled form. - Added handling for sum_init_zero which is produced by Calcite. Note: this is only supported in Impala for BIGINT. An implementation is needed for Decimal and double (IMPALA-13435) - CastExpr needs to be analyzed. There is a quirk in the current Impala implementation that the parameters for CastExpr are not re-analyzed. So an explicit analyze is done when a CastExpr is encountered. - AnalyticExprs allow "count" with zero parameters - Certain analytic expressions use default window functions. The Calcite window operations will be ignored for these functions. Change-Id: I56529b13c545cdc9f96dd1c3bea9ef676e8c2755 Reviewed-on: http://gerrit.cloudera.org:8080/21897 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-10-10 21:00:20 +00:00
Steve Carlin	e6825b7ed4	IMPALA-13197: Implement Analytic Exprs for Calcite An analytic expression is represented with a RexOver type RexNode within Calcite. They will exist within the Project RelNode. If there are any RexOvers existing within the Project, then the ImpalaAnalyticRel RelNode gets created instead of the ImpalaProjectRel. Only bare bones test cases are included. There are quite a number of analytic expressions that will not work. The logic is included in the AnalyticExpr.standardize() method. Another commit will be needed to support all general analytic expressions and the tests within Impala will be used for testing purposes. Change-Id: Iba5060546a7568ba0cd315f546daa78d89b1c3c5 Reviewed-on: http://gerrit.cloudera.org:8080/21565 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-10-10 20:59:40 +00:00
Steve Carlin	3f0a5cc93b	IMPALA-13022: Added infrastructure for implicit casting of functions Ideally, at validation time, we would like Calcite to coerce function parameters and generate the proper return types that match Impala functions. Currently, Calcite comes close to hitting these needs, and any function that passes Calcite validation will be a valid function. However, while some of the types are close, we want the output of the Calcite optimizer to contain methods where all parameter types and return types exactly match what is expected by Impala at runtime. The bulk of this commit is to address this issue. There are a couple of other parts to this commit added that rely on proper types as well. This will be described as well further down in the commit message. The 'fixing' of the parameters and return types is done after optimization and before the Calcite RelNodes are translated into Impala RelNodes. The code responsible for this is under the coercenodes directory. The entry point for fixing these nodes is the method CoerceNodes.coerceNodes() which takes a RelNode input and produces a RelNode output. This method will journey through the RelNodes bottom up because a RelNode must be created with its inputs, so it makes sense to fix the inputs first. One problem within Calcite is how it generates RexLiterals. For the literal number 2, Calcite will generate an INTEGER type. However, the Impala output for this is a TINYINT. Another Calcite issue is for a string literal such as 'hello'. Calcite will generate a CHAR(5) type whereas Impala generates a string. These inconsistencies cause Impala to crash in certain situations. For instance, the query "select 'hello' union select 'goodbye'" generates a union with 2 input nodes. If we were to use the Calcite definitions, one would have type of char(5) and the other would have type of char(7), which would cause a crash. Also, eventually when CTAS statements are implemented, we need the types to match Impala's native type. Most of the Calcite RelNode types need some sort of correction due to these issues. Join, Filter and Project nodes may have expressions that need fixing. The CoerceOperandShuttle class helps navigating through the RexNode function structure for these. The Aggregate class may require an underlying Project class where the explanation is detailed in the processAggNode method. The Union node also will generate underlying Project nodes for casting. The Values node creates a Project above itself since Calcite will not allow generation of a RexLiteral of string type directly, so it needs to be cast. In addition to these changes, other changes that relied on casting issues have been added. The or/and/case operator is now supported. 'Or' and 'and' worked before, but not if there were more than 2 or/and conditions grouped together. Case requires all return types to match and required some special logic. These functions required a little extra support since they have variable arguments. Change-Id: I13a349673f185463276ad7ddb83f0cfc2d73218c Reviewed-on: http://gerrit.cloudera.org:8080/21335 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-09-25 20:45:12 +00:00
Steve Carlin	e3583cb200	IMPALA-13043: Implement Join Capability to the Calcite Planner This commit adds the ability to handle joins in the Calcite planner. Some items worth noting: There is extra handling in the ImpalaJoinRel class to deal with outer joins. The AnalyzedTupleIsNullExpr object is needed for processing which derives from TupleIsNullExpr. Normally, expressions are created in the CreateExprVisitor, but the join requires that the TupleIsNullExpr object is wrapped around the expressions retrieved from the inputs. The execution engine requires separation of the equijoin conditions and the non-equijoin conditions. Furthermore, the equijoin conditions are BinaryCompPredicates instead of normal FunctionCallExprs, so the AnalyzedBinaryCompExpr class had to be created. Special processing needed to be coded for runtime filter generators. The conditions needed to be added to the value transfer graph in order to enable the Impala planner logic to create and push these generators. The join also required some rules to be added to the optimizer. If a join is done through the "ON" clause, Calcite is able to place the join condition directly in the Join RelNode. However, if it is in the "WHERE" clause, Calcite creates a Filter RelNode and creates a cross join RelNode object. Therefore, in order to handle "WHERE" joins, we need to implement the rules in the optimizer. Change-Id: I5db097577907d79877f52feff2922000af074ecd Reviewed-on: http://gerrit.cloudera.org:8080/21239 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-09-05 21:15:45 +00:00
Steve Carlin	02469e723b	IMPALA-12964: Implement basic aggregation in the Calcite planner Basic aggregation functionality is now added to the Calcite planner. The implementation of aggregation was a little tricky on the conversion from the Aggregate RelNode to the Impala Agg PlanNode. The compilation in Impala requires some AggregateInfo structures which may set up multiple internal PlanNodes. Some parts of the Analyzer are used by AggregateInfo. This usage of Analyzer puts two design goals in conflict with each other, which are: 1) Remove dependency on the Analyzer since Calcite does all the parsing and validation 2) Avoid refactoring in the first major iteration of the Calcite planner. To resolve this, a SimplifiedAnalyzer class has been created which is injected into the AggregateInfo. Some methods of the Analyzer class are overridden to avoid the non-Calcite planner analysis. The SimplifiedAnalyzer overrides two aspects of the Analyzer: 1) "Having" filter conjuncts are going to be "unassigned conjuncts". After Calcite validates and optimizes the plan, the only filter conjuncts above the aggregation will be the "having" clause, so all these conjuncts will be used in the aggregate (sidenote: optimization rules have not been pushed yet to move filters underneath the aggregate, but that will come in a push in the near future). Once the aggregate has been changed to a PlanNode, we can clear out the unassigned conjuncts. 2) Because the Aggregte PlanNodes can have multiple layers, it may be responsible for creating some TupleDescriptors and SlotDescriptors for these PlanNodes. The SlotDescriptors need to be "materialized". In the non-Calcite planner, this is done through its planning process. In the Calcite planner, the materialization can happen immediately when the PlanNode is created. So the "addSlotDescriptor" is overridden to call the parent, but then to immediately materialize the SlotDescriptor. The rest of the ImpalaAggRel is hopefully self-explanatory. The groups, aggregates, and grouping sets are extracted from the RelNodes and used in the PlanNodes. The logic to set up multiple PlanNodes and the creation of MultiAggregateInfo and AggregateInfo objects are similar to what is used in the non-Calcite planner. Change-Id: Iacf0de8ba11f0d31d73d624f0c9a91db9997cfd5 Reviewed-on: http://gerrit.cloudera.org:8080/21238 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-28 01:08:33 +00:00
Joe McDonnell	ae6a3b9ec0	IMPALA-13082: Use separate versions for jackson vs jackson-databind Sometimes there is a jackson-databind patch release without a corresponding release of other jackson libraries. For example, there is a jackson-databind 2.12.7.1, but jackson-core does not have an artifact with that version. To handle these scenarios, it is useful to have a separate version for jackson-databind vs other jackson libraries. This introduces IMPALA_JACKSON_VERSION (which currently matches IMPALA_JACKSON_DATABIND_VERSION) and uses this for non-databind jackson libraries. Testing: - Ran a local build Change-Id: I3055cb47986581793d947eaedb6a24b4dd92e3a6 Reviewed-on: http://gerrit.cloudera.org:8080/21719 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-08-26 22:52:25 +00:00
Steve Carlin	9848cd84be	IMPALA-12954: Implement Sorting capability for Calcite planner The Sort RelNode is now supported. This includes limit an offset features as well. A minor bit of code was copied from the original planner, but it just involved decision making on which Sort PlanNode to call, so this probably doesn't need to be refactored. Change-Id: I747e107ed996862ef348f829deee47f0c0fc78d5 Reviewed-on: http://gerrit.cloudera.org:8080/21237 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-20 15:41:06 +00:00
Yubi Lee	ba67660b3a	IMPALA-10408: Support build using Apache components Apache Impala uses many CDP components to build it. This patch provides a way to support building Apache Impala using Apache components. Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 Reviewed-on: http://gerrit.cloudera.org:8080/18977 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-19 17:36:05 +00:00
Michael Smith	65927f4ba3	IMPALA-13301: Upgrade aircompressor to 0.27 Upgrades io.airlift.aircompressor to 0.27 to address CVE-2024-36114. Aircompressor is a dependency of Orc, however we tend to upgrade Orc more deliberately and synchronize C++ and Java upgrades. Aircompressor upgrades in Orc did not require any code changes, so manage this dependency directly to address the CVE. Change-Id: I6c56daa61d5ecbcb3a5f7fbd0665043bb49b469f Reviewed-on: http://gerrit.cloudera.org:8080/21677 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-16 03:01:58 +00:00
Steve Carlin	b028da18c9	IMPALA-12947: Implement Calcite Planner Union and Value RelNodes This commit handles Union and Value RelNode operators The Union RelNode is created within Calcite when there is a "union" clause. The Values RelNode is created when the lowest level does not come from a table, but instead comes from constant values. For example, the query "select 3" would create a Values RelNode with one literal value of 3. The PlanNode creation simulates what already exists within the Impala planner. There is no corresponding Values PlanNode. Instead, a Union is created with the values expression serving as inputs expressions (hence the reason of combining these 2 RelNodes in the same commit). Other plan nodes used are the "SelectNode" and the "EmptySetNode". The EmptySetNode is used where there are no rows coming from the value node. While this cannot be simulated at this point, this will be needed when we start introducing optimization rules, and will be tested when we turn on the Impala test framework queries. The SelectNode is used for functions that are applied on top of the UnionNode. There is a major issue with this iteration of Union and Value nodes due to a Calcite issue. Calcite currently treats all string literals as "CHAR" type. This causes problems in the union operator if one tries to implement the following query: "select 'a' union select 'ab'", since the 2 types in the value clauses are CHAR(1) and CHAR(2) which do not match. This would cause an exception on the server. A future commit will fix this issue. Also of concern is that Calcite treats non-bigint constant as integers only. That is, 3, 257, 65539 are all considered of type INT. This will also be fixed in a later commit. Change-Id: Ibd989dbb5cf0df0fcc88f72dd579ce4fd713f547 Reviewed-on: http://gerrit.cloudera.org:8080/21211 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-15 21:10:24 +00:00
Michael Smith	22b59d27d0	IMPALA-13243: Update Dropwizard Metrics to 4.2.x Updates Dropwizard Metrics components to the latest 4.2.x release, 4.2.26. We directly use metrics-core, and metrics-jvm/metrics-json are imported via Hive (via https://github.com/joshelser/dropwizard-hadoop-metrics2). Dropwizard Metrics manually tested with these versions on https://github.com/joshelser/dropwizard-hadoop-metrics2/pull/8. Change-Id: Ie9bec7a7c23194604430531bd83b25c5969e888e Reviewed-on: http://gerrit.cloudera.org:8080/21599 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-07-23 05:22:59 +00:00
Steve Carlin	4c00cbff7e	IMPALA-13136: Refactor AnalyzedFunctionCallExpr (for Calcite) The analyze method is now called after the Expr is constructed. This code is more in line with the existing way that Impala constructs the Expr object. Change-Id: Ideb662d9c7536659cb558bf62baec29c82217aa2 Reviewed-on: http://gerrit.cloudera.org:8080/21525 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-06-20 17:14:04 +00:00
Steve Carlin	a6db27850a	IMPALA-12940: Added filtering capability for Calcite planner The Filter RelNode is now handled in the Calcite planner. The parsing and analysis is done by Calcite so there were no changes added to that portion. The ImpalaFilterRel class was created to handled the conversion of the Calcite LogicalFilter to create a filter condition within the Impala plan nodes. There is no explicit filter plan node in Impala. Instead, the filter condition attaches itself to an existing plan node. The filter condition gets passed into the children plan nodes through the ParentPlanRelContext. The ExprConjunctsConverter class is responsible for creating the filter Expr list that is used. The list contains separate AND conditions that are on the top level. Change-Id: If104bf1cd801d5ee92dd7e43d398a21a18be5d97 Reviewed-on: http://gerrit.cloudera.org:8080/21498 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-06-19 19:09:47 +00:00
Steve Carlin	141f38197b	IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-06-07 17:57:14 +00:00
Zoltan Borok-Nagy	1324a6e6c9	IMPALA-13108: Update version to 4.5.0-SNAPSHOT Updated IMPALA_VERSION in impala-config.sh Executed the followings for Java: cd java mvn versions:set -DnewVersion=4.5.0-SNAPSHOT Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244 Reviewed-on: http://gerrit.cloudera.org:8080/21460 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-29 23:47:05 +00:00
Sai Hemanth Gantasala	68f8a6a1df	IMPALA-12607: Bump the GBN and fetch events specific to the db/table from the metastore Bump the GBN to 49623641 to leverage HIVE-27499, so that Impala can directly fetch the latest events specific to the db/table from the metastore, instead of fetching the events from metastore and then filtering in the cache matching the DbName/TableName. Implementation Details: Currently when a DDL/DML is performed in Impala, we fetch all the events from metastore based on current eventId and then filter them in Impala which can be a bottleneck if the events count is huge. This can be optimized by including db name and/or table name in the notification event request object and then filter by event type in impala. This can provide performance boost on tables that generate a lot of events. Note: Also included ShowUtils class in hive-minimal-exec jar as it is required in the current build version Testing: 1) Did some tests in local cluster 2) Added a test case in MetaStoreEventsProcessorTest Change-Id: I6aecd5108b31c24e6e2c6f9fba6d4d44a3b00729 Reviewed-on: http://gerrit.cloudera.org:8080/20979 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-10 05:47:28 +00:00
Steve Carlin	2a3ce2071b	IMPALA-12934: Added Calcite parsing files to Impala Adding the framework to create our own parsing syntax for Impala using the base Calcite Parser.jj file. The Parser.jj file here was grabbed from Calcite 1.36. So with this commit, we are using the same parsing analysis as Calcite 1.36. Any changes made on top of the Parser.jj file or the config.fmpp file in the future are Impala specific changes, so a diff can be done from this commit to see all the Impala parsing changes. The config.fmpp file was grabbed from Calcite 1.36 default_config.fmpp. The Calcite intention of the config.fmpp file is to allow markup of variables in the Parser.jj file. So it is always preferable to modify the default_config.fmpp file when possible. Our version is grabbed from https://github.com/apache/calcite/blob/main/core/src/main/codegen/config.fmpp and slightly modified with the class name to make it compile for Impala. There's no unit test needed since there is no functional change. The Calcite planner will eventually make changes in the ".jj" file to support the differences between the Impala parser and the Calcite parser. Change-Id: If756b5ea8beb85661a30fb5d029e74ebb6719767 Reviewed-on: http://gerrit.cloudera.org:8080/21194 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-05-09 01:12:45 +00:00
Peter Rozsa	7ad9400656	IMPALA-13044: Upgrade bouncycastle to 1.78 This patch upgrades bouncycastle to 1.78. As of bouncycastle:1.71, the -jdk15on artifact is no longer available, the artifact is changed to -jdk18on. Tests: - core tests ran Change-Id: I8372916ab79b863e7a07d22e8333abd54492fa29 Reviewed-on: http://gerrit.cloudera.org:8080/21371 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-03 00:09:15 +00:00
Joe McDonnell	d09c502490	IMPALA-13049: Add dependency management for log4j2 to use 2.18.0 Currently, there is no dependency management for the log4j2 version. Impala itself doesn't use log4j2. However, recently we encountered a case where one dependency brought in log4-core 2.18.0 and another brought in log4j-api 2.17.1. log4j-core 2.18.0 relies on the existence of the ServiceLoaderUtil class from log4j-api 2.18.0. log4j-api 2.17.1 doesn't have this class, which causes class not found exceptions. This uses dependency management to set the log4j2 version to 2.18.0 for log4j-core and log4j-api to avoid any mismatch. Testing: - Ran a local build and verified that both log4j-core and log4j-api are using 2.18.0. Change-Id: Ib4f8485adadb90f66f354a5dedca29992c6d4e6f Reviewed-on: http://gerrit.cloudera.org:8080/21379 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-01 02:37:49 +00:00
Steve Carlin	b39cd79ae8	IMPALA-12872: Use Calcite for optimization - part 1: simple queries This is the first commit to use the Calcite library to parse, analyze, and optimize queries. The hook for the planner is through an override of the JniFrontend. The CalciteJniFrontend class is the driver that walks through each of the Calcite steps which are as follows: CalciteQueryParser: Takes the string query and outputs an AST in the form of Calcite's SqlNode object. CalciteMetadataHandler: Iterate through the SqlNode from the previous step and make sure all essential table metadata is retrieved from catalogd. CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer. CalciteRelNodeConverter: Change the AST into a logical plan. In this first commit, the only logical nodes used are LogicalTableScan and LogicalProject. The LogicalTableScan will serve as the node that reads from an Hdfs Table and the LogicalProject will only project out the used columns in the query. In later versions, the LogicalProject will also handle function changes. CalciteOptimizer: This step is to optimize the query. In this cut, it will be a nop, but in later versions, it will perform logical optimizations via Calcite's rule mechanism. CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into Impala's PlanNode physical tree ExecRequestCreator: Implement the existing Impala steps that turn a Single Node Plan into a Distributed Plan. It will also create the TExecRequest object needed by the runtime server. Only some very basic queries will work with this commit. These include: select * from tbl <-- only needs the LogicalTableScan select c1 from tbl <-- Also uses the LogicalProject In the CalciteJniFrontend, there is some basic checks to make sure only select statements will get processed. Any non-query statement will revert back to the current Impala planner. In this iteration, any queries besides the minimal ones listed above will result in a caught exception which will then be run through the current Impala planner. The tests that do work can be found in calcite.test and run through the custom cluster test test_experimental_planner.py This iteration should support all types with the exception of complex types. Calcite does not have a STRING type, so the string type is represented as VARCHAR(MAXINT) similar to how Hive represents their STRING type. The ImpalaTypeConverter file is used to convert the Impala Type object to corresponding Calcite objects. Authorization is not yet working with this current commit. A Jira has been filed (IMPALA-13011) to deal with this. Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Reviewed-on: http://gerrit.cloudera.org:8080/21109 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-04-25 20:09:09 +00:00
wzhou-code	fc74ca672a	IMPALA-12378: Auto Ship JDBC Data Source This patch moves the source files of jdbc package to fe. Data source location is optional. Data source could be created without specifying HDFS location. Assume data source class is in the classpath and instance of data source class could be created with current class loader. Impala still try to load the jar file of the data source in runtime if it's set in data source location. Testing: - Passed core test - Passed dockerised-tests Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f Reviewed-on: http://gerrit.cloudera.org:8080/20971 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-02-07 16:29:11 +00:00
gaurav1086	f7a43b18aa	IMPALA-12503: Support date data type for predicates for external data source table This patch adds support for datatype date as predicates for external data sources. Testing: - Added tests for date predicates with operators: '=', '>', '<', '>=', '<=', '!=', 'BETWEEN'. Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Reviewed-on: http://gerrit.cloudera.org:8080/20915 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-05 21:28:00 +00:00
Csaba Ringhofer	c14156eb3a	IMPALA-12746: Bump jackson.databind to 2.15.3 Also sets dependencyManagement to force using the same version for jackson-databind, jackson-core and jackon-annotations. This is needed because datagenerator depends on kitesdk, which would pull in a very old jackson-core version (2.3.1) and lead to build failures with the newer jackson.databind. Change-Id: I8440426da1395045cf149aca0044286015861e5f Reviewed-on: http://gerrit.cloudera.org:8080/20914 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-24 15:13:36 +00:00
wzhou-code	f8e8cd0906	IMPALA-12642: Support query options for Impala external JDBC table This patch uses JDBC connection string to apply query options to the Impala server by setting the properties in "jdbc.properties" when creating JDBC external DataSource table. jdbc.properties are specified as comma-delimited key=value string, like "MEM_LIMIT=1000000000, ENABLED_RUNTIME_FILTER_TYPES=\"BLOOM,MIN_MAX\"". Fixed Impala to allow value of ENABLED_RUNTIME_FILTER_TYPES to have double quotes in the beginning and ending of string. jdbc.properties can be used for other databases like Postgres and MySQL to set additional properties. The test cases will be added in separate patch. Testing: - Added end-to-end tests for setting query options on Impala JDBC tables. - Passed core tests. Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Reviewed-on: http://gerrit.cloudera.org:8080/20837 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-17 23:16:42 +00:00
Gaurav Singh	9a132bc436	IMPALA-12380: Securing dbcp.password for JDBC external data source In the current implementation of external JDBC data source, the user has to provide both the username and password in plain text which is not a good practice. This patch extends the functionality of existing implementation to either provide: a) username and password b) username or key and keystore If the user provides the password, then that password is used. However, if no password is provided and the user provides only the key/keystore, then it fetches the password from the secure jceks keystore. Testing: - Added unit test TestExtDataSourcesWithKeyStore Change-Id: Iec83a9b6e00456f0a1bbee747bd752b2cf9bf238 Reviewed-on: http://gerrit.cloudera.org:8080/20809 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-02 23:43:42 +00:00
wzhou-code	ec22a1e1ca	IMPALA-12502: Support Impala to Impala federation This patch adds support to read Impala tables in the Impala cluster through JDBC external data source. It also adds a new counter NumExternalDataSourceGetNext in profile for the total number of calls to ExternalDataSource::GetNext(). Setting query options for Impala will be supported in a following patch. Testing: - Added an end-to-end unit test to read Impala tables from Impala cluster through JDBC external data source. Manually ran the unit-test with Impala tables in Impala cluster on a remote host by setting $INTERNAL_LISTEN_HOST in jdbc.url as the ip address of the remote host on which an Impala cluster is running. - Added LDAP test for reading table through JDBC external data source with LDAP authentication. Manually ran the unit-test with Impala tables in a remote Impala cluster. - Passed core tests. Change-Id: I79ad3273932b658cb85c9c17cc834fa1b5fbd64f Reviewed-on: http://gerrit.cloudera.org:8080/20731 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Wenzhe Zhou <wzhou@cloudera.com>	2023-12-22 21:44:49 +00:00

1 2 3

121 Commits