impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	f7cf4f8446	IMPALA-14070: Use checkedMultiply in SortNode.java maxRowsInHeaps calculation may overflow because it use simple multiplication. This patch fix the bug by calculating it using checkedMultiply(). A broader refactoring will be done by IMPALA-14071. Testing: Add ee tests TestTopNHighNdv that exercise the issue. Change-Id: Ic6712b94f4704fd8016829b2538b1be22baaf2f7 Reviewed-on: http://gerrit.cloudera.org:8080/22896 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-14 04:11:48 +00:00
Surya Hebbar	7ad7a86c0e	IMPALA-13624: Implement textual representation for aggregate event sequences This adds support for a summarized textual representation of timestamps for the event sequences present in the aggregated profile. With the verbose format present in profile V1 and V2, it becomes difficult to analyze an event's timestamps across instances. The event sequences are now displayed in a histogram format, based on the number of timestamps present, in order to support an easier view for skew analysis and other possible use cases. (i.e. based on json_profile_event_timestamp_limit) The summary generated from aggregated instance-level timestamps (i.e. IMPALA-13304) is used to achieve this within the profile V2, which covers the possbility of missing events. Example, Verbosity::DEFAULT json_profile_event_timestamp_limit = 5 (default) Case #1, Number of instances exceeded limit Node Lifecycle Event Timeline Summary : - Open Started (4s880ms): Min: 2s312ms, Avg: 3s427ms, Max: 4s880ms, Count: 12 HistogramCount: 4, 4, 0, 0, 4 Case #2, Number of instances within the limit Node Lifecycle Event Timeline: - Open Started: 5s885ms, 1s708ms, 3s434ms - Open Finished: 5s885ms, 1s708ms, 3s435ms - First Batch Requested: 5s885ms, 1s708ms, 3s435ms - First Batch Returned: 6s319ms, 2s123ms, 3s570ms - Last Batch Returned: 7s878ms, 2s123ms, 3s570ms With Verbosity::EXTENDED or more, all events and timestamps are printed with full verbosity as before. Tests: For test_profile_tool.py, updated the generated outputs for text and JSON profiles. Change-Id: I4bcc0e2e7fccfa8a184cfa8a3a96d68bfe6035c0 Reviewed-on: http://gerrit.cloudera.org:8080/22245 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-14 00:21:54 +00:00
Surya Hebbar	bc0de10966	IMPALA-14069: Factor possibility of zero timestamps in aggregated event sequences Currently, the missing event timestamps are substituted by zeros and then reported(i.e. unreported_event_instance_idxs) within event sequences of the JSON profile. See IMPALA-13555 for more details. Even with micro/nanosecond precision, some event timestamps are recorded as zeros (i.e. Prepare Finished - 0ns). The current implementation of aggregated event sequences was incorrectly considering these zeros as substituted missing timestamps. Although, these can be distinguished from missing timestamps through the exposed 'unreported_event_instance_idxs', it is more helpful to represent missing values as -ve values or constants(i.e. -1). This representation is favorable for summary and visualization, and is necessary for skipping missing values and maintaing alignment between instance timestamps. The patch also fixes null values in "info_strings" fields within the JSON profile. Fixed runtime-profile-test to consider -ve values(i.e. -1) as missing event timestamps, instead of 0. Updated the generated profiles in testdata/impala-profiles. Change-Id: I9f1efd2aad5f62084075cd8f9169ef72c66942b6 Reviewed-on: http://gerrit.cloudera.org:8080/22893 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-14 00:21:54 +00:00
Zoltan Borok-Nagy	afa329fd89	IMPALA-13931: TestIcebergRestCatalog.test_rest_catalog_basic failed at setup There were several issues with test_rest_catalog_basic which made it fail in environments that used Ozone or S3. Missing dependency of Ozone and S3 classes: * This is resolved in iceberg-rest-catalog-test/pom.xml by adding a dependency to impala-executor-deps Hadoop configuration was initialized properly: * run-iceberg-rest-server.sh used Maven to run Iceberg REST Catalog in which case Maven is in charge of setting the CLASSPATH but the core-site/ozone-site/etc. config files were not on it, so the REST Catalog used a default Hadoop configuration that wasn't good for our environment. * To overcome the CLASSPATH problem now we create a runnable JAR in iceberg-rest-catalog-test/pom.xml and also generate the proper CLASSPATH during compilation. * run-iceberg-rest-server.sh now uses java -cp to run the REST Catalog S3 builds threw NoSuchMethodException for the "create" method of ApacheHttpClientConfigurations: * The Iceberg library dynamically load its http client builders to workaround an error, see details in https://github.com/apache/iceberg/issues/6715 * So the Iceberg lib dynamically wants to load the "create" method of its own ApacheHttpClientConfigurations class but it fails with NoSuchMethodException. * The critical code is invoked from Impala's IcebergMetadataScanner's ScanMetadataTable() method which happens to be invoked through JNI from the C++ backend. * The context class loader of such threads are NULL, which means Java will use the bootstrap class loader to load classes and methods, but that doesn't have the proper resources on its classpath. * To overcome this issue we set the context class loader for the thread to the class loader that originally loaded the IcebergMetadataScanner class. Change-Id: I9dc0e30aeaff0b8de41426ba38506383b4af472c Reviewed-on: http://gerrit.cloudera.org:8080/22818 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-05-09 17:01:56 +00:00
Zoltan Borok-Nagy	04735598d6	IMPALA-13718: Skip reloading Iceberg tables when metadata JSON file is the same With this patch Impala skips reloading Iceberg tables when metadata JSON file is the same, as this means that the table is essentially unchanged. This can help in situations when the event processor is lagging behind and we have an Iceberg table that is updated frequently. Imagine the case when Impala gets 100 events for an Iceberg table. In this case after processing the first event, our internal representation of the Iceberg table is already up-to-date, there is no need to do the reload 100 times. We cannot use the internal icebergApiTable_'s metadata location, as the following statement might silently refresh the metadata in 'current()': icebergApiTable_.operations().current().metadataFileLocation() To guarantee that we check against the actual loaded metadata this patch introduces a new member to store the metadata location. Testing * added e2e tests for REFRESH, also for event processing Change-Id: I16727000cb11d1c0591875a6542d428564dce664 Reviewed-on: http://gerrit.cloudera.org:8080/22432 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>	2025-05-09 11:37:01 +00:00
Riza Suminto	3210ec58c5	IMPALA-14006: Bound max_instances in CreateInputCollocatedInstances IMPALA-11604 (part 2) changes how many instances to create in Scheduler::CreateInputCollocatedInstances. This works when the left child fragment of a parent fragment is distributed across nodes. However, if the left child fragment instance is limited to only 1 node (the case of UNPARTITIONED fragment), the scheduler might over-parallelize the parent fragment by scheduling too many instances in a single node. This patch attempts to mitigate the issue in two ways. First, it adds bounding logic in PlanFragment.traverseEffectiveParallelism() to lower parallelism further if the left (probe) side of the child fragment is not well distributed across nodes. Second, it adds TQueryExecRequest.max_parallelism_per_node to relay information from Analyzer.getMaxParallelismPerNode() to the scheduler. With this information, the scheduler can do additional sanity checks to prevent Scheduler::CreateInputCollocatedInstances from over-parallelizing a fragment. Note that this sanity check can also cap MAX_FS_WRITERS option under a similar scenario. Added ScalingVerdict enum and TRACE log it to show the scaling decision steps. Testing: - Add planner test and e2e test that exercise the corner case under COMPUTE_PROCESSING_COST=1 option. - Manually comment the bounding logic in traverseEffectiveParallelism() and confirm that the scheduler's sanity check still enforces the bounding. Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3 Reviewed-on: http://gerrit.cloudera.org:8080/22840 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-07 03:34:15 +00:00
Riza Suminto	cb496104d9	IMPALA-14027: Implement HS2 NULL_TYPE using TStringValue HS2 NULL_TYPE should be implemented using TStringValue. However, due to incompatibility with Hive JDBC driver implementation then, Impala choose to implement NULL type using TBoolValue (see IMPALA-914, IMPALA-1370). HIVE-4172 might be the root cause for such decision. Today, the Hive JDBC (org.apache.hive.jdbc.HiveDriver) does not have that issue anymore, as shown in this reproduction after applying this patch: ./bin/run-jdbc-client.sh -q "select null" -t NOSASL Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl Executing: select null ----[START]---- NULL ----[END]---- Returned 1 row(s) in 0.343s Thus, we can reimplement NULL_TYPE using TStringValue to match HiveServer2 behavior. Testing: - Pass core tests. Change-Id: I354110164b360013d9893f1eb4398c3418f80472 Reviewed-on: http://gerrit.cloudera.org:8080/22852 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-06 19:41:17 +00:00
Riza Suminto	eaee02ffd4	IMPALA-14026: Migrate test files that assert Beeswax dml result. This patch migrate the remaining test file under testdata/ that verify the beeswax-specific dml result. The RESULTS section is replaced by RUNTIME_PROFILE section that validates Partition name, NumModifiedRows, and sum of NumModifiedRows if necessary. Added HS2_TYPES section where necessary to let tests pass. Testing: Pass core tests. Change-Id: I36efbe66b654a5af6c44710e55d0b755280ad3be Reviewed-on: http://gerrit.cloudera.org:8080/22846 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-05-02 20:04:00 +00:00
Eyizoha	faf322dd41	IMPALA-12927: Support specifying format for reading JSON BINARY columns Currently, Impala always assumes that the data in the binary columns of JSON tables is base64 encoded. However, before HIVE-21240, Hive wrote binary data to JSON tables without base64 encoding it, instead writing it as escaped strings. After HIVE-21240, Hive defaults to base64 encoding binary data when writing to JSON tables and introduces the serde property 'json.binary.format' to indicate the encoding method of binary data in JSON tables. To maintain consistency with Hive and avoid correctness issues caused by reading data in an incorrect manner, this patch also introduces the serde property 'json.binary.format' to specify the reading method for binary data in JSON tables. Currently, this property supports reading in either base64 or rawstring formats, same as Hive. Additionally, this patch introduces a query option 'json_binary_format' to achieve the same effect. This query option will only take effect for JSON tables where the serde property 'json.binary.format' is not set. The reading format of binary columns in JSON tables can be configured globally by setting the 'default_query_options'. It should be noted that the default value of 'json_binary_format' is 'NONE', and impala will prohibit reading binary columns of JSON tables that either have "no 'json.binary.format' set and 'json_binary_format' is 'NONE'" or "an invalid 'json.binary.format' value set", and will provide an error message to avoid using an incorrect format without the user noticing. Testing: - Enabled existing binary type E2E tests for JSON tables - Added new E2E test for 'json.binary.format' Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82 Reviewed-on: http://gerrit.cloudera.org:8080/22289 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-29 16:16:12 +00:00
Daniel Becker	c1aac4b3a4	IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views Some queries involving plain (distinct) UNIONs miss conjuncts, leading to incorrect results: Example: WITH u1 AS (select 10 a, 10 b), t AS (select a, b, min(b) over (partition by a) min_b from u1 UNION select 10, 10, 20) select t.* from t where t.b = t.min_b; Expected result: +----+----+-------+ \| a \| b \| min_b \| +----+----+-------+ \| 10 \| 10 \| 10 \| +----+----+-------+ Actual result: +----+----+-------+ \| a \| b \| min_b \| +----+----+-------+ \| 10 \| 10 \| 10 \| \| 10 \| 20 \| 10 \| +----+----+-------+ This is caused by MultiAggregateInfo assuming that conjuncts bound by grouping slots that are produced by SlotRef grouping expressions are already evaluated below the AggregationNode. However, this is not true in all cases: with UNIONs, there may be conjuncts that are unassigned below the AggregationNode. This may happen if a conjunct cannot be pushed into all operands of a UNION, because the source tuples in the operands do not contain all of the slots referenced by the predicate. In the example above, it happens in the first operand: select a, b, min(b) over (partition by a) min_b from u1 The source tuple, 'u1', contains only two slots ('a' and 'b'), but does not contain a slot corresponding to 'min(b)' - therefore the predicate 't.b = t.min_b' is not bound by the tuple of 'u1'. In theory, the predicate could still be evaluated directly after materialising the tuple with 'min(b)', still inside the UNION operand, but Impala currently does not work that way. In these cases, the conjuncts need to be evaluated in the AggregationNode (possibly in addition to some of the UNION operands). This change fixes this problem by introducing a method in MultiAggregateInfo: 'setConjunctsToKeep()', where the caller can pass a list of conjuncts that will not be eliminated. This is called during the planning of the UNION if there are unassigned conjuncts remaining. Testing: - Added a PlannerTest and an EE test for the case where a conjunct was previously incorrectly removed from the AggregationNode. - Existing tests cover the case when conjuncts can be safely removed from an AggregationNode above a UnionNode because the conjuncts are pushed into all union operands, see for example https://github.com/apache/impala/blob/6f2d9a2/testdata/workloads/functional-planner/queries/PlannerTest/union.test#L3914 Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3 Reviewed-on: http://gerrit.cloudera.org:8080/22746 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-29 02:04:55 +00:00
Riza Suminto	d8e18fac61	IMPALA-13991: Skip CROSS_JOIN rewrite if subquery is in disjunctive Inside StmtRewriter.mergeExpr() there is an optimization that set JoinOperator.CROSS_JOIN under certain scenario. This patch adds criteria to SKIP that rewrite if subquery is coming from inside disjunctive expression, regardless of joinConjunct value. If joinConjunct is NOT NULL, the inlineView maybe correlated through that joinConjunct. If joinConjunct is NULL, then expr is a (NOT) EXISTS predicate. EXISTS within a disjunct is not supported yet (see IMPALA-9931). Testing: - Add planner and query test for the corner case. Before this patch, the query return wrong result. - Fixed wrong testcase in subquery-rewrite.test. Change-Id: Iac0deb0b2fb1536684cce2e004156a20b769b9ab Reviewed-on: http://gerrit.cloudera.org:8080/22815 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-29 02:03:19 +00:00
Gabor Kaszab	3c24706c72	IMPALA-13268: Integrate Iceberg ScanMetrics into Impala query profiles When calling planFiles() on an Iceberg table, it can give us some metrics like total planning time, number of data/delete files and manifests, how many of these could be skipped etc. This change integrates these metrics into the query profile, under the "Frontend" section. These metrics are per-table, so if multiple tables are scanned for the query there will be multiple sections in the profile. Note that we only have these metrics for a table if Iceberg needs to be used for planning for that table, e.g. if a predicate is pushed down to Iceberg or if there is time travel. For tables where Iceberg was not used in planning, the profile will contain a short note describing this. To facilitate pairing the metrics with scans, the metrics header references the plan node responsible for the scan. This will always be the top level node for the scan, so it can be a SCAN node, a JOIN node or a UNION node depending on whether the table has delete files. Testing: - added EE tests in iceberg-scan-metrics.tests - added a test in PlannerTest.java that asserts on the number of metrics; if it changes in a new Iceberg release, the test will fail and we can update our reporting Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219 Reviewed-on: http://gerrit.cloudera.org:8080/22501 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2025-04-28 08:54:30 +00:00
stiga-huang	7d6fe8c6c8	IMPALA-13487: Add profile counters for memory allocation in parquet scanners This patch adds some profile counters for memory allocation and free in MemPools, which are useful to detect tcmalloc contention. The following counters are added: - Thread level page faults: TotalThreadsMinorPageFaults, TotalThreadsMajorPageFaults. - MemPool counters for tuple_mem_pool and aux_mem_pool of the scratch batch in columnar scanners: - ScratchBatchMemAllocDuration - ScratchBatchMemFreeDuration - ScratchBatchMemAllocBytes - MemPool counters for data_page_pool of ParquetColumnChunkReader - ParquetDataPagePoolAllocBytes - ParquetDataPagePoolAllocDuration - ParquetDataPagePoolFreeBytes - ParquetDataPagePoolFreeDuration - MemPool counters for the fragment level RowBatch - RowBatchMemPoolAllocDuration - RowBatchMemPoolAllocBytes - RowBatchMemPoolFreeDuration - RowBatchMemPoolFreeBytes - Duration in HdfsColumnarScanner::GetCollectionMemory() which includes memory allocation for collection values and memcpy when doubling the tuple buffer: - MaterializeCollectionGetMemTime Here is an example of a memory-bound query: Fragment Instance - RowBatchMemPoolAllocBytes: 0 (Number of samples: 0) - RowBatchMemPoolAllocDuration: 0.000ns (Number of samples: 0) - RowBatchMemPoolFreeBytes: (Avg: 719.25 KB (736517) ; Min: 4.00 KB (4096) ; Max: 4.12 MB (4321922) ; Sum: 1.93 GB (2069615013) ; Number of samples: 2810) - RowBatchMemPoolFreeDuration: (Avg: 132.027us ; Min: 0.000ns ; Max: 21.999ms ; Sum: 370.997ms ; Number of samples: 2810) - TotalStorageWaitTime: 47.999ms - TotalThreadsInvoluntaryContextSwitches: 2 (2) - TotalThreadsMajorPageFaults: 0 (0) - TotalThreadsMinorPageFaults: 549.63K (549626) - TotalThreadsTotalWallClockTime: 9s646ms - TotalThreadsSysTime: 1s508ms - TotalThreadsUserTime: 1s791ms - TotalThreadsVoluntaryContextSwitches: 8.85K (8852) - TotalTime: 9s648ms ... HDFS_SCAN_NODE (id=0): - ParquetDataPagePoolAllocBytes: (Avg: 2.36 MB (2477480) ; Min: 4.00 KB (4096) ; Max: 4.12 MB (4321922) ; Sum: 1.02 GB (1090091508) ; Number of samples: 440) - ParquetDataPagePoolAllocDuration: (Avg: 1.263ms ; Min: 0.000ns ; Max: 39.999ms ; Sum: 555.995ms ; Number of samples: 440) - ParquetDataPagePoolFreeBytes: (Avg: 1.28 MB (1344350) ; Min: 4.00 KB (4096) ; Max: 1.53 MB (1601012) ; Sum: 282.06 MB (295757000) ; Number of samples: 220) - ParquetDataPagePoolFreeDuration: (Avg: 1.927ms ; Min: 0.000ns ; Max: 19.999ms ; Sum: 423.996ms ; Number of samples: 220) - ScratchBatchMemAllocBytes: (Avg: 486.33 KB (498004) ; Min: 4.00 KB (4096) ; Max: 512.00 KB (524288) ; Sum: 1.19 GB (1274890240) ; Number of samples: 2560) - ScratchBatchMemAllocDuration: (Avg: 1.936ms ; Min: 0.000ns ; Max: 35.999ms ; Sum: 4s956ms ; Number of samples: 2560) - ScratchBatchMemFreeDuration: 0.000ns (Number of samples: 0) - DecompressionTime: 1s396ms - MaterializeCollectionGetMemTime: 4s899ms - MaterializeTupleTime: 6s656ms - ScannerIoWaitTime: 47.999ms - TotalRawHdfsOpenFileTime: 0.000ns - TotalRawHdfsReadTime: 360.997ms - TotalTime: 9s254ms The fragment instance took 9s648ms to finish. 370.997ms spent in releasing memory of the final RowBatch. The majority of the time is spent in the scan node (9s254ms). Mostly it's DecompressionTime + MaterializeTupleTime + ScannerIoWaitTime + TotalRawHdfsReadTime. The majority is MaterializeTupleTime (6s656ms). ScratchBatchMemAllocDuration shows that invoking std::malloc() in materializing the scratch batches took 4s956ms overall. MaterializeCollectionGetMemTime shows that allocating memory for collections and copying memory in doubling the tuple buffer took 4s899ms. So materializing the collections took most of the time. Note that DecompressionTime (1s396ms) also includes memory allocation duration tracked by the sum of ParquetDataPagePoolAllocDuration (555.995ms). So memory allocation also takes a significant portion of time here. The other observation is TotalThreadsTotalWallClockTime is much higher than TotalThreadsSysTime + TotalThreadsUserTime and there is a large number of TotalThreadsVoluntaryContextSwitches. So the thread is waiting for resources (e.g. lock) for a long duration. In the above case, it's waiting for locks in tcmalloc memory allocation (need off-cpu flame graph to reveal this). Implementation of MemPool counters Add MemPoolCounters in MemPool to track malloc/free duration and bytes. Note that counters are not updated in the destructor since it's expected that all chunks are freed or transfered before calling the destructor. MemPool is widely used in the code base. This patch only exposes MemPool counters in three places: - the scratch batch in columnar scanners - the ParquetColumnChunkReader of parquet scanners - the final RowBatch reset by FragmentInstanceState This patch also moves GetCollectionMemory() from HdfsScanner to HdfsColumnarScanner since it's only used by parquet and orc scanners. PrettyPrint of SummaryStatsCounter is updated to also show the sum of the values if it's not speeds or percentages. Tests - tested in manually reproducing the memory-bound queries - ran perf-AB-test on tpch (sf=42) and didn't see significant performance change - added e2e tests - updated expected files of observability/test_profile_tool.py due to SummaryStatsCounter now prints sum in most of the cases. Also updated get_bytes_summary_stats_counter and test_get_bytes_summary_stats_counter accordingly. Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710 Reviewed-on: http://gerrit.cloudera.org:8080/22062 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-25 02:53:32 +00:00
Daniel Becker	bc0a92c5ed	IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher value When setting the Iceberg table property 'write.parquet.page-size-bytes' to a higher value, inserting into the table crashes Impala: create table lineitem_iceberg_comment stored as iceberg tblproperties("write.parquet.page-size-bytes"="1048576") as select l_comment from tpch_parquet.lineitem; The impala executors crash because of memory corruption caused by buffer overflow in HdfsParquetTableWriter::ColumnWriter::ProcessValue(). Before attempting to write the next value, it checks whether the total byte size would exceed 'plain_page_size_', but the buffer into which it writes ('values_buffer_') has length 'values_buffer_len_'. 'values_buffer_len_' is initialised in the constructor to 'DEFAULT_DATA_PAGE_SIZE', irrespective of the value of 'plain_page_size_'. However, it is intended to have at least the same size, as can be seen from the check in ProcessValue() or the GrowPageSize() method. The error does not usually surface because 'plain_page_size_' has the same default value, 'DEFAULT_DATA_PAGE_SIZE'. 'values_buffer_' is also used for DICTIONARY encoding, but that takes care of growing it as necessary. This change fixes the problem by initialising 'values_buffer_len_' to the value of 'plain_page_size_' in the constructor. This leads to exposing another bug: in BitWriter::PutValue(), when we check whether the next element fits in the buffer, we multiply 'max_bytes_' by 8, which overflows because 'max_bytes_' is a 32-bit int. This happens with values that we already use in our tests. This change changes the type of 'max_bytes_' to int64_t, so multiplying it by 8 (converting from bytes to bits) is now safe. Testing: - Added an EE test in iceberg-insert.test that reproduced the error. Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76 Reviewed-on: http://gerrit.cloudera.org:8080/22777 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-04-23 11:18:14 +00:00
Noemi Pap-Takacs	6ec2fb0210	IMPALA-13972: TestIcebergRestCatalog.test_rest_catalog_basic should check erasure coding TestIcebergRestCatalog.test_rest_catalog_basic used to expect 'NONE' as erasure coding policy in the table metadata. Checking the actual value with regex enables us to run the test on any file system with or without erasure coding. Testing: - executed test_rest_catalog_basic with and without erasure coding Change-Id: I8467d420513ab59916351d25c4787c46fb3cef88 Reviewed-on: http://gerrit.cloudera.org:8080/22801 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-23 02:35:11 +00:00
Zoltan Borok-Nagy	b19331b3d3	IMPALA-13968: Fix TestBinaryTypeInText.test_invalid_binary_type in ARM builds In some platforms there's a bug in libSASL, which makes sasl_decode64() accept almost anything as input and it won't report any errors. See details here: https://github.com/cyrusimap/cyrus-sasl/issues/619 TestBinaryTypeInText::test_invalid_binary_type uses a data file that has two values that should be Base64 encoded but they aren't. The test checks that Impala raises the corresponding errors. The first value doesn't have a correct size, so Impala's Base64DecodeBufLen() will reject it. The second one passes the Base64DecodeBufLen() check so we will try to decode it with Base64Decode() that uses sasl_decode64() under the hood. If sasl_decode64() has the bug we won't report any errors for the second value. This patch changes the second value to one that still passes the Base64DecodeBufLen() check, but also sasl_decode64() rejects it in all platforms. To achieve this, the new value uses the special '=' character. The bug in sasl_decode64() is also a known issue in Impala, see IMPALA-9926. Alternatively, we could create our own Base64 decoder, or use one from a different library, but it is out of scope of this patch as it requires more thought and performance measurements. IMPALA-13973 tracks this option. Change-Id: Ifff99fd2839e75add49ee835323764615bd5139a Reviewed-on: http://gerrit.cloudera.org:8080/22791 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-21 15:23:12 +00:00
Riza Suminto	648209b172	IMPALA-13967: Move away from setting user parameter in execute ImpalaConnection.execute and ImpalaConnection.execute_async have 'user' parameter to set specific user to run the query. This is mainly legacy of BeeswaxConnection, which allows using 1 client to run queries under different usernames. BeeswaxConnection and ImpylaHS2Connection actually allow specifying one user per client. Doing so will simplify user-specific tests such as test_ranger.py that often instantiates separate clients for admin user and regular user. There is no need to specify 'user' parameter anymore when calling execute() or execute_async(). Thus, reducing potential bugs from forgetting to set one or setting it with incorrect value. This patch applies one-user-per-client practice as much as possible for test_ranger.py, test_authorization.py, and test_admission_controller.py. Unused code and pytest fixtures are removed. Few flake8 issues are addressed too. Their default_test_protocol() is overridden to return 'hs2'. ImpylaHS2Connection.execute() and ImpylaHS2Connection.execute_async() are slightly modified to assume ImpylaHS2Connection.__user if 'user' parameter in None. BeeswaxConnection remains unchanged. Extend ImpylaHS2ResultSet.__convert_result_value() to lower case boolean return value to match beeswax result. Testing: Run and pass all modified tests in exhaustive exploration. Change-Id: I20990d773f3471c129040cefcdff1c6d89ce87eb Reviewed-on: http://gerrit.cloudera.org:8080/22782 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-18 15:27:22 +00:00
Riza Suminto	182aa5066e	IMPALA-13958: Revisit hs2_parquet_constraint and hs2_text_constraint hs2_parquet_constraint and hs2_text_constraint is meant to extend test vector dimension to also test non-default test protocol (other than beeswax), but limit it to only run against 'parquet/none' or 'text/none' format accordingly. This patch modifies these constraints to default_protocol_or_parquet_constraint and default_protocol_or_text_constraint respectively such that the full file format coverage happen for default_test_protocol configuration and limited for the other protocols. Drop hs2_parquet_constraint entirely from test_utf8_strings.py because that test is already constrained to single 'parquet/none' file format. Num modified rows validation in date-fileformat-support.test and date-partitioning.test are changed to check the NumModifiedRows counter from profile. Fix TestQueriesJsonTables to always run with beeswax protocol because its assertions relies on beeswax-specific return values. Run impala-isort and fix few flake8 issues and in modified test files. Testing: Run and pass the affected test files using exhaustive exploration and env var DEFAULT_TEST_PROTOCOL=hs2. Confirmed that full file format coverage happen for hs2 protocol. Note that DEFAULT_TEST_PROTOCOL=beeswax is still the default. Change-Id: I8be0a628842e29a8fcc036180654cd159f6a23c8 Reviewed-on: http://gerrit.cloudera.org:8080/22775 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-17 22:50:58 +00:00
Riza Suminto	55feffb41b	IMPALA-13850 (part 1): Wait until CatalogD active before resetting In HA mode, CatalogD initialization can fail to complete within reasonable time. Log messages showed that CatalogD is blocked trying to acquire "CatalogServer.catalog_lock_" when calling CatalogServer::UpdateActiveCatalogd() during statestore subscriber registration. catalog_lock_ was held by GatherCatalogUpdatesThread which is calling GetCatalogDelta(), which waits for the java lock versionLock_ which is held by the thread doing CatalogServiceCatalog.reset(). This patch remove catalog reset in JniCatalog constructor. In turn, catalogd-server.cc is now responsible to trigger the metadata reset (Invaidate Metadata) only if: 1. It is the active CatalogD, and 2. Gathering thread has collect the first topic update or CatalogD is set with catalog_topic_mode other than "minimal". The later prerequisite is to ensure that all coordinators are not blocked waiting for full topic update in on-demand metadata mode. This is all managed by a new thread method TriggerResetMetadata that monitor and trigger the initial reset metadata. Note that this is a behavior change in on-demand catalog mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode will send full database list in its first catalog topic update. This behavior change is OK since coordinator can request metadata on-demand. After this patch, catalog-server.active-status and /healthz page can turn into true and OK respectively even if the very first metadata reset is still ongoing. Observer that cares about having fully populated metadata should check other metrics such as catalog.num-db, catalog.num-tables, or /catalog page content. Updated start-impala-cluster.py readiness check to wait for at least 1 table to be seen by coordinators, except during create-load-data.sh execution (there is no table yet) and when use_local_catalog=true (local catalog cache does not start with any table). Modified startup flag checking from reading the actual command line args to reading the '/varz?json' page of the daemon. Cleanup impala_service.py to fix some flake8 issues. Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so that unique_database cleanup is successful. Testing: - Refactor test_catalogd_ha.py to reduce repeated code, use unique_database fixture, and additionally validate /healthz page of both active and standby catalogd. Changed it to test using hs2 protocol by default. - Run and pass test_catalogd_ha.py and test_concurrent_ddls.py. - Pass core tests. Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1 Reviewed-on: http://gerrit.cloudera.org:8080/22634 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-17 01:59:54 +00:00
Joe McDonnell	c5a0ec8bdf	IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package This puts all of the thrift-generated python code into the impala_thrift_gen package. This is similar to what Impyla does for its thrift-generated python code, except that it uses the impala_thrift_gen package rather than impala._thrift_gen. This is a preparatory patch for fixing the absolute import issues. This patches all of the thrift files to add the python namespace. This has code to apply the patching to the thirdparty thrift files (hive_metastore.thrift, fb303.thrift) to do the same. Putting all the generated python into a package makes it easier to understand where the imports are getting code. When the subsequent change rearranges the shell code, the thrift generated code can stay in a separate directory. This uses isort to sort the imports for the affected Python files with the provided .isort.cfg file. This also adds an impala-isort shell script to make it easy to run. Testing: - Ran a core job Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0 Reviewed-on: http://gerrit.cloudera.org:8080/20169 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-15 17:03:02 +00:00
Riza Suminto	047cf9ff4d	IMPALA-13954: Validate num inserted rows via NumModifiedRows counter This patch changes the way test validate num inserted rows from checking the beeswax-specific result to checking NumModifiedRows counter from query profile. Remove skiping over hs2 protocol in test_chars.py and refactor test_date_queries.py a bit to reduce test skiping. Added HS2_TYPES in tests that requires it and fix some flake8 issues. Testing: Run and pass all affected tests. Change-Id: I96eae9967298f75b2c9e4d0662fcd4a62bf5fffc Reviewed-on: http://gerrit.cloudera.org:8080/22770 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-11 20:31:50 +00:00
Riza Suminto	0ed4e869de	IMPALA-13930: ImpylaHS2Connection should only open cursor as needed Before this patch, ImpylaHS2Connection unconditionally opened a cursor (and HS2 session) as it connected, followed by running a "SET ALL" query to populate the default query options. This patch changes the behavior of ImpylaHS2Connection to open the default cursor only when querying is needed for the first time. This helps preserve assertions for a test that is sensitive about client connection, like IMPALA-13925. Default query options are now parsed from newly instantiated TQueryOptions object rather than issuing a "SET ALL" query or making BeeswaxService.get_default_configuration() RPC. Fix test_query_profile_contains_query_compilation_metadata_cached_event slightly by setting the 'sync_ddl' option because the test is flaky without it. Tweak test_max_hs2_sessions_per_user to run queries so that sessions will open. Deduplicate test cases between utc-timestamp-functions.test and local-timestamp-functions.test. Rename TestUtcTimestampFunctions to TestTimestampFunctions, and expand it to also tests local-timestamp-functions.test and file-formats-with-local-tz-conversion.test. The table_format is now contrained to 'test/none' because it is unnecessary to permute other table_format. Deprecate 'use_local_tz_for_unix_timestamp_conversions' in favor of query option with the same name. Filed IMPALA-13953 to update the documentation of 'use_local_tz_for_unix_timestamp_conversions' flag/option. Testing: Run and pass a few pytests such as: test_admission_controller.py test_observability.py test_runtime_filters.py test_session_expiration.py. test_set.py Change-Id: I9d5e3e5c11ad386b7202431201d1a4cff46cbff5 Reviewed-on: http://gerrit.cloudera.org:8080/22731 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-11 04:37:14 +00:00
Zoltan Borok-Nagy	2849708b87	IMPALA-13932: (addendum) Adds e2e test for IMPALA-13932 This patch adds e2e test for IMPALA-13932. Change-Id: I07537b31717d568422ea042ad2aeef906b3cab2e Reviewed-on: http://gerrit.cloudera.org:8080/22767 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-10 19:55:22 +00:00
Zoltan Borok-Nagy	6e4613442e	IMPALA-13933: run-iceberg-rest-server.sh should use IMPALA_MAVEN_OPTIONS run-iceberg-rest-server.sh should use IMPALA_MAVEN_OPTIONS. The easiest way to achieve this is to invoke maven via bin/mvn-quiet.sh The missing options can cause maven failures. Change-Id: Ib01ed2ef420b9e965dd8d0e495c370a1ddf323a8 Reviewed-on: http://gerrit.cloudera.org:8080/22736 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-10 15:32:02 +00:00
Peter Rozsa	5c14877b05	IMPALA-13932: Add file path and position-based duplicate check for IcebergMergeNode IcebergMergeNode's duplicate checking mechanism was based on comparing pointers of the target table's rows. This mechanism results in false positives if a new row batch reused the memory of the previous row batch provided for the merge node. This change adds an additional check that validates the file position and file path as well. Change-Id: I71b47414321675958c05438ef3aeeb5df0128033 Reviewed-on: http://gerrit.cloudera.org:8080/22761 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-04-09 09:32:03 +00:00
Csaba Ringhofer	6f2d9a24d8	IMPALA-13920: Allow running minicluster with Java 17 IMPALA-11941 allowed building Impala and running tests with Java 17, but it still uses Java 8 for minicluster components (e.g. Hadoop) and skips several tests that would restart Hive. It should be possible to use 17 for everything to be able to deprecate Java 8. This patch mainly fixes Yarn+Hive+Tez startup issues with java 17 by setting JAVA_TOOL_OPTIONS. Another issues fixed is KuduHMSIntegrationTest: this test fails to restart Kudu due to a bug in OpenJDK (see IMPALA-13856). The current fix is to remove LD_PRELOAD to avoid loading libjsig (similarly to the case when MINICLUSTER_JAVA_HOME is set). This works, but it would be nice to clean up this area in a future patch. Testing: - ran exhaustive tests with Java 17 - ran core tests with default Java 8 Change-Id: If58b64a21d14a4a55b12dfe9ea0b9c3d5fe9c9cf Reviewed-on: http://gerrit.cloudera.org:8080/22705 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2025-04-04 17:50:01 +00:00
Daniel Vanko	841b8c32c8	IMPALA-12107: Throw AnalysisException for unsupported Kudu range-partioning types Change Precondition check to throwing AnalysisException for illegal key types in the PARTITION BY RANGE clause. Testing: * add fe tests * add e2e tests Change-Id: I3e3037318065b0f4437045a7e8dbb76639404167 Reviewed-on: http://gerrit.cloudera.org:8080/22542 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-04-04 12:01:01 +00:00
Zoltan Borok-Nagy	49aaaa2cd5	IMPALA-13927: Fix crash on invalid BINARY data in TEXT tables BINARY data in text files are expected to be Base64 encoded. TextConverter::WriteSlot has a bug when it decodes base64 code, it does not set the NULL-indicator bit to NULL for the slots of the invalid BINARY values. Therefore later Tuple::CopyStrings can try to copy invalid StringValue objects. This patch fixes TextConverter::WriteSlot to set the NULL-indicator bit in case of Base64 parse errors. Testing * e2e test added Change-Id: I79b712e2abe8ce6ecfbce508fd9e4e93fd63c964 Reviewed-on: http://gerrit.cloudera.org:8080/22721 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-03 13:57:24 +00:00
Zoltan Borok-Nagy	bd3486c051	IMPALA-13586: Initial support for Iceberg REST Catalogs This patch adds initial support for Iceberg REST Catalogs. This means now it's possible to run an Impala cluster without the Hive Metastore, and without the Impala CatalogD. Impala Coordinators can directly connect to an Iceberg REST server and fetch metadata for databases and tables from there. The support is read-only, i.e. DDL and DML statements are not supported yet. This was initially developed in the context of a company Hackathon program, i.e. it was a team effort that I squashed into a single commit and polished the code a bit. The Hackathon team members were: * Daniel Becker * Gabor Kaszab * Kurt Deschler * Peter Rozsa * Zoltan Borok-Nagy The Iceberg REST Catalog support can be configured via a Java properties file, the location of it can be specified via: --catalog_config_dir: Directory of configuration files Currently only one configuration file can be in the direcory as we only support a single Catalog at a time. The following properties are mandatory in the config file: * connector.name=iceberg * iceberg.catalog.type=rest * iceberg.rest-catalog.uri The first two properties can only be 'iceberg' and 'rest' for now, they are needed for extensibility in the future. Moreover, Impala Daemons need to specify the following flags to connect to an Iceberg REST Catalog: --use_local_catalog=true --catalogd_deployed=false Testing * e2e added to test basic functionlity with against a custom-built Iceberg REST server that delegates to HadoopCatalog under the hood * Further testing, e.g. Ranger tests are expected in subsequent commits TODO: * manual testing against Polaris / Lakekeeper, we could add automated tests in a later patch Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c Reviewed-on: http://gerrit.cloudera.org:8080/22353 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-02 20:04:12 +00:00
Daniel Becker	d714798904	IMPALA-13609: Store Iceberg snapshot id for COMPUTE STATS Currently, when COMPUTE STATS is run from Impala, we set the 'impala.lastComputeStatsTime' table property. Iceberg Puffin stats, on the other hand, store the snapshot id for which the stats were calculated. Although it is possible to retrieve the timestamp of a snapshot, comparing these two values is error-prone, e.g. in the following situation: - COMPUTE STATS calculation is running on snapshot N - snapshot N+1 is committed at time T - COMPUTE STATS finishes and sets 'impala.lastComputeStatsTime' at time T + Delta - some engine writes Puffin statistics for snapshot N+1 After this, HMS stats will appear to be more recent even though they were calculated on snapshot N, while we have Puffin stats for snapshot N+1. To make comparisons easier, after this change, COMPUTE STATS sets a new table property, 'impala.computeStatsSnapshotIds'. This property stores the snapshot id for which stats have been computed, for each column. It is a comma-separated list of values of the form "fieldIdRangeStart[-fieldIdRangeEndIncl]:snapshotId". The fieldId part may be a single value or a contiguous, inclusive range. Storing the snapshot ids on a per-column basis is needed because COMPUTE STATS can be set to calculate stats for only a subset of the columns, and then a different subset in a subsequent run. The recency of the stats will then be different for each column. Storing the Iceberg field ids instead of column names makes the format easier to handle as we do not need to take care of escaping special characters. The 'impala.computeStatsSnapshotIds' table property is deleted after DROP STATS. Note that this change does not yet modify how Impala chooses between Puffin and HMS stats: that will be done in a separate change. Testing: - Added tests in iceberg-compute-stats.test checking that 'impala.computeStatsSnapshotIds' is set correctly and is deleted after DROP STATS - added unit tests in IcebergUtilTest.java that check the parsing and serialisation of the table property Change-Id: Id9998b84c4fd20d1cf5e97a34f3553832ec70ae7 Reviewed-on: http://gerrit.cloudera.org:8080/22339 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-01 13:53:43 +00:00
Riza Suminto	b121a40d20	IMPALA-13909: Remove cursor fixture from custom_cluster/test_kudu.py This patch remove deprecated cursor fixture in custom_cluster/test_kudu.py. It is replaced with assert_num_row() method that creates a fresh hs2 client. All test class in custom_cluster/test_kudu.py now extend CustomKuduTest and use hs2 protocol as default test protocol. Testing: - Run and pass custom_cluster/test_kudu.py Change-Id: I046bf987dd16ecdf493d999e86191d85210f2de5 Reviewed-on: http://gerrit.cloudera.org:8080/22698 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-28 23:27:58 +00:00
Daniel Vanko	76a8efb155	IMPALA-13859: Add decimal to Kudu's supported primary key types Add DECIMAL to KuduUtil.isSupportedKeyType because it is a supported type for primary key by Kudu. Testing: * add fe tests * add e2e tests Change-Id: I0e8685fe89e4e4e511ab1d815c51ca5a021b4081 Reviewed-on: http://gerrit.cloudera.org:8080/22674 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-03-28 17:39:18 +00:00
Mihaly Szjatinya	356b7e5ddf	IMPALA-11597: Unset impala.lastComputeStatsTime during DROP STATS Removes 'impala.lastComputeStatsTime' property from table on DROP STATS catalog operation. Does not affect the incremental variant: 'DROP INCREMENTAL STATS t PARTITION (partition_spec)' Change-Id: Id743bc9779141df7a26a95a0ea201ef6fc6aeff9 Reviewed-on: http://gerrit.cloudera.org:8080/22474 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-14 18:06:39 +00:00
Yida Wu	1c6ff5f98d	IMPALA-13812: Fail query for certain errors related to AI functions The ai_generate_text() and ai_generate_text_default() functions return error message as a result (string) which could be misleading in some cases. This patch fixes this issue by setting the error in the context as a udf error, causing the query to fail in cases of configuration related errors or http errors when accessing the AI endpoint. Tests: Ran core tests. Added custom testcase TestAIGenerateText for failure cases with ai_generate_text_default(). Added testcase TestExprs.test_ai_generate_text_exprs for failure cases with ai_generate_text(). Change-Id: I639e48e64d62f7990cf9a3c35a59a0ee3a2c64e0 Reviewed-on: http://gerrit.cloudera.org:8080/22588 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-14 05:21:25 +00:00
Zoltan Borok-Nagy	efc921a83c	IMPALA-13836: Fix TestIcebergV2Table.test_missing_data_files failure in erasure-coding build test_missing_data_files issues DESCRIBE FORMATTED command to check the attributes of a table that has missing data files. It also checks the value of 'Erasure Coding Policy' which depends on the HDFS configuration. This patch removes the check for this property. Testing * Verified fix in build that has erasure-coding enabled Change-Id: I645e8922cbb992424058a2256114c66ca8ec4c3c Reviewed-on: http://gerrit.cloudera.org:8080/22619 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-13 21:02:50 +00:00
Riza Suminto	7a5adc15ca	IMPALA-13333: Limit memory estimation if PlanNode can spill SortNode, AggregationNode, and HashJoinNode (the build side) can spill to disk. However, their memory estimation does not consider this capability and assumes it will hold all rows in memory. This causes memory overestimation if cardinality is also overestimated. In reality, the whole query execution in a single host is often subject to much lower memory upper-bound and not allowed to exceed it. This upper-bound is dictated by, but not limited to: - MEM_LIMIT - MEM_LIMIT_COORDINATORS - MEM_LIMIT_EXECUTORS - MAX_MEM_ESTIMATE_FOR_ADMISSION - impala.admission-control.max-query-mem-limit.<pool_name> from admission control. This patch adds SpillableOperator interface that defines an alternative of either PlanNode.computeNodeResourceProfile() or DataSink.computeResourceProfile() if a lower memory upper-bound can be reasoned about from configs mentioned above. This interface is applied to SortNode, AggregationNode, HashJoinNode, and JoinBuildSink. The in-memory vs spill-to-disk bias is controlled through MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR option. A scale between [0.0,1.0] to control estimate peak memory of query operator that has spill-to-disk capabilities. Setting value closer to 1.0 will make Planner bias towards keeping as much rows as possible in memory, while setting value closer to 0.0 will make Planner bias towards spilling rows to disk under memory pressure. Note that lowering MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR can make query that previously rejected by Admission Controller becomes admittable, but also may spill-to-disk more and have higher risk to exhaust scratch space more than before. There are some caveats on this memory bounding patch: - It checks if spill-to-disk is enabled in the coordinator, but individual backend executors might not have it configured. Mismatch of spill-to-disk configs between the coordinator and backend executor, however, is rare and can be considered as misconfiguration. - It does not check the actual total scratch space available to spill-to-disk. However, query execution will be forced to spill anyway if memory usage exceeds the three memory configs above. Raising MEM_LIMIT / MEM_LIMIT_EXECUTORS option can help increase memory estimation and increase the likelihood for the query to get assigned to a larger executor group set, which usually has a bigger total scratch space. - The memory bound is divided evenly among all instances of a fragment kind in a single host. But in theory, they should be able to share and grow their memory usage independently beyond memory estimate as long max memory reservation is not set. - This does not consider other memory-related configs such as clamp_query_mem_limit_backend_mem_limit or disable_pool_mem_limits flag. But the admission controller will still enforce them if set. Testing: - Pass FE and custom cluster tests with core exploration. Change-Id: I290c4e889d4ab9e921e356f0f55a9c8b11d0854e Reviewed-on: http://gerrit.cloudera.org:8080/21762 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-03-13 18:38:27 +00:00
Zoltan Borok-Nagy	8093c3fa6b	IMPALA-13854: IcebergPositionDeleteChannel uses incorrect capacity IcebergPositionDeleteChannel uses incorrect capacity since IMPALA-13509. It is set to -1 which means it collects delete records as long as it runs out of memory. This patch moves the Channel's capacity calculation from the Init() function to the constructor. Testing * e2e test added Change-Id: I207869c97a699d2706227285595ec7d7dbe1e249 Reviewed-on: http://gerrit.cloudera.org:8080/22616 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-13 14:40:36 +00:00
Zoltan Borok-Nagy	a49ff618f1	IMPALA-13853: Don't adjust Iceberg field IDs for data files that don't have complex types In migrated Iceberg tables we can have data files with missing field IDs. We assume that their schema corresponds to the table schema at the point when the table migration happened. This means during runtime we can generate the field ids. The logic is more complicated when there are complex types in the table and the table is partitioned. In such cases we need to do some adjustments during field ID generation, in which case we verify that the file schema corresponds to the table schema. These adjustments are not needed when the table doesn't have complex types, hence we can be a bit more relaxed and skip schema verification, because field ID generation for top-level columns are not affected. This means Impala would still be able to read the table if there were trivial schema changes before migration. With this change we allow all data files that have a compatible schema with the table schema, which was the case before IMPALA-13364. This behavior is also aligned with Hive. Testing: * e2e tests added for both Parquet and ORC files Change-Id: Ib1f1d0cf36792d0400de346c83e999fa50c0fa67 Reviewed-on: http://gerrit.cloudera.org:8080/22610 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-12 12:51:16 +00:00
Peter Rozsa	167ced7844	IMPALA-13674: Enable MERGE statement for Iceberg tables with equality deletes This change fixes the delete expression calculation for IcebergMergeImpl, when an Iceberg table contains equality deletes, the merge implementation now includes the data sequence number in the result expressions as the underlying tuple descriptor also includes it implicitly. Without including this field, the row evaluation fails because of the mismatching number of evaluators and slot descriptors. Tests: - manually validated on an Iceberg table that contains equality delete - e2e test added Change-Id: I60e48e2731a59520373dbb75104d75aae39a94c1 Reviewed-on: http://gerrit.cloudera.org:8080/22423 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-06 19:20:52 +00:00
Zoltan Borok-Nagy	d928815d1a	IMPALA-13654: Tolerate missing data files of Iceberg tables Before this patch we got a TableLoadingException for missing data files. This means the IcebergTable will be in an incomplete state in Impala's memory, therefore we won't be able to do any operation on it. We should continue table loading in such cases, and only throw exception for queries that are about to read the missing data files. This way ROLLBACK / DROP PARTITION, and some SELECT statements should still work. If Impala is running in strict mode via CatalogD flag --iceberg_allow_datafiles_in_table_location_only, and an Iceberg table has data files outside of table location, we still raise an exception and leave the table in an unloaded state. To retain this behavior, the IOException we threw is substituted to TableLoadingException which fits better to logic errors anyway. Testing * added e2e tests Change-Id: If753619d8ee1b30f018e90157ff7bdbe5d7f1525 Reviewed-on: http://gerrit.cloudera.org:8080/22367 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-03 17:50:04 +00:00
Abhishek Rawat	7e8c79df53	IMPALA-13792: Cross compile AI functions ai_generate_text() and ai_generate_text_default() are not cross-compiled and so these could lead to undefined symbols when codegen is enabled. This patch cross-compiles these functions. Testing: - Added e2e tests for ai_generate_text and ai_generate_text_default functions and these are run with codegen enabled/disabled. Change-Id: I454657d9f1345a36b269e6b837aaecf55a09add0 Reviewed-on: http://gerrit.cloudera.org:8080/22552 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-02 08:51:57 +00:00
Noemi Pap-Takacs	79f51b018f	IMPALA-12588: Don't UPDATE rows that already have the desired value When UPDATEing an Iceberg or Kudu table, we should change as few rows as possible. In case of Iceberg tables it means writing as few new data records and delete records as possible. Therefore, if rows already have the new values we should just ignore them. One way to achieve this is to add extra predicates, e.g.: UPDATE tbl SET k = 3 WHERE i > 4; ==> UPDATE tbl SET k = 3 WHERE i > 4 AND k IS DISTINCT FROM 3; So we won't write new data/delete records for the rows that already have the desired value. Explanation on how to create extra predicates to filter out these rows: If there are multiple assignments in the SET list, we can only skip updating a row if all the mentioned values are already equal. If either of the values needs to be updated, the entire row does. Therefore we can think of the SET list as predicates connected with AND and all of them need to be taken into consideration. To negate this SET list, we have to negate the individual SET assignments and connect them with OR. Then add this new compound predicate to the original where predicates with an AND (if there were none, just create a WHERE predicate from it). AND / \ original OR WHERE predicate / \ !a OR / \ !b !c This simple graph illustrates how the where predicate is rewritten. (Considering an UPDATE statement that sets 3 columns.) '!a', '!b' and '!c' are the negations of the individual assignments in the SET list. So the extended WHERE predicate is: (original WHERE predicate) AND (!a OR !b OR !c) To handle NULL values correctly, we use IS DISTINCT FROM instead of simply negating the assignment with operator '!='. If the assignments contain UDFs, the result might be inconsistent because of possible non-deterministic values or state in the UDFs, therefore we should not rewrite the WHERE predicate at all. Evaluating expressions can be expensive, therefore this optimization can be limited or switched off entirely using the Query Option SKIP_UNNEEDED_UPDATES_COL_LIMIT. By default, there is no filtering if more than 10 assignments are in the SET list. ------------------------------------------------------------------- Some performance measurements on a tpch lineitem table: - predicates in HASH join, all updates can be skipped Q1/[Q2] (Similar, but Q2 adds extra 4 items to the SET list): update t set t.l_suppkey = s.l_suppkey, [ t.l_partkey=s.l_partkey, t.l_quantity=s.l_quantity, t.l_returnflag=s.l_returnflag, t.l_shipmode=s.l_shipmode ] from ice_lineitem t join ice_lineitem s on t.l_orderkey=s.l_orderkey and t.l_linenumber=s.l_linenumber; - predicates in HASH join, all rows need to be updated Q3: update t set t.l_suppkey = s.l_suppkey, t.l_partkey=s.l_partkey, t.l_quantity=s.l_quantity, t.l_returnflag=s.l_returnflag, t.l_shipmode=concat(s.l_shipmode,' ') from ice_lineitem t join ice_lineitem s on t.l_orderkey=s.l_orderkey and t.l_linenumber=s.l_linenumber; - predicates pushed down to the scanner, all rows updated Q4/[Q5] (Similar, but Q5 adds extra 8 items to the SET ist): update ice_lineitem set [ l_suppkey = l_suppkey + 0, l_partkey=l_partkey + 0, l_quantity=l_quantity, l_returnflag=l_returnflag, l_tax = l_tax, l_discount= l_discount, l_comment = l_comment, l_receiptdate = l_receiptdate, ] l_shipmode=concat(l_shipmode,' '); +=======+============+==========+======+ \| Query \| unfiltered \| filtered \| diff \| +=======+============+==========+======+ \| Q1 \| 4.1s \| 1.9s \| -54% \| +-------+------------+----------+------+ \| Q2 \| 4.2s \| 2.1s \| -50% \| +-------+------------+----------+------+ \| Q3 \| 4.3s \| 4.7s \| +10% \| +-------+------------+----------+------+ \| Q4 \| 3.0s \| 3.0s \| +0% \| +-------+------------+----------+------+ \| Q5 \| 3.1s \| 3.1s \| +0% \| +-------+------------+----------+------+ The results show that in the best case (we can skip all rows) this change can cause significant perf improvement ~50%, since 0 rows were written. See Q1 and Q2. If the predicates are evaluated in the join operator, but there were no matches (worst case scenario) we can lose about 10%. (Q3) If all the predicates can be pushed down to the scanners, the change does not seem to cause significant difference (~0% in Q4 and Q5) even if all rows have to be updated. Testing: - Analysis - Planner - E2E - Kudu - Iceberg - testing the new query option: SKIP_UNNEEDED_UPDATES_COL_LIMIT Change-Id: I926c80e8110de5a4615a3624a81a330f54317c8b Reviewed-on: http://gerrit.cloudera.org:8080/22407 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-28 05:52:21 +00:00
Michael Smith	768527c89a	IMPALA-13804: Use redacted statement in live table Uses the redacted SQL statement in sys.impala_query_live, so it's consistent with the profile and sys.impala_query_log. Testing: added a test with redaction rules for live and log tables. Change-Id: I9a72eeaea84981e96655aec6c67b5ef2cbbd3c3e Reviewed-on: http://gerrit.cloudera.org:8080/22556 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-27 22:39:44 +00:00
Pranav Lodha	4c549d79f2	IMPALA-12992: Support for Hive JDBC Storage handler tables This is an enhancement request to support JDBC tables created by Hive JDBC Storage handler. This is essentially done by making JDBC table properties compatible with Impala. It is done by translating when loading the table, and maintaining that only in the Impala cluster, i.e. it's not written back to HMS. Impala includes JDBC drivers for PostgreSQL and MySQL making 'driver.url' not mandatory in such cases. The Impala JDBC driver is still required for Impala-to-Impala JDBC connections. Additionally, Hive allows adding database driver JARs at runtime via Beeline, enabling users to dynamically include JDBC driver JARs. However, Impala does not support adding database driver JARs at runtime, making the driver.url field still useful in cases where additional drivers are needed. 'hive.sql.query' property is not handled in this patch. It'll be covered in a separate jira. Testing: End-to-end tests are included in test_ext_data_sources.py. Change-Id: I1674b93a02f43df8c1a449cdc54053cc80d9c458 Reviewed-on: http://gerrit.cloudera.org:8080/22134 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-27 11:44:38 +00:00
Riza Suminto	f59c0917fe	IMPALA-13786: Skip rewriting expr of Hive auto-generated label IMPALA-10836 introduced the SimplifyCastExprRule optimization to simplify CAST expressions. However, applying this rewrite rule over expression referred by Hive auto-generated label has caused AnalysisException like following: AnalysisException: Could not resolve column/field reference: 'failing_view._c0' It is most likely that, before IMPALA-10836, expression referred by Hive auto-generated label never effectively being rewritten. Thus, the ExprSubstitutionMap across multiple InlineViewRef was intact. This patch attempt to fix the issue by making any expression in SelectList that mapped to Hive auto-generated label ineligible for any kind of expression rewrite. Also addressed some flake8 errors in test_views_compatibility.py. Testing: - Add test case in views-compatibility.test. - Break test_view_compatibility_hive into 3 separate tests. - Refactor test_views_compatibility.py to run both EXPLAIN and SELECT query over the test view. - Pass test_views_compatibility.py in exhaustive exploration. - Pass core tests. Change-Id: I4b8bbd0afd6da0532bf2ef460989d4f01337d198 Reviewed-on: http://gerrit.cloudera.org:8080/22546 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-27 05:07:53 +00:00
Surya Hebbar	83452d640f	IMPALA-13751: Fix runtime-profile-test failure since IMPALA-13304 This patch fixes the flaky tests added to the 'runtime-profile-test' in IMPALA-13304. The following possible causes have been fixed. - Lexical errors present while defining example event sequence labels - Edge cases providing invalid randomization ranges - Improper re-allocation of instance timestamps - Inconsistent sleep timers during event sequence generation The test fixture's coverage for cases of complete and missing events have been refactored into a common validation method. To cover different scenarios of missing events across instances for sorting and alignment randomized tests have been kept. For simulating missing events, the test fixture uses bernoulli's distribution with two possible probability values 0.5 and 0.8. This generates missing events with an approximate certainity of ~99.99%. Also, irrespective of the presence of missing events the test runs successfully. Still, to ensure even more certainity(>99.99%), now the test fixture retries the setup for a maximum of 3 times. The randomly generated values are also included as parameters within the JUnitXML. Updated observability tests with latest JSON profile expected outputs. Tested locally with high parallelism and resource exhaustion using GNU's parallel. i.e. parallel -j 100 <test command> ::: {1..100} Change-Id: If04744e215cac79f255b3d73c3e91e873c13749a Reviewed-on: http://gerrit.cloudera.org:8080/22482 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-02-21 15:48:50 +00:00
Daniel Becker	14a1bd73e3	IMPALA-13770: Updating Iceberg tables with UDFs crashes Impala When using a native UDF in the target value of an UPDATE statement or in a filter predicate or target value of a MERGE statement, Impala crashes with the following DCHECK: be/src/exprs/expr.cc:47 47 DCHECK(cache_entry_ == nullptr); This DCHECK is in the destructor of Expr, and it fires because Close() has not been called for the expression. In the UPDATE case this is caused by MultiTableSinkConfig: it creates child DataSinkConfig objects but does not call Close() on them, and consequently these child sink configs do not call Close() on their output expressions. In the MERGE case it is because various expressions are not closed in IcebergMergeCase and IcebergMergeNode. This patch fixes the issue by overriding Close() in MultiTableSinkConfig, calling Close() on the child sinks as well as closing the expressions in IcebergMergeCase and IcebergMergeNode. Testing: - Added EE regression tests for the UPDATE and MERGE cases in iceberg-update-basic.test and iceberg-merge.test Change-Id: Id86638c8d6d86062c68cc9d708ec9c7b0a4e95eb Reviewed-on: http://gerrit.cloudera.org:8080/22508 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-21 15:10:37 +00:00
Zoltan Borok-Nagy	dacc447938	IMPALA-13768: Redundant Iceberg delete records are shuffled around which cause error "Invalid file path arrived at builder" IcebergDeleteBuilder assumes that it should only receive delete records for paths of data files that are scheduled for its corresponding SCAN operator. It is not true when any of the following happens: * number of output channels in sender is 1 (currently no DIRECTED mode, no filtering) * hit bug in DIRECTED mode, see below * single node plan is being used (no DIRECTED mode, no filtering) With this patch, KrpcDataStreamSender::Send() will use DIRECTED mode even if number of output channels is 1. It also fixes the bug in DIRECTED mode (which was due to an unused variable 'skipped_prev_row') and simplified the logic a bit. The patch also relaxes the assumption in IcebergDeleteBuilder, i.e. only return error for dangling delete records when we are in a distributed plan where we can assume DIRECTED distribution mode of position delete records. Testing * added e2e tests Change-Id: I695c919c9a74edec768e413a02b2ef7dbfa0d6a5 Reviewed-on: http://gerrit.cloudera.org:8080/22500 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-20 14:05:57 +00:00
Riza Suminto	9cb9bae84e	IMPALA-13758: Use context manager in ImpalaTestSuite.change_database ImpalaTestSuite.change_database is responsible to point impala client to database under test. However, it left client pointing to that database after the test without reverting them back to default database. This patch does the reversal by changing ImpalaTestSuite.change_database to use context manager. This patch change the behavior of execute_query_using_client() and execute_query_async_using_client(). They used to change database according to the given vector parameter, but not anymore after this patch. In practice, this behavior change does not affect many tests because most queries going through these functions already use fully qualified table name. Going forward, querying through function other than run_test_case() should try to use fully qualified table name as much as possible. Retain behavior of ImpalaTestSuite._get_table_location() since there are considerable number of tests relies on it (changing database when called). Removed unused test fixtures and fixed several flake8 issues in modified test files. Testing: - Moved nested-types-subplan-single-node.test. This allows the test framework to point to the right tpch_nested* database. - Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will be fixed in separate patch. Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2 Reviewed-on: http://gerrit.cloudera.org:8080/22487 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-19 23:50:34 +00:00
Michael Smith	1b6395b8db	IMPALA-13627: Handle legacy Hive timezone conversion After HIVE-12191, Hive has 2 different methods of calculating timestamp conversion from UTC to local timezone. When Impala has convert_legacy_hive_parquet_utc_timestamps=true, it assumes times written by Hive are in UTC and converts them to local time using tzdata, which matches the newer method introduced by HIVE-12191. Some dates convert differently between the two methods, such as Asia/Kuala_Lumpur or Singapore prior to 1982 (also seen in HIVE-24074). After HIVE-25104, Hive writes 'writer.zone.conversion.legacy' to distinguish which method is being used. As a result there are three different cases we have to handle: 1. Hive prior to 3.1 used what’s now called “legacy conversion” using SimpleDateFormat. 2. Hive 3.1.2 (with HIVE-21290) used a new Java API that’s based on tzdata and added metadata to identify the timezone. 3. Hive 4 support both, and added a new file metadata to identify it. Adds handling for Hive files (identified by created_by=parquet-mr) where we can infer the correct handling from Parquet file metadata: 1. if writer.zone.conversion.legacy is present (Hive 4), use it to determine whether to use a legacy conversion method compatible with Hive's legacy behavior, or convert using tzdata. 2. if writer.zone.conversion.legacy is not present but writer.time.zone is, we can infer it was written by Hive 3.1.2+ using new APIs. 3. otherwise it was likely written by an earlier Hive version. Adds a new CLI and query option - use_legacy_hive_timestamp_conversion - to select what conversion method to use in the 3rd case above, when Impala determines that the file was written by Hive older than 3.1.2. Defaults to false to minimize changes in Impala's behavior and because going through JNI is ~50x slower even when the results would not differ; Hive defaults to true for its equivalent setting: hive.parquet.timestamp.legacy.conversion.enabled. Hive legacy-compatible conversion uses a Java method that would be complicated to mimic in C++, doing DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); formatter.setTimeZone(TimeZone.getTimeZone(timezone_string)); java.util.Date date = formatter.parse(date_time_string); formatter.setTimeZone(TimeZone.getTimeZone("UTC")); return out.println(formatter.format(date); IMPALA-9385 added a check against a Timezone pointer in FromUnixTimestamp. That dominates the time in FromUnixTimeNanos, overriding any benchmark gains from IMPALA-7417. Moves FromUnixTime to allow inlining, and switches to using UTCPTR in the benchmark - as IMPALA-9385 did in most other code - to restore benchmark results. Testing: - Adds JVM conversion method to convert-timestamp-benchmark. - Adds tests for several cases from Hive conversion tests. Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067 Reviewed-on: http://gerrit.cloudera.org:8080/22293 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-02-18 16:33:39 +00:00

1 2 3 4 5 ...

3210 Commits