3210 Commits

Author SHA1 Message Date
Riza Suminto
f7cf4f8446 IMPALA-14070: Use checkedMultiply in SortNode.java
maxRowsInHeaps calculation may overflow because it use simple
multiplication. This patch fix the bug by calculating it using
checkedMultiply(). A broader refactoring will be done by IMPALA-14071.

Testing:
Add ee tests TestTopNHighNdv that exercise the issue.

Change-Id: Ic6712b94f4704fd8016829b2538b1be22baaf2f7
Reviewed-on: http://gerrit.cloudera.org:8080/22896
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-14 04:11:48 +00:00
Surya Hebbar
7ad7a86c0e IMPALA-13624: Implement textual representation for aggregate event sequences
This adds support for a summarized textual representation of timestamps
for the event sequences present in the aggregated profile.

With the verbose format present in profile V1 and V2, it becomes
difficult to analyze an event's timestamps across instances.

The event sequences are now displayed in a histogram format, based on
the number of timestamps present, in order to support an easier view
for skew analysis and other possible use cases.
(i.e. based on json_profile_event_timestamp_limit)

The summary generated from aggregated instance-level timestamps
(i.e. IMPALA-13304) is used to achieve this within the profile V2,
which covers the possbility of missing events.

Example,
  Verbosity::DEFAULT
  json_profile_event_timestamp_limit = 5 (default)

  Case #1, Number of instances exceeded limit
    Node Lifecycle Event Timeline Summary :
     - Open Started (4s880ms):
        Min: 2s312ms, Avg: 3s427ms, Max: 4s880ms, Count: 12
        HistogramCount: 4, 4, 0, 0, 4

  Case #2, Number of instances within the limit

    Node Lifecycle Event Timeline:
     - Open Started: 5s885ms, 1s708ms, 3s434ms
     - Open Finished: 5s885ms, 1s708ms, 3s435ms
     - First Batch Requested: 5s885ms, 1s708ms, 3s435ms
     - First Batch Returned: 6s319ms, 2s123ms, 3s570ms
     - Last Batch Returned: 7s878ms, 2s123ms, 3s570ms

With Verbosity::EXTENDED or more, all events and timestamps are printed
with full verbosity as before.

Tests:
For test_profile_tool.py, updated the generated outputs for text
and JSON profiles.

Change-Id: I4bcc0e2e7fccfa8a184cfa8a3a96d68bfe6035c0
Reviewed-on: http://gerrit.cloudera.org:8080/22245
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-14 00:21:54 +00:00
Surya Hebbar
bc0de10966 IMPALA-14069: Factor possibility of zero timestamps in aggregated event sequences
Currently, the missing event timestamps are substituted by zeros
and then reported(i.e. unreported_event_instance_idxs) within
event sequences of the JSON profile. See IMPALA-13555 for more details.

Even with micro/nanosecond precision, some event timestamps are recorded
as zeros (i.e. Prepare Finished - 0ns).

The current implementation of aggregated event sequences was incorrectly
considering these zeros as substituted missing timestamps.

Although, these can be distinguished from missing timestamps through
the exposed 'unreported_event_instance_idxs', it is more helpful
to represent missing values as -ve values or constants(i.e. -1).

This representation is favorable for summary and visualization, and is
necessary for skipping missing values and maintaing alignment between
instance timestamps.

The patch also fixes null values in "info_strings" fields within
the JSON profile.

Fixed runtime-profile-test to consider -ve values(i.e. -1) as missing
event timestamps, instead of 0.

Updated the generated profiles in testdata/impala-profiles.

Change-Id: I9f1efd2aad5f62084075cd8f9169ef72c66942b6
Reviewed-on: http://gerrit.cloudera.org:8080/22893
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-14 00:21:54 +00:00
Zoltan Borok-Nagy
afa329fd89 IMPALA-13931: TestIcebergRestCatalog.test_rest_catalog_basic failed at setup
There were several issues with test_rest_catalog_basic which made it
fail in environments that used Ozone or S3.

Missing dependency of Ozone and S3 classes:
* This is resolved in iceberg-rest-catalog-test/pom.xml by adding
  a dependency to impala-executor-deps

Hadoop configuration was initialized properly:
* run-iceberg-rest-server.sh used Maven to run Iceberg REST Catalog in
  which case Maven is in charge of setting the CLASSPATH but the
  core-site/ozone-site/etc. config files were not on it, so the
  REST Catalog used a default Hadoop configuration that wasn't good
  for our environment.
* To overcome the CLASSPATH problem now we create a runnable JAR in
  iceberg-rest-catalog-test/pom.xml and also generate the proper
  CLASSPATH during compilation.
* run-iceberg-rest-server.sh now uses java -cp to run the REST
  Catalog

S3 builds threw NoSuchMethodException for the "create" method of
ApacheHttpClientConfigurations:
* The Iceberg library dynamically load its http client builders
  to workaround an error, see details in
  https://github.com/apache/iceberg/issues/6715
* So the Iceberg lib dynamically wants to load the "create" method
  of its own ApacheHttpClientConfigurations class but it fails
  with NoSuchMethodException.
* The critical code is invoked from Impala's IcebergMetadataScanner's
  ScanMetadataTable() method which happens to be invoked through
  JNI from the C++ backend.
* The context class loader of such threads are NULL, which means
  Java will use the bootstrap class loader to load classes and methods,
  but that doesn't have the proper resources on its classpath.
* To overcome this issue we set the context class loader for the thread
  to the class loader that originally loaded the IcebergMetadataScanner
  class.

Change-Id: I9dc0e30aeaff0b8de41426ba38506383b4af472c
Reviewed-on: http://gerrit.cloudera.org:8080/22818
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-09 17:01:56 +00:00
Zoltan Borok-Nagy
04735598d6 IMPALA-13718: Skip reloading Iceberg tables when metadata JSON file is the same
With this patch Impala skips reloading Iceberg tables when metadata
JSON file is the same, as this means that the table is essentially
unchanged.

This can help in situations when the event processor is lagging behind
and we have an Iceberg table that is updated frequently. Imagine the
case when Impala gets 100 events for an Iceberg table. In this case
after processing the first event, our internal representation of
the Iceberg table is already up-to-date, there is no need to do the
reload 100 times.

We cannot use the internal icebergApiTable_'s metadata location,
as the following statement might silently refresh the metadata
in 'current()':

 icebergApiTable_.operations().current().metadataFileLocation()

To guarantee that we check against the actual loaded metadata
this patch introduces a new member to store the metadata location.

Testing
 * added e2e tests for REFRESH, also for event processing

Change-Id: I16727000cb11d1c0591875a6542d428564dce664
Reviewed-on: http://gerrit.cloudera.org:8080/22432
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
2025-05-09 11:37:01 +00:00
Riza Suminto
3210ec58c5 IMPALA-14006: Bound max_instances in CreateInputCollocatedInstances
IMPALA-11604 (part 2) changes how many instances to create in
Scheduler::CreateInputCollocatedInstances. This works when the left
child fragment of a parent fragment is distributed across nodes.
However, if the left child fragment instance is limited to only 1
node (the case of UNPARTITIONED fragment), the scheduler might
over-parallelize the parent fragment by scheduling too many instances in
a single node.

This patch attempts to mitigate the issue in two ways. First, it adds
bounding logic in PlanFragment.traverseEffectiveParallelism() to lower
parallelism further if the left (probe) side of the child fragment is
not well distributed across nodes.

Second, it adds TQueryExecRequest.max_parallelism_per_node to relay
information from Analyzer.getMaxParallelismPerNode() to the scheduler.
With this information, the scheduler can do additional sanity checks to
prevent Scheduler::CreateInputCollocatedInstances from
over-parallelizing a fragment. Note that this sanity check can also cap
MAX_FS_WRITERS option under a similar scenario.

Added ScalingVerdict enum and TRACE log it to show the scaling decision
steps.

Testing:
- Add planner test and e2e test that exercise the corner case under
  COMPUTE_PROCESSING_COST=1 option.
- Manually comment the bounding logic in traverseEffectiveParallelism()
  and confirm that the scheduler's sanity check still enforces the
  bounding.

Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
Reviewed-on: http://gerrit.cloudera.org:8080/22840
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-07 03:34:15 +00:00
Riza Suminto
cb496104d9 IMPALA-14027: Implement HS2 NULL_TYPE using TStringValue
HS2 NULL_TYPE should be implemented using TStringValue.

However, due to incompatibility with Hive JDBC driver implementation
then, Impala choose to implement NULL type using TBoolValue (see
IMPALA-914, IMPALA-1370).

HIVE-4172 might be the root cause for such decision. Today, the Hive
JDBC (org.apache.hive.jdbc.HiveDriver) does not have that issue anymore,
as shown in this reproduction after applying this patch:

./bin/run-jdbc-client.sh -q "select null" -t NOSASL
Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver
Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl
Executing: select null
----[START]----
NULL
----[END]----
Returned 1 row(s) in 0.343s

Thus, we can reimplement NULL_TYPE using TStringValue to match
HiveServer2 behavior.

Testing:
- Pass core tests.

Change-Id: I354110164b360013d9893f1eb4398c3418f80472
Reviewed-on: http://gerrit.cloudera.org:8080/22852
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-06 19:41:17 +00:00
Riza Suminto
eaee02ffd4 IMPALA-14026: Migrate test files that assert Beeswax dml result.
This patch migrate the remaining test file under testdata/ that verify
the beeswax-specific dml result. The RESULTS section is replaced by
RUNTIME_PROFILE section that validates Partition name, NumModifiedRows,
and sum of NumModifiedRows if necessary.

Added HS2_TYPES section where necessary to let tests pass.

Testing:
Pass core tests.

Change-Id: I36efbe66b654a5af6c44710e55d0b755280ad3be
Reviewed-on: http://gerrit.cloudera.org:8080/22846
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-05-02 20:04:00 +00:00
Eyizoha
faf322dd41 IMPALA-12927: Support specifying format for reading JSON BINARY columns
Currently, Impala always assumes that the data in the binary columns of
JSON tables is base64 encoded. However, before HIVE-21240, Hive wrote
binary data to JSON tables without base64 encoding it, instead writing
it as escaped strings. After HIVE-21240, Hive defaults to base64
encoding binary data when writing to JSON tables and introduces the
serde property 'json.binary.format' to indicate the encoding method of
binary data in JSON tables.

To maintain consistency with Hive and avoid correctness issues caused by
reading data in an incorrect manner, this patch also introduces the
serde property 'json.binary.format' to specify the reading method for
binary data in JSON tables. Currently, this property supports reading in
either base64 or rawstring formats, same as Hive.

Additionally, this patch introduces a query option 'json_binary_format'
to achieve the same effect. This query option will only take effect for
JSON tables where the serde property 'json.binary.format' is not set.
The reading format of binary columns in JSON tables can be configured
globally by setting the 'default_query_options'. It should be noted that
the default value of 'json_binary_format' is 'NONE', and impala will
prohibit reading binary columns of JSON tables that either have
"no 'json.binary.format' set and 'json_binary_format' is 'NONE'" or
"an invalid 'json.binary.format' value set", and will provide an error
message to avoid using an incorrect format without the user noticing.

Testing:
  - Enabled existing binary type E2E tests for JSON tables
  - Added new E2E test for 'json.binary.format'

Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82
Reviewed-on: http://gerrit.cloudera.org:8080/22289
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-29 16:16:12 +00:00
Daniel Becker
c1aac4b3a4 IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views
Some queries involving plain (distinct) UNIONs miss conjuncts, leading
to incorrect results:

Example:
  WITH u1 AS (select 10 a, 10 b),
  t AS (select a, b, min(b) over (partition by a) min_b from u1 UNION
  select 10, 10, 20)
  select t.* from t where t.b = t.min_b;

Expected result:
  +----+----+-------+
  | a  | b  | min_b |
  +----+----+-------+
  | 10 | 10 | 10    |
  +----+----+-------+

Actual result:
  +----+----+-------+
  | a  | b  | min_b |
  +----+----+-------+
  | 10 | 10 | 10    |
  | 10 | 20 | 10    |
  +----+----+-------+

This is caused by MultiAggregateInfo assuming that conjuncts bound by
grouping slots that are produced by SlotRef grouping expressions are
already evaluated below the AggregationNode. However, this is not true
in all cases: with UNIONs, there may be conjuncts that are unassigned
below the AggregationNode.

This may happen if a conjunct cannot be pushed into all operands of a
UNION, because the source tuples in the operands do not contain all of
the slots referenced by the predicate. In the example above, it happens
in the first operand:
  select a, b, min(b) over (partition by a) min_b from u1
The source tuple, 'u1', contains only two slots ('a' and 'b'), but does
not contain a slot corresponding to 'min(b)' - therefore the predicate
't.b = t.min_b' is not bound by the tuple of 'u1'. In theory, the
predicate could still be evaluated directly after materialising the
tuple with 'min(b)', still inside the UNION operand, but Impala
currently does not work that way.

In these cases, the conjuncts need to be evaluated in the
AggregationNode (possibly in addition to some of the UNION operands).

This change fixes this problem by introducing a method in
MultiAggregateInfo: 'setConjunctsToKeep()', where the caller can pass a
list of conjuncts that will not be eliminated. This is called during the
planning of the UNION if there are unassigned conjuncts remaining.

Testing:
 - Added a PlannerTest and an EE test for the case where a conjunct
   was previously incorrectly removed from the AggregationNode.
 - Existing tests cover the case when conjuncts can be safely removed
   from an AggregationNode above a UnionNode because the conjuncts are
   pushed into all union operands, see for example
   https://github.com/apache/impala/blob/6f2d9a2/testdata/workloads/functional-planner/queries/PlannerTest/union.test#L3914

Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3
Reviewed-on: http://gerrit.cloudera.org:8080/22746
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-29 02:04:55 +00:00
Riza Suminto
d8e18fac61 IMPALA-13991: Skip CROSS_JOIN rewrite if subquery is in disjunctive
Inside StmtRewriter.mergeExpr() there is an optimization that set
JoinOperator.CROSS_JOIN under certain scenario. This patch adds criteria
to SKIP that rewrite if subquery is coming from inside disjunctive
expression, regardless of joinConjunct value. If joinConjunct is NOT
NULL, the inlineView maybe correlated through that joinConjunct. If
joinConjunct is NULL, then expr is a (NOT) EXISTS predicate. EXISTS
within a disjunct is not supported yet (see IMPALA-9931).

Testing:
- Add planner and query test for the corner case.
  Before this patch, the query return wrong result.
- Fixed wrong testcase in subquery-rewrite.test.

Change-Id: Iac0deb0b2fb1536684cce2e004156a20b769b9ab
Reviewed-on: http://gerrit.cloudera.org:8080/22815
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-29 02:03:19 +00:00
Gabor Kaszab
3c24706c72 IMPALA-13268: Integrate Iceberg ScanMetrics into Impala query profiles
When calling planFiles() on an Iceberg table, it can give us some
metrics like total planning time, number of data/delete files and
manifests, how many of these could be skipped etc.

This change integrates these metrics into the query profile, under the
"Frontend" section. These metrics are per-table, so if multiple tables
are scanned for the query there will be multiple sections in the
profile.

Note that we only have these metrics for a table if Iceberg needs to be
used for planning for that table, e.g. if a predicate is pushed down to
Iceberg or if there is time travel. For tables where Iceberg was not
used in planning, the profile will contain a short note describing this.

To facilitate pairing the metrics with scans, the metrics header
references the plan node responsible for the scan. This will always be
the top level node for the scan, so it can be a SCAN node, a JOIN node
or a UNION node depending on whether the table has delete files.

Testing:
 - added EE tests in iceberg-scan-metrics.tests
 - added a test in PlannerTest.java that asserts on the number of
   metrics; if it changes in a new Iceberg release, the test will fail
   and we can update our reporting

Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219
Reviewed-on: http://gerrit.cloudera.org:8080/22501
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2025-04-28 08:54:30 +00:00
stiga-huang
7d6fe8c6c8 IMPALA-13487: Add profile counters for memory allocation in parquet scanners
This patch adds some profile counters for memory allocation and free in
MemPools, which are useful to detect tcmalloc contention.

The following counters are added:
 - Thread level page faults: TotalThreadsMinorPageFaults,
   TotalThreadsMajorPageFaults.
 - MemPool counters for tuple_mem_pool and aux_mem_pool of the scratch
   batch in columnar scanners:
    - ScratchBatchMemAllocDuration
    - ScratchBatchMemFreeDuration
    - ScratchBatchMemAllocBytes
 - MemPool counters for data_page_pool of ParquetColumnChunkReader
    - ParquetDataPagePoolAllocBytes
    - ParquetDataPagePoolAllocDuration
    - ParquetDataPagePoolFreeBytes
    - ParquetDataPagePoolFreeDuration
 - MemPool counters for the fragment level RowBatch
    - RowBatchMemPoolAllocDuration
    - RowBatchMemPoolAllocBytes
    - RowBatchMemPoolFreeDuration
    - RowBatchMemPoolFreeBytes
 - Duration in HdfsColumnarScanner::GetCollectionMemory() which includes
   memory allocation for collection values and memcpy when doubling the
   tuple buffer:
    - MaterializeCollectionGetMemTime

Here is an example of a memory-bound query:
  Fragment Instance
    - RowBatchMemPoolAllocBytes: 0 (Number of samples: 0)
    - RowBatchMemPoolAllocDuration: 0.000ns (Number of samples: 0)
    - RowBatchMemPoolFreeBytes: (Avg: 719.25 KB (736517) ; Min: 4.00 KB (4096) ; Max: 4.12 MB (4321922) ; Sum: 1.93 GB (2069615013) ; Number of samples: 2810)
    - RowBatchMemPoolFreeDuration: (Avg: 132.027us ; Min: 0.000ns ; Max: 21.999ms ; Sum: 370.997ms ; Number of samples: 2810)
    - TotalStorageWaitTime: 47.999ms
    - TotalThreadsInvoluntaryContextSwitches: 2 (2)
    - TotalThreadsMajorPageFaults: 0 (0)
    - TotalThreadsMinorPageFaults: 549.63K (549626)
    - TotalThreadsTotalWallClockTime: 9s646ms
      - TotalThreadsSysTime: 1s508ms
      - TotalThreadsUserTime: 1s791ms
    - TotalThreadsVoluntaryContextSwitches: 8.85K (8852)
    - TotalTime: 9s648ms
    ...
    HDFS_SCAN_NODE (id=0):
      - ParquetDataPagePoolAllocBytes: (Avg: 2.36 MB (2477480) ; Min: 4.00 KB (4096) ; Max: 4.12 MB (4321922) ; Sum: 1.02 GB (1090091508) ; Number of samples: 440)
      - ParquetDataPagePoolAllocDuration: (Avg: 1.263ms ; Min: 0.000ns ; Max: 39.999ms ; Sum: 555.995ms ; Number of samples: 440)
      - ParquetDataPagePoolFreeBytes: (Avg: 1.28 MB (1344350) ; Min: 4.00 KB (4096) ; Max: 1.53 MB (1601012) ; Sum: 282.06 MB (295757000) ; Number of samples: 220)
      - ParquetDataPagePoolFreeDuration: (Avg: 1.927ms ; Min: 0.000ns ; Max: 19.999ms ; Sum: 423.996ms ; Number of samples: 220)
      - ScratchBatchMemAllocBytes: (Avg: 486.33 KB (498004) ; Min: 4.00 KB (4096) ; Max: 512.00 KB (524288) ; Sum: 1.19 GB (1274890240) ; Number of samples: 2560)
      - ScratchBatchMemAllocDuration: (Avg: 1.936ms ; Min: 0.000ns ; Max: 35.999ms ; Sum: 4s956ms ; Number of samples: 2560)
      - ScratchBatchMemFreeDuration: 0.000ns (Number of samples: 0)
      - DecompressionTime: 1s396ms
      - MaterializeCollectionGetMemTime: 4s899ms
      - MaterializeTupleTime: 6s656ms
      - ScannerIoWaitTime: 47.999ms
      - TotalRawHdfsOpenFileTime: 0.000ns
      - TotalRawHdfsReadTime: 360.997ms
      - TotalTime: 9s254ms

The fragment instance took 9s648ms to finish. 370.997ms spent in
releasing memory of the final RowBatch. The majority of the time is
spent in the scan node (9s254ms). Mostly it's DecompressionTime +
MaterializeTupleTime + ScannerIoWaitTime + TotalRawHdfsReadTime. The
majority is MaterializeTupleTime (6s656ms).

ScratchBatchMemAllocDuration shows that invoking std::malloc() in
materializing the scratch batches took 4s956ms overall.
MaterializeCollectionGetMemTime shows that allocating memory for
collections and copying memory in doubling the tuple buffer took
4s899ms. So materializing the collections took most of the time.

Note that DecompressionTime (1s396ms) also includes memory allocation
duration tracked by the sum of ParquetDataPagePoolAllocDuration
(555.995ms). So memory allocation also takes a significant portion of
time here.

The other observation is TotalThreadsTotalWallClockTime is much higher
than TotalThreadsSysTime + TotalThreadsUserTime and there is a large
number of TotalThreadsVoluntaryContextSwitches. So the thread is waiting
for resources (e.g. lock) for a long duration. In the above case, it's
waiting for locks in tcmalloc memory allocation (need off-cpu flame
graph to reveal this).

Implementation of MemPool counters
Add MemPoolCounters in MemPool to track malloc/free duration and bytes.
Note that counters are not updated in the destructor since it's expected
that all chunks are freed or transfered before calling the destructor.

MemPool is widely used in the code base. This patch only exposes MemPool
counters in three places:
 - the scratch batch in columnar scanners
 - the ParquetColumnChunkReader of parquet scanners
 - the final RowBatch reset by FragmentInstanceState

This patch also moves GetCollectionMemory() from HdfsScanner to
HdfsColumnarScanner since it's only used by parquet and orc scanners.

PrettyPrint of SummaryStatsCounter is updated to also show the sum of
the values if it's not speeds or percentages.

Tests
 - tested in manually reproducing the memory-bound queries
 - ran perf-AB-test on tpch (sf=42) and didn't see significant
   performance change
 - added e2e tests
 - updated expected files of observability/test_profile_tool.py due to
   SummaryStatsCounter now prints sum in most of the cases. Also updated
   get_bytes_summary_stats_counter and
   test_get_bytes_summary_stats_counter accordingly.

Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710
Reviewed-on: http://gerrit.cloudera.org:8080/22062
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-25 02:53:32 +00:00
Daniel Becker
bc0a92c5ed IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher value
When setting the Iceberg table property 'write.parquet.page-size-bytes'
to a higher value, inserting into the table crashes Impala:

  create table lineitem_iceberg_comment
  stored as iceberg
  tblproperties("write.parquet.page-size-bytes"="1048576")
  as select l_comment from tpch_parquet.lineitem;

The impala executors crash because of memory corruption caused by buffer
overflow in HdfsParquetTableWriter::ColumnWriter::ProcessValue(). Before
attempting to write the next value, it checks whether the total byte
size would exceed 'plain_page_size_', but the buffer into which it
writes ('values_buffer_') has length 'values_buffer_len_'.
'values_buffer_len_' is initialised in the constructor to
'DEFAULT_DATA_PAGE_SIZE', irrespective of the value of
'plain_page_size_'. However, it is intended to have at least the same
size, as can be seen from the check in ProcessValue() or the
GrowPageSize() method. The error does not usually surface because
'plain_page_size_' has the same default value, 'DEFAULT_DATA_PAGE_SIZE'.

'values_buffer_' is also used for DICTIONARY encoding, but that takes
care of growing it as necessary.

This change fixes the problem by initialising 'values_buffer_len_' to
the value of 'plain_page_size_' in the constructor.

This leads to exposing another bug: in BitWriter::PutValue(), when we
check whether the next element fits in the buffer, we multiply
'max_bytes_' by 8, which overflows because 'max_bytes_' is a 32-bit int.
This happens with values that we already use in our tests.

This change changes the type of 'max_bytes_' to int64_t, so multiplying
it by 8 (converting from bytes to bits) is now safe.

Testing:
  - Added an EE test in iceberg-insert.test that reproduced the error.

Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76
Reviewed-on: http://gerrit.cloudera.org:8080/22777
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-04-23 11:18:14 +00:00
Noemi Pap-Takacs
6ec2fb0210 IMPALA-13972: TestIcebergRestCatalog.test_rest_catalog_basic should check erasure coding
TestIcebergRestCatalog.test_rest_catalog_basic used to expect
'NONE' as erasure coding policy in the table metadata.
Checking the actual value with regex enables us to run the test
on any file system with or without erasure coding.

Testing:
 - executed test_rest_catalog_basic with and without erasure coding
Change-Id: I8467d420513ab59916351d25c4787c46fb3cef88
Reviewed-on: http://gerrit.cloudera.org:8080/22801
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-23 02:35:11 +00:00
Zoltan Borok-Nagy
b19331b3d3 IMPALA-13968: Fix TestBinaryTypeInText.test_invalid_binary_type in ARM builds
In some platforms there's a bug in libSASL, which makes sasl_decode64()
accept almost anything as input and it won't report any errors.
See details here:
https://github.com/cyrusimap/cyrus-sasl/issues/619

TestBinaryTypeInText::test_invalid_binary_type uses a data file
that has two values that should be Base64 encoded but they aren't.
The test checks that Impala raises the corresponding errors.

The first value doesn't have a correct size, so Impala's
Base64DecodeBufLen() will reject it. The second one passes the
Base64DecodeBufLen() check so we will try to decode it with
Base64Decode() that uses sasl_decode64() under the hood. If
sasl_decode64() has the bug we won't report any errors for the
second value.

This patch changes the second value to one that still passes
the Base64DecodeBufLen() check, but also sasl_decode64() rejects
it in all platforms. To achieve this, the new value uses the
special '=' character.

The bug in sasl_decode64() is also a known issue in Impala,
see IMPALA-9926.

Alternatively, we could create our own Base64 decoder, or use one
from a different library, but it is out of scope of this patch
as it requires more thought and performance measurements.
IMPALA-13973 tracks this option.

Change-Id: Ifff99fd2839e75add49ee835323764615bd5139a
Reviewed-on: http://gerrit.cloudera.org:8080/22791
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-21 15:23:12 +00:00
Riza Suminto
648209b172 IMPALA-13967: Move away from setting user parameter in execute
ImpalaConnection.execute and ImpalaConnection.execute_async have 'user'
parameter to set specific user to run the query. This is mainly legacy
of BeeswaxConnection, which allows using 1 client to run queries under
different usernames.

BeeswaxConnection and ImpylaHS2Connection actually allow specifying one
user per client. Doing so will simplify user-specific tests such as
test_ranger.py that often instantiates separate clients for admin user
and regular user. There is no need to specify 'user' parameter anymore
when calling execute() or execute_async(). Thus, reducing potential bugs
from forgetting to set one or setting it with incorrect value.

This patch applies one-user-per-client practice as much as possible for
test_ranger.py, test_authorization.py, and test_admission_controller.py.
Unused code and pytest fixtures are removed. Few flake8 issues are
addressed too. Their default_test_protocol() is overridden to return
'hs2'.

ImpylaHS2Connection.execute() and ImpylaHS2Connection.execute_async()
are slightly modified to assume ImpylaHS2Connection.__user if 'user'
parameter in None. BeeswaxConnection remains unchanged.

Extend ImpylaHS2ResultSet.__convert_result_value() to lower case boolean
return value to match beeswax result.

Testing:
Run and pass all modified tests in exhaustive exploration.

Change-Id: I20990d773f3471c129040cefcdff1c6d89ce87eb
Reviewed-on: http://gerrit.cloudera.org:8080/22782
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-18 15:27:22 +00:00
Riza Suminto
182aa5066e IMPALA-13958: Revisit hs2_parquet_constraint and hs2_text_constraint
hs2_parquet_constraint and hs2_text_constraint is meant to extend test
vector dimension to also test non-default test protocol (other than
beeswax), but limit it to only run against 'parquet/none' or 'text/none'
format accordingly.

This patch modifies these constraints to
default_protocol_or_parquet_constraint and
default_protocol_or_text_constraint respectively such that the full file
format coverage happen for default_test_protocol configuration and
limited for the other protocols. Drop hs2_parquet_constraint entirely
from test_utf8_strings.py because that test is already constrained to
single 'parquet/none' file format.

Num modified rows validation in date-fileformat-support.test and
date-partitioning.test are changed to check the NumModifiedRows counter
from profile.

Fix TestQueriesJsonTables to always run with beeswax protocol because
its assertions relies on beeswax-specific return values.

Run impala-isort and fix few flake8 issues and in modified test files.

Testing:
Run and pass the affected test files using exhaustive exploration and
env var DEFAULT_TEST_PROTOCOL=hs2. Confirmed that full file format
coverage happen for hs2 protocol. Note that
DEFAULT_TEST_PROTOCOL=beeswax is still the default.

Change-Id: I8be0a628842e29a8fcc036180654cd159f6a23c8
Reviewed-on: http://gerrit.cloudera.org:8080/22775
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-17 22:50:58 +00:00
Riza Suminto
55feffb41b IMPALA-13850 (part 1): Wait until CatalogD active before resetting
In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().

This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if:

1. It is the active CatalogD, and
2. Gathering thread has collect the first topic update or CatalogD is
   set with catalog_topic_mode other than "minimal".

The later prerequisite is to ensure that all coordinators are not
blocked waiting for full topic update in on-demand metadata mode. This
is all managed by a new thread method TriggerResetMetadata that monitor
and trigger the initial reset metadata.

Note that this is a behavior change in on-demand catalog
mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode
will send full database list in its first catalog topic update. This
behavior change is OK since coordinator can request metadata on-demand.

After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.

Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution (there is no table yet) and when use_local_catalog=true (local
catalog cache does not start with any table). Modified startup flag
checking from reading the actual command line args to reading the
'/varz?json' page of the daemon. Cleanup impala_service.py to fix some
flake8 issues.

Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so
that unique_database cleanup is successful.

Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
  unique_database fixture, and additionally validate /healthz page of
  both active and standby catalogd. Changed it to test using hs2
  protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.
- Pass core tests.

Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Reviewed-on: http://gerrit.cloudera.org:8080/22634
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-17 01:59:54 +00:00
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Riza Suminto
047cf9ff4d IMPALA-13954: Validate num inserted rows via NumModifiedRows counter
This patch changes the way test validate num inserted rows from checking
the beeswax-specific result to checking NumModifiedRows counter from
query profile.

Remove skiping over hs2 protocol in test_chars.py and refactor
test_date_queries.py a bit to reduce test skiping. Added HS2_TYPES in
tests that requires it and fix some flake8 issues.

Testing:
Run and pass all affected tests.

Change-Id: I96eae9967298f75b2c9e4d0662fcd4a62bf5fffc
Reviewed-on: http://gerrit.cloudera.org:8080/22770
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-11 20:31:50 +00:00
Riza Suminto
0ed4e869de IMPALA-13930: ImpylaHS2Connection should only open cursor as needed
Before this patch, ImpylaHS2Connection unconditionally opened a
cursor (and HS2 session) as it connected, followed by running a "SET
ALL" query to populate the default query options.

This patch changes the behavior of ImpylaHS2Connection to open the
default cursor only when querying is needed for the first time. This
helps preserve assertions for a test that is sensitive about client
connection, like IMPALA-13925. Default query options are now parsed from
newly instantiated TQueryOptions object rather than issuing a "SET ALL"
query or making BeeswaxService.get_default_configuration() RPC.

Fix test_query_profile_contains_query_compilation_metadata_cached_event
slightly by setting the 'sync_ddl' option because the test is flaky
without it.

Tweak test_max_hs2_sessions_per_user to run queries so that sessions
will open.

Deduplicate test cases between utc-timestamp-functions.test and
local-timestamp-functions.test. Rename TestUtcTimestampFunctions to
TestTimestampFunctions, and expand it to also tests
local-timestamp-functions.test and
file-formats-with-local-tz-conversion.test. The table_format is now
contrained to 'test/none' because it is unnecessary to permute other
table_format.

Deprecate 'use_local_tz_for_unix_timestamp_conversions' in favor of
query option with the same name. Filed IMPALA-13953 to update the
documentation of 'use_local_tz_for_unix_timestamp_conversions'
flag/option.

Testing:
Run and pass a few pytests such as:
test_admission_controller.py
test_observability.py
test_runtime_filters.py
test_session_expiration.py.
test_set.py

Change-Id: I9d5e3e5c11ad386b7202431201d1a4cff46cbff5
Reviewed-on: http://gerrit.cloudera.org:8080/22731
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-11 04:37:14 +00:00
Zoltan Borok-Nagy
2849708b87 IMPALA-13932: (addendum) Adds e2e test for IMPALA-13932
This patch adds e2e test for IMPALA-13932.

Change-Id: I07537b31717d568422ea042ad2aeef906b3cab2e
Reviewed-on: http://gerrit.cloudera.org:8080/22767
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-10 19:55:22 +00:00
Zoltan Borok-Nagy
6e4613442e IMPALA-13933: run-iceberg-rest-server.sh should use IMPALA_MAVEN_OPTIONS
run-iceberg-rest-server.sh should use IMPALA_MAVEN_OPTIONS.
The easiest way to achieve this is to invoke maven via bin/mvn-quiet.sh

The missing options can cause maven failures.

Change-Id: Ib01ed2ef420b9e965dd8d0e495c370a1ddf323a8
Reviewed-on: http://gerrit.cloudera.org:8080/22736
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-10 15:32:02 +00:00
Peter Rozsa
5c14877b05 IMPALA-13932: Add file path and position-based duplicate check for IcebergMergeNode
IcebergMergeNode's duplicate checking mechanism was based on comparing
pointers of the target table's rows. This mechanism results in
false positives if a new row batch reused the memory of the previous row
batch provided for the merge node. This change adds an additional check
that validates the file position and file path as well.

Change-Id: I71b47414321675958c05438ef3aeeb5df0128033
Reviewed-on: http://gerrit.cloudera.org:8080/22761
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2025-04-09 09:32:03 +00:00
Csaba Ringhofer
6f2d9a24d8 IMPALA-13920: Allow running minicluster with Java 17
IMPALA-11941 allowed building Impala and running tests with Java 17,
but it still uses Java 8 for minicluster components (e.g. Hadoop) and
skips several tests that would restart Hive. It should be possible to
use 17 for everything to be able to deprecate Java 8.

This patch mainly fixes Yarn+Hive+Tez startup issues with java 17 by
setting JAVA_TOOL_OPTIONS.

Another issues fixed is KuduHMSIntegrationTest: this test fails to
restart Kudu due to a bug in OpenJDK (see IMPALA-13856). The current
fix is to remove LD_PRELOAD to avoid loading libjsig (similarly to
the case when MINICLUSTER_JAVA_HOME is set). This works, but it
would be nice to clean up this area in a future patch.

Testing:
- ran exhaustive tests with Java 17
- ran core tests with default Java 8

Change-Id: If58b64a21d14a4a55b12dfe9ea0b9c3d5fe9c9cf
Reviewed-on: http://gerrit.cloudera.org:8080/22705
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2025-04-04 17:50:01 +00:00
Daniel Vanko
841b8c32c8 IMPALA-12107: Throw AnalysisException for unsupported Kudu range-partioning types
Change Precondition check to throwing AnalysisException for illegal key
types in the PARTITION BY RANGE clause.

Testing:
 * add fe tests
 * add e2e tests

Change-Id: I3e3037318065b0f4437045a7e8dbb76639404167
Reviewed-on: http://gerrit.cloudera.org:8080/22542
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2025-04-04 12:01:01 +00:00
Zoltan Borok-Nagy
49aaaa2cd5 IMPALA-13927: Fix crash on invalid BINARY data in TEXT tables
BINARY data in text files are expected to be Base64 encoded.
TextConverter::WriteSlot has a bug when it decodes base64 code,
it does not set the NULL-indicator bit to NULL for the slots of
the invalid BINARY values. Therefore later Tuple::CopyStrings can
try to copy invalid StringValue objects.

This patch fixes TextConverter::WriteSlot to set the NULL-indicator
bit in case of Base64 parse errors.

Testing
 * e2e test added

Change-Id: I79b712e2abe8ce6ecfbce508fd9e4e93fd63c964
Reviewed-on: http://gerrit.cloudera.org:8080/22721
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-03 13:57:24 +00:00
Zoltan Borok-Nagy
bd3486c051 IMPALA-13586: Initial support for Iceberg REST Catalogs
This patch adds initial support for Iceberg REST Catalogs. This means
now it's possible to run an Impala cluster without the Hive Metastore,
and without the Impala CatalogD. Impala Coordinators can directly
connect to an Iceberg REST server and fetch metadata for databases and
tables from there. The support is read-only, i.e. DDL and DML statements
are not supported yet.

This was initially developed in the context of a company Hackathon
program, i.e. it was a team effort that I squashed into a single commit
and polished the code a bit.

The Hackathon team members were:
* Daniel Becker
* Gabor Kaszab
* Kurt Deschler
* Peter Rozsa
* Zoltan Borok-Nagy

The Iceberg REST Catalog support can be configured via a Java properties
file, the location of it can be specified via:
 --catalog_config_dir: Directory of configuration files

Currently only one configuration file can be in the direcory as we only
support a single Catalog at a time. The following properties are mandatory
in the config file:
* connector.name=iceberg
* iceberg.catalog.type=rest
* iceberg.rest-catalog.uri

The first two properties can only be 'iceberg' and 'rest' for now, they
are needed for extensibility in the future.

Moreover, Impala Daemons need to specify the following flags to connect
to an Iceberg REST Catalog:
 --use_local_catalog=true
 --catalogd_deployed=false

Testing
* e2e added to test basic functionlity with against a custom-built
  Iceberg REST server that delegates to HadoopCatalog under the hood
* Further testing, e.g. Ranger tests are expected in subsequent
  commits

TODO:
* manual testing against Polaris / Lakekeeper, we could add automated
  tests in a later patch

Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c
Reviewed-on: http://gerrit.cloudera.org:8080/22353
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-02 20:04:12 +00:00
Daniel Becker
d714798904 IMPALA-13609: Store Iceberg snapshot id for COMPUTE STATS
Currently, when COMPUTE STATS is run from Impala, we set the
'impala.lastComputeStatsTime' table property. Iceberg Puffin stats, on
the other hand, store the snapshot id for which the stats were
calculated. Although it is possible to retrieve the timestamp of a
snapshot, comparing these two values is error-prone, e.g. in the
following situation:

 - COMPUTE STATS calculation is running on snapshot N
 - snapshot N+1 is committed at time T
 - COMPUTE STATS finishes and sets 'impala.lastComputeStatsTime' at time
   T + Delta
 - some engine writes Puffin statistics for snapshot N+1

After this, HMS stats will appear to be more recent even though they
were calculated on snapshot N, while we have Puffin stats for snapshot
N+1.

To make comparisons easier, after this change, COMPUTE STATS sets a new
table property, 'impala.computeStatsSnapshotIds'. This property stores
the snapshot id for which stats have been computed, for each column. It
is a comma-separated list of values of the form
"fieldIdRangeStart[-fieldIdRangeEndIncl]:snapshotId". The fieldId part
may be a single value or a contiguous, inclusive range.

Storing the snapshot ids on a per-column basis is needed because COMPUTE
STATS can be set to calculate stats for only a subset of the columns,
and then a different subset in a subsequent run. The recency of the
stats will then be different for each column.

Storing the Iceberg field ids instead of column names makes the format
easier to handle as we do not need to take care of escaping special
characters.

The 'impala.computeStatsSnapshotIds' table property is deleted after
DROP STATS.

Note that this change does not yet modify how Impala chooses between
Puffin and HMS stats: that will be done in a separate change.

Testing:
 - Added tests in iceberg-compute-stats.test checking that
   'impala.computeStatsSnapshotIds' is set correctly and is deleted
   after DROP STATS
 - added unit tests in IcebergUtilTest.java that check the parsing and
   serialisation of the table property

Change-Id: Id9998b84c4fd20d1cf5e97a34f3553832ec70ae7
Reviewed-on: http://gerrit.cloudera.org:8080/22339
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-01 13:53:43 +00:00
Riza Suminto
b121a40d20 IMPALA-13909: Remove cursor fixture from custom_cluster/test_kudu.py
This patch remove deprecated cursor fixture in
custom_cluster/test_kudu.py. It is replaced with assert_num_row() method
that creates a fresh hs2 client. All test class in
custom_cluster/test_kudu.py now extend CustomKuduTest and use hs2
protocol as default test protocol.

Testing:
- Run and pass custom_cluster/test_kudu.py

Change-Id: I046bf987dd16ecdf493d999e86191d85210f2de5
Reviewed-on: http://gerrit.cloudera.org:8080/22698
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-28 23:27:58 +00:00
Daniel Vanko
76a8efb155 IMPALA-13859: Add decimal to Kudu's supported primary key types
Add DECIMAL to KuduUtil.isSupportedKeyType because it is a supported
type for primary key by Kudu.

Testing:
 * add fe tests
 * add e2e tests

Change-Id: I0e8685fe89e4e4e511ab1d815c51ca5a021b4081
Reviewed-on: http://gerrit.cloudera.org:8080/22674
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2025-03-28 17:39:18 +00:00
Mihaly Szjatinya
356b7e5ddf IMPALA-11597: Unset impala.lastComputeStatsTime during DROP STATS
Removes 'impala.lastComputeStatsTime' property from table on DROP STATS
catalog operation. Does not affect the incremental variant:
'DROP INCREMENTAL STATS t PARTITION (partition_spec)'

Change-Id: Id743bc9779141df7a26a95a0ea201ef6fc6aeff9
Reviewed-on: http://gerrit.cloudera.org:8080/22474
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-14 18:06:39 +00:00
Yida Wu
1c6ff5f98d IMPALA-13812: Fail query for certain errors related to AI functions
The ai_generate_text() and ai_generate_text_default() functions
return error message as a result (string) which could be
misleading in some cases. This patch fixes this issue by setting
the error in the context as a udf error, causing the query to
fail in cases of configuration related errors or http errors
when accessing the AI endpoint.

Tests:
Ran core tests.
Added custom testcase TestAIGenerateText for failure cases
with ai_generate_text_default().
Added testcase TestExprs.test_ai_generate_text_exprs for
failure cases with ai_generate_text().

Change-Id: I639e48e64d62f7990cf9a3c35a59a0ee3a2c64e0
Reviewed-on: http://gerrit.cloudera.org:8080/22588
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-14 05:21:25 +00:00
Zoltan Borok-Nagy
efc921a83c IMPALA-13836: Fix TestIcebergV2Table.test_missing_data_files failure in erasure-coding build
test_missing_data_files issues DESCRIBE FORMATTED command to check
the attributes of a table that has missing data files. It also
checks the value of 'Erasure Coding Policy' which depends on the
HDFS configuration. This patch removes the check for this property.

Testing
 * Verified fix in build that has erasure-coding enabled

Change-Id: I645e8922cbb992424058a2256114c66ca8ec4c3c
Reviewed-on: http://gerrit.cloudera.org:8080/22619
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-13 21:02:50 +00:00
Riza Suminto
7a5adc15ca IMPALA-13333: Limit memory estimation if PlanNode can spill
SortNode, AggregationNode, and HashJoinNode (the build side) can spill
to disk. However, their memory estimation does not consider this
capability and assumes it will hold all rows in memory. This causes
memory overestimation if cardinality is also overestimated. In reality,
the whole query execution in a single host is often subject to much
lower memory upper-bound and not allowed to exceed it.

This upper-bound is dictated by, but not limited to:
- MEM_LIMIT
- MEM_LIMIT_COORDINATORS
- MEM_LIMIT_EXECUTORS
- MAX_MEM_ESTIMATE_FOR_ADMISSION
- impala.admission-control.max-query-mem-limit.<pool_name>
  from admission control.

This patch adds SpillableOperator interface that defines an alternative
of either PlanNode.computeNodeResourceProfile() or
DataSink.computeResourceProfile() if a lower memory upper-bound can be
reasoned about from configs mentioned above. This interface is applied
to SortNode, AggregationNode, HashJoinNode, and JoinBuildSink.

The in-memory vs spill-to-disk bias is controlled through
MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR option. A scale between
[0.0,1.0] to control estimate peak memory of query operator that has
spill-to-disk capabilities. Setting value closer to 1.0 will make
Planner bias towards keeping as much rows as possible in memory, while
setting value closer to 0.0 will make Planner bias towards spilling rows
to disk under memory pressure. Note that lowering
MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR can make query that previously
rejected by Admission Controller becomes admittable, but also may
spill-to-disk more and have higher risk to exhaust scratch space more
than before.

There are some caveats on this memory bounding patch:
- It checks if spill-to-disk is enabled in the coordinator, but
  individual backend executors might not have it configured. Mismatch of
  spill-to-disk configs between the coordinator and backend executor,
  however, is rare and can be considered as misconfiguration.
- It does not check the actual total scratch space available to
  spill-to-disk. However, query execution will be forced to spill anyway
  if memory usage exceeds the three memory configs above. Raising
  MEM_LIMIT / MEM_LIMIT_EXECUTORS option can help increase memory
  estimation and increase the likelihood for the query to get assigned
  to a larger executor group set, which usually has a bigger total
  scratch space.
- The memory bound is divided evenly among all instances of a fragment
  kind in a single host. But in theory, they should be able to share
  and grow their memory usage independently beyond memory estimate as
  long max memory reservation is not set.
- This does not consider other memory-related configs such as
  clamp_query_mem_limit_backend_mem_limit or disable_pool_mem_limits
  flag. But the admission controller will still enforce them if set.

Testing:
- Pass FE and custom cluster tests with core exploration.

Change-Id: I290c4e889d4ab9e921e356f0f55a9c8b11d0854e
Reviewed-on: http://gerrit.cloudera.org:8080/21762
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-03-13 18:38:27 +00:00
Zoltan Borok-Nagy
8093c3fa6b IMPALA-13854: IcebergPositionDeleteChannel uses incorrect capacity
IcebergPositionDeleteChannel uses incorrect capacity since IMPALA-13509.
It is set to -1 which means it collects delete records as long as it
runs out of memory. This patch moves the Channel's capacity calculation
from the Init() function to the constructor.

Testing
 * e2e test added

Change-Id: I207869c97a699d2706227285595ec7d7dbe1e249
Reviewed-on: http://gerrit.cloudera.org:8080/22616
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-13 14:40:36 +00:00
Zoltan Borok-Nagy
a49ff618f1 IMPALA-13853: Don't adjust Iceberg field IDs for data files that don't have complex types
In migrated Iceberg tables we can have data files with missing field
IDs. We assume that their schema corresponds to the table schema at the
point when the table migration happened. This means during runtime we
can generate the field ids. The logic is more complicated when there are
complex types in the table and the table is partitioned. In such cases
we need to do some adjustments during field ID generation, in which case
we verify that the file schema corresponds to the table schema.

These adjustments are not needed when the table doesn't have complex
types, hence we can be a bit more relaxed and skip schema verification,
because field ID generation for top-level columns are not affected.
This means Impala would still be able to read the table if there were
trivial schema changes before migration.

With this change we allow all data files that have a compatible schema
with the table schema, which was the case before IMPALA-13364. This
behavior is also aligned with Hive.

Testing:
 * e2e tests added for both Parquet and ORC files

Change-Id: Ib1f1d0cf36792d0400de346c83e999fa50c0fa67
Reviewed-on: http://gerrit.cloudera.org:8080/22610
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-12 12:51:16 +00:00
Peter Rozsa
167ced7844 IMPALA-13674: Enable MERGE statement for Iceberg tables with equality deletes
This change fixes the delete expression calculation for
IcebergMergeImpl, when an Iceberg table contains equality deletes, the
merge implementation now includes the data sequence number in the result
expressions as the underlying tuple descriptor also includes it
implicitly. Without including this field, the row evaluation fails
because of the mismatching number of evaluators and slot descriptors.

Tests:
 - manually validated on an Iceberg table that contains equality delete
 - e2e test added

Change-Id: I60e48e2731a59520373dbb75104d75aae39a94c1
Reviewed-on: http://gerrit.cloudera.org:8080/22423
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-06 19:20:52 +00:00
Zoltan Borok-Nagy
d928815d1a IMPALA-13654: Tolerate missing data files of Iceberg tables
Before this patch we got a TableLoadingException for missing data files.
This means the IcebergTable will be in an incomplete state in Impala's
memory, therefore we won't be able to do any operation on it.

We should continue table loading in such cases, and only throw exception
for queries that are about to read the missing data files.

This way ROLLBACK / DROP PARTITION, and some SELECT statements should
still work.

If Impala is running in strict mode via CatalogD flag
--iceberg_allow_datafiles_in_table_location_only, and an Iceberg table
has data files outside of table location, we still raise an exception
and leave the table in an unloaded state. To retain this behavior, the
IOException we threw is substituted to TableLoadingException which fits
better to logic errors anyway.

Testing
 * added e2e tests

Change-Id: If753619d8ee1b30f018e90157ff7bdbe5d7f1525
Reviewed-on: http://gerrit.cloudera.org:8080/22367
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-03 17:50:04 +00:00
Abhishek Rawat
7e8c79df53 IMPALA-13792: Cross compile AI functions
ai_generate_text() and ai_generate_text_default() are not
cross-compiled and so these could lead to undefined symbols when
codegen is enabled. This patch cross-compiles these functions.

Testing:
- Added e2e tests for ai_generate_text and ai_generate_text_default
functions and these are run with codegen enabled/disabled.

Change-Id: I454657d9f1345a36b269e6b837aaecf55a09add0
Reviewed-on: http://gerrit.cloudera.org:8080/22552
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-02 08:51:57 +00:00
Noemi Pap-Takacs
79f51b018f IMPALA-12588: Don't UPDATE rows that already have the desired value
When UPDATEing an Iceberg or Kudu table, we should change as few rows
as possible. In case of Iceberg tables it means writing as few new
data records and delete records as possible.
Therefore, if rows already have the new values we should just ignore
them. One way to achieve this is to add extra predicates, e.g.:

  UPDATE tbl SET k = 3 WHERE i > 4;
    ==>
  UPDATE tbl SET k = 3 WHERE i > 4 AND k IS DISTINCT FROM 3;

So we won't write new data/delete records for the rows that already have
the desired value.

Explanation on how to create extra predicates to filter out these rows:

If there are multiple assignments in the SET list, we can only skip
updating a row if all the mentioned values are already equal.
If either of the values needs to be updated, the entire row does.
Therefore we can think of the SET list as predicates connected with AND
and all of them need to be taken into consideration.
To negate this SET list, we have to negate the individual SET
assignments and connect them with OR.
Then add this new compound predicate to the original where predicates
with an AND (if there were none, just create a WHERE predicate from it).

                AND
              /     \
      original        OR
 WHERE predicate    /    \
                  !a       OR
                         /    \
                       !b     !c

This simple graph illustrates how the where predicate is rewritten.
(Considering an UPDATE statement that sets 3 columns.)
'!a', '!b' and '!c' are the negations of the individual assignments in
the SET list. So the extended WHERE predicate is:
(original WHERE predicate) AND (!a OR !b OR !c)
To handle NULL values correctly, we use IS DISTINCT FROM instead of
simply negating the assignment with operator '!='.

If the assignments contain UDFs, the result might be inconsistent
because of possible non-deterministic values or state in the UDFs,
therefore we should not rewrite the WHERE predicate at all.

Evaluating expressions can be expensive, therefore this optimization
can be limited or switched off entirely using the Query Option
SKIP_UNNEEDED_UPDATES_COL_LIMIT. By default, there is no filtering
if more than 10 assignments are in the SET list.

-------------------------------------------------------------------
Some performance measurements on a tpch lineitem table:

- predicates in HASH join, all updates can be skipped
Q1/[Q2] (Similar, but Q2 adds extra 4 items to the SET list):
update t set t.l_suppkey = s.l_suppkey,
[ t.l_partkey=s.l_partkey,
  t.l_quantity=s.l_quantity,
  t.l_returnflag=s.l_returnflag,
  t.l_shipmode=s.l_shipmode ]
from ice_lineitem t join ice_lineitem s
on t.l_orderkey=s.l_orderkey and t.l_linenumber=s.l_linenumber;

- predicates in HASH join, all rows need to be updated
Q3: update t set
 t.l_suppkey = s.l_suppkey,
 t.l_partkey=s.l_partkey,
 t.l_quantity=s.l_quantity,
 t.l_returnflag=s.l_returnflag,
 t.l_shipmode=concat(s.l_shipmode,' ')
 from ice_lineitem t join ice_lineitem s
 on t.l_orderkey=s.l_orderkey and t.l_linenumber=s.l_linenumber;

- predicates pushed down to the scanner, all rows updated
Q4/[Q5] (Similar, but Q5 adds extra 8 items to the SET ist):
update ice_lineitem set
[ l_suppkey = l_suppkey + 0,
  l_partkey=l_partkey + 0,
  l_quantity=l_quantity,
  l_returnflag=l_returnflag,
  l_tax = l_tax,
  l_discount= l_discount,
  l_comment = l_comment,
  l_receiptdate = l_receiptdate, ]
l_shipmode=concat(l_shipmode,' ');

+=======+============+==========+======+
| Query | unfiltered | filtered | diff |
+=======+============+==========+======+
| Q1    |       4.1s |     1.9s | -54% |
+-------+------------+----------+------+
| Q2    |       4.2s |     2.1s | -50% |
+-------+------------+----------+------+
| Q3    |       4.3s |     4.7s | +10% |
+-------+------------+----------+------+
| Q4    |       3.0s |     3.0s | +0%  |
+-------+------------+----------+------+
| Q5    |       3.1s |     3.1s | +0%  |
+-------+------------+----------+------+

The results show that in the best case (we can skip all rows)
this change can cause significant perf improvement ~50%, since
0 rows were written. See Q1 and Q2.
If the predicates are evaluated in the join operator, but there were
no matches (worst case scenario) we can lose about 10%. (Q3)
If all the predicates can be pushed down to the scanners, the change
does not seem to cause significant difference (~0% in Q4 and Q5)
even if all rows have to be updated.

Testing:
 - Analysis
 - Planner
 - E2E
 - Kudu
 - Iceberg
 - testing the new query option: SKIP_UNNEEDED_UPDATES_COL_LIMIT
Change-Id: I926c80e8110de5a4615a3624a81a330f54317c8b
Reviewed-on: http://gerrit.cloudera.org:8080/22407
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-28 05:52:21 +00:00
Michael Smith
768527c89a IMPALA-13804: Use redacted statement in live table
Uses the redacted SQL statement in sys.impala_query_live, so it's
consistent with the profile and sys.impala_query_log.

Testing: added a test with redaction rules for live and log tables.

Change-Id: I9a72eeaea84981e96655aec6c67b5ef2cbbd3c3e
Reviewed-on: http://gerrit.cloudera.org:8080/22556
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-27 22:39:44 +00:00
Pranav Lodha
4c549d79f2 IMPALA-12992: Support for Hive JDBC Storage handler tables
This is an enhancement request to support JDBC tables
created by Hive JDBC Storage handler. This is essentially
done by making JDBC table properties compatible with
Impala. It is done by translating when loading the table,
and maintaining that only in the Impala cluster, i.e. it's
not written back to HMS.

Impala includes JDBC drivers for PostgreSQL and MySQL
making 'driver.url' not mandatory in such cases. The
Impala JDBC driver is still required for Impala-to-Impala
JDBC connections. Additionally, Hive allows adding database
driver JARs at runtime via Beeline, enabling users to
dynamically include JDBC driver JARs. However, Impala does
not support adding database driver JARs at runtime,
making the driver.url field still useful
in cases where additional drivers are needed.

'hive.sql.query' property is not handled in this patch.
It'll be covered in a separate jira.

Testing: End-to-end tests are included in
test_ext_data_sources.py.

Change-Id: I1674b93a02f43df8c1a449cdc54053cc80d9c458
Reviewed-on: http://gerrit.cloudera.org:8080/22134
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-27 11:44:38 +00:00
Riza Suminto
f59c0917fe IMPALA-13786: Skip rewriting expr of Hive auto-generated label
IMPALA-10836 introduced the SimplifyCastExprRule optimization to
simplify CAST expressions. However, applying this rewrite rule over
expression referred by Hive auto-generated label has caused
AnalysisException like following:

AnalysisException: Could not resolve column/field reference:
'failing_view._c0'

It is most likely that, before IMPALA-10836, expression referred by Hive
auto-generated label never effectively being rewritten. Thus, the
ExprSubstitutionMap across multiple InlineViewRef was intact.

This patch attempt to fix the issue by making any expression in
SelectList that mapped to Hive auto-generated label ineligible for any
kind of expression rewrite.

Also addressed some flake8 errors in test_views_compatibility.py.

Testing:
- Add test case in views-compatibility.test.
- Break test_view_compatibility_hive into 3 separate tests.
- Refactor test_views_compatibility.py to run both EXPLAIN and SELECT
  query over the test view.
- Pass test_views_compatibility.py in exhaustive exploration.
- Pass core tests.

Change-Id: I4b8bbd0afd6da0532bf2ef460989d4f01337d198
Reviewed-on: http://gerrit.cloudera.org:8080/22546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-27 05:07:53 +00:00
Surya Hebbar
83452d640f IMPALA-13751: Fix runtime-profile-test failure since IMPALA-13304
This patch fixes the flaky tests added to the 'runtime-profile-test'
in IMPALA-13304.

The following possible causes have been fixed.
- Lexical errors present while defining example event sequence labels
- Edge cases providing invalid randomization ranges
- Improper re-allocation of instance timestamps
- Inconsistent sleep timers during event sequence generation

The test fixture's coverage for cases of complete and missing events
have been refactored into a common validation method.

To cover different scenarios of missing events across instances for
sorting and alignment randomized tests have been kept.

For simulating missing events, the test fixture uses bernoulli's
distribution with two possible probability values 0.5 and 0.8.
This generates missing events with an approximate certainity of ~99.99%.

Also, irrespective of the presence of missing events the test runs
successfully.

Still, to ensure even more certainity(>99.99%), now the test fixture
retries the setup for a maximum of 3 times.

The randomly generated values are also included as parameters within
the JUnitXML.

Updated observability tests with latest JSON profile expected outputs.

Tested locally with high parallelism and resource exhaustion using
GNU's parallel.

i.e. parallel -j 100 <test command> ::: {1..100}

Change-Id: If04744e215cac79f255b3d73c3e91e873c13749a
Reviewed-on: http://gerrit.cloudera.org:8080/22482
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-02-21 15:48:50 +00:00
Daniel Becker
14a1bd73e3 IMPALA-13770: Updating Iceberg tables with UDFs crashes Impala
When using a native UDF in the target value of an UPDATE statement or in
a filter predicate or target value of a MERGE statement, Impala crashes
with the following DCHECK:

  be/src/exprs/expr.cc:47 47 DCHECK(cache_entry_ == nullptr);

This DCHECK is in the destructor of Expr, and it fires because Close()
has not been called for the expression. In the UPDATE case this is
caused by MultiTableSinkConfig: it creates child DataSinkConfig objects
but does not call Close() on them, and consequently these child sink
configs do not call Close() on their output expressions.
In the MERGE case it is because various expressions are not closed in
IcebergMergeCase and IcebergMergeNode.

This patch fixes the issue by overriding Close() in MultiTableSinkConfig,
calling Close() on the child sinks as well as closing the expressions in
IcebergMergeCase and IcebergMergeNode.

Testing:
 - Added EE regression tests for the UPDATE and MERGE cases in
   iceberg-update-basic.test and iceberg-merge.test

Change-Id: Id86638c8d6d86062c68cc9d708ec9c7b0a4e95eb
Reviewed-on: http://gerrit.cloudera.org:8080/22508
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-21 15:10:37 +00:00
Zoltan Borok-Nagy
dacc447938 IMPALA-13768: Redundant Iceberg delete records are shuffled around which cause error "Invalid file path arrived at builder"
IcebergDeleteBuilder assumes that it should only receive delete
records for paths of data files that are scheduled for its
corresponding SCAN operator.

It is not true when any of the following happens:
* number of output channels in sender is 1
  (currently no DIRECTED mode, no filtering)
* hit bug in DIRECTED mode, see below
* single node plan is being used (no DIRECTED mode, no filtering)

With this patch, KrpcDataStreamSender::Send() will use DIRECTED mode
even if number of output channels is 1. It also fixes the bug in
DIRECTED mode (which was due to an unused variable 'skipped_prev_row')
and simplified the logic a bit.

The patch also relaxes the assumption in IcebergDeleteBuilder, i.e.
only return error for dangling delete records when we are in a
distributed plan where we can assume DIRECTED distribution mode of
position delete records.

Testing
 * added e2e tests

Change-Id: I695c919c9a74edec768e413a02b2ef7dbfa0d6a5
Reviewed-on: http://gerrit.cloudera.org:8080/22500
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-20 14:05:57 +00:00
Riza Suminto
9cb9bae84e IMPALA-13758: Use context manager in ImpalaTestSuite.change_database
ImpalaTestSuite.change_database is responsible to point impala client to
database under test. However, it left client pointing to that database
after the test without reverting them back to default database. This
patch does the reversal by changing ImpalaTestSuite.change_database to
use context manager.

This patch change the behavior of execute_query_using_client() and
execute_query_async_using_client(). They used to change database
according to the given vector parameter, but not anymore after this
patch. In practice, this behavior change does not affect many tests
because most queries going through these functions already use fully
qualified table name. Going forward, querying through function other
than run_test_case() should try to use fully qualified table name as
much as possible.

Retain behavior of ImpalaTestSuite._get_table_location() since there are
considerable number of tests relies on it (changing database when
called).

Removed unused test fixtures and fixed several flake8 issues in modified
test files.

Testing:
- Moved nested-types-subplan-single-node.test. This allows the test
  framework to point to the right tpch_nested* database.
- Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will
  be fixed in separate patch.

Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2
Reviewed-on: http://gerrit.cloudera.org:8080/22487
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-19 23:50:34 +00:00
Michael Smith
1b6395b8db IMPALA-13627: Handle legacy Hive timezone conversion
After HIVE-12191, Hive has 2 different methods of calculating timestamp
conversion from UTC to local timezone. When Impala has
convert_legacy_hive_parquet_utc_timestamps=true, it assumes times
written by Hive are in UTC and converts them to local time using tzdata,
which matches the newer method introduced by HIVE-12191.

Some dates convert differently between the two methods, such as
Asia/Kuala_Lumpur or Singapore prior to 1982 (also seen in HIVE-24074).
After HIVE-25104, Hive writes 'writer.zone.conversion.legacy' to
distinguish which method is being used. As a result there are three
different cases we have to handle:
1. Hive prior to 3.1 used what’s now called “legacy conversion” using
   SimpleDateFormat.
2. Hive 3.1.2 (with HIVE-21290) used a new Java API that’s based on
   tzdata and added metadata to identify the timezone.
3. Hive 4 support both, and added a new file metadata to identify it.

Adds handling for Hive files (identified by created_by=parquet-mr) where
we can infer the correct handling from Parquet file metadata:
1. if writer.zone.conversion.legacy is present (Hive 4), use it to
   determine whether to use a legacy conversion method compatible with
   Hive's legacy behavior, or convert using tzdata.
2. if writer.zone.conversion.legacy is not present but writer.time.zone
   is, we can infer it was written by Hive 3.1.2+ using new APIs.
3. otherwise it was likely written by an earlier Hive version.

Adds a new CLI and query option - use_legacy_hive_timestamp_conversion -
to select what conversion method to use in the 3rd case above, when
Impala determines that the file was written by Hive older than 3.1.2.
Defaults to false to minimize changes in Impala's behavior and because
going through JNI is ~50x slower even when the results would not differ;
Hive defaults to true for its equivalent setting:
hive.parquet.timestamp.legacy.conversion.enabled.

Hive legacy-compatible conversion uses a Java method that would be
complicated to mimic in C++, doing

  DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
  formatter.setTimeZone(TimeZone.getTimeZone(timezone_string));
  java.util.Date date = formatter.parse(date_time_string);
  formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
  return out.println(formatter.format(date);

IMPALA-9385 added a check against a Timezone pointer in
FromUnixTimestamp. That dominates the time in FromUnixTimeNanos,
overriding any benchmark gains from IMPALA-7417. Moves FromUnixTime to
allow inlining, and switches to using UTCPTR in the benchmark - as
IMPALA-9385 did in most other code - to restore benchmark results.

Testing:
- Adds JVM conversion method to convert-timestamp-benchmark.
- Adds tests for several cases from Hive conversion tests.

Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067
Reviewed-on: http://gerrit.cloudera.org:8080/22293
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-02-18 16:33:39 +00:00