Files
Surya Hebbar 7ad7a86c0e IMPALA-13624: Implement textual representation for aggregate event sequences
This adds support for a summarized textual representation of timestamps
for the event sequences present in the aggregated profile.

With the verbose format present in profile V1 and V2, it becomes
difficult to analyze an event's timestamps across instances.

The event sequences are now displayed in a histogram format, based on
the number of timestamps present, in order to support an easier view
for skew analysis and other possible use cases.
(i.e. based on json_profile_event_timestamp_limit)

The summary generated from aggregated instance-level timestamps
(i.e. IMPALA-13304) is used to achieve this within the profile V2,
which covers the possbility of missing events.

Example,
  Verbosity::DEFAULT
  json_profile_event_timestamp_limit = 5 (default)

  Case #1, Number of instances exceeded limit
    Node Lifecycle Event Timeline Summary :
     - Open Started (4s880ms):
        Min: 2s312ms, Avg: 3s427ms, Max: 4s880ms, Count: 12
        HistogramCount: 4, 4, 0, 0, 4

  Case #2, Number of instances within the limit

    Node Lifecycle Event Timeline:
     - Open Started: 5s885ms, 1s708ms, 3s434ms
     - Open Finished: 5s885ms, 1s708ms, 3s435ms
     - First Batch Requested: 5s885ms, 1s708ms, 3s435ms
     - First Batch Returned: 6s319ms, 2s123ms, 3s570ms
     - Last Batch Returned: 7s878ms, 2s123ms, 3s570ms

With Verbosity::EXTENDED or more, all events and timestamps are printed
with full verbosity as before.

Tests:
For test_profile_tool.py, updated the generated outputs for text
and JSON profiles.

Change-Id: I4bcc0e2e7fccfa8a184cfa8a3a96d68bfe6035c0
Reviewed-on: http://gerrit.cloudera.org:8080/22245
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-14 00:21:54 +00:00
..

impala_profile_log_tpcds_compute_stats:
An impala profile log with the main query and child queries for
compute stats tpcds_parquet.store_sales
This log has associated expected output files (with "expected") in the filename
that were generated by impala-profile-tool with various output formats and verbosity
levels. These are used in test_profile_tool.py.


impala_profile_log_tpcds_compute_stats_v2:
An impala profile log with the main query and child queries for
compute stats tpcds_parquet.store_sales from an impalad running with
--gen_experimental_profile=true.
This log has associated expected output files (with "expected") in the filename
that were generated by impala-profile-tool with various output formats and verbosity
levels. These are used in test_profile_tool.py.