Files
impala/tests/observability/test_profile_tool.py
Surya Hebbar a68717cac9 IMPALA-13304: Include aggregate instance-level metrics in JSON profile
The JSON representation of the aggregated profile or
`AggregatedRuntimeProfile` excludes some instance-level metrics
(e.g. timeseries counters, event sequences, etc) in order to limit
the profile size from growing rapidly.

In contrast, the traditional profile contains associated metrics
from root profiles(`RuntimeProfile`s) of all instances and
the aggregated profile.

The experimental profile only contains the aggregated profile.
Hence, the instance-level metrics are not present.

This patch introduces some of the aggregated instance-level metrics
to the experimental profile's JSON representation without allowing
the profile size to grow rapidly.

It also provides insights into instances with unreported or missing events.

The following attributes have been exposed to the JSON form after grouping
or aggregation.

- Aggregated Event Sequences
- Aggregated Info Strings (i.e. Table Name)

The timestamps across instances for a particular event are grouped
when the number of instances is small and aggregated otherwise,
in order to maintain the profile size and facilitate analysis.

This behavior is controlled by the json_profile_event_timestamp_limit,
which defaults to 5.

For events where the number of timestamps exceeds this limit,
they are grouped into buckets of the same size.

These buckets are spans divided evenly between the minimum and
maximum timestamps for the event. With the default limit of 5,
this results in spans of 20%.

The following aggregates are then calculated for each of these buckets,
to provide a clear and efficient summary of the data.
* Maximum timestamp
* Minimum timestamp
* Average timestamp
* Total no. of instances

The aggregate metrics are calculated with minimal overhead through
assignment to a particular division without the need for sorting,
resulting in a time complexity of O(n) with only two passes through
the entire list of timestamps.

To further optimize performance, the aggregates are computed by
circumventing the need to store each division's timestamps utilizing
only the memory required for a single value per metric, instead of
the entire range of values, while reusing previously allocated vectors.

For efficiently copying the calculated values without internally
reallocating on each insertion, memory is preallocated for each array
of metrics using RapidJSON library.

In the case of missing events, the timestamps are ordered and aligned
through the analysis of 'label_idxs'. If at least one instance contains
a complete set of events, all instances with missing timestamps are
ordered and aligned efficiently by referencing the reordering of labels.
Otherwise, the initial ordering and alignment are retained.

If any fragment instances report only a subset of events due to failure
or error, such instances are reported and only the unavailable timestamps
are skipped during the aggregate metrics calculation, while leveraging
the available timestamps.

The instances containing missing events are further recorded into the
"unreported_event_instance_idxs" field within the event sequence. These
indexes for instances are based on 'exec_params_' set during execution.
Please refer to IMPALA-13555 for further details.

All of the above logic has been encapsulated into the newly added
`ToJson` method within the `AggEventSequence` struct, prioritizing better
reuse and maintainability.

Structure of the `AggEventSequence` in JSON form -
{
  profile_name : <PLAN_NODE_NAME>,
  num_children : <NUM_CHILDREN>
  node_metadata : <NODE_METADATA_OBJECT>
  event_sequences :
  [{
    events : // An example event
    [{
      label : "Open Started""
      ts_list : [ 2257887941, <other instances' timestamps> ]
       // OR
      ts_stat :
      {
        min : [ 2257887941, ...<other divisions' minimum timestamps> ],
        max : [ 3257887941, ...<other divisions' maximum timestamps> ],
        avg : [ 2757887941, ...<other divisions' average timestamps> ]
        count : [ 2, ... <other counts of divisions' no. of instances> ]
      }
    }, <...other plan node's events>
    ],
    // This field is only included, if there are unreported events
    unreported_event_instance_idxs : [ 3, 5, 0 ]
  }],
  counters : <COUNTERS_OBJECT_ARRAY>,
  child_profiles : <CHILD_PROFILES>
}

Structure of `AggInfoStrings` in JSON form.
{
  profile_name : <PLAN_NODE_NAME>,
  num_children : <NUM_CHILDREN>
  node_metadata : <NODE_METADATA_OBJECT>
  "info_strings" :
  [{
    "key": "<info string's key>",
    "values": [<distinct info string values>]
  }]
  counters : <COUNTERS_OBJECT_ARRAY>,
  child_profiles : <CHILD_PROFILES>
}

Note: In the above structures, unlike a plan node's profile,
a fragment's profile does not contain the 'node_metadata' field.

Added unit tests for the serialization of aggregated metrics -
- Added tests for handling info strings in aggregated JSON profiles
- Introduced AggregatedEventSequenceToJsonTest fixture to validate
  event sequence serialization
- Added random profile generation for varied test conditions
- Covered scenarios for complete and missing events in both aggregated
  and grouped cases
- Ensured correct JSON structure for info strings and event sequences
- Ensured proper timestamp ordering and aggregation logic in serialized
  JSON profiles

Generated the latest expected JSON profile outputs from the
'impala-profile-tool' using the stored impala profile logs.

Added additional tests in tests/observability for profile v2's
JSON output, after inclusion of the new expected JSON profile
formats.

Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
Reviewed-on: http://gerrit.cloudera.org:8080/21683
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-07 11:54:35 +00:00

92 lines
4.4 KiB
Python

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from __future__ import absolute_import, division, print_function
import os.path
import tempfile
from subprocess import check_call
from tests.common.environ import impalad_basedir
from tests.common.base_test_suite import BaseTestSuite
IMPALA_HOME = os.environ['IMPALA_HOME']
def get_profile_path(filename):
return os.path.join(IMPALA_HOME, 'testdata/impala-profiles/', filename)
class TestProfileTool(BaseTestSuite):
def test_text_output(self):
# Test text profiles with different verbosity levels.
self._compare_profile_tool_output([],
get_profile_path('impala_profile_log_tpcds_compute_stats'),
get_profile_path('impala_profile_log_tpcds_compute_stats.expected.txt'))
self._compare_profile_tool_output(['--profile_verbosity=default'],
get_profile_path('impala_profile_log_tpcds_compute_stats'),
get_profile_path('impala_profile_log_tpcds_compute_stats_default.expected.txt'))
self._compare_profile_tool_output(['--profile_verbosity=extended'],
get_profile_path('impala_profile_log_tpcds_compute_stats'),
get_profile_path('impala_profile_log_tpcds_compute_stats_extended.expected.txt'))
def test_text_output_profile_v2(self):
# Test text profiles with different verbosity levels.
self._compare_profile_tool_output(['--profile_verbosity=default'],
get_profile_path('impala_profile_log_tpcds_compute_stats_v2'),
get_profile_path(
'impala_profile_log_tpcds_compute_stats_v2_default.expected.txt'))
self._compare_profile_tool_output(['--profile_verbosity=extended'],
get_profile_path('impala_profile_log_tpcds_compute_stats_v2'),
get_profile_path(
'impala_profile_log_tpcds_compute_stats_v2_extended.expected.txt'))
def test_json_output(self):
# Test JSON profiles with different verbosity levels.
self._compare_profile_tool_output(['--profile_format=json'],
get_profile_path('impala_profile_log_tpcds_compute_stats'),
get_profile_path('impala_profile_log_tpcds_compute_stats.expected.json'))
self._compare_profile_tool_output(['--profile_format=prettyjson'],
get_profile_path('impala_profile_log_tpcds_compute_stats'),
get_profile_path('impala_profile_log_tpcds_compute_stats.expected.pretty.json'))
self._compare_profile_tool_output(['--profile_format=prettyjson',
'--profile_verbosity=extended'],
get_profile_path('impala_profile_log_tpcds_compute_stats'),
get_profile_path(
'impala_profile_log_tpcds_compute_stats_extended.expected.pretty.json'))
def test_json_output_profile_v2(self):
# Test JSON profiles with different verbosity levels.
self._compare_profile_tool_output(['--profile_format=json'],
get_profile_path('impala_profile_log_tpcds_compute_stats_v2'),
get_profile_path('impala_profile_log_tpcds_compute_stats_v2.expected.json'))
self._compare_profile_tool_output(['--profile_format=prettyjson',
'--profile_verbosity=extended'],
get_profile_path('impala_profile_log_tpcds_compute_stats_v2'),
get_profile_path(
'impala_profile_log_tpcds_compute_stats_v2_extended.expected.pretty.json'))
def _compare_profile_tool_output(self, args, input_log, expected_output):
"""Run impala-profile-tool on input_log and compare it to the contents of the
file at 'expected_output'."""
with tempfile.NamedTemporaryFile() as tmp:
with open(input_log, 'r') as f:
check_call([os.path.join(IMPALA_HOME, "bin/run-binary.sh"),
os.path.join(impalad_basedir, 'util/impala-profile-tool')] + args,
stdin=f, stdout=tmp)
check_call(['diff', expected_output, tmp.name])