OpenTelemetry has various Go packages split across several Go modules that often need to be carefully upgraded together. And in particular, we are using the "semconv" package in conjunction with the OpenTelemetry SDK's "resource" package in a way that requires that they both agree on which version of the OpenTelemetry Semantic Conventions are being followed. To help avoid "dependency hell" situations when upgrading, this centralizes all of our direct calls into the OpenTelemetry SDK and tracing API into packages under internal/tracing, by exposing a few thin wrapper functions that other packages can use to access the same functionality indirectly. We only use a relatively small subset of the OpenTelemetry library surface area, so we don't need too many of these reexports and they should not represent a significant additional maintenance burden. For the semconv and resource interaction in particular this also factors that out into a separate helper function with a unit test, so we should notice quickly whenever they become misaligned. This complements the end-to-end test previously added in opentofu/opentofu#3447 to give us faster feedback about this particular problem, while the end-to-end test has the broader scope of making sure there aren't any errors at all when initializing OpenTelemetry tracing. Finally, this also replaces the constants we previously had in package traceaddrs with functions that return attribute.KeyValue values directly. This matches the API style used by the OpenTelemetry semconv packages, and makes the calls to these helpers from elsewhere in the system a little more concise. Signed-off-by: Martin Atkins <mart@degeneration.co.uk>
7.5 KiB
OpenTofu Tracing Guide
This document describes how to use and implement tracing in OpenTofu Core using OpenTelemetry.
There's background information on OpenTofu's tracing implementation in the OpenTelemetry Tracing RFC
Warning
If you change which version of the
go.opentelemetry.io/otel/sdkwe have selected in ourgo.mod, you must make sure thatinternal/tracing/traceattrs/semconv.goimports the same subpackage ofgo.opentelemetry.io/otel/semconv/*that is used by the selected version ofgo.opentelemetry.io/otel/sdk.This is important because our tracing setup uses a blend of directly-constructed
semconvattributes and attributes chosen indirectly through theresourcepackage, and they must all be using the same version of the semantic conventions schema or there will be a "conflicting Schema URL" error at runtime.(Problems of this sort should be detected both by a unit test in
internal/tracing/traceattrsand an end-to-end test that executes OpenTofu with tracing enabled.)
Overview
OpenTofu provides distributed tracing capabilities via OpenTelemetry to help end users understand the execution flow and performance characteristics of OpenTofu operations. Tracing is particularly useful for:
- Debugging performance issues (e.g., "Why is my plan taking so long?")
- Understanding time spent in different operations
- Visualizing the execution flow across providers and modules
- Diagnosing issues in CI/CD pipelines
Tracing in OpenTofu is strictly opt-in and disabled by default. It's designed to have minimal overhead when disabled and to provide valuable insights when enabled.
Important
OpenTofu's tracing functionality refers only to OpenTelemetry traces for local debugging and analysis. No telemetry or usage data is sent to external servers, and no data leaves your environment unless you explicitly configure an external collector.
Enabling Tracing
To enable tracing in OpenTofu:
- Set the environment variable
OTEL_TRACES_EXPORTER=otlp - Configure the OpenTelemetry exporter using standard OpenTelemetry environment variables
Example configuration for a local Jaeger collector:
export OTEL_TRACES_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_INSECURE=true
For a complete list of configuration options, refer to the OpenTelemetry Documentation.
Quick Start with Jaeger
To quickly spin up a local Jaeger instance with OTLP support:
docker run -d --rm --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 5778:5778 \
-p 9411:9411 \
jaegertracing/jaeger:2.5.0
Then configure OpenTofu as shown above and access the Jaeger UI at http://localhost:16686.
Adding Tracing to OpenTofu Code
Note
For Contributors: When adding tracing to OpenTofu, remember that the primary audience is end users who need to understand performance, not OpenTofu developers. Add spans sparingly to avoid polluting traces with too much detail.
Basic Span Creation
import (
"github.com/opentofu/opentofu/internal/tracing"
"github.com/opentofu/opentofu/internal/tracing/traceattrs"
)
func SomeFunction(ctx context.Context) error {
// Create a new span
ctx, span := tracing.Tracer().Start(ctx, "Human readable operation name",
tracing.SpanAttributes(
traceattrs.String("opentofu.some_attribute", "value")
),
)
defer span.End()
// Optionally add additional attributes after the span is created, if
// they only need to appear in certain cases.
span.SetAttributes(traceattrs.String("opentofu.some_other_attribute", "value"))
// Use the more specific attribute-construction helpers from package
// traceattrs where they are relevant, to ensure we follow consistent
// semantic conventions for cross-cutting concerns.
span.SetAttributes(traceattrs.OpenTofuProviderAddress("hashicorp/aws"))
// Your function logic here...
// If an error occurs
if err != nil {
tracing.SetSpanError(span, err)
return err
}
return nil
}
OpenTelemetry has many different packages spread across a variety of different Go modules, and those different modules often need to be upgraded together to ensure consistent behavior and avoid errors at runtime.
Therefore we prefer to directly import go.opentelemetry.io/otel/* packages only from our packages under internal/tracing, and then reexport certain functions from our own packages so that we can manage all of the OpenTelemetry dependencies in a centralized place to minimize "dependency hell" problems when upgrading. Packages under go.opentelemetry.io/contrib/instrumentation/* are an exception because they tend to be more tightly-coupled to whatever they are instrumenting than to the other OpenTelemetry packages, and so it's better to import those from the same file that's importing whatever other package the instrumentation is being applied to.
Warning
Don't import
go.opentelemetry.io/otel/semconv/*packages from anywhere exceptinternal/tracing/traceattrs/semconv.go!If you want to use standard OpenTelemetry semantic conventions from other packages, use them indirectly through reexports in
package traceattrsinstead, so we can make sure there's only one file in OpenTofu deciding which version of semconv we are currently depending on.
Tracing Conventions
Span Naming
- Use human-readable, action-oriented names that describe operations from a user perspective
- Prefer names like "Provider installation" over internal function names like "InstallProvider"
- Use consistent terminology from the OpenTofu CLI and documentation
- Span names should represent UX-level concepts, not internal code structure
Attributes
- Prefer standard OpenTelemetry semantic conventions where applicable, using helper functions from
internal/tracing/traceattrs. - Use
OpenTofu-prefixed functions ininternal/tracing/traceattrsfor OpenTofu-specific cross-cutting concerns. - It's okay to use one-off inline strings for attribute names specific to a single span, but make sure to still follow the OpenTelemetry attribute naming conventions and use the
opentofu.prefix for anything that is not a standardized semantic convention. - If a particular subsystem of OpenTofu has some repeated conventions for attribute names, consider creating unexported string constants or attribute construction helper functions in the same package to centralize those naming conventions.
Error Handling
Use the tracing.SetSpanError helper to consistently record errors:
if err != nil {
tracing.SetSpanError(span, err)
return err
}
This helper supports various error types including standard errors, strings, and OpenTofu diagnostics.
Instrumentation Guidelines
- Focus on Key Operations: Instrument high-level operations that are meaningful to end users rather than every internal function.
- Include Valuable Context: Add attributes that help identify resources, modules, or operations.
- Respect Performance: Avoid expensive computations solely for tracing.