Skip to content

OpenTelemetry distributed tracing for Numaflow #3377

@adarsh0728

Description

@adarsh0728

Summary

Add end-to-end OpenTelemetry distributed tracing to Numaflow so that each message's journey through a pipeline (source → map → sink) is captured as a connected trace. This enables teams to attribute latency to either the Numaflow platform or user-defined functions (UDFs).

Span Hierarchy: Sequential Siblings

upstream producer (optional — W3C traceparent or B3 headers)
└── numaflow.platform.process                          (root — accurate duration via AckHandle)
    ├── numaflow.monovertex.source.read                (sibling)
    ├── numaflow.monovertex.map                        (sibling)
    │   └── udf.map.process                            (UDF, child of map)
    └── numaflow.monovertex.sink.write                 (sibling)
        └── udf.sink.process                           (UDF, child of sink.write)

For Pipeline topology, ISB read/write spans are added between vertices, and span names use numaflow.pipeline.* prefix instead of numaflow.monovertex.*.

Key Design Decisions

  • W3C Trace Context as internal propagation format via sys_metadata["tracing"]
  • Dual-key sys_metadata: "tracing" holds platform.process context (shared parent), "tracing_udf" holds stage-specific context for UDF parent
  • platform.process span stored in AckHandle for accurate duration (source read → ack)
  • Per-message sink.write spans using OTel API directly for accurate duration across batch UDF calls
  • Upstream trace stitching: B3 and W3C headers from upstream producers (Kafka, EventBus) are extracted and linked

Sub-Issues

Span Attributes

All platform spans carry:

Attribute Description
messaging.system = numaflow OTel semantic convention
messaging.operation.name Stage name (source.read, map, sink.write)
messaging.message.id Message offset
numaflow.topology monovertex or pipeline
numaflow.pipeline.name Pipeline/MonoVertex name
numaflow.vertex.name Vertex name (Pipeline only)

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions