Skip to content

Reserve already_shutdown as error.type value for SDK processor processed metrics#3710

Open
cijothomas wants to merge 8 commits into
open-telemetry:mainfrom
cijothomas:clarify-sdk-processor-error-type
Open

Reserve already_shutdown as error.type value for SDK processor processed metrics#3710
cijothomas wants to merge 8 commits into
open-telemetry:mainfrom
cijothomas:clarify-sdk-processor-error-type

Conversation

@cijothomas
Copy link
Copy Markdown
Member

@cijothomas cijothomas commented May 11, 2026

Reserves already_shutdown as an error.type value on otel.sdk.processor.span.processed and otel.sdk.processor.log.processed, used when a processor drops items because it has already been shut down. Applies to any SDK processor (not just batching).

Why the processor is the right place

TracerProvider.Shutdown only requires new GetTracer(...) calls to return a no-op; it doesn't invalidate pre-existing Tracers or affect in-flight Spans (span.End() calls SpanProcessor.OnEnd directly). SpanProcessor.Shutdown explicitly anticipates this: "subsequent calls to OnStart/OnEnd/ForceFlush are not allowed. SDKs SHOULD ignore these calls gracefully." So the drop is observed at the processor and recorded as error.type=already_shutdown.

Note: GetTracer(...) after shutdown returns a no-op Tracer — spans never enter the SDK, so no metric fires. Tracked as a follow-up.

Related:

@github-actions github-actions Bot added enhancement New feature or request area:otel labels May 11, 2026
@cijothomas cijothomas changed the title Clarify closed-set semantics of error.type for SDK processor metrics Reserve queue_full and already_shutdown as error.type values for SDK processor processed metrics May 11, 2026
@github-actions
Copy link
Copy Markdown

This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days.

@github-actions github-actions Bot added the Stale label May 26, 2026
@cijothomas cijothomas marked this pull request as ready for review May 26, 2026 19:03
@cijothomas cijothomas requested review from a team as code owners May 26, 2026 19:03
@cijothomas cijothomas changed the title Reserve queue_full and already_shutdown as error.type values for SDK processor processed metrics Reserve already_shutdown as error.type value for SDK processor processed metrics May 26, 2026
@github-actions github-actions Bot removed the Stale label May 27, 2026
@pellared
Copy link
Copy Markdown
Member

CC @dashpole

@pellared
Copy link
Copy Markdown
Member

It would be nice to have a PR with a prototype. Wouldn't the SDK itself also need such metric as some spans/logs may not even start being processed given the SDK was shutdown?

@cijothomas
Copy link
Copy Markdown
Member Author

It would be nice to have a PR with a prototype. Wouldn't the SDK itself also need such metric as some spans/logs may not even start being processed given the SDK was shutdown?

open-telemetry/opentelemetry-rust#3514

Is there a concept of shutting down an SDK? We have shutdown on each providers (which bubble it through to processors and exporters..), but there is nothing called sdk.shutdown. Can you clarify what is meant by SDK shutdown?

@pellared
Copy link
Copy Markdown
Member

We have shutdown on each providers (which bubble it through to processors and exporters..), but there is nothing called sdk.shutdown. Can you clarify what is meant by SDK shutdown?

I meant provider shutdown.

@cijothomas
Copy link
Copy Markdown
Member Author

We have shutdown on each providers (which bubble it through to processors and exporters..), but there is nothing called sdk.shutdown. Can you clarify what is meant by SDK shutdown?

I meant provider shutdown.

Got it. This part is already covered by spec. If a provider is shutdown, then new getTracer() returns noop, but existing tracers would pass the spans/logs to the processor. And processor spec says it should gracefully ignore these calls. So processor would be the most apt component to emit this metric.

@cijothomas
Copy link
Copy Markdown
Member Author

We have shutdown on each providers (which bubble it through to processors and exporters..), but there is nothing called sdk.shutdown. Can you clarify what is meant by SDK shutdown?

I meant provider shutdown.

Got it. This part is already covered by spec. If a provider is shutdown, then new getTracer() returns noop, but existing tracers would pass the spans/logs to the processor. And processor spec says it should gracefully ignore these calls. So processor would be the most apt component to emit this metric.

There is another gap you helped me realize! if provider.GetTracer(...) is called after shutdown — spec says return a no-op Tracer, so spans never enter the SDK and no existing metric fires. A good fix to bring visibility into that is probably a separate, small follow-up: a counter on the provider (e.g., otel.sdk.tracer.created with error.type=already_shutdown) so users get a signal when GetTracer returns no-op post-shutdown. Its good to be followed up, I'll open a tracking issue.

Copy link
Copy Markdown
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this error type seems reasonable. I'm a bit concerned that our description for the error.type field in sdk telemetry will be really long if we enumerate all of the errors. But we can cross that bridge when we get there.

Comment thread model/otel/metrics.yaml Outdated
A low-cardinality description of the failure reason. SDK Batching Log Record Processors MUST use `queue_full` for log records dropped due to a full queue.
examples: ["queue_full"]
A low-cardinality description of the failure reason.
SDK Batching Log Record Processors MUST use `queue_full` for log records dropped due to a full queue.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
SDK Batching Log Record Processors MUST use `queue_full` for log records dropped due to a full queue.
note:
SDK Batching Log Record Processors MUST use `queue_full` for log records dropped due to a full queue.

It's getting long, let's keep brief short and move details to note.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to move the note to the overall metric level note. Please see if this is fine!

@lmolkova lmolkova moved this from Untriaged to Needs More Approval in Semantic Conventions Triage May 28, 2026
@lmolkova lmolkova moved this from Needs More Approval to Ready to be Merged in Semantic Conventions Triage May 28, 2026
@cijothomas
Copy link
Copy Markdown
Member Author

@JonasKunz Could you review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:otel enhancement New feature or request

Projects

Status: Ready to be Merged

Development

Successfully merging this pull request may close these issues.

4 participants