Skip to content

Commit 95932d2

Browse files
committed
Address PR feedback on stable-by-default OTEP
- Clarify that instrumentation stability contracts apply to telemetry the library itself produces, not telemetry from third-party libraries it subscribes to (addresses nrcventura's auto-instrumentation concern) - Replace dissolved Configuration SIG reference with note that a new project is needed (flagged by trask and jack-berg) - Remove anecdotal quotes from motivation section per trask's suggestions
1 parent cad6dfe commit 95932d2

1 file changed

Lines changed: 6 additions & 4 deletions

File tree

oteps/4813-stable-by-default.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ This OTEP defines goals and acceptance criteria for making OpenTelemetry product
66

77
OpenTelemetry has grown into a massive ecosystem supporting four telemetry signals across dozen programming languages. This growth has come with complexity that creates real barriers to production adoption.
88

9-
Community feedback consistently identifies several pain points. Experimental features break production deployments—users report configuration breaking between minor versions, silent failures in telemetry pipelines, and unexpected performance regressions that only appear at scale. As one practitioner noted: "The silent failure policy of OTEL makes flames shoot out of the top of my head."
9+
Community feedback consistently identifies several pain points. Experimental features break production deployments—users report configuration breaking between minor versions, silent failures in telemetry pipelines, and unexpected performance regressions that only appear at scale.
1010

1111
Semantic convention changes destroy existing dashboards. When conventions change, users must update instrumentation across their entire infrastructure while simultaneously updating dashboards, alerts, and downstream tooling. Organizations report significant resistance from developers asked to coordinate these changes.
1212

13-
Many instrumentation libraries are stuck on pre-release because they depend on experimental semantic conventions, even when the instrumentation API surface itself is mature and battle-tested. The "batteries not included" philosophy means users must assemble many components before achieving basic functionality. Documentation assumes expertise, and newcomers describe the experience as "overwhelming" with "no discoverability." Auto-instrumentation can add significant resource consumption that only becomes apparent at scale, with reports of "four times the CPU usage" compared to simpler alternatives. Users evaluating OpenTelemetry for production deployment need confidence in CVE response timelines, dependency hygiene, and supply chain security—areas where commitments are not well documented.
13+
Many instrumentation libraries are stuck on pre-release because they depend on experimental semantic conventions, even when the instrumentation API surface itself is mature and battle-tested. The "batteries not included" philosophy means users must assemble many components before achieving basic functionality. Documentation assumes expertise, and newcomers describe the experience as "overwhelming" with "no discoverability." Auto-instrumentation can add significant resource consumption that only becomes apparent at scale. Users evaluating OpenTelemetry for production deployment need confidence in CVE response timelines, dependency hygiene, and supply chain security—areas where commitments are not well documented.
1414

1515
These all stem from the same problem: OpenTelemetry's default configuration prioritizes feature completeness over production readiness. This OTEP establishes the goals and workstreams needed to address this.
1616

@@ -24,7 +24,7 @@ This OTEP aims to achieve six outcomes:
2424

2525
- Stability information should be visible and consistent. Users should be able to easily determine the stability status of any component before adopting it, and this information should be presented consistently across all OpenTelemetry projects.
2626

27-
- Instrumentation should be able to stabilize based on production readiness. The bar for a stable instrumentation library should be whether the instrumentation code itself is production-ready, not whether the semantic conventions it depends on have been finalized. However, once an instrumentation library stabilizes, any breaking change to its telemetry output must be treated as a breaking change requiring a major version bump.
27+
- Instrumentation should be able to stabilize based on production readiness. The bar for a stable instrumentation library should be whether the instrumentation code itself is production-ready, not whether the semantic conventions it depends on have been finalized. However, once an instrumentation library stabilizes, any breaking change to its telemetry output must be treated as a breaking change requiring a major version bump. This stability guarantee applies to telemetry that the instrumentation library itself produces. When an instrumentation library subscribes to telemetry emitted natively by a third-party library (e.g., auto-instrumentation that captures spans produced by an HTTP client's own OTel integration), the content of that telemetry is governed by the third-party library's release cycle, not the instrumentation library's stability contract.
2828

2929
- Performance characteristics should be known. Users should be able to understand the overhead implications of OpenTelemetry before deploying to production, and maintainers should be able to detect regressions between releases.
3030

@@ -44,7 +44,7 @@ There is no consistent mechanism across OpenTelemetry for users to opt into expe
4444

4545
This workstream should result in a consistent pattern for experimental feature opt-in that works across SDKs, the Collector, and instrumentation libraries.
4646

47-
The Configuration SIG is the natural owner for this work.
47+
A new project will be needed to drive this work.
4848

4949
### Workstream 2: Federated Schema and Stability
5050

@@ -96,6 +96,8 @@ Distributions that currently enable experimental components by default will need
9696

9797
Instrumentation library maintainers will be able to stabilize based on the production readiness of their code, without waiting for all upstream semantic conventions to stabilize. Once stable, they own the stability of their telemetry output—any breaking change to emitted telemetry requires a major version bump. They will need to clearly document which semantic conventions they use and provide migration guidance when conventions evolve.
9898

99+
Note that this stability contract covers telemetry the instrumentation library itself produces. In cases where auto-instrumentation subscribes to telemetry emitted natively by a third-party library—for example, an HTTP client that directly uses OpenTelemetry APIs—the telemetry content is controlled by that library, not by the instrumentation package. The instrumentation library's stability commitment in this case is to its subscription surface (which telemetry sources it captures and how it processes them), not to the content of telemetry it does not control.
100+
99101
### On Users
100102

101103
Users will experience a more predictable default installation. Those who depend on experimental features will need to explicitly opt in, which may require configuration changes during the transition period.

0 commit comments

Comments
 (0)