General Availability for OpenTelemetry by tedsuo · Pull Request #3452 · open-telemetry/community

tedsuo · 2026-05-19T06:56:55Z

This meta-project is a re-organization of the work presented in the Stable-by-default OTEP. Since almost none of the work required relates to changes in the spec, I'm moving it from an OTEP to the community repo, since this is where we do the rest of our project planning.

The goal of this PR is to identify the remaining workstreams that are needed to complete delivery of the initial set of goals for OpenTelemetry: a telemetry system that delivers tracing, metrics, and logs from the most common software libraries and cloud infrastructures.

The term "stable by default" seemed to be a little confusing to some people. So we are trying a different term: "general availability" (GA). This is the term we used to use when describing OpenTelemetry as complete, so it seems appropriate to bring it back for this use case.

Currently, this is a first draft. If we agree that this scope of work is correct, we can merge this PR and move on to defining each individual workstream in more detail, as separate documents.

cijothomas · 2026-05-19T15:11:03Z

+# OpenTelemetry GA: Completing our initial scope of work
+
+This project file identifies the remaining workstreams needed to complete delivery of the
+initial scope of work for OpenTelemetry: A telemetry system that delivers tracing,


The title "OpenTelemetry GA" is ambiguous and likely to confuse the community. Many language SDKs and instrumentations have been declared v1.0/stable for years. Do we meant OpenTelemetry as-a-whole here? Lets clarify to avoid this confusion.

Clarify how? Can you suggest a new opening sentence?

Some wording suggestion:

This document identifies the remaining workstreams needed for the OpenTelemetry project as a whole to be considered generally available — i.e. an end-to-end platform where users can install, deploy, and operate tracing, metrics, and logs at scale using stable components.

Note: many individual components (language APIs, SDKs, and a growing set of instrumentation libraries) are already at v1.0 today. "Project GA" here is a higher-level milestone about the platform as a whole, not a per-component status.

cijothomas · 2026-05-19T15:12:24Z

+technically still in beta, but are recommended to be used in production. This is confusing, as
+OpenTelemetry also has components marked 0.X that are genuinely experimental and should not be
+used in production. Additionally, some end user organizations have rules that prohibit them
+from deploying 0.X software to production.


lets also ack that there are many components that are declared 1.0 stable.

Go has special meaning attached to https://go.dev/doc/modules/version-numbers where moving from 0 to 1 bears stability requirements, and it will take some alignment to move up past to v2, etc.

A module developer should increment this number past v1 only when necessary because the version upgrade represents significant disruption for developers whose code uses function in the upgraded module. This disruption includes backward-incompatible changes to the public API, as well as the need for developers using the module to update the package path wherever they import packages from the module.

So the problem for the collector is we conflate GA and stability with API stability, which is a lot of small details.

cijothomas · 2026-05-19T15:13:30Z

+maintaining instrumentation.  The SemConv Tooling SIG is in charge of this project.
+
+* Weaver
+* AI coding


AI coding - bit vague - could we add a one-liner to clarify what we mean?

Definitely need more details here, but I want the SemConv Tooling SIG to take a look at it before I put too many words in their mouth 🙂

cijothomas · 2026-05-19T15:14:42Z

+* Move away from the “community contrib” model for critical instrumentation packages.
+* Deploy the new SemConv tooling across all language ecosystems.
+* Badges and other forms of recognition
+* Native instrumentation push to move instrumentation out of OpenTelemetry.


Native instrumentation - do we mean libraries/frameworks natively picking a dependency on opentelemetry api, and isntrumenting themselves, without the need for us to make a instrumentation library package?

Yes. There are two directions a community/contrib instrumentation package could gain more support. One is that a trusted group within OpenTelemetry maintains the package. The second is that the library itself includes the instrumentation natively, so there is no need for OpenTelemetry to maintain a separate package.

In both cases, we've identified a lack of tooling as a barrier. It's difficult to write instrumentation that matches the semantic conventions without making mistakes. So, once we have better tooling, we have an opportunity to try to upstream instrumentation. This would be preferable to maintaining a separate instrumentation package.

I am not sure on framing tooling (or lack of it) as the main barrier. If I'm a library owner deciding whether to take a direct dependency on the OTel API, my decision tree looks roughly like this:

API stability & long-term support — Is the OTel API in my language stable, and is there a guarantee of support for at least ~3 years? I can't take a dependency on something that might force me to react to breaking changes every year.

Performance — Does depending on the OTel API regress my library on the no-op path? Some cost when the SDK is enabled is acceptable; cost when telemetry is disabled is not, because it directly slows down every user of my library, whether they use OTel or not.

Semantic convention stability — Is the semantic convention I'm being asked to emit stable? If it churns, I either ship breaking changes downstream or carry compatibility layers/opt-in-out flags forever.

Tooling / validation — Then — is there tooling to validate that what I produce matches what I'm supposed to produce?

The current wording treats (4) as the primary blocker, but (1)–(3) come first IMHO for any library owner. I also don't recall library authors citing tooling as their blocker in prior discussions — the concerns I've consistently heard are stability and performance. If the SemConv Tooling SIG has data showing otherwise, it would be great to link it. (I only have anecdotal evidence only, so happy to correct my position once I learn more)

For this workstream to actually deliver native instrumentation at scale, I think we need explicit commitments on (1)–(3) — API LTS guarantees, a no-op/hot-path performance commitment, and semconv stability — alongside the tooling work.

cijothomas · 2026-05-19T15:19:53Z

+The following projects are seen as important for the long term success of OpenTelemetry, but
+not actually necessary to deliver stable components that are deployable at scale.
+
+### Performance / Benchmarking


Performance is one of the key things we list in our mission, so listing it as out-of-scope reads a bit oddly. If this is a bandwidth call rather than a position change, could we say so explicitly?

@martincostello and I had discussed some work to help with this effort, hopefully we can help solve bandwidth problem to a certain extent.

Yes I'm only putting it as out of scope because I feel like we can't ask maintainers to focus on ten things at once. We want maintainers to focus on instrumentation, packaging, and declarative config because those things are necessary for OpenTelemetry to be stable and manageable at scale.

Improving performance is important, but it is not necessary for stability or deployability. So I want to say that at this time, improving performance is something optional that individual SIGs can work on, but not something we are trying to standardize across all implementations by producing a set of standard benchmarks or something like that.

If, in the same way that the SemConv tooling SIG has been workign on tools for managing instrumentation, a group could possibly be working on performance benchmarks they could propose to the community. In the past, this effort has always died because we have not found a useful way to define universal performance benchmarks, and SIGs have instead made better progress working on performance on a case-by-case basis, based on user complaints.

open-telemetry/opentelemetry-specification#5118 Opened an OTEP to get cross-implementation performance tracked centrally. It shows prototypes as well.

cijothomas · 2026-05-19T15:23:02Z

+are always learning. It's reasonable that we may want to revisit these designs, either to
+incorporate new developments within the industry, or to address fundamental performance issues.
+
+Let's finish shipping v1.0 first, before distracting ourselves with v2.0.


similar to my comment in the beginning line - this also assumes OTel is not v1.0 yet.
https://opentelemetry.io/status/#language-apis--sdks paints a different picture. Lets clarify we meant OTel as whole, not pieces of it to avoid this confusion.

OTel as a whole is not "v1.0," that chart is too limited as it does not include the work listed in this document: instrumentation stability, packaging, deployment, and management at scale are still not completed.

While it is possible to install an SDK that is v1.0, you cannot reasonably instrument a real world application using only stable components. Today, if we shipped a "stable" distribution of OpenTelemetry in any language, that contained only components that are v1.0 or greater, it would contain no instrumentation and thus be completely useless in a real world scenario. A stable SDK simply isn't enough.

Furthermore, there's currently no way for an organization to deploy that stable distribution at scale, except for a couple of languages in a single environment: Kubernetes. These two hurdles – stability and deployability – are huge gaps, which is why they were flagged as part of the due diligence process done for OpenTelemetry's graduation

cijothomas · 2026-05-19T15:26:19Z

+
+### Stability: Instrumentation
+
+The biggest barrier to general availability is unstable instrumentation.


lets also list who owns this piece? I think we are moving to a model where the maintainers pick a set of key instrumentations and treat it like the core api/sdk? That was my understanding from the wording "move away from community contrib" model!
(💯 agree with the change, comment is about making the ownership clear)

I completely agree. Actually, figuring out who owns instrumentation is the biggest challenge in all of this work, in my opinion. Currently, no one owns it. And we have no one available to own it: it's unfair to assume that the current SDK maintainers can also take on the work of maintaining instrumentation, even with better tooling.

So, an important part of this workstream is figuring out a new model that offers some kind of reward for organizations that put in the work needed to maintain all of this instrumentation.

Actually, figuring out who owns instrumentation is the biggest challenge in all of this work, in my opinion. Currently, no one owns it.

Sharing my observations:

In my opinion, it is fair to ask the language's core repo maintainers/approvers to find a list of libraries that they deem important for their ecosystem, and make themselves as owners for them. For years, OTel .NET maintained instrumentation libraries in the core repo itself. It actually helps with validating things e2e also.
I am currently doing it in OTel Rust by owning instrumentation library for the most important library ourselves (of course there is other owners as well)

Rewarding organizations are good idea, but we need a concrete plan.

tigrannajaryan · 2026-05-19T15:55:43Z

+
+As part of the [due diligence](https://github.com/cncf/toc/blob/main/projects/open-telemetry/otel-graduation-dd.md)
+for OpenTelemetry's graduation, a scope of work was identified as required in order
+for OpenTelemetry to be considered "stable" or "generally available."


How do we define "stable" for this project?

Does “stable” mean “doesn’t crash at runtime” or “doesn’t introduce breaking changes between releases”? Or is it both?

It would be nice to explicitly define what sort of stability goals we have.

My intention is that stable means both. Once something is v1.0:

it is ready for production, meaning it doesn't crash or cause harm.

it is supported, meaning we will fix bugs and issue security patches without having them mixed in with breaking changes.

TylerHelmuth · 2026-05-19T16:25:26Z

+
+### Stability: Collector v1.0
+
+Managed by Collector SIG, the OpenTelemetry Collector needs to complete its roadmap for v1.0.


Many collector component 1.0 has historically be blocked by stable semantic conventions, so that no breaking changes to names occur once the component is tagged 1.0. I recall recently that there was discussion about making it easier to update semconv in 1.0 components. Is that still in effect for the GA concept this document describes? I would be good to include a link to that decision somewhere since it affects a lot of components (collector or instrumentation).

Yes, we decided that we were being overly cautious by including data stability with v1.0. While it should be a breaking change if we changed the data, there's nothing wrong with breaking data changes being issued as a v2.0. Combining the two together concepts together as a requirement for v1.0 left us with no way of indicating that the code is safe to run in production.

But you're right, I don't think that this decision has been recorded in the spec. I believe this is the doc that needs to be updated: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/telemetry-stability.md

TylerHelmuth · 2026-05-19T16:26:18Z

+
+* The need for pod attribution and other manual configuration requirements that interfere
+  with deploying OpenTelemetry at scale.
+* All major languages supported.


Is this bullet specifically referring to auto-instrumentation? If so we should be explicit.

It is, the intention is that all languages that have an auto-instrumentation mechanism can be installed via both the Operator and via system packaging.

Suggested change

* All major languages supported.

* Auto-instrumentation supported for all major languages.

Do we have an existing definition for major language or is that distinction left up to the operator/packaging sig?

We usually mean "Java, .NET, NodeJS, Python, Ruby, PHP" as those are the popular languages that also have an auto-installation path. Maybe there's a better name for it than "major languages" that could be seen as rude. Go could also be on the list if OBI-based auto-instrumentation gets to a good spot. I'm not sure what is required to other languages such as Erlang.

TylerHelmuth · 2026-05-19T16:27:40Z

+* Language distributions for SDKs, plugins, and instrumentation
+* Declarative configuration for managing instrumentation and stability
+
+### Deployment: Kubernetes Operator v1.0


The Operator is the go-to k8s solution for many users, but many other users prefer to use the helm charts to install OpenTelemetry in Kubernetes. How do they fit into the GA picture?

My understanding is that our current approach is for the OTel Helm Charts to just install the Operator. Am I off base in that assumption? Do we need a better solution than that?

The opentelemetry-operator chart is one way to install the operator, and its 1.0 could be molded into this effort. But separately there is the opentelemetry-collector chart which installs the collector directly.

We have purposefully never given a stance like you should use the opentelemetry operator to install a collector in kubernetes because its not accurate to always suggest the operator.

I think it would make sense to handle the helm charts via the respective efforts tho:

opentelemetry-operator 1.0 with the Operator effort

opentelemetry-collector 1.0 with the Collector effort

But I think it is dangerous to claim for GA that only the Operator needs 1.0, as that would be taking a strong stance that the Operator is the Official OpenTelemetry Way to install a collector on Kubernetes.

If the GC/TC wants to take that stance I think its worth discussing further.

Well, if the opentelemetry-collector helm chart only installs the Collector, that's pretty limiting, isn't it? I'm definitely not opposed to listing it. But I'd like to get a better understanding about whether or not we should be expanding our helm offerings beyond these two options.

thompson-tomo · 2026-05-20T02:21:41Z

+NOTE: This is a meta-project. It describes a set of workstreams at a high level, so that we can
+agree upon the overall scope of work needed for OpenTelemetry to be considered GA or "generally
+available." For that reason, it is missing some sections that would normally be in a project
+file. In the future we should develop better road-mapping tools, but this is what we have today.


Suggested change

NOTE: This is a meta-project. It describes a set of workstreams at a high level, so that we can

agree upon the overall scope of work needed for OpenTelemetry to be considered GA or "generally

available." For that reason, it is missing some sections that would normally be in a project

file. In the future we should develop better road-mapping tools, but this is what we have today.

> [!NOTE]

> This is a meta-project. It describes a set of workstreams at a high level, so that we can

> agree upon the overall scope of work needed for OpenTelemetry to be considered GA or "generally

> available." For that reason, it is missing some sections that would normally be in a project

> file. In the future we should develop better road-mapping tools, but this is what we have today.

cijothomas · 2026-05-20T03:18:32Z

+* Deployment: Packaging v1.0
+* Deployment: Kubernetes Operator v1.0
+* Security
+* Roadmaps & Project Management


Question: should OpenTelemetry's own self-observability be an explicit workstream in this GA plan?

If we're calling OTel GA — production-ready and deployable at scale — operators need to be able to answer basic questions like is my sdk/exporter dropping data? or is my collector silently failing to export? Today, in most languages and components, you can't easily tell:

The semantic conventions for OTel's self-instrumentation are still experimental.

Very few SDKs/components implement them end-to-end.

Silent data loss has come up repeatedly in issues, SIG discussions, and customer complaints

Users often discover missing telemetry days later with no signal from OTel itself.

To operate OTel at scale, we need the system to tell you when it's unhealthy. Is this intentionally out of scope, covered implicitly under another workstream (Collector v1.0? Instrumentation?), or is it a gap worth calling out explicitly?

A concrete example I have always noticed:
OTel SDK's batch processor (default with OTLP exporter) drops telemetry when exporter cannot keep up. And there is no standard way for an operator to know about this. The fix is

Have semantic convention for internal telemetry/self-instrumentation stable

Make sure all sdks implement it. (I think only Java and Go implements this. I opened a PR to add it to Rust recently.)

I think that's reasonable. OpenTelemetry definitely isn't finished if we are still missing critical forms of self-observability. Possibly related is OpAMP management and health reporting for the SDKs.

reyang · 2026-05-20T15:22:15Z

I have a couple meta comments:

I feel the use of "General Availability" is confusing and misleading. Take Kubernetes as an example, there is no such thing as "Kubernetes as a project has reached General Availability", the GA term is used for specific features https://kubernetes.io/search/?q=GA.
There are many components which are already at v1.0 or even v2.0, if we put something like "this means finalizing the v1.0 roadmap for every component..." it would surprise many users.
The term "GA" has been used several times:
- https://opentelemetry.io/search/?q=GA#gsc.tab=0&gsc.q=GA&gsc.page=1
- In the spec, we already said:

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>

tedsuo · 2026-05-26T04:43:08Z

@reyang I think you're correct about watching our language about v1.0 vs v2.0, I use v2.0 casually in one place and it isn't appropriate. I'll add some clarification.

As far as the term "General Availability" vs "Stable by Default" or something else... I don't know how much I want to bikeshed that 😅. But I will say, we have always used the term "General Availability" to mean exactly this: a component is stable, v1.0, ready for production and supported. "Stable" in the spec matches this meaning. The issue with most of the components in this roadmap is that they are "de facto" stable, meaning we tell people to run them in production but we haven't marked them as stable or issued a v1.0 for them.

So, given that no term is perfect, and given that we have always used GA in this manner, I'd like to stick with it. If people want to bikeshed and come up with a better term and get buy in from the community, maybe slack is a better place to do that. I promise to change it if it looks like there's consensus, but if you don't mind, I'd like to keep the comment threads here focused on the content.

First draft of OTel GA

42c4dc5

tedsuo requested review from a team, alolita, austinlparker, jpkrohling, maryliag, mtwo, mx-psi, svrnm and trask as code owners May 19, 2026 06:56

tedsuo added the area/project-proposal Submitting a filled out project template label May 19, 2026

github-actions Bot added the triage:tc-inbox label May 19, 2026

cijothomas reviewed May 19, 2026

View reviewed changes

tedsuo added 2 commits May 19, 2026 08:46

list workstreams

c93b4b0

list packaging

5fca455

tigrannajaryan reviewed May 19, 2026

View reviewed changes

TylerHelmuth reviewed May 19, 2026

View reviewed changes

tedsuo mentioned this pull request May 19, 2026

OTEP: Stable by Default open-telemetry/opentelemetry-specification#4813

Open

3 tasks

thompson-tomo reviewed May 20, 2026

View reviewed changes

cijothomas reviewed May 20, 2026

View reviewed changes

mx-psi reviewed May 22, 2026

View reviewed changes

Comment thread projects/otel-ga.md Outdated

Add Collector v1 rodamap

23fc839

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>


		### Stability: Instrumentation

		The biggest barrier to general availability is unstable instrumentation.


		### Stability: Collector v1.0

		Managed by Collector SIG, the OpenTelemetry Collector needs to complete its roadmap for v1.0.

	* All major languages supported.
	* Auto-instrumentation supported for all major languages.

-NOTE: This is a meta-project. It describes a set of workstreams at a high level, so that we can
-agree upon the overall scope of work needed for OpenTelemetry to be considered GA or "generally
-available." For that reason, it is missing some sections that would normally be in a project
-file. In the future we should develop better road-mapping tools, but this is what we have today.
+> [!NOTE]
+> This is a meta-project. It describes a set of workstreams at a high level, so that we can
+> agree upon the overall scope of work needed for OpenTelemetry to be considered GA or "generally
+> available." For that reason, it is missing some sections that would normally be in a project
+> file. In the future we should develop better road-mapping tools, but this is what we have today.

Conversation

tedsuo commented May 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reyang commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tedsuo commented May 26, 2026

Uh oh!

Reviewers

reyang commented May 20, 2026 •

edited

Loading