Skip to content

Standalone mode for OpAMP k8s bridge#4986

Draft
owais wants to merge 3 commits into
open-telemetry:mainfrom
owais:standalone-mode
Draft

Standalone mode for OpAMP k8s bridge#4986
owais wants to merge 3 commits into
open-telemetry:mainfrom
owais:standalone-mode

Conversation

@owais
Copy link
Copy Markdown
Contributor

@owais owais commented Apr 22, 2026

Description:
This PR implements a basic standlone mode for the OpAMP bridge that works directly on collector configmap(s) instead of operator CRDs. It does not depend on the operator. The ConfigApplier interface almost already satisfies everything we need with some minor tweaks. The change is mostly a new implementation of the ConfigApplier interface that works directly with configmaps.

It watches configmaps with a specific label and reports them to the OpAMP server. Users can then remotely manage that config from the server.

Link to tracking Issue(s):

How to review:

  • internal/agent & internal/operator packages moves things around a bit to allow the new standalone mode to implement the existing Config Applier interface. No new features or changes to existing features in these packages.
  • internal/standalone contains all of the new feature code. It implements a standalone client and an implementation of the ConfigApplier interface.
  • config/standalone-bridge contains the k8s manifests used to deploy the bridge in standlone mode.

Testing:

  • Tests added for the new implementation & updated existing tests where needed
  • Manually tested in a local kind cluster

Documentation:

Run the bridge with mode flag set to standlone to run in standlone mode.

operator-opamp-bridge --config-file=/conf/config.yaml --mode=standalone

Add the following label to a configmap you want to manage with the bridge.

app.kubernetes.io/managed-by: opamp-bridge-standalone

Now bridge will report this config to the OpAMP server and server can push updates back. Bridge will trigger a rollign restart after applying config.

@owais owais force-pushed the standalone-mode branch 2 times, most recently from 50b5f7b to 3464209 Compare April 22, 2026 19:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

E2E Test Results

 34 files  259 suites   2h 16m 34s ⏱️
100 tests 100 ✅ 0 💤 0 ❌
263 runs  263 ✅ 0 💤 0 ❌

Results for commit 057f69d.

♻️ This comment has been updated with latest results.

@owais owais force-pushed the standalone-mode branch 7 times, most recently from 3259be0 to 57e3079 Compare April 27, 2026 19:26
@owais
Copy link
Copy Markdown
Contributor Author

owais commented Apr 27, 2026

The failing VULN does not affect us. It only affect plugin installs when running docker as a daemon which we do not. We only use it indirectly as an API.

more details: GHSA-pxq6-2prw-chj9

Patching is not as simple as docker has deprecated the github.com/docker/docker package and replaced it with multiple smaller packages in github.com/moby/moby/*; and we don't use the docker package directly. The call path is:

go: github.com/open-telemetry/opentelemetry-operator/apis/v1beta1 imports
	go.opentelemetry.io/contrib/otelconf/v0.3.0 imports
	go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc imports
	google.golang.org/grpc imports
	google.golang.org/grpc/balancer/roundrobin imports
	google.golang.org/grpc/balancer/endpointsharding tested by
	google.golang.org/grpc/balancer/endpointsharding.test imports
	google.golang.org/grpc/internal/testutils/roundrobin imports
	
go: github.com/open-telemetry/opentelemetry-operator/cmd/otel-allocator/internal/watcher imports
	github.com/prometheus-operator/prometheus-operator/pkg/operator imports
	github.com/prometheus/prometheus/model/rulefmt imports
	github.com/prometheus/prometheus/promql tested by
	github.com/prometheus/prometheus/promql.test imports
	github.com/prometheus/prometheus/tsdb imports
	github.com/prometheus/prometheus/tsdb/index imports
	github.com/bboreham/go-loser: github.com/docker/docker@v29.3.1+incompatible: reading github.com/docker/docker/go.mod at revision v29.3.1: unknown revision v29.3.1
	
go: github.com/open-telemetry/opentelemetry-operator/cmd/otel-allocator/internal/config imports
	github.com/prometheus/prometheus/discovery/install imports
	github.com/prometheus/prometheus/discovery/scaleway imports
	github.com/scaleway/scaleway-sdk-go/api/instance/v1 tested by
	github.com/scaleway/scaleway-sdk-go/api/instance/v1.test imports
	github.com/scaleway/scaleway-sdk-go/internal/testhelpers/httprecorder imports
	github.com/scaleway/scaleway-sdk-go/vcr imports
	gopkg.in/dnaeon/go-vcr.v4/pkg/cassette: github.com/docker/docker@v29.3.1+incompatible: reading github.com/docker/docker/go.mod at revision v29.3.1: unknown revision v29.3.1
	
	
go: github.com/open-telemetry/opentelemetry-operator/cmd/otel-allocator/internal/config imports
	github.com/prometheus/prometheus/discovery/install imports
	github.com/prometheus/prometheus/discovery/gce imports
	google.golang.org/api/compute/v1 imports
	google.golang.org/api/internal/gensupport imports
	github.com/googleapis/gax-go/v2/callctx tested by
	github.com/googleapis/gax-go/v2/callctx.test imports
	google.golang.org/genproto/googleapis/type/color: github.com/docker/docker@v29.3.1+incompatible: reading github.com/docker/docker/go.mod at revision v29.3.1: unknown revision v29.3.1

So each one of those packages will need to migrate to moby to pick up the security patch. I suggest we ignore this vuln as it is essentially a false positive for us and upgrading to a fixed moby version means waiting on close to a dozen OSS packages to do it.

migration details: https://github.com/moby/moby/releases/tag/docker-v29.0.0

@owais owais force-pushed the standalone-mode branch 2 times, most recently from 2e90995 to f2c5727 Compare April 27, 2026 20:23
@owais owais marked this pull request as ready for review April 27, 2026 20:23
@owais owais requested a review from a team as a code owner April 27, 2026 20:23
@owais owais force-pushed the standalone-mode branch from f2c5727 to a4c54d2 Compare May 4, 2026 18:21
@jaronoff97
Copy link
Copy Markdown
Contributor

this is top on my review list, thank you very much for the contribution here and apologies for the delay, i was on PTO.

@jaronoff97 jaronoff97 self-requested a review May 5, 2026 16:48
@owais owais force-pushed the standalone-mode branch 3 times, most recently from 9cbc35f to 054483e Compare May 5, 2026 19:50
Copy link
Copy Markdown
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some requested changes, but overall i would like to see some better test coverage as well as some more thought around the code boundaries. definitely in favor of this, but there's a few open questions I'd like to hear more about.

Comment thread cmd/operator-opamp-bridge/internal/agent/agent.go
Comment thread cmd/operator-opamp-bridge/internal/agent/kube_resource_key.go Outdated
case 2:
return newKubeResourceKey(s[0], s[1]), nil
case 3:
return newKubeResourceKeyWithKind(s[1], s[2], s[0]), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should validate what kind can be as i would imagine we should only accept cases where kind is either configmap or otelcol

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are other cases for health check on pods, etc that don't have a kind. Proxy mode might also accept identifiers without it but I'm not sure. We do checks on config sent from the server and don't accept anything other than the supported types. So I think we should be ok but happy to add it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it but it made things more complicated for no strong benefits. I think resource key can stay agnostic and bridge mode can validate what it passes in.

Comment thread cmd/operator-opamp-bridge/internal/agent/kube_resource_key.go Outdated
Comment thread cmd/operator-opamp-bridge/internal/config/config.go
Comment thread cmd/operator-opamp-bridge/internal/standalone/client.go Outdated
Comment thread cmd/operator-opamp-bridge/internal/standalone/client.go Outdated

// triggerRollout patches the pod template of each workload in the list with a
// restart annotation, causing a rolling restart.
func (c *Client) triggerRollout(ctx context.Context, namespace string, workloads []string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like above, i do think we should validate that we have permissions to do this in main

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal here? Is it to fail fast and report this early or to avoid making these calls if we don't have the permissions? Rollout is an optional feature and users might not even enable it. Even if we do check for permissions, we'd have to do it in all namespaces as any configmap in any namespace could be marked to be managed by the user.

Comment thread cmd/operator-opamp-bridge/internal/standalone/schema.go
Comment thread config/standalone-bridge/configmap.yaml
@owais owais force-pushed the standalone-mode branch from 054483e to e2ede27 Compare May 6, 2026 22:03
@owais
Copy link
Copy Markdown
Contributor Author

owais commented May 6, 2026

Addressed some things. Will add more e2e tests.

@owais owais force-pushed the standalone-mode branch 5 times, most recently from edc4894 to fec3816 Compare May 7, 2026 06:01
@owais owais force-pushed the standalone-mode branch 3 times, most recently from 0612642 to 1180faf Compare May 7, 2026 18:10
Introduces a new standalone mode that allows managing collector config from a remote server without requiring Operator CRDs.
Adds the standalone client, a plain collector instance type, Kubernetes RBAC and deployment manifests, and a --mode flag to select between operator (default) and standalone at startup.
@owais owais force-pushed the standalone-mode branch from 1180faf to 73b0c64 Compare May 7, 2026 22:09
@owais owais marked this pull request as draft May 11, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standalone OpAMP-to-Kubernetes binary for updating Collector configs without the operator

3 participants