Skip to content

Commit edc4894

Browse files
committed
opamp-bridge: add standalone mode for ConfigMaps
Introduces a new standalone mode that allows managing collector config from a remote server without requiring Operator CRDs. Adds the standalone client, a plain collector instance type, Kubernetes RBAC and deployment manifests, and a --mode flag to select between operator (default) and standalone at startup.
1 parent fc03802 commit edc4894

33 files changed

Lines changed: 2290 additions & 191 deletions

.chloggen/standalone-mode.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
2+
change_type: enhancement
3+
4+
# The name of the component, or a single word describing the area of concern, (e.g. collector, target allocator, auto-instrumentation, opamp, github action)
5+
component: opamp-bridge
6+
7+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
8+
note: OpAMP Bridge standalone mode
9+
10+
# One or more tracking issues related to the change
11+
issues: [4913]
12+
13+
# (Optional) One or more lines of additional information to render under the primary note.
14+
# These lines will be padded with 2 spaces and then inserted directly into the document.
15+
# Use pipe (|) for multiline entries.
16+
subtext: |
17+
Standalone mode for OpAMP Bridge allows users to manage collector configuration from a remote
18+
OpAMP server without the need to deploy full Otel Operator.

Makefile

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,26 @@ deploy: install-gateway-api-crds set-image-controller
333333
undeploy: set-image-controller
334334
$(KUSTOMIZE) build config/default | kubectl delete --ignore-not-found=$(ignore-not-found) -f -
335335

336+
##@ Standalone OpAMP Bridge (no operator / CRDs required)
337+
338+
# Deploy the standalone OpAMP bridge into the current Kubernetes context.
339+
# Does not require the operator, CRDs, or cert-manager.
340+
.PHONY: deploy-standalone-bridge
341+
deploy-standalone-bridge: kustomize
342+
cd config/standalone-bridge && $(KUSTOMIZE) edit set image operator-opamp-bridge=${OPERATOROPAMPBRIDGE_IMG}
343+
$(KUSTOMIZE) build config/standalone-bridge | kubectl apply -f -
344+
kubectl rollout status deployment/otel-opamp-bridge-standalone -n opentelemetry-opamp-bridge --timeout=120s
345+
346+
# Undeploy the standalone OpAMP bridge from the current Kubernetes context.
347+
.PHONY: undeploy-standalone-bridge
348+
undeploy-standalone-bridge: kustomize
349+
$(KUSTOMIZE) build config/standalone-bridge | kubectl delete --ignore-not-found=true -f -
350+
351+
# Build, load, and deploy the standalone bridge to a kind cluster.
352+
# Assumes a kind cluster is already running (use start-kind first).
353+
.PHONY: deploy-standalone-bridge-kind
354+
deploy-standalone-bridge-kind: load-image-operator-opamp-bridge deploy-standalone-bridge
355+
336356
# Generates the released manifests
337357
.PHONY: release-artifacts
338358
release-artifacts: set-image-controller
@@ -427,7 +447,7 @@ e2e-multi-instrumentation: chainsaw
427447
# OpAMPBridge CR end-to-tests
428448
.PHONY: e2e-opampbridge
429449
e2e-opampbridge: chainsaw
430-
$(CHAINSAW) test --test-dir ./tests/e2e-opampbridge --report-name e2e-opampbridge
450+
OPERATOROPAMPBRIDGE_IMG=$(OPERATOROPAMPBRIDGE_IMG) $(CHAINSAW) test --test-dir ./tests/e2e-opampbridge --report-name e2e-opampbridge
431451

432452
# end-to-end-test for testing pdb support
433453
.PHONY: e2e-pdb

cmd/operator-opamp-bridge/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,55 @@ There are two main ways to install the OpAMP Bridge:
2020

2121
## Usage
2222

23+
### Standalone mode
24+
25+
Standalone mode lets the bridge manage Collector configuration stored in Kubernetes `ConfigMap` resources, without creating `OpenTelemetryCollector` CRDs. This is useful when the Collector workload is managed outside the operator, but the config still needs to be reported to and updated from an OpAMP server.
26+
27+
Start the bridge with `mode: standalone` in its config file, or pass `--mode=standalone`:
28+
29+
```yaml
30+
endpoint: "<OPAMP_SERVER_ENDPOINT>"
31+
mode: standalone
32+
capabilities:
33+
AcceptsRemoteConfig: true
34+
ReportsEffectiveConfig: true
35+
ReportsRemoteConfig: true
36+
```
37+
38+
In this mode, the bridge watches ConfigMaps labeled with `opentelemetry.io/managed-by: opamp-bridge-standalone`. Each managed ConfigMap is reported to the OpAMP server as `kind/namespace/name`, for example `configmap/default/collector-config`.
39+
40+
```yaml
41+
apiVersion: v1
42+
kind: ConfigMap
43+
metadata:
44+
name: collector-config
45+
namespace: default
46+
labels:
47+
opentelemetry.io/managed-by: opamp-bridge-standalone
48+
annotations:
49+
opentelemetry.io/opamp-rollout-target: Deployment/my-collector
50+
data:
51+
collector.yaml: |
52+
receivers:
53+
otlp:
54+
protocols:
55+
grpc:
56+
http:
57+
exporters:
58+
otlphttp:
59+
endpoint: http://example-collector:4318
60+
service:
61+
pipelines:
62+
traces:
63+
receivers: [otlp]
64+
exporters: [otlphttp]
65+
```
66+
67+
68+
The bridge will create a missing ConfigMap, but it will only update an existing ConfigMap if that managed-by label is present. Remote deletion is not supported in standalone mode.
69+
70+
Standalone mode needs RBAC for ConfigMaps and for any workload kinds used as rollout targets. The repository includes a starter manifest at [`config/standalone-bridge/rbac.yaml`](../../config/standalone-bridge/rbac.yaml).
71+
2372
### OpAMPBridge CRD
2473

2574
The [OpAMPBridge](../../docs/api/opampbridges.md) CRD is used to create an OpAMP Bridge instance.

cmd/operator-opamp-bridge/internal/agent/agent.go

Lines changed: 41 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,18 @@ import (
1717
"github.com/open-telemetry/opamp-go/client/types"
1818
"github.com/open-telemetry/opamp-go/protobufs"
1919
"k8s.io/utils/clock"
20-
"sigs.k8s.io/yaml"
2120

22-
"github.com/open-telemetry/opentelemetry-operator/apis/v1beta1"
2321
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/config"
2422
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/metrics"
2523
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/operator"
2624
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/proxy"
25+
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/resourcekey"
2726
)
2827

2928
type Agent struct {
3029
logger logr.Logger
3130

32-
appliedKeys map[kubeResourceKey]bool
31+
appliedKeys map[resourcekey.Key]bool
3332
clock clock.Clock
3433
startTime uint64
3534
lastHash []byte
@@ -59,7 +58,7 @@ func NewAgent(logger logr.Logger, applier operator.ConfigApplier, cfg *config.Co
5958
applier: applier,
6059
proxy: p,
6160
logger: logger,
62-
appliedKeys: map[kubeResourceKey]bool{},
61+
appliedKeys: map[resourcekey.Key]bool{},
6362
instanceId: cfg.GetInstanceId(),
6463
agentDescription: cfg.GetDescription(),
6564
remoteConfigEnabled: cfg.RemoteConfigEnabled(),
@@ -117,8 +116,8 @@ func (agent *Agent) generateCollectorPoolHealth() (map[string]*protobufs.Compone
117116
healthMap := map[string]*protobufs.ComponentHealth{}
118117
proxiesUsed := make(map[uuid.UUID]struct{}, len(agentsByHostName))
119118
for _, col := range cols {
120-
key := newKubeResourceKey(col.GetNamespace(), col.GetName())
121-
podMap, err := agent.generateCollectorHealth(agent.getCollectorSelector(col), col.GetNamespace())
119+
key := resourcekey.New(col.GetNamespace(), col.GetName(), "")
120+
podMap, err := agent.generateCollectorHealth(col.GetSelectorLabels(), col.GetNamespace())
122121
if err != nil {
123122
return nil, err
124123
}
@@ -132,7 +131,7 @@ func (agent *Agent) generateCollectorPoolHealth() (map[string]*protobufs.Compone
132131
proxiesUsed[uid] = struct{}{}
133132
}
134133
}
135-
podStartTime, err := timeToUnixNanoUnsigned(col.ObjectMeta.GetCreationTimestamp().Time)
134+
podStartTime, err := timeToUnixNanoUnsigned(col.GetCreationTimestamp())
136135
if err != nil {
137136
return nil, err
138137
}
@@ -143,7 +142,7 @@ func (agent *Agent) generateCollectorPoolHealth() (map[string]*protobufs.Compone
143142
healthMap[key.String()] = &protobufs.ComponentHealth{
144143
StartTimeUnixNano: podStartTime,
145144
StatusTimeUnixNano: statusTime,
146-
Status: col.Status.Scale.StatusReplicas,
145+
Status: col.GetStatusReplicas(),
147146
ComponentHealthMap: podMap,
148147
Healthy: isPoolHealthy,
149148
}
@@ -157,28 +156,6 @@ func (agent *Agent) generateCollectorPoolHealth() (map[string]*protobufs.Compone
157156
return healthMap, nil
158157
}
159158

160-
// getCollectorSelector destructures the collectors scale selector if present, it uses the labelmap from the operator.
161-
func (*Agent) getCollectorSelector(col v1beta1.OpenTelemetryCollector) map[string]string {
162-
if col.Status.Scale.Selector != "" {
163-
selMap := map[string]string{}
164-
for kvPair := range strings.SplitSeq(col.Status.Scale.Selector, ",") {
165-
kv := strings.Split(kvPair, "=")
166-
// skip malformed pairs
167-
if len(kv) != 2 {
168-
continue
169-
}
170-
selMap[kv[0]] = kv[1]
171-
}
172-
return selMap
173-
}
174-
return map[string]string{
175-
"app.kubernetes.io/managed-by": "opentelemetry-operator",
176-
"app.kubernetes.io/instance": fmt.Sprintf("%s.%s", col.GetNamespace(), col.GetName()),
177-
"app.kubernetes.io/part-of": "opentelemetry",
178-
"app.kubernetes.io/component": "opentelemetry-collector",
179-
}
180-
}
181-
182159
func (agent *Agent) generateCollectorHealth(selectorLabels map[string]string, namespace string) (map[string]*protobufs.ComponentHealth, error) {
183160
statusTime, err := agent.getCurrentTimeUnixNano()
184161
if err != nil {
@@ -190,7 +167,7 @@ func (agent *Agent) generateCollectorHealth(selectorLabels map[string]string, na
190167
}
191168
healthMap := map[string]*protobufs.ComponentHealth{}
192169
for _, item := range pods.Items {
193-
key := newKubeResourceKey(item.GetNamespace(), item.GetName())
170+
key := resourcekey.New(item.GetNamespace(), item.GetName(), "")
194171
healthy := true
195172
if item.Status.Phase != "Running" {
196173
healthy = false
@@ -348,15 +325,14 @@ func (agent *Agent) getEffectiveConfig(context.Context) (*protobufs.EffectiveCon
348325
}
349326
instanceMap := map[string]*protobufs.AgentConfigFile{}
350327
for _, instance := range instances {
351-
col := instance
352-
marshaled, err := yaml.Marshal(&col)
353-
if err != nil {
354-
agent.logger.Error(err, "failed to marhsal config")
355-
return nil, err
328+
body := instance.GetEffectiveConfig()
329+
if body == nil {
330+
agent.logger.Error(errors.New("nil effective config"), "failed to get effective config",
331+
"name", instance.GetName(), "namespace", instance.GetNamespace())
332+
continue
356333
}
357-
mapKey := newKubeResourceKey(instance.GetNamespace(), instance.GetName())
358-
instanceMap[mapKey.String()] = &protobufs.AgentConfigFile{
359-
Body: marshaled,
334+
instanceMap[instance.GetConfigMapKey().String()] = &protobufs.AgentConfigFile{
335+
Body: body,
360336
ContentType: "yaml",
361337
}
362338
}
@@ -390,11 +366,11 @@ func (agent *Agent) initMeter(settings *protobufs.TelemetryConnectionSettings) {
390366

391367
// applyRemoteConfig receives a remote configuration from a remote server of the following form:
392368
//
393-
// map[name/namespace] -> collector CRD spec
369+
// map[resource key] -> AgentConfigFile body
394370
//
395-
// For every key in the received remote configuration, the agent attempts to apply it to the connected
396-
// Kubernetes cluster. If an agent fails to apply a collector CRD, it will continue to the next entry. The agent will
397-
// store the received configuration hash regardless of application status as per the OpAMP spec.
371+
// For every key in the received remote configuration, the agent attempts to apply it via the configured
372+
// applier. If an entry fails to apply, the agent continues to the next entry. The agent stores the
373+
// received configuration hash regardless of application status, as per the OpAMP spec.
398374
//
399375
// INVARIANT: The caller must verify that config isn't nil _and_ the configuration has changed between calls.
400376
func (agent *Agent) applyRemoteConfig(config *protobufs.AgentRemoteConfig) (*protobufs.RemoteConfigStatus, error) {
@@ -404,22 +380,26 @@ func (agent *Agent) applyRemoteConfig(config *protobufs.AgentRemoteConfig) (*pro
404380
if key == "" || len(file.Body) == 0 {
405381
continue
406382
}
407-
colKey, err := kubeResourceFromKey(key)
383+
colKey, err := resourcekey.Parse(key)
408384
if err != nil {
409385
errs = append(errs, err)
410386
continue
411387
}
412-
err = agent.applier.Apply(colKey.name, colKey.namespace, file)
388+
if err = agent.validateConfigMapKey(colKey); err != nil {
389+
errs = append(errs, err)
390+
continue
391+
}
392+
err = agent.applier.Apply(colKey.Name(), colKey.Namespace(), file)
413393
if err != nil {
414394
errs = append(errs, err)
415395
continue
416396
}
417397
agent.appliedKeys[colKey] = true
418398
}
419399
// Check if anything was deleted
420-
for collectorKey := range agent.appliedKeys {
421-
if _, ok := config.Config.GetConfigMap()[collectorKey.String()]; !ok {
422-
err := agent.applier.Delete(collectorKey.name, collectorKey.namespace)
400+
for key := range agent.appliedKeys {
401+
if _, ok := config.Config.GetConfigMap()[key.String()]; !ok {
402+
err := agent.applier.Delete(key.Name(), key.Namespace())
423403
if err != nil {
424404
errs = append(errs, err)
425405
}
@@ -440,6 +420,19 @@ func (agent *Agent) applyRemoteConfig(config *protobufs.AgentRemoteConfig) (*pro
440420
}, nil
441421
}
442422

423+
func (agent *Agent) validateConfigMapKey(key resourcekey.Key) error {
424+
if agent.config.IsStandaloneMode() {
425+
if key.Kind() != resourcekey.KindConfigMap {
426+
return errors.New("standalone config key must use configmap kind")
427+
}
428+
return nil
429+
}
430+
if key.Kind() != resourcekey.KindOtelCol {
431+
return errors.New("operator config key must use otelcol kind")
432+
}
433+
return nil
434+
}
435+
443436
// Shutdown will stop the OpAMP client gracefully.
444437
func (agent *Agent) Shutdown() {
445438
agent.logger.V(3).Info("Agent shutting down...")

cmd/operator-opamp-bridge/internal/agent/agent_test.go

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ import (
3333
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/config"
3434
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/operator"
3535
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/proxy"
36+
"github.com/open-telemetry/opentelemetry-operator/cmd/operator-opamp-bridge/internal/resourcekey"
3637
)
3738

3839
const (
@@ -45,9 +46,9 @@ const (
4546
otherCollectorName = "other"
4647
thirdCollectorName = "third"
4748
emptyConfigHash = ""
48-
testCollectorKey = testNamespace + "/" + testCollectorName
49-
otherCollectorKey = testNamespace + "/" + otherCollectorName
50-
thirdCollectorKey = otherCollectorName + "/" + thirdCollectorName
49+
testCollectorKey = resourcekey.KindOtelCol + "/" + testNamespace + "/" + testCollectorName
50+
otherCollectorKey = resourcekey.KindOtelCol + "/" + testNamespace + "/" + otherCollectorName
51+
thirdCollectorKey = resourcekey.KindOtelCol + "/" + otherCollectorName + "/" + thirdCollectorName
5152

5253
agentTestFileName = "testdata/agent.yaml"
5354
agentTestFileHttpName = "testdata/agenthttpbasic.yaml"

cmd/operator-opamp-bridge/internal/agent/kube_resource_key.go

Lines changed: 0 additions & 32 deletions
This file was deleted.

0 commit comments

Comments
 (0)