Skip to content

[Backport v1.27] Refactor experiment signals#3016

Merged
levan-m merged 1 commit into
v1.27from
backport-2944-to-v1.27
May 14, 2026
Merged

[Backport v1.27] Refactor experiment signals#3016
levan-m merged 1 commit into
v1.27from
backport-2944-to-v1.27

Conversation

@dd-octo-sts
Copy link
Copy Markdown

@dd-octo-sts dd-octo-sts Bot commented May 14, 2026

Backport 3df38d6 from #2944.


What does this PR do?

Refactor experiment signals for more robustness. Instead of having both daemon and controller write to DDA status, have only controller write to DDA status and daemon modify DDA annotations

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

Write there any instructions and details you may have to test your PR.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

* refactor

* review fixes

* Review suggestions

* add cluster uid tag

* Fix version check

* Update default versions

* separate goroutine for acks

* Review suggestions

* Simplify refactor

* fix go.mod

* skip checking experiment ID on promote signal

* exclude fleet.datadoghq.com annotation from controller revision

---------

Co-authored-by: Paul Coignet <paul.coignet@datadoghq.com>
Co-authored-by: levan-m <116471169+levan-m@users.noreply.github.com>
Co-authored-by: Levan Machablishvili <levan.machablishvili@datadoghq.com>
(cherry picked from commit 3df38d6)
@dd-octo-sts dd-octo-sts Bot requested a review from a team as a code owner May 14, 2026 18:35
@dd-octo-sts dd-octo-sts Bot added backport label added by backport action bot label added by backport bot team/container-platform team/container-autoscaling team/fleet labels May 14, 2026
@dd-octo-sts dd-octo-sts Bot requested review from a team as code owners May 14, 2026 18:35
@dd-octo-sts dd-octo-sts Bot added this to the v1.27.0 milestone May 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14f89d672f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// reads: object identity, annotations, and status.experiment.
func (d *Daemon) forwardDDAStatusUpdate(obj any) {
if dda, ok := obj.(*v2alpha1.DatadogAgent); ok {
d.statusUpdates <- newDDAStatusSnapshot(dda)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make status update forwarding non-blocking

Sending snapshots with a blocking channel write here can stall the DatadogAgent informer under load: onStatusUpdate performs mutex-protected RC updates and Kubernetes client calls, so if processing lags and the 128-slot buffer fills, this handler blocks and back-pressures informer event delivery. In clusters with many DDA updates (or transient API slowness), that can delay or freeze task-state progression and other consumers of the informer stream; use a non-blocking enqueue/drop policy or a decoupled workqueue to avoid blocking informer callbacks.

Useful? React with 👍 / 👎.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

❌ Patch coverage is 79.56081% with 121 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.36%. Comparing base (849be07) to head (14f89d6).

Files with missing lines Patch % Lines
internal/controller/datadogagent/experiment.go 72.18% 30 Missing and 12 partials ⚠️
pkg/fleet/daemon_worker.go 76.22% 30 Missing and 4 partials ⚠️
pkg/fleet/daemon_operations.go 86.85% 14 Missing and 14 partials ⚠️
pkg/fleet/daemon.go 64.86% 12 Missing and 1 partial ⚠️
pkg/remoteconfig/updater.go 80.00% 3 Missing ⚠️
cmd/main.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##            v1.27    #3016      +/-   ##
==========================================
+ Coverage   40.76%   41.36%   +0.60%     
==========================================
  Files         332      334       +2     
  Lines       28199    28586     +387     
==========================================
+ Hits        11495    11826     +331     
- Misses      15929    15970      +41     
- Partials      775      790      +15     
Flag Coverage Δ
unittests 41.36% <79.56%> (+0.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/datadogagent/revision.go 78.46% <100.00%> (+0.33%) ⬆️
pkg/fleet/experiment.go 79.48% <100.00%> (-4.19%) ⬇️
pkg/fleet/remote_config.go 100.00% <100.00%> (ø)
cmd/main.go 6.88% <0.00%> (ø)
pkg/remoteconfig/updater.go 4.02% <80.00%> (+4.02%) ⬆️
pkg/fleet/daemon.go 69.94% <64.86%> (+4.48%) ⬆️
pkg/fleet/daemon_operations.go 86.85% <86.85%> (ø)
pkg/fleet/daemon_worker.go 76.22% <76.22%> (ø)
internal/controller/datadogagent/experiment.go 77.60% <72.18%> (-7.56%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 849be07...14f89d6. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@datadog-prod-us1-3
Copy link
Copy Markdown

Code Coverage

Fix all issues with BitsAI

🛑 Gate Violations

🎯 1 Code Coverage issue detected

A Patch coverage percentage gate may be blocking this PR.

Patch coverage: 78.99% (threshold: 80.00%)

ℹ️ Info

🎯 Code Coverage (details)
Patch Coverage: 78.99%
Overall Coverage: 41.69% (+0.62%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 14f89d6 | Docs | Datadog PR Page | Give us feedback!

@levan-m levan-m merged commit 881aa65 into v1.27 May 14, 2026
68 of 69 checks passed
@levan-m levan-m deleted the backport-2944-to-v1.27 branch May 14, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants