[Backport v1.27] Refactor experiment signals#3016
Conversation
* refactor * review fixes * Review suggestions * add cluster uid tag * Fix version check * Update default versions * separate goroutine for acks * Review suggestions * Simplify refactor * fix go.mod * skip checking experiment ID on promote signal * exclude fleet.datadoghq.com annotation from controller revision --------- Co-authored-by: Paul Coignet <paul.coignet@datadoghq.com> Co-authored-by: levan-m <116471169+levan-m@users.noreply.github.com> Co-authored-by: Levan Machablishvili <levan.machablishvili@datadoghq.com> (cherry picked from commit 3df38d6)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14f89d672f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // reads: object identity, annotations, and status.experiment. | ||
| func (d *Daemon) forwardDDAStatusUpdate(obj any) { | ||
| if dda, ok := obj.(*v2alpha1.DatadogAgent); ok { | ||
| d.statusUpdates <- newDDAStatusSnapshot(dda) |
There was a problem hiding this comment.
Make status update forwarding non-blocking
Sending snapshots with a blocking channel write here can stall the DatadogAgent informer under load: onStatusUpdate performs mutex-protected RC updates and Kubernetes client calls, so if processing lags and the 128-slot buffer fills, this handler blocks and back-pressures informer event delivery. In clusters with many DDA updates (or transient API slowness), that can delay or freeze task-state progression and other consumers of the informer stream; use a non-blocking enqueue/drop policy or a decoupled workqueue to avoid blocking informer callbacks.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v1.27 #3016 +/- ##
==========================================
+ Coverage 40.76% 41.36% +0.60%
==========================================
Files 332 334 +2
Lines 28199 28586 +387
==========================================
+ Hits 11495 11826 +331
- Misses 15929 15970 +41
- Partials 775 790 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
🛑 Gate Violations
ℹ️ Info🎯 Code Coverage (details) Useful? React with 👍 / 👎 This comment will be updated automatically if new data arrives.🔗 Commit SHA: 14f89d6 | Docs | Datadog PR Page | Give us feedback! |
Backport 3df38d6 from #2944.
What does this PR do?
Refactor experiment signals for more robustness. Instead of having both daemon and controller write to DDA status, have only controller write to DDA status and daemon modify DDA annotations
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
Write there any instructions and details you may have to test your PR.
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel