[CONTP-1609] Auto-inject agent toleration when untaint controller is enabled#3086
Conversation
|
🎯 Code Coverage (details) 🔗 Commit SHA: 87e8b9d | Docs | Datadog PR Page | Give us feedback! |
44c716a to
31dae4d
Compare
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3086 +/- ##
==========================================
+ Coverage 43.64% 43.95% +0.31%
==========================================
Files 350 352 +2
Lines 30075 30289 +214
==========================================
+ Hits 13125 13313 +188
- Misses 16079 16100 +21
- Partials 871 876 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
cd1e16f to
9303804
Compare
OliviaShoup
left a comment
There was a problem hiding this comment.
thanks for the PR! left a comment with a minor suggestion
| func podToleratesAgentNotReadyStartup(tolerations []corev1.Toleration) bool { | ||
| taint := untaint.AgentNotReadyTaint() | ||
| for i := range tolerations { | ||
| if tolerations[i].ToleratesTaint(klog.Background(), &taint, false) { |
There was a problem hiding this comment.
no need to access klog directly, can be passed from the controller.
| experimental.ApplyExperimentalOverrides(objLogger, ddai, podManagers) | ||
|
|
||
| if r.options.UntaintControllerEnabled { | ||
| componentagent.EnsureAgentNotReadyStartupToleration(&podManagers.PodTemplateSpec().Spec) |
There was a problem hiding this comment.
pass objLogger here, it will have relevant attributes. Alternative, pass context check start of the function and create logger inside the function ctrl.LoggerFrom(ctx).WithValues with additional attributes.
| // EnsureAgentNotReadyStartupToleration appends the agent-not-ready Equal toleration | ||
| // when not already tolerated per Kubernetes toleration matching. | ||
| func EnsureAgentNotReadyStartupToleration(spec *corev1.PodSpec) { | ||
| if spec == nil { |
There was a problem hiding this comment.
nit - redundant nil check since PodSpec is struct in PodTemplateSpec and can't be nil here.
| ) | ||
|
|
||
| // AgentNotReadyTaintEffect is the effect for the agent-not-ready startup taint. | ||
| const AgentNotReadyTaintEffect = corev1.TaintEffectNoSchedule |
There was a problem hiding this comment.
nit - doesn't have to be public and can be inlined.
|
|
||
| wantTol := untaint.AgentNotReadyEqualToleration() | ||
| tt := testCase{ | ||
| name: "untaint controller enabled injects agent-not-ready toleration on node agent DS", |
There was a problem hiding this comment.
for completeness would be nice to have case asserting tolerations aren't added when feature is disabled.
Co-authored-by: Olivia Shoup <116908616+OliviaShoup@users.noreply.github.com>
|
@levan-m thanks for the review 🙇 Addressed all your comments! |
What does this PR do?
Updates the datadog agent controller to Auto-inject agent toleration when untaint controller is enabled
Relates to #2753
Motivation
For the untaint controller to work, the agent DaemonSet must tolerate the startup taint (agent.datadoghq.com/not-ready=presence:NoSchedule) — otherwise the agent can never schedule on tainted nodes, the taint is never removed, and all workloads are blocked indefinitely. Without auto-injection this is a silent foot-gun: enabling the flag without manually adding the toleration produces no error but a permanently deadlocked cluster.
When --untaintControllerEnabled=true, the operator automatically injects the toleration into the node agent DaemonSet spec, following the same pattern used by other feature flags that influence agent pod spec assembly (IntrospectionEnabled, DatadogAgentProfileEnabled).
Pass UntaintControllerEnabled through datadogagent.ReconcilerOptions
Inject toleration in the node agent DaemonSet builder (idempotent — no duplicate if user also sets it manually)
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
Follow the same testing instructions as #2753 , but instead of adding the toleration manually to the datadog agent, ensure the agent gets the toleration automatically.
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel