You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update Flaky Tests Management notifications documentation
Add notification types table and document the new "New flaky test detected"
notification type. Restructure the Receive notifications section for clarity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Clarify code owners AND matching behavior for notifications
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix description for successful remediation notification
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Use "state" instead of "status" for flaky test states
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Replace "status" with "state" for flaky test states throughout
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update reference link
* Address PR review feedback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Esther Kim <esther.kim@datadoghq.com>
Copy file name to clipboardExpand all lines: content/en/tests/flaky_management/_index.md
+28-12Lines changed: 28 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,24 +19,24 @@ further_reading:
19
19
20
20
## Overview
21
21
22
-
The [Flaky Tests Management][1] page provides a centralized view to track, triage, and remediate flaky tests across your organization. You can view every test's status along with key impact metrics like number of pipeline failures, CI time wasted, and failure rate.
22
+
The [Flaky Tests Management][1] page provides a centralized view to track, triage, and remediate flaky tests across your organization. You can view every test's state along with key impact metrics like number of pipeline failures, CI time wasted, and failure rate.
23
23
24
24
From this UI, you can act on flaky tests to mitigate their impact. Quarantine or disable problematic tests to keep known flakes from breaking builds, and create cases and Jira issues to track work toward fixes.
25
25
26
26
{{< img src="tests/flaky_management-2.png" alt="Overview of the Flaky Tests Management UI" style="width:100%;" >}}
27
27
28
-
## Change a flaky test's status
28
+
## Change a flaky test's state
29
29
30
-
Use the status drop-down to change how a flaky test is handled in your CI pipeline. This can help reduce CI noise while retaining traceability and control. Available statuses are:
30
+
Use the state drop-down to change how a flaky test is handled in your CI pipeline. This can help reduce CI noise while retaining traceability and control. Available states are:
31
31
32
-
|Status| Description |
32
+
|State | Description |
33
33
| ----------- | ----------- |
34
34
|**Active**| The test is known to be flaky and is running in CI. |
35
35
|**Quarantined**| Keep the test running in the background, but failures don't affect CI status or break pipelines. This is useful for isolating flaky tests without blocking merges. Datadog tags test run events with `@test.test_management.is_quarantined:true` when quarantined. |
36
36
|**Disabled**| Skip the test entirely in CI. Use this when a test is no longer relevant or needs to be temporarily removed from the pipeline. Datadog tags test run events with `@test.test_management.is_disabled:true` when disabled. |
37
-
|**Fixed**| The test has passed consistently and is no longer flaky. If supported, use the [remediation flow](#confirm-fixes-for-flaky-tests) to confirm the fix and automatically apply this status after it is merged into the default branch. |
37
+
|**Fixed**| The test has passed consistently and is no longer flaky. If supported, use the [remediation flow](#confirm-fixes-for-flaky-tests) to confirm the fix and automatically apply this state after it is merged into the default branch. |
38
38
39
-
<divclass="alert alert-info">Status actions have minimum version requirements for each programming language's instrumentation library. See <ahref="#compatibility">Compatibility</a> for details.</div>
39
+
<divclass="alert alert-info">State actions have minimum version requirements for each programming language's instrumentation library. See <ahref="#compatibility">Compatibility</a> for details.</div>
40
40
41
41
## Configure policies to automate the flaky test lifecycle
42
42
@@ -61,7 +61,7 @@ Configure automated Flaky Test Policies to govern how flaky tests are handled in
61
61
<p>Toggle to allow flaky tests to be quarantined for this repository.</p>
62
62
<p>Customize automation rules based on:</p>
63
63
<ul>
64
-
<li><strong>Time</strong>: Quarantine a test if its status is <code>Active</code> for a specified number of days. The rule is triggered every day at 12:15 UTC.</li>
64
+
<li><strong>Time</strong>: Quarantine a test if its state is <code>Active</code> for a specified number of days. The rule is triggered every day at 12:15 UTC.</li>
65
65
<li><strong>Branch</strong>: Quarantine an <code>Active</code> test if it flakes in one or more specified branches.</li>
66
66
<li><strong>Failure rate</strong>: Quarantine an <code>Active</code> test if its failure rate over the last 7 days is greater or equal to the specified threshold. The rule is triggered every 15 minutes.</li>
67
67
</ul>
@@ -73,7 +73,7 @@ Configure automated Flaky Test Policies to govern how flaky tests are handled in
73
73
<p>Toggle to allow flaky tests to be disabled for this repository. You may want to do this after quarantining or to protect specific branches from flakiness.</p>
74
74
<p>Customize automation rules based on:</p>
75
75
<ul>
76
-
<li><strong>Status and time</strong>: Disable a test if it has a specified status for a specified number of days. The rule is triggered every day at 12:30 UTC.</li>
76
+
<li><strong>State and time</strong>: Disable a test if it has a specified state for a specified number of days. The rule is triggered every day at 12:30 UTC.</li>
77
77
<li><strong>Branch</strong>: Disable an <code>Active</code> or <code>Quarantined</code> test if it flakes in one or more specified branches.</li>
78
78
<li><strong>Failure rate</strong>: Disable an <code>Active</code> or <code>Quarantined</code> test if its failure rate over the last 7 days is greater or equal to the specified threshold. The rule is triggered every 15 minutes.</li>
79
79
</ul>
@@ -85,7 +85,7 @@ Configure automated Flaky Test Policies to govern how flaky tests are handled in
85
85
</tr>
86
86
<tr>
87
87
<td><strong>Fixed</strong></td>
88
-
<td>If a flaky test no longer flakes for 30 days, it is automatically moved to Fixed status. This automation is default behavior and can't be customized.</td>
88
+
<td>If a flaky test no longer flakes for 30 days, it is automatically moved to the Fixed state. This automation is default behavior and can't be customized.</td>
89
89
</tr>
90
90
</tbody>
91
91
</table>
@@ -128,7 +128,7 @@ When you fix a flaky test, Test Optimization's remediation flow can confirm the
128
128
- If all retries pass, marks the fix as **in progress** in the Flaky Tests Management UI, associates it with the branch used for the fix, and waits for that branch to be merged.
129
129
- Tags the last test retry with `@test.test_management.attempt_to_fix_passed:true` in test run events.
130
130
- Starts a 14-day [grace period](#grace-period-mechanism) to give time for the fix to propagate everywhere in the repository.
131
-
- If any retry fails, keeps the test's current status (`Active`, `Quarantined`, or `Disabled`).
131
+
- If any retry fails, keeps the test's current state (`Active`, `Quarantined`, or `Disabled`).
132
132
- Tags the last test retry with `@test.test_management.attempt_to_fix_passed:false` in test run events.
133
133
134
134
### Track fixes that are in progress
@@ -201,9 +201,25 @@ Flaky Tests Management uses AI to automatically assign a root cause category to
201
201
202
202
## Receive notifications
203
203
204
-
Set up notifications to track changes to your flaky tests. Whenever a user or a policy changes the state of a flaky test, a message is sent to your selected recipients. You can send notifications to email addresses or Slack channels (see the [Datadog Slack integration][5]), and route messages based on test code owners. If no code owners are specified, all selected recipients are notified of all flaky test changes in the repository. Configure notification for each repository from the [**Flaky Test Policies**][13] page in Software Delivery settings.
204
+
Set up notifications to track changes to your flaky tests. Notifications are sent when:
205
+
- A new flaky test is detected on the default branch of the repository.
206
+
- A user or policy changes the state of a flaky test.
207
+
- The remediation flow for a flaky test succeeds or fails.
205
208
206
-
Notifications are not sent immediately; they are batched every few minutes to reduce noise.
209
+
You can send notifications to email addresses or Slack channels (see the [Datadog Slack integration][5]), and route messages based on test code owners. When multiple code owners are specified, a flaky test must be owned by all specified code owners for the notification rule to match. If no code owners are specified, all selected recipients are notified of all flaky test changes in the repository. Configure notifications for each repository from the [**Flaky Test Policies**][13] page in Software Delivery settings.
210
+
211
+
Notifications are bundled over a short period to reduce noise.
212
+
213
+
### Notification types
214
+
215
+
| Notification type | Description |
216
+
|---|---|
217
+
|**New flaky test detected**| A new flaky test is detected on the default branch of the repository. |
218
+
|**Test quarantined**| A test is quarantined by an automated policy rule (time-based, branch-based, or failure rate). |
219
+
|**Test disabled**| A test is disabled by an automated policy rule (time-based, branch-based, or failure rate). |
220
+
|**Fix successful**| A test passes all retries in the remediation flow and is marked as "fix in progress". |
221
+
|**Fix failed**| A test fails during the remediation flow. |
222
+
|**Manual state change**| A user manually changes the state of a flaky test. |
0 commit comments