Docker: Retain recordings for failed sessions only from session capabilities by VietND96 · Pull Request #3111 · SeleniumHQ/docker-selenium

VietND96 · 2026-04-10T23:52:38Z

Thanks for contributing to the Docker-Selenium project!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines, applied for this repository.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

Retain recordings for failed sessions only

In event-driven mode (SE_VIDEO_EVENT_DRIVEN=true, the default), the video service subscribes to the Grid's ZeroMQ event bus and reacts to session lifecycle events in real time. This enables a retain-on-failure strategy: record every session, but automatically discard the video when the session passes and only keep (and upload) recordings from sessions that fail.

Enable it globally with the environment variable:

SE_RETAIN_ON_FAILURE=true

A session is treated as failed when either of the following is true:

The test code fires a session event whose eventType contains a substring from SE_FAILURE_SESSION_EVENTS (default: :failed,:failure,:error,:aborted).
The session closes with an abnormal reason — TIMEOUT, NODE_REMOVED, or NODE_RESTARTED — instead of the normal QUIT_COMMAND.

Environment variable	Default	Description
`SE_RETAIN_ON_FAILURE`	`false`	Discard recordings of sessions that pass. Only recordings from failed sessions are retained on disk and queued for upload.
`SE_FAILURE_SESSION_EVENTS`	`:failed,:failure,:error,:aborted`	Comma-separated substrings. Any session event whose `eventType` contains one of these (case-insensitive) marks the session as failed.

The se:retainOnFailure session capability overrides the global container env var for a specific session. For example, to retain the recording of a single session regardless of the global setting:

options.set_capability('se:retainOnFailure', True)

`se:retainOnFailure` cap	`SE_RETAIN_ON_FAILURE` env	Effective behaviour
`true`	`false` (default)	Retain on failure for this session
`false`	`true`	Always retain for this session
absent	`true`	Retain on failure (global default)
absent	`false` (default)	Always retain (global default)

Firing session events from test code

The Session Event API lets test code push named events directly to the Grid. The video service listens for these events on the ZeroMQ bus and uses them to determine session failure.

Call driver.fire_session_event(eventType, payload) from your test. Any eventType that contains a configured failure substring (e.g. "test:failed" contains ":failed") marks the session as failed.

from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium import webdriver

options = ChromeOptions()
options.set_capability('se:name', 'checkout_flow')
options.set_capability('se:retainOnFailure', True)   # discard video if this session passes

driver = webdriver.Remote(options=options, command_executor="http://localhost:4444")

try:
    driver.get("https://selenium.dev")
    # ... test steps ...
except Exception as exc:
    # "test:failed" contains ":failed" — matches the default SE_FAILURE_SESSION_EVENTS
    driver.fire_session_event("test:failed", {"error": str(exc)})
    raise
finally:
    driver.quit()

Note: If the test catches an exception and still calls driver.quit() normally, the session close reason is QUIT_COMMAND (not abnormal). In that case, firing a failure event before quit() is the only way to mark the session as failed and prevent the recording from being discarded.

So, you can control the retain-on-failure strategy fully from test code via session capabilities and fire session event.

Motivation and Context

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

I have read the contributing document.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

qodo-code-review · 2026-04-10T23:52:57Z

Review Summary by Qodo

Implement retain-on-failure strategy with session capability override

✨ Enhancement

Walkthroughs

Description

• Implement retain-on-failure strategy to discard passing session recordings
• Refactor upload logic to support artifact-agnostic queuing and processing
• Add session capability se:retainOnFailure for per-session override control
• Expand default failure event detection to include :error and :aborted
• Update environment variable naming for clarity: SE_UPLOAD_FAILURE_SESSION_ONLY →
  SE_RETAIN_ON_FAILURE

Diagram

flowchart LR
  SessionCreated["Session Created<br/>with capability"] -->|Check se:retainOnFailure| CapCheck{"Capability<br/>set?"}
  CapCheck -->|Yes| UseCap["Use capability<br/>value"]
  CapCheck -->|No| UseEnv["Use SE_RETAIN_ON_FAILURE<br/>env var"]
  UseCap --> SessionState["Store in<br/>SessionState"]
  UseEnv --> SessionState
  SessionState -->|Recording stops| FailureCheck{"Session<br/>failed?"}
  FailureCheck -->|Yes| Queue["Queue for upload"]
  FailureCheck -->|No| Discard["Discard video<br/>immediately"]
  Queue --> Upload["Upload artifact"]
  Discard --> Cleanup["Cleanup session"]

File Changes

1. Video/video_service.py ✨ Enhancement +88/-62

Refactor upload logic and implement retain-on-failure strategy

Video/video_service.py

2. Video/video_ready.py ✨ Enhancement +18/-6

Update video readiness check for event-driven mode

Video/video_ready.py

3. tests/SeleniumTests/__init__.py 🧪 Tests +12/-0

Add test capability and event firing for retain-on-failure

tests/SeleniumTests/init.py

View more (8)

4. .github/workflows/docker-test.yml ⚙️ Configuration changes +15/-0

Add retain-on-failure test matrix configurations

.github/workflows/docker-test.yml

5. ENV_VARIABLES.md 📝 Documentation +4/-4

Document retain-on-failure and updated environment variables

ENV_VARIABLES.md

6. Video/Dockerfile ⚙️ Configuration changes +2/-2

Update default environment variables for retain-on-failure

Video/Dockerfile

7. scripts/generate_list_env_vars/description.yaml 📝 Documentation +13/-11

Update environment variable descriptions in metadata

scripts/generate_list_env_vars/description.yaml

8. scripts/generate_list_env_vars/value.yaml 📝 Documentation +5/-5

Update environment variable default values in metadata

scripts/generate_list_env_vars/value.yaml

9. Makefile ⚙️ Configuration changes +2/-0

Add TEST_RETAIN_ON_FAILURE to test targets

Makefile

10. README.md 📝 Documentation +65/-1

Add comprehensive retain-on-failure usage documentation

README.md

11. tests/docker-compose-v3-event-driven-arm64.yml ⚙️ Configuration changes +2/-4

Update compose file to use new retain-on-failure variables

tests/docker-compose-v3-event-driven-arm64.yml

qodo-code-review · 2026-04-10T23:52:59Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0) 🎨 UX Issues (0)

🐞\ ≡ Correctness (1) ⚙ Maintainability (1)

📘\ ☼ Reliability (2)

1. Unquoted $GITHUB_ENV redirection 📘 ☼

Description

The new workflow line appends to $GITHUB_ENV without quoting the path, which is brittle and can
break if the path contains spaces or special characters. This violates the requirement to quote
variables/paths in shell scripts to avoid word-splitting/globbing issues.

Code

.github/workflows/docker-test.yml[R259-260]

+          if [ -n "${TEST_RETAIN_ON_FAILURE}" ]; then
+              echo "TEST_RETAIN_ON_FAILURE=${TEST_RETAIN_ON_FAILURE}" >> $GITHUB_ENV

Evidence

PR Compliance ID 2 requires robust shell practices including quoting variables/paths. The added line
writes to $GITHUB_ENV via >> $GITHUB_ENV without quotes, which is explicitly called out as a
failure mode (unquoted paths causing word-splitting).

.github/workflows/docker-test.yml[259-260]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The workflow appends to `GITHUB_ENV` using an unquoted path (`>> $GITHUB_ENV`), which is brittle under shell word-splitting rules.

## Issue Context
GitHub Actions recommends treating `GITHUB_ENV` as a file path; quoting prevents failures if the path ever contains spaces/special characters.

## Fix Focus Areas
- .github/workflows/docker-test.yml[259-261]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Failure event always fired 🐞 ≡

Description

SeleniumGenericTests.tearDown() fires a "test:failed" session event whenever
TEST_RETAIN_ON_FAILURE is enabled, even for passing tests, so sessions are always marked failed
and recordings will never be discarded. This makes CI and local runs unable to validate
retain-on-failure behavior and can cause unnecessary retention/uploads.

Code

tests/SeleniumTests/init.py[R143-148]

+            if TEST_RETAIN_ON_FAILURE:
+                self.driver.fire_session_event(
+                    "test:failed",
+                    {"test": f"{self._testMethodName} ({self.__class__.__name__})"},
+                )
            self.driver.quit()

Evidence
The test suite unconditionally emits a failure event in tearDown when TEST_RETAIN_ON_FAILURE is
true, and the video service considers any eventType containing :failed a failure by default,
setting has_failure_event=True which makes SessionState.is_failed return true.
tests/SeleniumTests/init.py[136-148]
Video/video_service.py[166-173]
Video/video_service.py[297-301]
Video/video_service.py[108-115]
Video/video_service.py[812-823]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`tests/SeleniumTests/__init__.py` fires `driver.fire_session_event("test:failed", ...)` in `tearDown()` whenever `TEST_RETAIN_ON_FAILURE` is enabled, regardless of whether the test actually failed. This forces the video service to mark all sessions as failed and prevents retain-on-failure from ever discarding videos for passing sessions.

### Issue Context
The video service treats event types containing `:failed` as failures by default, so emitting that event for successful tests invalidates the new retain-on-failure behavior and produces misleading CI artifacts.

### Fix Focus Areas
- tests/SeleniumTests/__init__.py[136-148]

### Suggested change
Update `tearDown()` to emit the failure session event **only when the test failed/error’d**. In `unittest`, you can inspect `self._outcome` / `self._outcome.result` (failures + errors) to determine whether the current test had an exception, and only then call `fire_session_event("test:failed", ...)` before `quit()`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. TEST_RETAIN_ON_FAILURE unquoted 📘 ☼

Description

The workflow passes TEST_RETAIN_ON_FAILURE into a multi-line shell command without quoting, which
can cause unintended word-splitting if the value is ever not a simple token. This is fragile shell
practice under the compliance requirement for robust quoting.

Code

.github/workflows/docker-test.yml[R283-286]

            USE_RANDOM_USER_ID=${{ matrix.use-random-user }} VERSION=${BRANCH} BUILD_DATE=${BUILD_DATE} \
            TEST_FIREFOX_INSTALL_LANG_PACKAGE=${TEST_FIREFOX_INSTALL_LANG_PACKAGE} SELENIUM_ENABLE_MANAGED_DOWNLOADS=${SELENIUM_ENABLE_MANAGED_DOWNLOADS} \
+            TEST_RETAIN_ON_FAILURE=${TEST_RETAIN_ON_FAILURE} \
            make ${{ matrix.test-strategy }}

Evidence

PR Compliance ID 2 requires quoting variables appropriately in shell contexts. The added line
injects ${TEST_RETAIN_ON_FAILURE} into a shell command without quotes, which is a common source of
brittle behavior.

.github/workflows/docker-test.yml[283-286]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`TEST_RETAIN_ON_FAILURE` is appended to a shell command line without quotes, risking word-splitting/globbing if its value changes.

## Issue Context
Even if currently set to boolean-like values, quoting is required for robust shell practices and future-proofing.

## Fix Focus Areas
- .github/workflows/docker-test.yml[283-286]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Misleading capability example text 🐞 ⚙

Description

README says setting se:retainOnFailure=true will "retain the recording ... regardless of the
global setting", but the surrounding table and later example indicate true means
discard-on-pass/retain-on-failure. This can cause users to set the capability expecting the opposite
behavior.

Code

README.md[R869-873]

+The `se:retainOnFailure` session **capability** overrides the global container env var for a specific session. For example, to retain the recording of a single session regardless of the global setting:
+
+```python
+options.set_capability('se:retainOnFailure', True)
+```

Evidence

The sentence claims the capability retains recordings regardless of global configuration, but the
immediately following table defines se:retainOnFailure=true as retain-on-failure (discard on
pass), and the later code comment explicitly says it discards if the session passes.

README.md[869-880]
README.md[892-895]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The README text introducing `se:retainOnFailure` describes the example as retaining the recording regardless of global setting, but `se:retainOnFailure=true` actually enables retain-on-failure (discard on pass).

### Issue Context
Immediately below, the behavior table and a later code comment describe the correct semantics, so the intro sentence is the inconsistent part.

### Fix Focus Areas
- README.md[869-873]

### Suggested change
Rewrite the sentence to match the semantics, e.g.:
- "...to enable retain-on-failure for a single session regardless of the global setting" (for `true`),
OR change the example to `se:retainOnFailure=false` if the intent is to demonstrate "always retain" overriding a globally-enabled retain-on-failure.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-10T23:58:15Z

+          if [ -n "${TEST_RETAIN_ON_FAILURE}" ]; then
+              echo "TEST_RETAIN_ON_FAILURE=${TEST_RETAIN_ON_FAILURE}" >> $GITHUB_ENV


1. Unquoted $github_env redirection 📘 Rule violation ☼ Reliability

The new workflow line appends to $GITHUB_ENV without quoting the path, which is brittle and can break if the path contains spaces or special characters. This violates the requirement to quote variables/paths in shell scripts to avoid word-splitting/globbing issues.

Agent Prompt

## Issue description The workflow appends to `GITHUB_ENV` using an unquoted path (`>> $GITHUB_ENV`), which is brittle under shell word-splitting rules. ## Issue Context GitHub Actions recommends treating `GITHUB_ENV` as a file path; quoting prevents failures if the path ever contains spaces/special characters. ## Fix Focus Areas - .github/workflows/docker-test.yml[259-261]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-04-10T23:58:15Z

+            if TEST_RETAIN_ON_FAILURE:
+                self.driver.fire_session_event(
+                    "test:failed",
+                    {"test": f"{self._testMethodName} ({self.__class__.__name__})"},
+                )
            self.driver.quit()


2. Failure event always fired 🐞 Bug ≡ Correctness

SeleniumGenericTests.tearDown() fires a "test:failed" session event whenever TEST_RETAIN_ON_FAILURE is enabled, even for passing tests, so sessions are always marked failed and recordings will never be discarded. This makes CI and local runs unable to validate retain-on-failure behavior and can cause unnecessary retention/uploads.

Agent Prompt

### Issue description `tests/SeleniumTests/__init__.py` fires `driver.fire_session_event("test:failed", ...)` in `tearDown()` whenever `TEST_RETAIN_ON_FAILURE` is enabled, regardless of whether the test actually failed. This forces the video service to mark all sessions as failed and prevents retain-on-failure from ever discarding videos for passing sessions. ### Issue Context The video service treats event types containing `:failed` as failures by default, so emitting that event for successful tests invalidates the new retain-on-failure behavior and produces misleading CI artifacts. ### Fix Focus Areas - tests/SeleniumTests/__init__.py[136-148] ### Suggested change Update `tearDown()` to emit the failure session event **only when the test failed/error’d**. In `unittest`, you can inspect `self._outcome` / `self._outcome.result` (failures + errors) to determine whether the current test had an exception, and only then call `fire_session_event("test:failed", ...)` before `quit()`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…ilities Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>

qodo-code-review bot reviewed Apr 10, 2026

View reviewed changes

VietND96 force-pushed the retain-on-failure branch 2 times, most recently from 639ff8d to f606d8c Compare April 11, 2026 04:28

Docker: Retain recordings for failed sessions only from session capab…

a5fee57

…ilities Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>

VietND96 force-pushed the retain-on-failure branch from f606d8c to a5fee57 Compare April 11, 2026 18:03

Merge branch 'trunk' into retain-on-failure

ccdefeb

VietND96 merged commit 03f439f into trunk Apr 12, 2026
56 of 57 checks passed

VietND96 deleted the retain-on-failure branch April 12, 2026 04:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docker: Retain recordings for failed sessions only from session capabilities#3111

Docker: Retain recordings for failed sessions only from session capabilities#3111
VietND96 merged 2 commits intotrunkfrom
retain-on-failure

VietND96 commented Apr 10, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot commented Apr 10, 2026

Uh oh!

qodo-code-review bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot Apr 10, 2026

Uh oh!

qodo-code-review bot Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if [ -n "${TEST_RETAIN_ON_FAILURE}" ]; then
		echo "TEST_RETAIN_ON_FAILURE=${TEST_RETAIN_ON_FAILURE}" >> $GITHUB_ENV

Uh oh!

Conversation

VietND96 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Retain recordings for failed sessions only

Firing session events from test code

Motivation and Context

Types of changes

Checklist

Uh oh!

qodo-code-review bot commented Apr 10, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

VietND96 commented Apr 10, 2026 •

edited

Loading

qodo-code-review bot commented Apr 10, 2026 •

edited

Loading