fix(tests): move shared test mixins to tests.common to fix DeviceContext leak by wanghan-iapcm · Pull Request #5344 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-03-26T05:51:34Z

Summary

Move TestCaseSingleFrameWithNlist and get_tols from tests.pt.model to tests.common.test_mixins
Update all pt_expt test imports to use tests.common.test_mixins directly
Simplify tests/pt_expt/conftest.py and remove manual _pop_device_contexts() workarounds

Root cause

tests/pt/__init__.py calls torch.set_default_device("cuda:9999999") to enforce explicit device usage in pt tests. This pushes a DeviceContext onto the torch mode stack. pt_expt descriptor/fitting tests imported TestCaseSingleFrameWithNlist from tests.pt.model.test_env_mat, which triggered tests/pt/__init__.py — leaking the DeviceContext into pt_expt tests.

The leaked DeviceContext caused torch.zeros() calls (without explicit device=) inside AOTInductor's lowering pass and PyTorch's Adam optimizer to target cuda:9999999, crashing on CPU-only CI machines with AssertionError: Torch not compiled with CUDA enabled.

Fix

The shared mixins (TestCaseSingleFrameWithNlist, get_tols) are pure numpy with no torch dependency. Moving them to tests/common/test_mixins.py lets pt_expt tests import them without touching the tests.pt package. The pt tests re-export from the common location for backward compatibility.

Test plan

pytest source/tests/pt_expt/descriptor/test_se_e2_a.py — passes, no DeviceContext leak
pytest source/tests/pt/model/test_env_mat.py — passes (backward compat via re-export)
pytest source/tests/pt/model/test_mlp.py — passes (backward compat via re-export)
Broader pt_expt tests (descriptor, fitting, loss, utils) all pass
Verified: import source.tests.pt_expt.descriptor.test_se_e2_a no longer creates DeviceContext

Summary by CodeRabbit

Tests
- Consolidated shared testing utilities into a centralized module referenced by many test suites for consistent setup and tolerance handling
- Simplified and reduced device-context cleanup behavior in test fixtures
- Updated numerous test imports to use the new shared testing utilities location

…ext leak pt_expt tests imported TestCaseSingleFrameWithNlist and get_tols from tests.pt.model, which triggered tests/pt/__init__.py's torch.set_default_device("cuda:9999999"). This pushed a DeviceContext onto the mode stack, causing torch.zeros() (without device=) to target a fake CUDA device — crashing on CPU-only machines. Fix: move the shared mixins (pure numpy, no torch dependency) to tests/common/test_mixins.py. The pt tests re-export from there for backward compat. pt_expt tests now import directly from tests.common, avoiding the tests.pt package entirely. Also simplify the pt_expt conftest.py and remove the manual _pop_device_contexts() call from test_change_bias.py.

coderabbitai · 2026-03-26T05:56:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fdba37c3-b62d-443e-8c3a-638fc80e6f6a

📥 Commits

Reviewing files that changed from the base of the PR and between 3ef1527 and 118f43a.

📒 Files selected for processing (1)

source/tests/pt_expt/conftest.py

📝 Walkthrough

Walkthrough

Centralized shared test utilities into source/tests/common/test_mixins.py (providing TestCaseSingleFrameWithNlist and get_tols), and updated many tests to import those utilities; also simplified device-context cleanup fixtures in source/tests/pt_expt/conftest.py and removed one _pop_device_contexts() call in a test setup.

Changes

Cohort / File(s)	Summary
Centralized Test Infrastructure `source/tests/common/test_mixins.py`	Added new module defining `TestCaseSingleFrameWithNlist` mixin (setUp builds small nlist-based test system) and `get_tols(prec)` (maps precision strings to rtol/atol).
Core test modules updated `source/tests/pt/model/test_env_mat.py`, `source/tests/pt/model/test_mlp.py`	Removed local definitions of `TestCaseSingleFrameWithNlist` and `get_tols`; replaced with imports from `...common.test_mixins`.
pt_expt descriptor tests `source/tests/pt_expt/descriptor/test_dpa1.py`, `source/tests/pt_expt/descriptor/test_dpa2.py`, `source/tests/pt_expt/descriptor/test_dpa3.py`, `source/tests/pt_expt/descriptor/test_hybrid.py`, `source/tests/pt_expt/descriptor/test_se_atten_v2.py`, `source/tests/pt_expt/descriptor/test_se_e2_a.py`, `source/tests/pt_expt/descriptor/test_se_r.py`, `source/tests/pt_expt/descriptor/test_se_t.py`, `source/tests/pt_expt/descriptor/test_se_t_tebd.py`	Updated imports to source `TestCaseSingleFrameWithNlist` and/or `get_tols` from `...common.test_mixins` instead of previous module locations; no test logic changes.
pt_expt fitting tests `source/tests/pt_expt/fitting/test_dipole_fitting.py`, `source/tests/pt_expt/fitting/test_dos_fitting.py`, `source/tests/pt_expt/fitting/test_ener_fitting.py`, `source/tests/pt_expt/fitting/test_invar_fitting.py`, `source/tests/pt_expt/fitting/test_polar_fitting.py`, `source/tests/pt_expt/fitting/test_property_fitting.py`	Switched `TestCaseSingleFrameWithNlist` imports to `...common.test_mixins`; no other changes.
pt_expt other tests `source/tests/pt_expt/loss/test_ener.py`, `source/tests/pt_expt/utils/test_exclusion_mask.py`	Switched `get_tols` / `TestCaseSingleFrameWithNlist` imports to `...common.test_mixins`; no logic changes.
Fixture and setup adjustments `source/tests/pt_expt/conftest.py`, `source/tests/pt_expt/test_change_bias.py`	Simplified device-context cleanup fixtures in `conftest.py` (removed per-test restore logic and reduced docstring); removed `_pop_device_contexts()` call from `test_change_bias.py` setup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: new backend pytorch exportable. #5194 — touches the same test helpers (TestCaseSingleFrameWithNlist, get_tols) and centralization/usage across tests.
feat(pt_expt): add dp change-bias support #5330 — modifies device-context cleanup in source/tests/pt_expt/conftest.py similar to the fixture changes here.
refact(dpmodel,pt_expt): fitting net #5207 — centralizes test helpers into source/tests/common/test_mixins.py and updates tests to import them.

Suggested reviewers

njzjz
iProzd

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: moving shared test mixins from tests.pt.model to tests.common to resolve a DeviceContext leak in pt_expt tests. It accurately summarizes the key technical change and its purpose.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

source/tests/pt_expt/conftest.py (1)

21-35: Optional cleanup: remove unused return value from _pop_device_contexts.

Now that context restore is gone, the returned popped list is dead state and can be dropped for clarity.

Proposed simplification

-def _pop_device_contexts() -> list:
+def _pop_device_contexts() -> None:
     """Pop all stale DeviceContext modes from the torch function mode stack."""
-    popped = []
     while True:
         modes = _get_current_function_mode_stack()
         if not modes:
             break
         top = modes[-1]
         if isinstance(top, _device.DeviceContext):
             top.__exit__(None, None, None)
-            popped.append(top)
         else:
             break
-    return popped
+    return None

Also applies to: 39-40

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/conftest.py` around lines 21 - 35, The function
_pop_device_contexts currently builds and returns a list named popped but that
return value is unused; remove the popped list and its return, turning the
function into a void operation that simply iterates the mode stack and calls
top.__exit__(None, None, None) for each _device.DeviceContext encountered;
update the function body to drop the popped variable and the final "return
popped". Apply the same simplification to the analogous code block at the other
occurrence noted (lines 39-40) so both places perform side-effect-only context
popping without returning unused state.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@source/tests/pt_expt/conftest.py`:
- Around line 21-35: The function _pop_device_contexts currently builds and
returns a list named popped but that return value is unused; remove the popped
list and its return, turning the function into a void operation that simply
iterates the mode stack and calls top.__exit__(None, None, None) for each
_device.DeviceContext encountered; update the function body to drop the popped
variable and the final "return popped". Apply the same simplification to the
analogous code block at the other occurrence noted (lines 39-40) so both places
perform side-effect-only context popping without returning unused state.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 53d4815d-fcdd-4e31-be0b-e8f7196010ed

📥 Commits

Reviewing files that changed from the base of the PR and between e97967b and 3ef1527.

📒 Files selected for processing (22)

source/tests/common/test_mixins.py
source/tests/pt/model/test_env_mat.py
source/tests/pt/model/test_mlp.py
source/tests/pt_expt/conftest.py
source/tests/pt_expt/descriptor/test_dpa1.py
source/tests/pt_expt/descriptor/test_dpa2.py
source/tests/pt_expt/descriptor/test_dpa3.py
source/tests/pt_expt/descriptor/test_hybrid.py
source/tests/pt_expt/descriptor/test_se_atten_v2.py
source/tests/pt_expt/descriptor/test_se_e2_a.py
source/tests/pt_expt/descriptor/test_se_r.py
source/tests/pt_expt/descriptor/test_se_t.py
source/tests/pt_expt/descriptor/test_se_t_tebd.py
source/tests/pt_expt/fitting/test_dipole_fitting.py
source/tests/pt_expt/fitting/test_dos_fitting.py
source/tests/pt_expt/fitting/test_ener_fitting.py
source/tests/pt_expt/fitting/test_invar_fitting.py
source/tests/pt_expt/fitting/test_polar_fitting.py
source/tests/pt_expt/fitting/test_property_fitting.py
source/tests/pt_expt/loss/test_ener.py
source/tests/pt_expt/test_change_bias.py
source/tests/pt_expt/utils/test_exclusion_mask.py

💤 Files with no reviewable changes (1)

source/tests/pt_expt/test_change_bias.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ef15274bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

codecov · 2026-03-26T06:31:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.28%. Comparing base (e97967b) to head (118f43a).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5344      +/-   ##
==========================================
- Coverage   82.28%   82.28%   -0.01%     
==========================================
  Files         797      797              
  Lines       82100    82101       +1     
  Branches     4003     4004       +1     
==========================================
- Hits        67557    67555       -2     
- Misses      13336    13337       +1     
- Partials     1207     1209       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Reinstate session-scoped fixture to cover mixed test runs where tests/pt/__init__.py may push a DeviceContext before pt_expt tests.

github-actions Bot added the Python label Mar 26, 2026

wanghan-iapcm marked this pull request as draft March 26, 2026 05:51

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 26, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Mar 26, 2026

dosubot Bot added the bug label Mar 26, 2026

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread source/tests/pt_expt/conftest.py

fix(tests): add session-scoped DeviceContext cleanup as safety net

118f43a

Reinstate session-scoped fixture to cover mixed test runs where tests/pt/__init__.py may push a DeviceContext before pt_expt tests.

wanghan-iapcm requested a review from njzjz March 26, 2026 07:01

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 26, 2026

wanghan-iapcm marked this pull request as ready for review March 26, 2026 07:01

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Mar 26, 2026

njzjz approved these changes Mar 27, 2026

View reviewed changes

njzjz added this pull request to the merge queue Mar 27, 2026

Merged via the queue into deepmodeling:master with commit a7e9fed Mar 28, 2026
73 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): move shared test mixins to tests.common to fix DeviceContext leak#5344

fix(tests): move shared test mixins to tests.common to fix DeviceContext leak#5344
njzjz merged 2 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix-device-context-leak

wanghan-iapcm commented Mar 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanghan-iapcm commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

codecov Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wanghan-iapcm commented Mar 26, 2026 •

edited

Loading

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

codecov Bot commented Mar 26, 2026 •

edited

Loading