perf(pt2): optimize .pt2 C++ inference path by wanghan-iapcm · Pull Request #5407 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-04-20T08:24:13Z

Summary

Replace CPU-side buildTypeSortedNlist with createNlistTensor(data, expected_nnei) — avoids distance computation and type sorting every step; model's compiled format_nlist handles this on-device
Export with do_atomic_virial=False by default — avoids 3 extra torch.autograd.grad backward passes; add --atomic-virial flag to dp convert-backend
Cache mapping_tensor as member variable, only rebuild when ago == 0
Store nnei and do_atomic_virial in .pt2 metadata for C++ to read at init
Make nnei dynamic in torch.export — compiled graph accepts variable-size neighbor lists via internal padding + sort branch

Benchmark (V100-SXM2-16GB, 192-atom water, LAMMPS MD)

Before this PR

.pt2 was 9x slower than .pth due to CPU-side nlist sorting, baked-in atomic virial backward passes, and excessive clones:

Atoms	.pth (ms/step)	.pt2 (ms/step)	.pt2/.pth
192	11	97	8.8x

After this PR

DPA1 L0 (se_atten nlayer=0)

Atoms	.pth (ms)	.pt2 (ms)	.pt2/.pth
192	5.58	3.30	0.59x
384	6.73	5.39	0.80x
768	10.9	9.54	0.88x
1536	19.3	17.9	0.93x
3072	36.5	34.8	0.95x
6144	72.0	68.4	0.95x
12288	140	137	0.97x

DPA1 L2 (se_atten nlayer=2)

Atoms	.pth (ms)	.pt2 (ms)	.pt2/.pth
192	13.0	7.77	0.60x
384	22.3	13.6	0.61x
768	41.0	25.3	0.62x
1536	77.7	48.6	0.63x

DPA2 (repinit + repformer)

Atoms	.pth (ms)	.pt2 (ms)	.pt2/.pth
192	28.5	14.1	0.49x
384	34.6	25.0	0.72x
768	60.5	47.4	0.78x
1536	112.9	91.2	0.81x

More complex models benefit more from AOTInductor: DPA2 19-51% faster, DPA1 L2 37-40% faster, DPA1 L0 3-41% faster.

Test plan

All Python export/make_fx tests pass (74 tests)
All Python model tests pass
All C++ ctest pass (0 failures)
All 37 LAMMPS .pt2 tests pass
V100 benchmark confirms speedup

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added --atomic-virial command-line flag to enable atomic virial correction during model conversion and export operations
- Models exported with this feature now include per-atom virial contributions for improved computational accuracy
- Atomic virial support available for all exportable model formats

The reshape-first approach changed the error from a custom ValueError to a generic reshape error, breaking test_self_exception assertions.

- Remove unused _min_nbor_dist variables in main.py - Remove unused _MULTITASK_JSON/_MULTITASK_SHAREFIT_JSON globals - Fix duplicate unittest import (use unittest.mock.patch directly) - Bind test socket to 127.0.0.1 instead of all interfaces - Remove redundant nframes assignment in _compile_model

Under DDP self.wrapper is DistributedDataParallel which has no .model attribute. Use .module to access the underlying ModelWrapper.

Cover the _compile_model DDP unwrap fix with single-task and multi-task tests that enable_compile=True under 2-rank gloo DDP.

Replace aot_eager+padding+manual recompile with symbolic make_fx + torch.compile(backend="inductor", dynamic=True). The compiled graph natively handles varying nframes/nloc/nall so the per-batch padding and runtime _recompile pass can be removed. Use a trace-time nframes of 7 (prime) and reshape with -1 in dpmodel (general_fitting, env_mat) to prevent PyTorch's symbolic tracer from unifying the batch dim with numb_fparam / numb_aparam / ntypes / dim_case_embd. Add TestCompiledVaryingNframesWithParams covering collisions with fparam/aparam, and TestCompileCaseEmbdVaryingNframes covering dim_case_embd > 0 with runtime nframes matching the embed dim.

…modeling#5393 Add silut/custom_silu support to _torch_activation using native torch ops (sigmoid, tanh, where) so the custom silu stays traceable by make_fx / torch.export. Cross-backend consistency tests cover multiple thresholds across the silu/tanh branches, and a pt_expt unit file exercises default/custom threshold, gradient flow, make_fx, and torch.export. Also port DescrptBlockRepformers accessor tests (get_rcut_smth, get_env_protection). The underlying accessor methods already exist on this branch; these tests guard against regressions.

Extend the compiled-vs-uncompiled assertions in TestCompiledConsistency (single-task) and _check_compile_correctness (multi-task) to also cover ``atom_energy`` and the reduced ``virial``. Atomic virial is intentionally not exercised because training never sets ``do_atomic_virial=True``.

…deling#5393 Adds the remaining tests from PR deepmodeling#5393 that were not yet on this branch: ``test_training_loop_compiled_silu`` (silu activation under torch.compile) and ``TestCompiledVaryingNatoms`` (compiled training across systems with different atom counts). Also drops a stray unused ``threshold`` variable in ``test_silut_below_threshold_is_silu`` to match the upstream PR.

Replace the two finite-loss smoke tests with a single test that builds both trainers, syncs weights, and per-step asserts identical predictions, loss, and per-parameter gradients (second-order through the force loss). Also add a silu full-model consistency test and write virial.npy in the small synthetic system so the virial passthrough is exercised on every step. Factor the prediction/grad comparison loops into shared helpers.

Parametrize TestCompiledVaryingNatoms over se_e2_a, DPA2 and DPA3 with strict atol=rtol=1e-10 on float64 (machine epsilon). DPA1 (se_atten) is intentionally omitted: its compiled path is intermittently incorrect (~20% of compiles produce grad diffs up to 0.67 at the first embedding layer), and including it would have required masking the bug with a loose tolerance.

Enable use_three_body=True on the DPA2 varying-natoms test so the compiled three-body neighbor path is also covered. three_body_rcut=3.0 matches the repformer rcut and is large enough to find neighbors in the 6-atom small system (~2.75Å nearest-neighbor distance).

…ckward decomp Revert (-1, nloc*nnei, ...) reshapes back to (nf, -1, ...) in env_mat.py and general_fitting.py. The -1-for-nf pattern breaks zero-atom systems: numpy cannot infer -1 when other dims multiply to zero (0/0), and torch.export shape assertions hit Mod(0,0). Using nf is safe because _TRACE_NFRAMES=7 already prevents symbolic-tracer specialisation during training compile. Add silu_backward decomposition table to make_fx in training.py so inductor can compile second-order gradients through silu without requiring a fused higher-order derivative kernel.

Replace assert with if/raise ValueError for user-facing config validation (data_stat_protect, finetune branch/head checks). Wrap train() in try/finally for destroy_process_group cleanup. Add parents=True, exist_ok=True to stat_file mkdir. Add strict=True to zip() calls. Fix minor test issues.

Match the dpmodel try/except pattern so shape mismatches produce a clear error instead of a raw RuntimeError from torch.view.

DPA1's se_atten attention path produces incorrect force gradients under inductor compile. Add get_numb_attn_layer() API and isinstance guard to reject attn_layer>0 at compile time with a clear error message. DPA1 with attn_layer=0 compiles correctly and is now tested. The guard also covers se_atten_v2 which inherits from DPA1.

DPA1 with se_atten attention layers compiles correctly under inductor+dynamic — the _check_varying_natoms test passes at 1e-10 tolerance for attn_layer=2. The guard added in 1e694a3 was based on an inconclusive diagnostic; replace the rejection test with a compile-correctness test for DPA1 with attention.

The compile guard that used this method was removed in 6d39ddf.

Use ``pt/model/water`` instead of the ``pt/water`` symlink in test_fitting_stat.py to avoid FileNotFoundError on CI. Remove unused ``get_numb_attn_layer()`` from DescrptDPA1.

…faults After Graph.erase_node() stale C-level prev/next pointers may remain on neighbouring Node objects. Dynamo re-tracing can dereference them and segfault. Rebuild into a fresh graph to eliminate stale pointers.

- max_fusion_size 8 → 64 to avoid scheduler timeouts on large descriptors - add triton.mix_order_reduction=False for PyTorch <=2.11 bugs (pytorch/pytorch#174379, #178080, #179494) - hardcode defaults, let user compile_options override per-key

DDPOptimizer splits the inner compiled graph at bucket boundaries, producing subgraph outputs with symbolic integers that crash AOT Autograd (pytorch/pytorch#134182).

…ests

max_fusion_size=64 causes DPA3 force divergence and triton.mix_order_reduction=False causes DPA1-attention divergence (both on CPU, float64, 1e-10 tolerance). Revert to max_fusion_size=8 and remove mix_order_reduction (kept as comment for GPU users). Also revert test_fitting_stat.py path back to pt/water symlink which was confirmed working at c2efbf1.

Remove os.environ["DEVICE"] = "cpu" from all worker functions and replace hardcoded backend="gloo" with auto-detected _DDP_BACKEND (nccl on CUDA, gloo on CPU). env.py now evaluates DEVICE naturally based on hardware availability.

…links Move torch._dynamo.config.optimize_ddp = False from module level into _compile_model() so it only applies when compile is active. Resolve symlinks in test_fitting_stat.py data paths for reliable CI access.

TestFparam.tearDown deleted the committed fparam.npy, breaking other tests (e.g. test_fitting_stat) running later in the same CI shard. Also revert the unnecessary .resolve() workaround in test_fitting_stat.

NCCL rejects two ranks sharing the same GPU device, causing all DDP tests to fail on single-GPU CI runners. Skip the entire module when the backend is NCCL and device_count < 2.

Three optimizations for .pt2 (AOTInductor) C++ inference: 1. Replace buildTypeSortedNlist with createNlistTensor: avoids expensive CPU-side distance computation and type-based sorting. The .pt2 model's compiled graph contains format_nlist which handles distance sorting on-device. The new createNlistTensor pads/truncates to the expected nnei (= sum(sel)) stored in model metadata, since torch.export specializes nnei as a static dimension. 2. Export with do_atomic_virial=False by default: avoids 3 extra torch.autograd.grad backward passes in atomic_virial_corr(). The reduced virial (energy_derv_c_redu) is still correct without the correction (it sums to zero). Add --atomic-virial flag to dp convert-backend for users who need per-atom virial (~2.5x cost). C++ raises error if atomic virial requested but model lacks it. 3. Cache mapping tensor: only rebuild when ago==0 (nlist updated), avoiding redundant CPU-to-device copies every step. Also adds nnei and do_atomic_virial to .pt2 metadata for C++ to read.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7739fadaf0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T08:31:07Z

+    int ncopy = std::min(static_cast<int>(data[ii].size()), expected_nnei);
+    for (int jj = 0; jj < ncopy; ++jj) {
+      flat_data[static_cast<size_t>(ii) * expected_nnei + jj] = data[ii][jj];


Keep all neighbors for model-side sorting

createNlistTensor truncates each atom’s neighbor row to expected_nnei before inference, but forward_lower only performs distance-based sorting/truncation when input nlist.shape[-1] > sum(sel) (otherwise it passes through). In the new path, the model always receives exactly sum(sel) entries, so unsorted LAMMPS neighbor order is never re-sorted and atoms with more than sum(sel) neighbors can drop closer neighbors arbitrarily, changing energies/forces. This affects any system where raw neighbor counts exceed the selected neighbor budget.

Useful? React with 👍 / 👎.

Fixed in b7509db — nlist nnei dimension is now dynamic in the .pt2 export. createNlistTensor pads to nnei+1 (never truncates), so the compiled graph's _format_nlist always enters the distance-sort branch on GPU.

coderabbitai · 2026-04-20T08:36:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces atomic virial correction support for PyTorch-exported models (.pt2 and .pte). It threads an atomic_virial flag from the CLI through the conversion API, serialization pipeline, and C++ inference layer while generalizing neighbor-list handling from type-sorted to padded format.

Changes

Cohort / File(s)	Summary
CLI & Conversion API `deepmd/main.py`, `deepmd/entrypoints/convert_backend.py`	Added `--atomic-virial` CLI flag and extended `convert_backend()` signature with keyword-only `atomic_virial: bool = False` parameter; function inspects `out_hook` signature at runtime to conditionally pass the flag.
Serialization Infrastructure `deepmd/pt_expt/utils/serialization.py`	Generalized shape assertion stripping to all models; modified `nlist` padding with `-1` columns based on `nnei = sum(sel)`; added `model_nnei` parameter to `_build_dynamic_shapes`; threaded `do_atomic_virial` parameter through `deserialize_to_file`, `_trace_and_export`, and file-type-specific functions; recorded `do_atomic_virial` in metadata.
C++ Header Changes `source/api_cc/include/DeepPotPTExpt.h`, `source/api_cc/include/DeepSpinPTExpt.h`	Replaced private members `mixed_types` and `sel` with `do_atomic_virial` (bool) and `nnei` (int); added `mapping_tensor` caching for LAMMPS evaluation path.
C++ Implementation – Neighbor List `source/api_cc/src/commonPTExpt.h`	Removed `buildTypeSortedNlist` template function; added `createNlistTensor` function that flattens jagged neighbor indices into fixed-width `[1, nloc, nnei]` tensor with `-1` padding, no distance sorting, returns `int32` tensor.
C++ Implementation – Model Inference `source/api_cc/src/DeepPotPTExpt.cc`, `source/api_cc/src/DeepSpinPTExpt.cc`	Replaced `buildTypeSortedNlist` calls with `createNlistTensor`; removed metadata reading of `mixed_types`/`sel`; added parsing of `do_atomic_virial` (default `true`) and `nnei`; added fail-fast checks throwing `deepmd_exception` when atomic output requested but `do_atomic_virial=false`; reorganized mapping-tensor rebuild to only occur when `ago == 0`.
Model Tracing Infrastructure `deepmd/pt_expt/model/make_model.py`, `deepmd/pt_expt/model/spin_model.py`	Modified `forward_common_lower_exportable` to concatenate additional `-1` padding columns to `nlist` inside traced function; temporarily override `need_sorted_nlist_for_lower` to force sorted format during tracing; restore override in `finally` block.
Miscellaneous Python `deepmd/pt_expt/train/training.py`	Removed blank line between warning log and subsequent data-fetching call (formatting only).
Test Model Generation `source/tests/infer/gen_dpa1.py`, `source/tests/infer/gen_dpa2.py`, `source/tests/infer/gen_dpa3.py`, `source/tests/infer/gen_fparam_aparam.py`, `source/tests/infer/gen_model_devi.py`, `source/tests/infer/gen_sea.py`, `source/tests/infer/gen_spin.py`, `source/tests/infer/gen_spin_model_devi.py`	Updated `.pt2` export calls to pass `do_atomic_virial=True` (or `atomic_virial=True` for `convert_backend`) to enable atomic virial handling in generated test artifacts.
Test Infrastructure `source/tests/pt_expt/export_helpers.py`, `source/tests/pt_expt/model/test_ener_model.py`, `source/tests/pt_expt/model/test_export_pipeline.py`, `source/tests/pt_expt/infer/test_deep_eval.py`, `source/tests/infer/case.py`, `source/tests/pt_expt/test_training.py`	Applied `_strip_shape_assertions` in export test helpers; extended `_build_dynamic_shapes` calls with `model_nnei` parameter; updated `Case.get_model()` to conditionally pass `atomic_virial=True` for `.pte`/`.pt2` suffixes; updated test deserialization calls to use `do_atomic_virial=True`; added DPA1 attention descriptor config constant.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI Interface
    participant API as convert_backend API
    participant Serialize as Serialization
    participant Export as Torch Export
    participant CPP as C++ Inference
    participant Model as Model Forward

    CLI->>API: --atomic-virial flag
    API->>Serialize: atomic_virial parameter
    Serialize->>Serialize: Inspect out_hook signature
    Serialize->>Export: Pad nlist with -1 columns
    Export->>Export: Strip shape assertions
    Export->>Export: Build dynamic shapes with model_nnei
    Export->>Export: Trace forward_common_lower_exportable
    Export->>Serialize: Record do_atomic_virial in metadata
    Serialize->>CPP: Load do_atomic_virial & nnei from metadata
    CPP->>CPP: createNlistTensor(data, nnei+1)
    CPP->>Model: Check: atomic requested && !do_atomic_virial?
    alt atomic==true && do_atomic_virial==false
        CPP->>CPP: Throw deepmd_exception
    else Valid state
        CPP->>Model: Forward with padded nlist
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes span multiple heterogeneous domains (CLI, Python serialization, C++ inference, model tracing, and test infrastructure), introduce new neighbor-list tensor construction logic, and require understanding of the export/tracing pipeline. While individual changes follow a consistent pattern of threading the atomic_virial flag, the breadth and interaction between C++ refactoring and Python serialization logic necessitate careful review.

Possibly related PRs

feat(pt_expt): enhance per-module export tests + fix nframes symbol collision #5367: Modifies deepmd/pt_expt/utils/serialization.py including _trace_and_export changes directly overlapping with atomic virial threading.
refactor(pt_expt): use model API for inference, consistent file naming #5354: Modifies pt_expt serialization API and metadata layout (deserialize_to_file signature and archive structure).
feat(pt_expt): add eval_typeebd, eval_descriptor, eval_fitting_last_layer #5391: Modifies deepmd/pt_expt/model/spin_model.py SpinModel class; same file affected by nlist padding and tracing override changes in this PR.

Suggested labels

LAMMPS, DeepPot, export

Suggested reviewers

njzjz
OutisLi
iProzd

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the PR: optimizing the .pt2 C++ inference path through performance improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

source/tests/pt_expt/test_training.py (1)
64-74: Use the new constant in test_compile_warns_dpa1_with_attention to eliminate duplication.

The descriptor dictionary defined inline at lines 1211-1221 in test_compile_warns_dpa1_with_attention has identical values to this new constant. Consider refactoring that test to use _DESCRIPTOR_DPA1_WITH_ATTN instead of the inline definition to follow the DRY principle.
♻️ Refactor the test to use the constant

At lines 1211-1221, replace the inline descriptor definition with the constant:
-        descriptor = {
-            "type": "dpa1",
-            "sel": 12,
-            "rcut_smth": 0.50,
-            "rcut": 3.00,
-            "neuron": [8, 16],
-            "axis_neuron": 4,
-            "attn_layer": 2,
-            "precision": "float64",
-            "seed": 1,
-        }
+        descriptor = _DESCRIPTOR_DPA1_WITH_ATTN
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/test_training.py` around lines 64 - 74, The
test_compile_warns_dpa1_with_attention test contains an inline descriptor
identical to the module-level _DESCRIPTOR_DPA1_WITH_ATTN constant; replace the
inline dictionary in test_compile_warns_dpa1_with_attention with a reference to
_DESCRIPTOR_DPA1_WITH_ATTN so the test uses the shared constant (ensure you
import or reference the symbol in the same scope and remove the duplicated
literal).
source/tests/infer/gen_spin.py (1)
112-115: Asymmetric atomic-virial setting across .pth and .pt2.

The .pth export at Line 112 omits atomic_virial=True, while the .pt2 export at Line 115 passes it. Parity verification downstream (Lines 186-195) compares energies/forces/force_mag but not virials, so this is unlikely to cause test failures. However, if future parity checks extend to (atomic) virials, the two exports would diverge. Consider passing the flag consistently, or add a brief comment noting the intentional asymmetry (i.e., .pth path always computes atomic virial regardless of flag).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_spin.py` around lines 112 - 115, The two calls to
convert_backend are inconsistent: convert_backend(INPUT=yaml_path,
OUTPUT=pth_path) omits atomic_virial while convert_backend(INPUT=yaml_path,
OUTPUT=pt2_path, atomic_virial=True) sets it, which will produce differing
virial outputs; update the first call that writes pth_path to include
atomic_virial=True for parity (or, if the asymmetry is intentional, add a
concise inline comment next to the convert_backend call for pth_path explaining
that the .pth export intentionally omits the flag because it always computes
atomic virial), referencing the convert_backend function and the
INPUT/OUTPUT/atomic_virial parameters and the pth_path/pt2_path variables so
reviewers can locate the change.
deepmd/main.py (1)
918-925: LGTM. Flag name, action, default, and help text are consistent with the signature of convert_backend(atomic_virial: bool = False, ...); argparse will populate args.atomic_virial, which the downstream deepmd_main dispatch can forward as a kwarg.

One minor nit: the help text states "~2.5x inference cost," but the PR description quotes three extra torch.autograd.grad backward passes in atomic_virial_corr(). If that multiplier was measured on a specific model/device, consider softening to "~2-3x" or "significant" to avoid implying a universal figure; otherwise leave as is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/main.py` around lines 918 - 925, The help text for the parser argument
defined in parser_convert_backend.add_argument("--atomic-virial", ...) claims a
specific "~2.5x inference cost" which may be misleading; update the help string
for atomic_virial to soften the performance claim (e.g., "approximately 2–3x
inference cost" or "significant inference overhead") so it doesn't imply a
universal measurement—edit the help parameter where
parser_convert_backend.add_argument for "--atomic-virial" is defined to use the
revised wording.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/entrypoints/convert_backend.py`:
- Around line 35-42: The conversion currently silently ignores the
--atomic-virial flag when the backend hook (out_hook) does not accept
do_atomic_virial; change this to fail fast: after obtaining sig =
inspect.signature(out_hook) check if atomic_virial is True and
"do_atomic_virial" not in sig.parameters, then raise a clear error (e.g.,
SystemExit or ValueError) explaining that the target backend does not support
--atomic-virial rather than calling out_hook without the parameter; otherwise
proceed to call out_hook(OUTPUT, data, do_atomic_virial=atomic_virial) or
out_hook(OUTPUT, data) as before.

In `@source/api_cc/src/commonPTExpt.h`:
- Around line 461-472: createNlistTensor silently truncates rows longer than
expected_nnei, which can change neighbor selection versus a sort-then-truncate
strategy; modify createNlistTensor to validate each row size and fail fast
(e.g., throw or assert) if any data[ii].size() > expected_nnei instead of using
std::min truncation. Update the function that builds flat_data
(createNlistTensor) to check row length before copying, and include a clear
error message referencing expected_nnei and the offending row size; this keeps
callers (and upstream padding()/format_nlist logic) from accidentally producing
incorrect neighbor sets.

---

Nitpick comments:
In `@deepmd/main.py`:
- Around line 918-925: The help text for the parser argument defined in
parser_convert_backend.add_argument("--atomic-virial", ...) claims a specific
"~2.5x inference cost" which may be misleading; update the help string for
atomic_virial to soften the performance claim (e.g., "approximately 2–3x
inference cost" or "significant inference overhead") so it doesn't imply a
universal measurement—edit the help parameter where
parser_convert_backend.add_argument for "--atomic-virial" is defined to use the
revised wording.

In `@source/tests/infer/gen_spin.py`:
- Around line 112-115: The two calls to convert_backend are inconsistent:
convert_backend(INPUT=yaml_path, OUTPUT=pth_path) omits atomic_virial while
convert_backend(INPUT=yaml_path, OUTPUT=pt2_path, atomic_virial=True) sets it,
which will produce differing virial outputs; update the first call that writes
pth_path to include atomic_virial=True for parity (or, if the asymmetry is
intentional, add a concise inline comment next to the convert_backend call for
pth_path explaining that the .pth export intentionally omits the flag because it
always computes atomic virial), referencing the convert_backend function and the
INPUT/OUTPUT/atomic_virial parameters and the pth_path/pt2_path variables so
reviewers can locate the change.

In `@source/tests/pt_expt/test_training.py`:
- Around line 64-74: The test_compile_warns_dpa1_with_attention test contains an
inline descriptor identical to the module-level _DESCRIPTOR_DPA1_WITH_ATTN
constant; replace the inline dictionary in
test_compile_warns_dpa1_with_attention with a reference to
_DESCRIPTOR_DPA1_WITH_ATTN so the test uses the shared constant (ensure you
import or reference the symbol in the same scope and remove the duplicated
literal).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e15a07e8-f80f-46d0-83be-1d11ef300a67

📥 Commits

Reviewing files that changed from the base of the PR and between 3f91293 and 19272c2.

📒 Files selected for processing (18)

deepmd/entrypoints/convert_backend.py
deepmd/main.py
deepmd/pt_expt/train/training.py
deepmd/pt_expt/utils/serialization.py
source/api_cc/include/DeepPotPTExpt.h
source/api_cc/include/DeepSpinPTExpt.h
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/src/DeepSpinPTExpt.cc
source/api_cc/src/commonPTExpt.h
source/tests/infer/gen_dpa1.py
source/tests/infer/gen_dpa2.py
source/tests/infer/gen_dpa3.py
source/tests/infer/gen_fparam_aparam.py
source/tests/infer/gen_model_devi.py
source/tests/infer/gen_sea.py
source/tests/infer/gen_spin.py
source/tests/infer/gen_spin_model_devi.py
source/tests/pt_expt/test_training.py

💤 Files with no reviewable changes (1)

deepmd/pt_expt/train/training.py

    "seed": 1,
 }

+_DESCRIPTOR_DPA1_WITH_ATTN = {


codecov · 2026-04-20T09:45:38Z

Codecov Report

❌ Patch coverage is 66.66667% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.45%. Comparing base (3f91293) to head (711a1f4).

Files with missing lines	Patch %	Lines
source/api_cc/src/DeepSpinPTExpt.cc	40.62%	14 Missing and 5 partials ⚠️
source/api_cc/src/DeepPotPTExpt.cc	59.37%	6 Missing and 7 partials ⚠️
source/api_cc/src/commonPTExpt.h	81.81%	0 Missing and 2 partials ⚠️
deepmd/entrypoints/convert_backend.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5407      +/-   ##
==========================================
- Coverage   80.46%   80.45%   -0.02%     
==========================================
  Files         820      820              
  Lines       86005    85994      -11     
  Branches     4139     4139              
==========================================
- Hits        69207    69185      -22     
- Misses      15521    15535      +14     
+ Partials     1277     1274       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Instead of truncating the neighbor list to sum(sel) in C++ (which drops neighbors without distance sorting), make the nnei dimension dynamic in torch.export. The .pt2 model's compiled format_nlist now sorts by distance on-device and truncates to sum(sel), matching the .pth behavior. C++ createNlistTensor pads to at least nnei+1 columns so the format_nlist sort branch is always active.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

source/api_cc/src/DeepSpinPTExpt.cc (2)

322-326: Also cache firstneigh_tensor when ago != 0.

nlist_data.jlist is only mutated inside the if (ago == 0) block, so for subsequent MD steps with the same neighbor list this rebuilds the exact same [1, nloc, nnei+1] int32 tensor, materializes it on CPU, and copies it to the device every step. Same rationale as the mapping_tensor caching just above — worth extending to avoid the per-step H2D copy.

♻️ Proposed caching (mirrors mapping_tensor)

   if (ago == 0) {
     nlist_data.copy_from_nlist(lmp_list, nall - nghost);
     nlist_data.shuffle_exclude_empty(fwd_map);
     nlist_data.padding();
+    firstneigh_tensor = createNlistTensor(nlist_data.jlist, nnei + 1)
+                            .to(torch::kInt64)
+                            .to(device);
     // Rebuild mapping tensor only when nlist is updated (ago == 0).
     ...
   }
-  // Build raw nlist tensor ...
-  at::Tensor firstneigh_tensor = createNlistTensor(nlist_data.jlist, nnei + 1)
-                                     .to(torch::kInt64)
-                                     .to(device);

(Declare firstneigh_tensor as a member alongside mapping_tensor.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/src/DeepSpinPTExpt.cc` around lines 322 - 326, The code
currently rebuilds and H2D-copies firstneigh_tensor each step even when ago !=
0; change firstneigh_tensor to be a persistent member (like mapping_tensor),
only recreate and assign it inside the block where ago == 0 (using
createNlistTensor(nlist_data.jlist, nnei + 1).to(torch::kInt64).to(device)), and
on subsequent steps reuse the cached member instead of recreating it; ensure the
member is cleared/updated when nlist_data.jlist actually changes and that usages
of firstneigh_tensor in functions downstream reference the member variable.

296-321: Optional: collapse the two mapping_tensor branches.

The only difference between the lmp_list.mapping and identity branches is how mapping[ii] is populated; the from_blob → clone → to(device) → assign to mapping_tensor tail is identical. Factoring out the tail would reduce duplication and the risk of future drift between the two paths.

♻️ Proposed

-    if (lmp_list.mapping) {
-      std::vector<std::int64_t> mapping(nall_real);
-      for (int ii = 0; ii < nall_real; ii++) {
-        mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]];
-      }
-      mapping_tensor =
-          torch::from_blob(mapping.data(), {1, nall_real}, int_option)
-              .clone()
-              .to(device);
-    } else {
-      std::vector<std::int64_t> mapping(nall_real);
-      for (int ii = 0; ii < nall_real; ii++) {
-        mapping[ii] = ii;
-      }
-      mapping_tensor =
-          torch::from_blob(mapping.data(), {1, nall_real}, int_option)
-              .clone()
-              .to(device);
-    }
+    std::vector<std::int64_t> mapping(nall_real);
+    if (lmp_list.mapping) {
+      for (int ii = 0; ii < nall_real; ii++) {
+        mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]];
+      }
+    } else {
+      for (int ii = 0; ii < nall_real; ii++) {
+        mapping[ii] = ii;
+      }
+    }
+    mapping_tensor =
+        torch::from_blob(mapping.data(), {1, nall_real}, int_option)
+            .clone()
+            .to(device);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/src/DeepSpinPTExpt.cc` around lines 296 - 321, Collapse
duplication by building the std::vector<std::int64_t> mapping once and then
doing the torch::from_blob(...).clone().to(device) tail only once; specifically,
in the ago==0 block create mapping sized nall_real, fill it using either the
lmp_list.mapping path (mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]]) or
the identity path (mapping[ii] = ii) based on lmp_list.mapping, then call
torch::from_blob(mapping.data(), {1, nall_real}, int_option).clone().to(device)
to assign mapping_tensor; keep references to nall_real, bkw_map, fwd_map,
int_option and device unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/api_cc/src/DeepSpinPTExpt.cc`:
- Around line 412-418: The atomic-virial capability check (do_atomic_virial)
must be performed before invoking run_model to avoid wasted inference: in the
compute methods that handle atomic output (the branch gated by the atomic flag
in DeepSpinPTExpt::compute and the standalone path around the run_model call),
move the existing do_atomic_virial check to the top of the atomic-capable branch
immediately after you detect atomic==true (i.e., before building input tensors
and calling run_model), and remove the later redundant check that currently runs
after run_model returns; you may also consider performing this validation in
init() if convenient, but the minimal fix is to hoist the check above run_model
in the compute paths referencing run_model and do_atomic_virial.
- Around line 135-151: The loader currently falls back do_atomic_virial and
computes nnei but does not detect older .pt2 graphs that lack the new
format_nlist padding, which can cause silent incorrect inference; update
DeepSpinPTExpt.cc to read a metadata schema/version or explicitly check for a
"format_nlist" flag in metadata during load (before creating/passing raw padded
neighbor lists) and reject the archive with a clear error if the graph was
exported pre-format_nlist (or if metadata version < required_version),
referencing the existing symbols/do_atomic_virial, nnei, and format_nlist to
locate the logic that computes nnei and constructs neighbor lists, so
incompatible .pt2 files are detected and loading aborted rather than proceeding
silently.

---

Nitpick comments:
In `@source/api_cc/src/DeepSpinPTExpt.cc`:
- Around line 322-326: The code currently rebuilds and H2D-copies
firstneigh_tensor each step even when ago != 0; change firstneigh_tensor to be a
persistent member (like mapping_tensor), only recreate and assign it inside the
block where ago == 0 (using createNlistTensor(nlist_data.jlist, nnei +
1).to(torch::kInt64).to(device)), and on subsequent steps reuse the cached
member instead of recreating it; ensure the member is cleared/updated when
nlist_data.jlist actually changes and that usages of firstneigh_tensor in
functions downstream reference the member variable.
- Around line 296-321: Collapse duplication by building the
std::vector<std::int64_t> mapping once and then doing the
torch::from_blob(...).clone().to(device) tail only once; specifically, in the
ago==0 block create mapping sized nall_real, fill it using either the
lmp_list.mapping path (mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]]) or
the identity path (mapping[ii] = ii) based on lmp_list.mapping, then call
torch::from_blob(mapping.data(), {1, nall_real}, int_option).clone().to(device)
to assign mapping_tensor; keep references to nall_real, bkw_map, fwd_map,
int_option and device unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4c50c5df-2c04-49b8-844c-05064e64a26b

📥 Commits

Reviewing files that changed from the base of the PR and between 19272c2 and b7509db.

📒 Files selected for processing (7)

deepmd/pt_expt/utils/serialization.py
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/src/DeepSpinPTExpt.cc
source/api_cc/src/commonPTExpt.h
source/tests/pt_expt/export_helpers.py
source/tests/pt_expt/model/test_ener_model.py
source/tests/pt_expt/model/test_export_pipeline.py

🚧 Files skipped from review as they are similar to previous changes (1)

source/api_cc/src/DeepPotPTExpt.cc

coderabbitai · 2026-04-20T11:58:39Z

+  if (metadata.obj_val.count("do_atomic_virial")) {
+    do_atomic_virial = metadata["do_atomic_virial"].as_bool();
+  } else {
+    // Older models without this field were exported with do_atomic_virial=True
+    do_atomic_virial = true;
+  }
+
+  // Read expected nnei (= sum(sel)) — the .pt2 graph has this dimension static.
+  if (metadata.obj_val.count("nnei")) {
+    nnei = metadata["nnei"].as_int();
+  } else {
+    // Fallback: compute from sel array
+    nnei = 0;
+    for (const auto& v : metadata["sel"].as_array()) {
+      nnei += v.as_int();
+    }
+  }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether any prior .pt2 export path recorded a version/schema field, # and whether the graph already contained format_nlist before this PR. rg -nP --type=py -C3 '\b(schema_version|pt2_version|format_version|pte_version)\b' rg -nP --type=py -C3 '\bformat_nlist\s*\(' rg -nP --type=py -C3 'extra/metadata\.json|metadata\.json'

Repository: deepmodeling/deepmd-kit

Length of output: 25950

🏁 Script executed:

# Check what metadata fields are being collected in serialization sed -n '248,300p' deepmd/pt_expt/utils/serialization.py rg -A 20 '_collect_metadata' deepmd/pt_expt/utils/serialization.py | head -50

Repository: deepmodeling/deepmd-kit

Length of output: 4166

🏁 Script executed:

# Check the DeepSpinPTExpt.cc around the metadata loading section sed -n '120,170p' source/api_cc/src/DeepSpinPTExpt.cc

Repository: deepmodeling/deepmd-kit

Length of output: 1830

🏁 Script executed:

# Look for any version or compatibility checks in the C++ code rg -n 'version|Version|schema|Schema' source/api_cc/src/DeepSpinPTExpt.cc

Repository: deepmodeling/deepmd-kit

Length of output: 49

🏁 Script executed:

# Check if there are any recent changes or git history related to format_nlist in this PR git log --oneline --all -20 -- source/api_cc/src/DeepSpinPTExpt.cc 2>/dev/null || echo "Git history not available"

Repository: deepmodeling/deepmd-kit

Length of output: 208

🏁 Script executed:

# Search for where do_atomic_virial is being set in metadata during export rg -n 'do_atomic_virial' deepmd/pt_expt/ -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 37990

🏁 Script executed:

# Check if do_atomic_virial is being set anywhere in serialization rg -n 'do_atomic_virial' deepmd/ source/ -B 1 -A 1

Repository: deepmodeling/deepmd-kit

Length of output: 50380

🏁 Script executed:

# Check the full context of where metadata is prepared before export in serialization.py sed -n '530,620p' deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 2737

🏁 Script executed:

# Look for nlist handling and format_nlist in the C++ inference code rg -n 'format_nlist|createNlistTensor|buildTypeSortedNlist' source/api_cc/ -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 6902

🏁 Script executed:

# Check git history for when do_atomic_virial was added to metadata cd source/api_cc/src && git log --all --oneline -p -- DeepSpinPTExpt.cc 2>/dev/null | grep -A 5 -B 5 "do_atomic_virial" | head -60 || echo "Git history limited"

Repository: deepmodeling/deepmd-kit

Length of output: 1541

🏁 Script executed:

# Look for any version/compatibility markers in the codebase rg -n 'backward|compat|legacy|version' source/api_cc/src/DeepSpinPTExpt.cc source/api_cc/src/commonPTExpt.h -i -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 49

🏁 Script executed:

# Check if the compiled .pt2 graph definitely includes format_nlist or if it's optional rg -n 'format_nlist' source/api_cc/src/ -A 3 -B 3 | head -100

Repository: deepmodeling/deepmd-kit

Length of output: 4510

🏁 Script executed:

# Check if there's any model version tracking being added rg -n 'format_nlist' deepmd/pt_expt/utils/serialization.py -B 5 -A 5

Repository: deepmodeling/deepmd-kit

Length of output: 564

🏁 Script executed:

# Check when format_nlist was introduced to the pt_expt backend git log --all --oneline --grep="format_nlist" 2>/dev/null | head -10 || echo "Limited git history"

Repository: deepmodeling/deepmd-kit

Length of output: 752

🏁 Script executed:

# Check if there are any tests or documentation about backward compatibility with old .pt2 files rg -n 'backward|old.*model|legacy.*pt2|version.*pt2' --type=py source/ deepmd/ -i | head -20

Repository: deepmodeling/deepmd-kit

Length of output: 1723

🏁 Script executed:

# Check if there's any validation or version checking logic for .pt2 files before loading rg -n 'load.*pt2|read.*pt2|version' source/api_cc/src/DeepSpinPTExpt.cc deepmd/pt_expt/infer/deep_eval.py -B 2 -A 2 | head -50

Repository: deepmodeling/deepmd-kit

Length of output: 1532

🏁 Script executed:

# Check if the trace-time padding ensures format_nlist will be in the compiled graph sed -n '140,160p' deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 984

The default do_atomic_virial=true fallback is correct, but consider adding a version check for old .pt2 files lacking format_nlist.

The do_atomic_virial field is being correctly written to metadata in deepmd/pt_expt/utils/serialization.py (line 541), and the fallback to true for older models is appropriate. However, this PR introduces format_nlist into the compiled .pt2 graph (via trace-time padding at export time). Old .pt2 files exported before this change would not have format_nlist compiled in, yet the C++ runtime in DeepSpinPTExpt.cc (lines 323–326, 602–604) creates raw padded neighbor lists and passes them to the compiled graph assuming format_nlist exists. Loading such legacy models could produce silent incorrect results.

Consider adding a version or schema field to the metadata to distinguish pre- and post-format_nlist .pt2 files, and reject incompatible archives at load time rather than failing silently at inference.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepSpinPTExpt.cc` around lines 135 - 151, The loader currently falls back do_atomic_virial and computes nnei but does not detect older .pt2 graphs that lack the new format_nlist padding, which can cause silent incorrect inference; update DeepSpinPTExpt.cc to read a metadata schema/version or explicitly check for a "format_nlist" flag in metadata during load (before creating/passing raw padded neighbor lists) and reject the archive with a clear error if the graph was exported pre-format_nlist (or if metadata version < required_version), referencing the existing symbols/do_atomic_virial, nnei, and format_nlist to locate the logic that computes nnei and constructs neighbor lists, so incompatible .pt2 files are detected and loading aborted rather than proceeding silently.

Older .pt2 models are handled by fallback defaults: do_atomic_virial defaults to true (matching the old export behavior) and nnei defaults to sum(sel) when the metadata field is absent. There is no versioning scheme in the .pt2 format to do more than this.

@wanghan-iapcm, understood — the fallback defaults cover the older model behavior as closely as the format allows. Accepting this as the intended backward-compatibility strategy.

(˳˘ ɜ˘)˳ ♡

✏️ Learnings added

Learnt from: wanghan-iapcm Repo: deepmodeling/deepmd-kit PR: 5407 File: source/api_cc/src/DeepSpinPTExpt.cc:135-151 Timestamp: 2026-04-20T13:36:43.594Z Learning: In `deepmd-kit`, the `.pt2` format has no versioning scheme. Backward compatibility for older `.pt2` models in `DeepSpinPTExpt.cc` and `DeepPotPTExpt.cc` is handled purely via metadata fallback defaults: `do_atomic_virial` defaults to `true` (matching old export behavior) and `nnei` defaults to `sum(sel)` when the `nnei` metadata field is absent. No version/schema field should be expected in `.pt2` metadata.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

The .pt2 model's compiled format_nlist expects n_nnei > sum(sel) to trigger the sort branch. Pad the nlist with one extra -1 column before calling the AOTInductor runner.

coderabbitai

🧹 Nitpick comments (1)

deepmd/pt_expt/infer/deep_eval.py (1)

762-774: Deduplicate the nlist padding block.

The same 13-line padding snippet is repeated verbatim in _eval_model (Line 762-774) and _eval_model_spin (Line 914-926). Consider extracting a small helper (e.g. _pad_nlist_for_pt2(nlist_t)) to keep the two call sites in sync and document the "n_nnei > nnei triggers format_nlist sort branch" invariant in one place.

♻️ Proposed refactor

+    `@staticmethod`
+    def _pad_nlist_for_pt2(nlist_t: torch.Tensor) -> torch.Tensor:
+        """Pad nlist with one extra -1 column so n_nnei > nnei, ensuring
+        format_nlist's compiled sort branch executes in the .pt2 model."""
+        pad = -torch.ones(
+            (*nlist_t.shape[:2], 1), dtype=nlist_t.dtype, device=nlist_t.device
+        )
+        return torch.cat([nlist_t, pad], dim=-1)

Then at both call sites:

-            nlist_t = torch.cat(
-                [
-                    nlist_t,
-                    -torch.ones(
-                        (*nlist_t.shape[:2], 1),
-                        dtype=nlist_t.dtype,
-                        device=nlist_t.device,
-                    ),
-                ],
-                dim=-1,
-            )
+            nlist_t = self._pad_nlist_for_pt2(nlist_t)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 762 - 774, Duplicate 13-line
padding of nlist_t in _eval_model and _eval_model_spin should be extracted to a
single helper to avoid drift: create a small function (e.g.
_pad_nlist_for_pt2(nlist_t)) that performs the torch.cat(...) padding of an
extra -1 column and document the invariant "n_nnei > nnei triggers format_nlist
sort branch" in that helper's docstring; then replace the inlined padding blocks
in both _eval_model and _eval_model_spin with calls to
_pad_nlist_for_pt2(nlist_t) so both call sites share the same implementation and
comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 762-774: Duplicate 13-line padding of nlist_t in _eval_model and
_eval_model_spin should be extracted to a single helper to avoid drift: create a
small function (e.g. _pad_nlist_for_pt2(nlist_t)) that performs the
torch.cat(...) padding of an extra -1 column and document the invariant "n_nnei
> nnei triggers format_nlist sort branch" in that helper's docstring; then
replace the inlined padding blocks in both _eval_model and _eval_model_spin with
calls to _pad_nlist_for_pt2(nlist_t) so both call sites share the same
implementation and comment.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 836fbd50-361a-4482-88ee-8625bd33bb92

📥 Commits

Reviewing files that changed from the base of the PR and between b7509db and eec2528.

📒 Files selected for processing (1)

deepmd/pt_expt/infer/deep_eval.py

…orted --atomic-virial Move the do_atomic_virial check before run_model in both DeepPotPTExpt and DeepSpinPTExpt so we fail fast without wasting GPU computation. Also reject --atomic-virial flag in convert_backend when the output backend doesn't support it.

Move nlist padding (+1 column of -1s) inside the `fn` closure in both `make_model.forward_common_lower_exportable` and `SpinModel.forward_common_lower_exportable`, making it part of the traced graph. This fixes proxy tensor shape mismatches from make_fx and removes the need for external padding in deep_eval.py. Also apply `_strip_shape_assertions` unconditionally (not just spin models) to remove spurious torch.export guards like Ne(nnei, sum(sel)). Export tests that verify atomic virial now pass `do_atomic_virial=True` to `deserialize_to_file` so the exported model includes the correction.

The test_models.py tests compare per-atom virial against reference values that include the atomic virial correction. Since convert_backend now defaults to atomic_virial=False for performance, test model generation must explicitly request atomic_virial=True.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 188-210: The dynamic nnei lower bound is incorrectly forced to 2;
update the min for the nnei dimension from max(2, model_nnei) to max(1,
model_nnei) inside _build_dynamic_shapes (the code that constructs nnei_dim for
torch.export) so single-neighbor models are allowed; also update the
corresponding metadata that records the nnei minimum (the place that currently
sets the metadata nnei bound to 2) and any call sites or uses that assume the
hard-coded 2 (including where _build_dynamic_shapes is invoked) so they use the
new max(1, model_nnei) logic consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0be31bce-b291-4412-90b0-dd19b2cabb28

📥 Commits

Reviewing files that changed from the base of the PR and between 217a587 and 711a1f4.

📒 Files selected for processing (6)

deepmd/pt_expt/model/make_model.py
deepmd/pt_expt/model/spin_model.py
deepmd/pt_expt/utils/serialization.py
source/tests/infer/case.py
source/tests/pt_expt/export_helpers.py
source/tests/pt_expt/infer/test_deep_eval.py

✅ Files skipped from review due to trivial changes (1)

source/tests/pt_expt/export_helpers.py

coderabbitai · 2026-04-21T02:52:08Z

+    model_nnei: int = 1,
 ) -> tuple:
    """Build dynamic shape specifications for torch.export.

-    Marks nframes, nloc and nall as dynamic dimensions so the exported
-    program handles arbitrary frame and atom counts.
+    Marks nframes, nloc, nall and nnei as dynamic dimensions so the exported
+    program handles arbitrary frame, atom and neighbor counts.

    Parameters
    ----------
    *sample_inputs : torch.Tensor | None
        Sample inputs: either 6 tensors (non-spin) or 7 tensors (spin).
    has_spin : bool
        Whether the inputs include an extended_spin tensor.
+    model_nnei : int
+        The model's sum(sel).  Used as the min for the dynamic nnei dim.
    Returns a tuple (not dict) to match positional args of the make_fx
    traced module, whose arg names may have suffixes like ``_1``.
    """
    nframes_dim = torch.export.Dim("nframes", min=1)
    nall_dim = torch.export.Dim("nall", min=1)
    nloc_dim = torch.export.Dim("nloc", min=1)
+    nnei_dim = torch.export.Dim("nnei", min=max(2, model_nnei))



⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's read the serialization file to see the current state wc -l deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 108

🏁 Script executed:

# Read the area around line 209 to understand the context sed -n '185,215p' deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 1252

🏁 Script executed:

# Also check the other locations mentioned: 275-280 and 509-511 sed -n '275,280p' deepmd/pt_expt/utils/serialization.py sed -n '509,511p' deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 395

🏁 Script executed:

# Search for how model_nnei and sel are used throughout the file rg "model_nnei|sum.*sel" deepmd/pt_expt/utils/serialization.py -n -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 2246

🏁 Script executed:

# Check if there are any tests related to single-neighbor models or nnei constraints fd "test.*serialization|serialization.*test" --type f -e py | head -20

Repository: deepmodeling/deepmd-kit

Length of output: 49

🏁 Script executed:

# Search for documentation or comments about the nnei constraint rg "nnei.*1|single.*neighbor|sel.*==.*1" deepmd/ --type py -B 2 -A 2 | head -50

Repository: deepmodeling/deepmd-kit

Length of output: 3272

🏁 Script executed:

# Search for test cases or usage of models with sel=1 or single-neighbor rg "sel.*1|single.*neighbor" deepmd/pt_expt/ --type py -l | head -10

Repository: deepmodeling/deepmd-kit

Length of output: 409

🏁 Script executed:

# Search for any error messages or issues related to nnei in test files fd "test.*\.py" deepmd/pt_expt/ -type f | xargs grep -l "nnei\|export" | head -5

Repository: deepmodeling/deepmd-kit

Length of output: 237

🏁 Script executed:

# Check if there are any existing test models with sum(sel)==1 find deepmd -name "*.py" -type f | xargs grep -l "get_sel\|sum.*sel" | head -10

Repository: deepmodeling/deepmd-kit

Length of output: 536

🏁 Script executed:

# Search for PyTorch torch.export.Dim documentation or constraints in comments rg "torch\.export\.Dim|torch\.export" deepmd/ --type py -B 3 -A 3 | head -80

Repository: deepmodeling/deepmd-kit

Length of output: 6144

🏁 Script executed:

# Look for any GitHub issues or TODO comments related to this rg "TODO|FIXME|BUG|issue" deepmd/pt_expt/utils/serialization.py -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 239

🏁 Script executed:

# Search for test files related to export or serialization find deepmd -path "*/test*" -name "*.py" -type f | xargs grep -l "export\|serialization" 2>/dev/null | head -10

Repository: deepmodeling/deepmd-kit

Length of output: 49

🏁 Script executed:

# Check if there are any sel configurations in test data or models rg "sel\s*[=:].*\[1\]|sum.*sel.*==.*1" deepmd/ --type py | head -20

Repository: deepmodeling/deepmd-kit

Length of output: 314

🏁 Script executed:

# Look for descriptor initialization with single-neighbor sel rg "sel\s*=\s*\[1\]" deepmd/ --type py

Repository: deepmodeling/deepmd-kit

Length of output: 49

🏁 Script executed:

# Check the forward_common_lower_exportable function mentioned in comments rg "forward_common_lower_exportable" deepmd/pt_expt/ --type py -B 5 -A 10 | head -60

Repository: deepmodeling/deepmd-kit

Length of output: 4520

🏁 Script executed:

# Verify the padding logic more carefully around line 146-150 sed -n '140,160p' deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 916

Allow single-neighbor models through the dynamic nnei bound.

Line 209 enforces nnei >= 2 via min=max(2, model_nnei), but the default parameter (line 188) and padding logic (lines 146–150) both support nnei >= 1. When sum(model.get_sel()) == 1, the metadata correctly records nnei = 1 (line 279), and the C++/runtime side can legitimately pass an input with nnei = 1. However, the exported program rejects it before padding can add its internal column.

The compiled graph handles any nnei >= sum(sel) correctly (as stated in the code comments). Change the bound to min=max(1, model_nnei).

Required change

- nnei_dim = torch.export.Dim("nnei", min=max(2, model_nnei)) + nnei_dim = torch.export.Dim("nnei", min=max(1, model_nnei))

This change applies to all three locations: line 209 in _build_dynamic_shapes, which is called at line 510, and the associated metadata at lines 275–280.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model_nnei: int = 1,

) -> tuple:

"""Build dynamic shape specifications for torch.export.

Marks nframes, nloc and nall as dynamic dimensions so the exported

program handles arbitrary frame and atom counts.

Marks nframes, nloc, nall and nnei as dynamic dimensions so the exported

program handles arbitrary frame, atom and neighbor counts.

Parameters

----------

*sample_inputs : torch.Tensor | None

Sample inputs: either 6 tensors (non-spin) or 7 tensors (spin).

has_spin : bool

Whether the inputs include an extended_spin tensor.

model_nnei : int

The model's sum(sel). Used as the min for the dynamic nnei dim.

Returns a tuple (not dict) to match positional args of the make_fx

traced module, whose arg names may have suffixes like ``_1``.

"""

nframes_dim = torch.export.Dim("nframes", min=1)

nall_dim = torch.export.Dim("nall", min=1)

nloc_dim = torch.export.Dim("nloc", min=1)

nnei_dim = torch.export.Dim("nnei", min=max(2, model_nnei))

model_nnei: int = 1,

) -> tuple:

"""Build dynamic shape specifications for torch.export.

Marks nframes, nloc, nall and nnei as dynamic dimensions so the exported

program handles arbitrary frame, atom and neighbor counts.

Parameters

----------

*sample_inputs : torch.Tensor | None

Sample inputs: either 6 tensors (non-spin) or 7 tensors (spin).

has_spin : bool

Whether the inputs include an extended_spin tensor.

model_nnei : int

The model's sum(sel). Used as the min for the dynamic nnei dim.

Returns a tuple (not dict) to match positional args of the make_fx

traced module, whose arg names may have suffixes like ``_1``.

"""

nframes_dim = torch.export.Dim("nframes", min=1)

nall_dim = torch.export.Dim("nall", min=1)

nloc_dim = torch.export.Dim("nloc", min=1)

nnei_dim = torch.export.Dim("nnei", min=max(1, model_nnei))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/serialization.py` around lines 188 - 210, The dynamic nnei lower bound is incorrectly forced to 2; update the min for the nnei dimension from max(2, model_nnei) to max(1, model_nnei) inside _build_dynamic_shapes (the code that constructs nnei_dim for torch.export) so single-neighbor models are allowed; also update the corresponding metadata that records the nnei minimum (the place that currently sets the metadata nnei bound to 2) and any call sites or uses that assume the hard-coded 2 (including where _build_dynamic_shapes is invoked) so they use the new max(1, model_nnei) logic consistently.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 711a1f4ce3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T02:53:51Z

    model_file: str,
    data: dict,
    model_json_override: dict | None = None,
+    do_atomic_virial: bool = False,


Preserve atomic virial output in default pt_expt exports

Changing deserialize_to_file to default do_atomic_virial=False makes every caller that does not explicitly override it emit .pte/.pt2 models without energy_derv_c, but pt_expt entry points like freeze/compress still call this API with defaults; this regresses existing workflows that relied on per-atom virial and now fail at runtime when atomic=true is requested (the C++ path throws “Atomic virial is not available…”). Please keep the previous behavior for those entry points or add/pass through an explicit atomic-virial option there as well.

Useful? React with 👍 / 👎.

Han Wang added 30 commits April 16, 2026 00:51

feat(pt_expt): multi-task training support

512eeb6

fix(dpmodel): wrap fparam/aparam reshape with descriptive ValueError

9f4d232

The reshape-first approach changed the error from a custom ValueError to a generic reshape error, breaking test_self_exception assertions.

fix(pt_expt): access unwrapped module in _compile_model for DDP compat

f3f5474

Under DDP self.wrapper is DistributedDataParallel which has no .model attribute. Use .module to access the underlying ModelWrapper.

test(pt_expt): add DDP + torch.compile training tests

665b85a

Cover the _compile_model DDP unwrap fix with single-task and multi-task tests that enable_compile=True under 2-rank gloo DDP.

fix(pt): wrap fparam/aparam reshape with descriptive ValueError

c2efbf1

Match the dpmodel try/except pattern so shape mismatches produce a clear error instead of a raw RuntimeError from torch.view.

refactor(dpmodel): remove unused get_numb_attn_layer API

23eb6dd

The compile guard that used this method was removed in 6d39ddf.

fix(test): use real path for PT water data, remove unused API

bacd312

Use ``pt/model/water`` instead of the ``pt/water`` symlink in test_fitting_stat.py to avoid FileNotFoundError on CI. Remove unused ``get_numb_attn_layer()`` from DescrptDPA1.

fix(pt_expt): disable DDPOptimizer to prevent compiled graph splitting

fb25ccb

DDPOptimizer splits the inner compiled graph at bucket boundaries, producing subgraph outputs with symbolic integers that crash AOT Autograd (pytorch/pytorch#134182).

fix(test): add .cpu() before .numpy() for GPU-compatible activation t…

479900d

…ests

fix(test): correct freeze test docstrings to match dpa3 guard

975db17

fix(test): backup/restore fparam.npy in TestFparam instead of deleting

28fbcac

TestFparam.tearDown deleted the committed fparam.npy, breaking other tests (e.g. test_fitting_stat) running later in the same CI shard. Also revert the unnecessary .resolve() workaround in test_fitting_stat.

fix(test): skip DDP tests when NCCL is selected with fewer than 2 GPUs

fbb361a

NCCL rejects two ranks sharing the same GPU device, causing all DDP tests to fail on single-GPU CI runners. Skip the entire module when the backend is NCCL and device_count < 2.

wanghan-iapcm requested review from iProzd and njzjz April 20, 2026 08:24

dosubot bot added the enhancement label Apr 20, 2026

github-actions bot added Python C++ labels Apr 20, 2026

Merge upstream/master into perf-pt-expt-pt2-cpp

19272c2

chatgpt-codex-connector bot reviewed Apr 20, 2026

View reviewed changes

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Comment thread deepmd/entrypoints/convert_backend.py

Comment thread source/api_cc/src/commonPTExpt.h Outdated

github-advanced-security AI found potential problems Apr 20, 2026

View reviewed changes

Comment thread source/tests/pt_expt/test_training.py

"seed": 1,

}

_DESCRIPTOR_DPA1_WITH_ATTN = {

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

fix(pt2): pad nlist in Python eval path for dynamic nnei

eec2528

The .pt2 model's compiled format_nlist expects n_nnei > sum(sel) to trigger the sort branch. Pad the nlist with one extra -1 column before calling the AOTInductor runner.

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

wanghan-iapcm marked this pull request as draft April 20, 2026 14:02

Han Wang added 2 commits April 21, 2026 06:50

wanghan-iapcm marked this pull request as ready for review April 21, 2026 02:44

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Apr 21, 2026

github-actions bot removed the Test CUDA Trigger test CUDA workflow label Apr 21, 2026

coderabbitai bot reviewed Apr 21, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 21, 2026

View reviewed changes

Conversation

wanghan-iapcm commented Apr 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (V100-SXM2-16GB, 192-atom water, LAMMPS MD)

Before this PR

After this PR

DPA1 L0 (se_atten nlayer=0)

DPA1 L2 (se_atten nlayer=2)

DPA2 (repinit + repformer)

Test plan

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

wanghan-iapcm commented Apr 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 20, 2026 •

edited

Loading

codecov bot commented Apr 20, 2026 •

edited

Loading

coderabbitai bot Apr 20, 2026 •

edited

Loading