add DeePMD property tools by zhaiwenxi · Pull Request #5471 · deepmodeling/deepmd-kit

zhaiwenxi · 2026-05-28T05:11:21Z

Summary

Add deepmd_property_tools for molecular property training and prediction.
Add modern pyproject.toml packaging with a CLI entry point.
Add pytest-based unit tests and GitHub Actions workflow.
Add small demo DATA files for examples.

Tests

python -m pytest tests -v

Summary by CodeRabbit

New Features
- Added property prediction and training tools for molecular property modeling via Python API and command-line interface.
- Introduced train and predict CLI commands for workflow automation.
- Added support for molecular structure files (MOL format) in training and prediction pipelines.
- Included comprehensive documentation on DPA3 fine-tuning hyperparameters and package usage.
Tests
- Added unit test suite covering CLI, training, prediction, and data handling workflows.

for more information, see https://pre-commit.ci

coderabbitai · 2026-05-28T05:15:30Z

📝 Walkthrough

Walkthrough

This PR adds deepmd_property_tools, a new package providing end-to-end workflows for training and inference on molecular property prediction. It includes data preparation from CSV+MOL or direct coordinates, DeepMD dataset generation, fine-tuning orchestration, inference pipelines, CLI interface, comprehensive tests, and example scripts.

Changes

Property Tools Training and Inference

Layer / File(s)	Summary
MOL file parsing and record building `deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py`, `tests/test_mol.py`	Element lookup, property value parsing, MOL coordinate extraction with fixed-width and whitespace-delimited format support, record builders for CSV/MOL and direct data with overlap detection and zero-filtering.
Data conversion and DeepMD frame building `deepmd/deepmd_property_tools/deepmd_property_tools/data/converter.py`, `data/__init__.py`	Converts parsed records into `dpdata.LabeledSystem` frames, writes `train/valid` mixed-npy directories, generates DeepMD input JSON, tracks failed rows, and returns comprehensive conversion metadata.
Configuration loading and defaults `deepmd/deepmd_property_tools/deepmd_property_tools/config/config_handler.py`, `config/default.json`, `config/__init__.py`	JSON configuration persistence with deep recursive merging, default DPA3 model/training/loss settings for property prediction fine-tuning.
PropertyTrain high-level interface `deepmd/deepmd_property_tools/deepmd_property_tools/train.py`	Validates training arguments, derives step counts from epochs, resolves finetune weights via `WeightHub`, merges batch/metric configurations, orchestrates `DataHub` + `Trainer` pipeline.
Trainer subprocess and checkpoint management `deepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.py`, `tasks/__init__.py`, `tests/test_trainer.py`	Executes DeepMD training (single-process or `torchrun` multi-process), handles optional model freezing, discovers and selects newest checkpoints by modification time.
PropertyPredict initialization and config resolution `deepmd/deepmd_property_tools/deepmd_property_tools/predict.py`	Resolves model paths (preferring frozen models), loads `type_map` and `property_name` from `property_tools_config.json`, validates required parameters.
Predictor model evaluation and output `deepmd/deepmd_property_tools/deepmd_property_tools/predictor.py`, `tests/test_predict.py`	Standardizes atoms/coordinates into padded `float32`/`int32` tensors, evaluates `PropertyModel`, writes predictions to CSV with auto-incremented run indices and multi-output support.
PropertyModel and DeepProperty wrapper `deepmd/deepmd_property_tools/deepmd_property_tools/models/property_model.py`, `models/__init__.py`	Lightweight wrapper around DeepMD `DeepProperty` with `no_jit=True`, exposing unified `eval()` interface.
DataHub train/inference mode dispatcher `deepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.py`	Unified entry point that branches to training (via `prepare_property_data`) or inference (`predict_records_from_data`) based on `is_train` flag.
CLI entrypoint and utility modules `deepmd/deepmd_property_tools/deepmd_property_tools/cli.py`, `utils/`, `tests/test_cli.py`	Argparse-based CLI with `train` and `predict` subcommands; logging setup, metrics (MAE/RMSE), and directory utilities.
WeightHub pretrained model resolution `deepmd/deepmd_property_tools/deepmd_property_tools/weights/weighthub.py`, `weights/__init__.py`	Registry-based model name resolution supporting local paths and built-in model lookup with caching.
Package initialization and build config `deepmd/deepmd_property_tools/deepmd_property_tools/__init__.py`, `pyproject.toml`, `MANIFEST.in`	Main package re-exports, PEP 621 metadata with `deepmd-kit[torch]==3.1.3` dependency, console script entry, JSON data inclusion.
Unit tests for core components `deepmd/deepmd_property_tools/tests/test_*.py`	Test coverage for CLI argument handling, config deep-merge, MOL parsing robustness, checkpoint selection, subprocess command generation, and training parameter validation.
Example training and prediction scripts `deepmd/deepmd_property_tools/train_property_20.py`, `predict_property_20.py`	Runnable end-to-end workflows demonstrating DPA3 fine-tuning with learning-rate scheduling and subsequent inference on test data.
Molecular structure test data (40 MOL files) `deepmd/deepmd_property_tools/DATA/mol_convert/id*.mol`	RDKit V2000 MOL format structures (17–57 atoms) with 3D coordinates and formal charges for validation and example workflows.
Package README and finetune hyperparameter guide `deepmd/deepmd_property_tools/README.md`, `DPA3_finetune_hyperparameters.md`	User documentation covering installation, Python/CLI usage, data format expectations, and detailed guidance on inheriting vs customizing DPA3 parameters during fine-tuning.
GitHub Actions test workflow `.github/workflows/property_tools_tests.yml`	CI pipeline running pytest on `push`/`pull_request` for changes under `deepmd/deepmd_property_tools/` with Python 3.10 and dependencies.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

Examples

Suggested reviewers

njzjz
iProzd

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.94% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add DeePMD property tools' directly and clearly describes the main change: the addition of a new property tools package to the DeePMD project.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

deepmd/deepmd_property_tools/README.md (1)

15-20: ⚡ Quick win

Add a single-test pytest example for faster local iteration.

This section currently only shows running the full test set. Please also document a single-test invocation pattern so contributors follow the repo’s preferred quick loop.

As per coding guidelines, "**/tests/**/*.py: Use pytest for testing single test cases ... instead of full test suite (60+ minutes)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/deepmd_property_tools/README.md` around lines 15 - 20, Update the "For
local development with tests" section in README.md to include a quick
single-test invocation example for faster iteration: add a short sentence
showing how to run an individual test file or a single test case using pytest by
specifying the test file path and optional test nodeid (using pytest's ::
notation) or by using -k to filter, and recommend using -q for concise output;
reference the repository's test pattern "**/tests/**/*.py" and place this
example immediately after the existing full-suite commands.

deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py (1)

264-281: ⚡ Quick win

Add strict=True to zip() for consistency with the length check.

The length validation on line 272 ensures matching lengths, but adding strict=True provides defense-in-depth and aligns with Ruff B905.

♻️ Proposed fix

-    for idx, (symbols, coords, target) in enumerate(zip(atoms, coordinates, targets)):
+    for idx, (symbols, coords, target) in enumerate(zip(atoms, coordinates, targets, strict=True)):

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py` around lines
264 - 281, In records_from_direct_data, the zip over atoms, coordinates, and
targets should use strict=True to match the earlier length validation and
satisfy Ruff B905; update the enumerate(zip(atoms, coordinates, targets)) call
in the records_from_direct_data function to pass strict=True (i.e.,
enumerate(zip(atoms, coordinates, targets, strict=True))) so any length mismatch
raises immediately.

.github/workflows/property_tools_tests.yml (1)

18-18: ⚡ Quick win

Pin GitHub Actions to commit SHAs for supply-chain security.

The workflow uses unpinned action references (actions/checkout@v4, actions/setup-python@v5) which reference mutable tags. Pinning to specific commit SHAs prevents potential supply-chain attacks if tags are moved or compromised.

🔒 Pin actions to commit hashes

     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1

       - name: Set up Python
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c  # v5.0.0
         with:

Also consider adding persist-credentials: false to the checkout step to prevent credential leakage:

-      - uses: actions/checkout@v4
+      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
+        with:
+          persist-credentials: false

Also applies to: 21-21

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/property_tools_tests.yml at line 18, Replace mutable
action tags with pinned commit SHAs and disable credential persistence: update
the checkout usage (currently referenced as actions/checkout@v4) to the
corresponding actions/checkout@<commit-sha> and update setup-python (currently
actions/setup-python@v5) to actions/setup-python@<commit-sha>; also add
persist-credentials: false to the checkout step to avoid credential leakage.
Locate the workflow steps that reference actions/checkout and
actions/setup-python and substitute the tags with the official commit SHAs from
the actions repositories, then add the persist-credentials: false key under the
checkout step.

deepmd/deepmd_property_tools/tests/test_predict.py (1)

60-65: 💤 Low value

Consider a more reliable approach for timestamp ordering.

Using time.sleep(0.01) to ensure file modification time differences can be fragile on filesystems with coarse timestamp resolution or under heavy system load.

Alternative approaches

Consider explicitly setting file modification times using os.utime() or Path.touch() with specific timestamps:

 def test_predict_directory_uses_latest_checkpoint(tmp_path: Path) -> None:
     old_checkpoint = tmp_path / "model.ckpt-1.pt"
     old_checkpoint.write_text("old", encoding="utf-8")
-    time.sleep(0.01)
     latest_checkpoint = tmp_path / "model.ckpt-2.pt"
     latest_checkpoint.write_text("new", encoding="utf-8")
+    # Ensure old_checkpoint has earlier mtime
+    import os
+    os.utime(old_checkpoint, (0, 0))
     (tmp_path / "property_tools_config.json").write_text(

Or rely on checkpoint numbering logic rather than file timestamps if the implementation supports it.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/deepmd_property_tools/tests/test_predict.py` around lines 60 - 65, The
test test_predict_directory_uses_latest_checkpoint is relying on time.sleep to
order old_checkpoint and latest_checkpoint which is fragile; replace the
sleep-based approach by explicitly setting file modification times (e.g., use
os.utime or Path.touch with a specified times argument) on old_checkpoint and
latest_checkpoint (or alternatively rely on checkpoint numbering logic if the
code under test uses filename semantics) so the test deterministically makes
model.ckpt-2.pt the latest; update the setup that writes "old"/"new" to ensure
the timestamps are set explicitly rather than waiting.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/deepmd_property_tools/DATA/mol_convert/id5.mol`:
- Around line 1-121: The file id5.mol contains the same molecule data as id4.mol
(same atoms/bonds/charges with only the title "id_5" changed); remove or replace
id5.mol with the intended unique molecule data (or update your dataset
generation/deduplication logic so files like id5.mol are not created as
duplicates of id4.mol). Search for the identifier/title "id_5" and the matching
atom/bond/charge blocks (e.g., the M  CHG block and the same bond list) to
locate the duplicate, then either swap in the correct .mol content or
deduplicate at generation time to avoid biased/duplicated examples.

In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.py`:
- Around line 47-62: When is_train is True, ensure property_col is a non-None
str before calling prepare_property_data: in the DataHub constructor (or the
method where this code lives) validate that property_col is not None and raise a
clear ValueError if it is, or set a sensible default string (e.g., "Property")
before passing it to prepare_property_data; reference the symbols property_col,
is_train, and prepare_property_data to locate and fix the call site so
prepare_property_data always receives a str.

In `@deepmd/deepmd_property_tools/deepmd_property_tools/predictor.py`:
- Around line 55-67: The zip(...) usage silently truncates mismatched sequences
(e.g., atoms vs coordinates, rows vs y_pred, predict_cols vs pred) causing data
loss; before each zip, validate lengths explicitly and raise a clear ValueError
if they differ (e.g., check len(atoms) == len(coordinates), len(rows) ==
len(y_pred), len(predict_cols) == len(pred)), include identifying context in the
message, then proceed with the existing loops that populate coords/atom_types
and consume y_pred/pred so no samples/columns are silently dropped; apply these
checks around the blocks that reference atoms/coordinates and
rows/y_pred/predict_cols/pred.

In `@deepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.py`:
- Around line 28-30: The code sets self.save_path (and other path attributes
like self.input_path) as Path(save_path) but later run() changes the working
directory, causing relative paths to resolve incorrectly; change the constructor
to immediately resolve all incoming path attributes (e.g., call
Path(save_path).resolve() and Path(input_path).resolve() for any similar fields
set around lines where save_path/input_path are assigned) so that self.save_path
and related attributes are absolute before run() mutates cwd, and ensure any
other places assigning paths (the blocks around lines 38-39, 44-60, 66-77, 88)
are updated to use resolved Paths as well.

In `@deepmd/deepmd_property_tools/predict_property_20.py`:
- Around line 13-17: The code currently hardcodes MODEL_PATH = ROOT /
"exp_property_20" / "model.ckpt-10.pt", which couples the script to one
checkpoint; change it to accept an experiment directory (or path) and let
PropertyPredict or its loader resolve the correct file (e.g., frozen_model.pth
or latest checkpoint). Modify the usage of MODEL_PATH and any instantiation of
PropertyPredict to accept a directory/path variable (e.g., exp_dir or model_dir)
instead of a specific filename, and update the existence check to verify the
directory and delegate selecting the actual checkpoint file to PropertyPredict
(or a helper like PropertyPredict.load_checkpoint) so the script works with
frozen_model.pth or the latest checkpoint automatically.

In `@deepmd/deepmd_property_tools/train_property_20.py`:
- Around line 38-40: The finetuning invocation sets finetune=PRETRAINED_MODEL
while disabling the pretrained alignment via use_pretrain_script=False, which
can produce descriptor/config mismatches; update the call that includes
finetune, PRETRAINED_MODEL and input_updates to enable the pretrained-structure
alignment by setting use_pretrain_script=True (or removing the override so the
default alignment runs) so the model descriptors/config match the pretrained
weights during load.

---

Nitpick comments:
In @.github/workflows/property_tools_tests.yml:
- Line 18: Replace mutable action tags with pinned commit SHAs and disable
credential persistence: update the checkout usage (currently referenced as
actions/checkout@v4) to the corresponding actions/checkout@<commit-sha> and
update setup-python (currently actions/setup-python@v5) to
actions/setup-python@<commit-sha>; also add persist-credentials: false to the
checkout step to avoid credential leakage. Locate the workflow steps that
reference actions/checkout and actions/setup-python and substitute the tags with
the official commit SHAs from the actions repositories, then add the
persist-credentials: false key under the checkout step.

In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py`:
- Around line 264-281: In records_from_direct_data, the zip over atoms,
coordinates, and targets should use strict=True to match the earlier length
validation and satisfy Ruff B905; update the enumerate(zip(atoms, coordinates,
targets)) call in the records_from_direct_data function to pass strict=True
(i.e., enumerate(zip(atoms, coordinates, targets, strict=True))) so any length
mismatch raises immediately.

In `@deepmd/deepmd_property_tools/README.md`:
- Around line 15-20: Update the "For local development with tests" section in
README.md to include a quick single-test invocation example for faster
iteration: add a short sentence showing how to run an individual test file or a
single test case using pytest by specifying the test file path and optional test
nodeid (using pytest's :: notation) or by using -k to filter, and recommend
using -q for concise output; reference the repository's test pattern
"**/tests/**/*.py" and place this example immediately after the existing
full-suite commands.

In `@deepmd/deepmd_property_tools/tests/test_predict.py`:
- Around line 60-65: The test test_predict_directory_uses_latest_checkpoint is
relying on time.sleep to order old_checkpoint and latest_checkpoint which is
fragile; replace the sleep-based approach by explicitly setting file
modification times (e.g., use os.utime or Path.touch with a specified times
argument) on old_checkpoint and latest_checkpoint (or alternatively rely on
checkpoint numbering logic if the code under test uses filename semantics) so
the test deterministically makes model.ckpt-2.pt the latest; update the setup
that writes "old"/"new" to ensure the timestamps are set explicitly rather than
waiting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3b5f2302-3bde-4f08-abe0-e6307f9eed60

📥 Commits

Reviewing files that changed from the base of the PR and between a7c597d and db05969.

⛔ Files ignored due to path filters (1)

deepmd/deepmd_property_tools/DATA/dataset_demo.csv is excluded by !**/*.csv

📒 Files selected for processing (75)

.github/workflows/property_tools_tests.yml
deepmd/deepmd_property_tools/DATA/mol_convert/id0.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id1.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id10.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id11.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id12.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id13.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id14.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id15.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id16.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id17.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id18.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id19.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id2.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id20.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id21.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id22.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id23.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id24.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id25.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id26.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id27.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id28.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id29.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id3.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id30.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id31.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id32.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id33.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id34.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id35.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id36.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id37.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id38.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id39.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id4.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id5.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id6.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id7.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id8.mol
deepmd/deepmd_property_tools/DATA/mol_convert/id9.mol
deepmd/deepmd_property_tools/DPA3_finetune_hyperparameters.md
deepmd/deepmd_property_tools/MANIFEST.in
deepmd/deepmd_property_tools/README.md
deepmd/deepmd_property_tools/deepmd_property_tools/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/cli.py
deepmd/deepmd_property_tools/deepmd_property_tools/config/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/config/config_handler.py
deepmd/deepmd_property_tools/deepmd_property_tools/config/default.json
deepmd/deepmd_property_tools/deepmd_property_tools/data/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/data/converter.py
deepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.py
deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py
deepmd/deepmd_property_tools/deepmd_property_tools/models/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/models/property_model.py
deepmd/deepmd_property_tools/deepmd_property_tools/predict.py
deepmd/deepmd_property_tools/deepmd_property_tools/predictor.py
deepmd/deepmd_property_tools/deepmd_property_tools/tasks/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.py
deepmd/deepmd_property_tools/deepmd_property_tools/train.py
deepmd/deepmd_property_tools/deepmd_property_tools/utils/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/utils/base_logger.py
deepmd/deepmd_property_tools/deepmd_property_tools/utils/metrics.py
deepmd/deepmd_property_tools/deepmd_property_tools/utils/util.py
deepmd/deepmd_property_tools/deepmd_property_tools/weights/__init__.py
deepmd/deepmd_property_tools/deepmd_property_tools/weights/weighthub.py
deepmd/deepmd_property_tools/predict_property_20.py
deepmd/deepmd_property_tools/pyproject.toml
deepmd/deepmd_property_tools/tests/test_cli.py
deepmd/deepmd_property_tools/tests/test_config.py
deepmd/deepmd_property_tools/tests/test_mol.py
deepmd/deepmd_property_tools/tests/test_predict.py
deepmd/deepmd_property_tools/tests/test_train.py
deepmd/deepmd_property_tools/tests/test_trainer.py
deepmd/deepmd_property_tools/train_property_20.py

coderabbitai · 2026-05-28T05:26:10Z

+id_5
+     RDKit          2D
+
+ 57 56  0  0  0  0  0  0  0  0999 V2000
+   -2.2500   -1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+   -1.5000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    3.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    3.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    4.5000    2.5981    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    6.0000    2.5981    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    7.5000    2.5981    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    9.0000    2.5981    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    9.7500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    9.7500    3.8971    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    7.5000    4.0981    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    8.7990    4.8481    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    6.3021    5.0008    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    7.5000    1.0981    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    8.7990    0.3481    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    6.3021    0.1953    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    5.0490    0.5490    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    5.0490   -0.9510    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    5.0490   -2.4510    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    5.0490   -3.9510    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    3.7500   -4.7010    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    6.3481   -4.7010    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    3.5490   -2.4510    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    2.7990   -3.7500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    2.6463   -1.2530    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    6.5490   -2.4510    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    7.2990   -3.7500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    7.4518   -1.2530    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    2.4510    2.0490    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    2.4510    3.5490    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    2.4510    5.0490    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
+    2.4510    6.5490    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    1.1519    7.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    3.7500    7.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    3.9510    5.0490    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    4.7010    6.3481    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    4.8537    3.8511    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    0.9510    5.0490    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+    0.2010    6.3481    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    0.0482    3.8511    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    0.0000    1.5000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+   -1.2990    2.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    1.1979    2.4028    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+   -0.0000   -1.5000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
+   -1.2990   -2.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    1.1979   -2.4028    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+   -2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
+    1.6828   -1.4888    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    1.6828    1.4888    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    5.8172    1.1093    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    5.8172    4.0869    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    6.5379   -0.7681    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    3.5602   -0.7681    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    3.9398    3.3662    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+    0.9621    3.3662    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
+  1  2  2  0
+  2  3  1  0
+  3  4  1  0
+  4  5  1  0
+  5  6  1  0
+  6  7  1  0
+  7  8  1  0
+  8  9  1  0
+  9 10  1  0
+ 10 11  1  0
+ 10 12  2  0
+  9 13  1  0
+ 13 14  1  0
+ 13 15  2  0
+  9 16  1  0
+ 16 17  1  0
+ 16 18  2  0
+  6 19  1  0
+ 19 20  1  0
+ 20 21  1  0
+ 21 22  1  0
+ 22 23  1  0
+ 22 24  2  0
+ 21 25  1  0
+ 25 26  1  0
+ 25 27  2  0
+ 21 28  1  0
+ 28 29  1  0
+ 28 30  2  0
+  6 31  1  0
+ 31 32  1  0
+ 32 33  1  0
+ 33 34  1  0
+ 34 35  1  0
+ 34 36  2  0
+ 33 37  1  0
+ 37 38  1  0
+ 37 39  2  0
+ 33 40  1  0
+ 40 41  1  0
+ 40 42  2  0
+  3 43  1  0
+ 43 44  1  0
+ 43 45  2  0
+  3 46  1  0
+ 46 47  1  0
+ 46 48  2  0
+  2 49  1  0
+  4 50  1  0
+  4 51  1  0
+  8 52  1  0
+  8 53  1  0
+ 20 54  1  0
+ 20 55  1  0
+ 32 56  1  0
+ 32 57  1  0
+M  CHG  8   2   1  10   1  11  -1  13   1  14  -1  16   1  17  -1  22   1
+M  CHG  8  23  -1  25   1  26  -1  28   1  29  -1  34   1  35  -1  37   1
+M  CHG  8  38  -1  40   1  41  -1  43   1  44  -1  46   1  47  -1  49  -1
+M  END


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

id5.mol duplicates id4.mol content under a different identifier.

The structure/bond/charge blocks here are effectively identical to deepmd/deepmd_property_tools/DATA/mol_convert/id4.mol (only title differs). If unintentional, this introduces duplicate examples that can bias splits and inflate evaluation confidence. Please replace this file with the intended molecule or deduplicate at dataset generation time.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/DATA/mol_convert/id5.mol` around lines 1 - 121, The file id5.mol contains the same molecule data as id4.mol (same atoms/bonds/charges with only the title "id_5" changed); remove or replace id5.mol with the intended unique molecule data (or update your dataset generation/deduplication logic so files like id5.mol are not created as duplicates of id4.mol). Search for the identifier/title "id_5" and the matching atom/bond/charge blocks (e.g., the M CHG block and the same bond list) to locate the duplicate, then either swap in the correct .mol content or deduplicate at generation time to avoid biased/duplicated examples.

coderabbitai · 2026-05-28T05:26:10Z

+        if is_train:
+            self.result: PropertyDataResult | None = prepare_property_data(
+                data,
+                output_dir=self.save_path / "prepared_data",
+                input_out=self.save_path / "input.json",
+                property_name=property_name,
+                property_col=property_col,
+                train_ratio=train_ratio,
+                mol_dir=mol_dir,
+                mol_template=mol_template,
+                overlap_tol=overlap_tol,
+                seed=seed,
+                overwrite=overwrite,
+                numb_steps=numb_steps,
+                input_updates=input_updates,
+            )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Type mismatch: property_col can be None but prepare_property_data expects str.

When is_train=True, property_col is passed to prepare_property_data which has signature property_col: str = "Property". If DataHub is called directly with is_train=True and property_col=None, this would pass None where a str is expected.

Consider either:

Validating that property_col is not None when is_train=True, or

Changing the type annotation to str with a default value, since the inference path (context snippet from predict.py:67) explicitly passes property_col=None

🛡️ Option 1: Add validation

if is_train: + if property_col is None: + raise ValueError("property_col is required for training") self.result: PropertyDataResult | None = prepare_property_data(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if is_train:

self.result: PropertyDataResult | None = prepare_property_data(

data,

output_dir=self.save_path / "prepared_data",

input_out=self.save_path / "input.json",

property_name=property_name,

property_col=property_col,

train_ratio=train_ratio,

mol_dir=mol_dir,

mol_template=mol_template,

overlap_tol=overlap_tol,

seed=seed,

overwrite=overwrite,

numb_steps=numb_steps,

input_updates=input_updates,

)

if is_train:

if property_col is None:

raise ValueError("property_col is required for training")

self.result: PropertyDataResult | None = prepare_property_data(

data,

output_dir=self.save_path / "prepared_data",

input_out=self.save_path / "input.json",

property_name=property_name,

property_col=property_col,

train_ratio=train_ratio,

mol_dir=mol_dir,

mol_template=mol_template,

overlap_tol=overlap_tol,

seed=seed,

overwrite=overwrite,

numb_steps=numb_steps,

input_updates=input_updates,

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.py` around lines 47 - 62, When is_train is True, ensure property_col is a non-None str before calling prepare_property_data: in the DataHub constructor (or the method where this code lives) validate that property_col is not None and raise a clear ValueError if it is, or set a sensible default string (e.g., "Property") before passing it to prepare_property_data; reference the symbols property_col, is_train, and prepare_property_data to locate and fix the call site so prepare_property_data always receives a str.

coderabbitai · 2026-05-28T05:26:10Z

+        for frame_idx, (symbols, coord) in enumerate(zip(atoms, coordinates)):
+            if coord.shape != (len(symbols), 3):
+                raise ValueError(
+                    f"coordinates shape mismatch at sample {frame_idx}: {coord.shape}"
+                )
+            for atom_idx, symbol in enumerate(symbols):
+                if symbol not in self.type_index:
+                    raise ValueError(
+                        f"Element {symbol!r} is not present in type_map {self.type_map}"
+                    )
+                atom_types[frame_idx, atom_idx] = self.type_index[symbol]
+            coords[frame_idx, : len(symbols), :] = coord
+        return coords, atom_types


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent silent truncation when input lengths diverge.

These zip(...) loops can drop samples/columns without error when lengths mismatch (atoms vs coordinates, rows vs y_pred, or predict_cols vs pred). This can silently corrupt prediction outputs.

💡 Suggested fix

def standardize( self, atoms: list[list[str]], coordinates: list[np.ndarray] ) -> tuple[np.ndarray, np.ndarray]: if not atoms: raise ValueError("No samples to predict") + if len(atoms) != len(coordinates): + raise ValueError( + f"atoms/coordinates length mismatch: {len(atoms)} != {len(coordinates)}" + ) max_natoms = max(len(symbols) for symbols in atoms) coords = np.zeros((len(atoms), max_natoms, 3), dtype=np.float32) atom_types = np.full((len(atoms), max_natoms), -1, dtype=np.int32) - for frame_idx, (symbols, coord) in enumerate(zip(atoms, coordinates)): + for frame_idx, (symbols, coord) in enumerate( + zip(atoms, coordinates, strict=True) + ): if coord.shape != (len(symbols), 3): raise ValueError( f"coordinates shape mismatch at sample {frame_idx}: {coord.shape}" ) @@ with out_path.open("w", encoding="utf-8", newline="") as fp: writer = csv.DictWriter(fp, fieldnames=fieldnames) writer.writeheader() - for row, pred in zip(rows, y_pred): + if len(rows) != len(y_pred): + raise ValueError( + f"rows/prediction length mismatch: {len(rows)} != {len(y_pred)}" + ) + for row, pred in zip(rows, y_pred, strict=True): out_row = dict(row) - for col, value in zip(predict_cols, pred): + for col, value in zip(predict_cols, pred, strict=True): out_row[col] = float(value) writer.writerow(out_row)

Also applies to: 95-99

🧰 Tools

🪛 Ruff (0.15.14)

[warning] 55-55: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/deepmd_property_tools/predictor.py` around lines 55 - 67, The zip(...) usage silently truncates mismatched sequences (e.g., atoms vs coordinates, rows vs y_pred, predict_cols vs pred) causing data loss; before each zip, validate lengths explicitly and raise a clear ValueError if they differ (e.g., check len(atoms) == len(coordinates), len(rows) == len(y_pred), len(predict_cols) == len(pred)), include identifying context in the message, then proceed with the existing loops that populate coords/atom_types and consume y_pred/pred so no samples/columns are silently dropped; apply these checks around the blocks that reference atoms/coordinates and rows/y_pred/predict_cols/pred.

coderabbitai · 2026-05-28T05:26:10Z

+        self.save_path = Path(save_path)
+        self.finetune = finetune
+        self.nproc_per_node = nproc_per_node


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve paths before changing working directory.

save_path/input_path can be relative, but run() changes cwd and still passes paths prefixed with self.save_path. This can mis-target files (e.g., nested .../exp_property/exp_property/out.json) and break default relative-path runs.

💡 Proposed fix

class Trainer: @@ - self.save_path = Path(save_path) + self.save_path = Path(save_path).resolve() @@ def run(self, input_path: str | Path) -> None: - input_path = Path(input_path) + input_path = Path(input_path).resolve() @@ train( input_file=str(input_path), @@ output=str(self.save_path / "out.json"), )

Also applies to: 38-39, 44-60, 66-77, 88-88

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.py` around lines 28 - 30, The code sets self.save_path (and other path attributes like self.input_path) as Path(save_path) but later run() changes the working directory, causing relative paths to resolve incorrectly; change the constructor to immediately resolve all incoming path attributes (e.g., call Path(save_path).resolve() and Path(input_path).resolve() for any similar fields set around lines where save_path/input_path are assigned) so that self.save_path and related attributes are absolute before run() mutates cwd, and ensure any other places assigning paths (the blocks around lines 38-39, 44-60, 66-77, 88) are updated to use resolved Paths as well.

coderabbitai · 2026-05-28T05:26:10Z

+MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt"
+
+if not MODEL_PATH.exists():
+    raise FileNotFoundError(f"Train first; checkpoint not found: {MODEL_PATH}")
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid hardcoding a specific checkpoint filename.

model.ckpt-10.pt tightly couples this script to one training run. Passing the experiment directory is more robust and lets PropertyPredict pick frozen_model.pth or the latest checkpoint automatically.

Suggested patch

-MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt" - -if not MODEL_PATH.exists(): - raise FileNotFoundError(f"Train first; checkpoint not found: {MODEL_PATH}") +MODEL_PATH = ROOT / "exp_property_20" + +if not MODEL_PATH.exists(): + raise FileNotFoundError(f"Train first; model directory not found: {MODEL_PATH}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt"

if not MODEL_PATH.exists():

raise FileNotFoundError(f"Train first; checkpoint not found: {MODEL_PATH}")

MODEL_PATH = ROOT / "exp_property_20"

if not MODEL_PATH.exists():

raise FileNotFoundError(f"Train first; model directory not found: {MODEL_PATH}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/predict_property_20.py` around lines 13 - 17, The code currently hardcodes MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt", which couples the script to one checkpoint; change it to accept an experiment directory (or path) and let PropertyPredict or its loader resolve the correct file (e.g., frozen_model.pth or latest checkpoint). Modify the usage of MODEL_PATH and any instantiation of PropertyPredict to accept a directory/path variable (e.g., exp_dir or model_dir) instead of a specific filename, and update the existence check to verify the directory and delegate selecting the actual checkpoint file to PropertyPredict (or a helper like PropertyPredict.load_checkpoint) so the script works with frozen_model.pth or the latest checkpoint automatically.

coderabbitai · 2026-05-28T05:26:11Z

+    finetune=PRETRAINED_MODEL,
+    use_pretrain_script=False,
+    input_updates={


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enable pretrained-structure alignment when finetuning.

With finetune=PRETRAINED_MODEL, setting use_pretrain_script=False can cause descriptor/config mismatch against pretrained weights and make this demo fail on load.

Suggested patch

- use_pretrain_script=False, + use_pretrain_script=True,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

finetune=PRETRAINED_MODEL,

use_pretrain_script=False,

input_updates={

finetune=PRETRAINED_MODEL,

use_pretrain_script=True,

input_updates={

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/train_property_20.py` around lines 38 - 40, The finetuning invocation sets finetune=PRETRAINED_MODEL while disabling the pretrained alignment via use_pretrain_script=False, which can produce descriptor/config mismatches; update the call that includes finetune, PRETRAINED_MODEL and input_updates to enable the pretrained-structure alignment by setting use_pretrain_script=True (or removing the override so the default alignment runs) so the model descriptors/config match the pretrained weights during load.

zhaiwenxi and others added 2 commits May 27, 2026 16:08

feat: add DeePMD property tools

30351e9

[pre-commit.ci] auto fixes from pre-commit.com hooks

e9fe00f

for more information, see https://pre-commit.ci

dosubot Bot added the new feature label May 28, 2026

github-actions Bot added the Python label May 28, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

db05969

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add DeePMD property tools#5471

add DeePMD property tools#5471
zhaiwenxi wants to merge 3 commits into
deepmodeling:masterfrom
zhaiwenxi:add-property-tools

zhaiwenxi commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhaiwenxi commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhaiwenxi commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading