add DeePMD property tools#5471
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThis PR adds ChangesProperty Tools Training and Inference
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
deepmd/deepmd_property_tools/README.md (1)
15-20: ⚡ Quick winAdd a single-test
pytestexample for faster local iteration.This section currently only shows running the full test set. Please also document a single-test invocation pattern so contributors follow the repo’s preferred quick loop.
As per coding guidelines, "
**/tests/**/*.py: Usepytestfor testing single test cases ... instead of full test suite (60+ minutes)".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/README.md` around lines 15 - 20, Update the "For local development with tests" section in README.md to include a quick single-test invocation example for faster iteration: add a short sentence showing how to run an individual test file or a single test case using pytest by specifying the test file path and optional test nodeid (using pytest's :: notation) or by using -k to filter, and recommend using -q for concise output; reference the repository's test pattern "**/tests/**/*.py" and place this example immediately after the existing full-suite commands.deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py (1)
264-281: ⚡ Quick winAdd
strict=Truetozip()for consistency with the length check.The length validation on line 272 ensures matching lengths, but adding
strict=Trueprovides defense-in-depth and aligns with Ruff B905.♻️ Proposed fix
- for idx, (symbols, coords, target) in enumerate(zip(atoms, coordinates, targets)): + for idx, (symbols, coords, target) in enumerate(zip(atoms, coordinates, targets, strict=True)):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py` around lines 264 - 281, In records_from_direct_data, the zip over atoms, coordinates, and targets should use strict=True to match the earlier length validation and satisfy Ruff B905; update the enumerate(zip(atoms, coordinates, targets)) call in the records_from_direct_data function to pass strict=True (i.e., enumerate(zip(atoms, coordinates, targets, strict=True))) so any length mismatch raises immediately..github/workflows/property_tools_tests.yml (1)
18-18: ⚡ Quick winPin GitHub Actions to commit SHAs for supply-chain security.
The workflow uses unpinned action references (
actions/checkout@v4,actions/setup-python@v5) which reference mutable tags. Pinning to specific commit SHAs prevents potential supply-chain attacks if tags are moved or compromised.🔒 Pin actions to commit hashes
steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 - name: Set up Python - uses: actions/setup-python@v5 + uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0 with:Also consider adding
persist-credentials: falseto the checkout step to prevent credential leakage:- - uses: actions/checkout@v4 + - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 + with: + persist-credentials: falseAlso applies to: 21-21
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/property_tools_tests.yml at line 18, Replace mutable action tags with pinned commit SHAs and disable credential persistence: update the checkout usage (currently referenced as actions/checkout@v4) to the corresponding actions/checkout@<commit-sha> and update setup-python (currently actions/setup-python@v5) to actions/setup-python@<commit-sha>; also add persist-credentials: false to the checkout step to avoid credential leakage. Locate the workflow steps that reference actions/checkout and actions/setup-python and substitute the tags with the official commit SHAs from the actions repositories, then add the persist-credentials: false key under the checkout step.deepmd/deepmd_property_tools/tests/test_predict.py (1)
60-65: 💤 Low valueConsider a more reliable approach for timestamp ordering.
Using
time.sleep(0.01)to ensure file modification time differences can be fragile on filesystems with coarse timestamp resolution or under heavy system load.Alternative approaches
Consider explicitly setting file modification times using
os.utime()orPath.touch()with specific timestamps:def test_predict_directory_uses_latest_checkpoint(tmp_path: Path) -> None: old_checkpoint = tmp_path / "model.ckpt-1.pt" old_checkpoint.write_text("old", encoding="utf-8") - time.sleep(0.01) latest_checkpoint = tmp_path / "model.ckpt-2.pt" latest_checkpoint.write_text("new", encoding="utf-8") + # Ensure old_checkpoint has earlier mtime + import os + os.utime(old_checkpoint, (0, 0)) (tmp_path / "property_tools_config.json").write_text(Or rely on checkpoint numbering logic rather than file timestamps if the implementation supports it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/deepmd_property_tools/tests/test_predict.py` around lines 60 - 65, The test test_predict_directory_uses_latest_checkpoint is relying on time.sleep to order old_checkpoint and latest_checkpoint which is fragile; replace the sleep-based approach by explicitly setting file modification times (e.g., use os.utime or Path.touch with a specified times argument) on old_checkpoint and latest_checkpoint (or alternatively rely on checkpoint numbering logic if the code under test uses filename semantics) so the test deterministically makes model.ckpt-2.pt the latest; update the setup that writes "old"/"new" to ensure the timestamps are set explicitly rather than waiting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/deepmd_property_tools/DATA/mol_convert/id5.mol`:
- Around line 1-121: The file id5.mol contains the same molecule data as id4.mol
(same atoms/bonds/charges with only the title "id_5" changed); remove or replace
id5.mol with the intended unique molecule data (or update your dataset
generation/deduplication logic so files like id5.mol are not created as
duplicates of id4.mol). Search for the identifier/title "id_5" and the matching
atom/bond/charge blocks (e.g., the M CHG block and the same bond list) to
locate the duplicate, then either swap in the correct .mol content or
deduplicate at generation time to avoid biased/duplicated examples.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.py`:
- Around line 47-62: When is_train is True, ensure property_col is a non-None
str before calling prepare_property_data: in the DataHub constructor (or the
method where this code lives) validate that property_col is not None and raise a
clear ValueError if it is, or set a sensible default string (e.g., "Property")
before passing it to prepare_property_data; reference the symbols property_col,
is_train, and prepare_property_data to locate and fix the call site so
prepare_property_data always receives a str.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/predictor.py`:
- Around line 55-67: The zip(...) usage silently truncates mismatched sequences
(e.g., atoms vs coordinates, rows vs y_pred, predict_cols vs pred) causing data
loss; before each zip, validate lengths explicitly and raise a clear ValueError
if they differ (e.g., check len(atoms) == len(coordinates), len(rows) ==
len(y_pred), len(predict_cols) == len(pred)), include identifying context in the
message, then proceed with the existing loops that populate coords/atom_types
and consume y_pred/pred so no samples/columns are silently dropped; apply these
checks around the blocks that reference atoms/coordinates and
rows/y_pred/predict_cols/pred.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.py`:
- Around line 28-30: The code sets self.save_path (and other path attributes
like self.input_path) as Path(save_path) but later run() changes the working
directory, causing relative paths to resolve incorrectly; change the constructor
to immediately resolve all incoming path attributes (e.g., call
Path(save_path).resolve() and Path(input_path).resolve() for any similar fields
set around lines where save_path/input_path are assigned) so that self.save_path
and related attributes are absolute before run() mutates cwd, and ensure any
other places assigning paths (the blocks around lines 38-39, 44-60, 66-77, 88)
are updated to use resolved Paths as well.
In `@deepmd/deepmd_property_tools/predict_property_20.py`:
- Around line 13-17: The code currently hardcodes MODEL_PATH = ROOT /
"exp_property_20" / "model.ckpt-10.pt", which couples the script to one
checkpoint; change it to accept an experiment directory (or path) and let
PropertyPredict or its loader resolve the correct file (e.g., frozen_model.pth
or latest checkpoint). Modify the usage of MODEL_PATH and any instantiation of
PropertyPredict to accept a directory/path variable (e.g., exp_dir or model_dir)
instead of a specific filename, and update the existence check to verify the
directory and delegate selecting the actual checkpoint file to PropertyPredict
(or a helper like PropertyPredict.load_checkpoint) so the script works with
frozen_model.pth or the latest checkpoint automatically.
In `@deepmd/deepmd_property_tools/train_property_20.py`:
- Around line 38-40: The finetuning invocation sets finetune=PRETRAINED_MODEL
while disabling the pretrained alignment via use_pretrain_script=False, which
can produce descriptor/config mismatches; update the call that includes
finetune, PRETRAINED_MODEL and input_updates to enable the pretrained-structure
alignment by setting use_pretrain_script=True (or removing the override so the
default alignment runs) so the model descriptors/config match the pretrained
weights during load.
---
Nitpick comments:
In @.github/workflows/property_tools_tests.yml:
- Line 18: Replace mutable action tags with pinned commit SHAs and disable
credential persistence: update the checkout usage (currently referenced as
actions/checkout@v4) to the corresponding actions/checkout@<commit-sha> and
update setup-python (currently actions/setup-python@v5) to
actions/setup-python@<commit-sha>; also add persist-credentials: false to the
checkout step to avoid credential leakage. Locate the workflow steps that
reference actions/checkout and actions/setup-python and substitute the tags with
the official commit SHAs from the actions repositories, then add the
persist-credentials: false key under the checkout step.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/mol.py`:
- Around line 264-281: In records_from_direct_data, the zip over atoms,
coordinates, and targets should use strict=True to match the earlier length
validation and satisfy Ruff B905; update the enumerate(zip(atoms, coordinates,
targets)) call in the records_from_direct_data function to pass strict=True
(i.e., enumerate(zip(atoms, coordinates, targets, strict=True))) so any length
mismatch raises immediately.
In `@deepmd/deepmd_property_tools/README.md`:
- Around line 15-20: Update the "For local development with tests" section in
README.md to include a quick single-test invocation example for faster
iteration: add a short sentence showing how to run an individual test file or a
single test case using pytest by specifying the test file path and optional test
nodeid (using pytest's :: notation) or by using -k to filter, and recommend
using -q for concise output; reference the repository's test pattern
"**/tests/**/*.py" and place this example immediately after the existing
full-suite commands.
In `@deepmd/deepmd_property_tools/tests/test_predict.py`:
- Around line 60-65: The test test_predict_directory_uses_latest_checkpoint is
relying on time.sleep to order old_checkpoint and latest_checkpoint which is
fragile; replace the sleep-based approach by explicitly setting file
modification times (e.g., use os.utime or Path.touch with a specified times
argument) on old_checkpoint and latest_checkpoint (or alternatively rely on
checkpoint numbering logic if the code under test uses filename semantics) so
the test deterministically makes model.ckpt-2.pt the latest; update the setup
that writes "old"/"new" to ensure the timestamps are set explicitly rather than
waiting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3b5f2302-3bde-4f08-abe0-e6307f9eed60
⛔ Files ignored due to path filters (1)
deepmd/deepmd_property_tools/DATA/dataset_demo.csvis excluded by!**/*.csv
📒 Files selected for processing (75)
.github/workflows/property_tools_tests.ymldeepmd/deepmd_property_tools/DATA/mol_convert/id0.moldeepmd/deepmd_property_tools/DATA/mol_convert/id1.moldeepmd/deepmd_property_tools/DATA/mol_convert/id10.moldeepmd/deepmd_property_tools/DATA/mol_convert/id11.moldeepmd/deepmd_property_tools/DATA/mol_convert/id12.moldeepmd/deepmd_property_tools/DATA/mol_convert/id13.moldeepmd/deepmd_property_tools/DATA/mol_convert/id14.moldeepmd/deepmd_property_tools/DATA/mol_convert/id15.moldeepmd/deepmd_property_tools/DATA/mol_convert/id16.moldeepmd/deepmd_property_tools/DATA/mol_convert/id17.moldeepmd/deepmd_property_tools/DATA/mol_convert/id18.moldeepmd/deepmd_property_tools/DATA/mol_convert/id19.moldeepmd/deepmd_property_tools/DATA/mol_convert/id2.moldeepmd/deepmd_property_tools/DATA/mol_convert/id20.moldeepmd/deepmd_property_tools/DATA/mol_convert/id21.moldeepmd/deepmd_property_tools/DATA/mol_convert/id22.moldeepmd/deepmd_property_tools/DATA/mol_convert/id23.moldeepmd/deepmd_property_tools/DATA/mol_convert/id24.moldeepmd/deepmd_property_tools/DATA/mol_convert/id25.moldeepmd/deepmd_property_tools/DATA/mol_convert/id26.moldeepmd/deepmd_property_tools/DATA/mol_convert/id27.moldeepmd/deepmd_property_tools/DATA/mol_convert/id28.moldeepmd/deepmd_property_tools/DATA/mol_convert/id29.moldeepmd/deepmd_property_tools/DATA/mol_convert/id3.moldeepmd/deepmd_property_tools/DATA/mol_convert/id30.moldeepmd/deepmd_property_tools/DATA/mol_convert/id31.moldeepmd/deepmd_property_tools/DATA/mol_convert/id32.moldeepmd/deepmd_property_tools/DATA/mol_convert/id33.moldeepmd/deepmd_property_tools/DATA/mol_convert/id34.moldeepmd/deepmd_property_tools/DATA/mol_convert/id35.moldeepmd/deepmd_property_tools/DATA/mol_convert/id36.moldeepmd/deepmd_property_tools/DATA/mol_convert/id37.moldeepmd/deepmd_property_tools/DATA/mol_convert/id38.moldeepmd/deepmd_property_tools/DATA/mol_convert/id39.moldeepmd/deepmd_property_tools/DATA/mol_convert/id4.moldeepmd/deepmd_property_tools/DATA/mol_convert/id5.moldeepmd/deepmd_property_tools/DATA/mol_convert/id6.moldeepmd/deepmd_property_tools/DATA/mol_convert/id7.moldeepmd/deepmd_property_tools/DATA/mol_convert/id8.moldeepmd/deepmd_property_tools/DATA/mol_convert/id9.moldeepmd/deepmd_property_tools/DPA3_finetune_hyperparameters.mddeepmd/deepmd_property_tools/MANIFEST.indeepmd/deepmd_property_tools/README.mddeepmd/deepmd_property_tools/deepmd_property_tools/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/cli.pydeepmd/deepmd_property_tools/deepmd_property_tools/config/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/config/config_handler.pydeepmd/deepmd_property_tools/deepmd_property_tools/config/default.jsondeepmd/deepmd_property_tools/deepmd_property_tools/data/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/data/converter.pydeepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.pydeepmd/deepmd_property_tools/deepmd_property_tools/data/mol.pydeepmd/deepmd_property_tools/deepmd_property_tools/models/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/models/property_model.pydeepmd/deepmd_property_tools/deepmd_property_tools/predict.pydeepmd/deepmd_property_tools/deepmd_property_tools/predictor.pydeepmd/deepmd_property_tools/deepmd_property_tools/tasks/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.pydeepmd/deepmd_property_tools/deepmd_property_tools/train.pydeepmd/deepmd_property_tools/deepmd_property_tools/utils/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/utils/base_logger.pydeepmd/deepmd_property_tools/deepmd_property_tools/utils/metrics.pydeepmd/deepmd_property_tools/deepmd_property_tools/utils/util.pydeepmd/deepmd_property_tools/deepmd_property_tools/weights/__init__.pydeepmd/deepmd_property_tools/deepmd_property_tools/weights/weighthub.pydeepmd/deepmd_property_tools/predict_property_20.pydeepmd/deepmd_property_tools/pyproject.tomldeepmd/deepmd_property_tools/tests/test_cli.pydeepmd/deepmd_property_tools/tests/test_config.pydeepmd/deepmd_property_tools/tests/test_mol.pydeepmd/deepmd_property_tools/tests/test_predict.pydeepmd/deepmd_property_tools/tests/test_train.pydeepmd/deepmd_property_tools/tests/test_trainer.pydeepmd/deepmd_property_tools/train_property_20.py
| id_5 | ||
| RDKit 2D | ||
|
|
||
| 57 56 0 0 0 0 0 0 0 0999 V2000 | ||
| -2.2500 -1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| -1.5000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 4.5000 2.5981 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 6.0000 2.5981 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 7.5000 2.5981 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 9.0000 2.5981 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 9.7500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 9.7500 3.8971 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 7.5000 4.0981 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 8.7990 4.8481 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 6.3021 5.0008 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 7.5000 1.0981 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 8.7990 0.3481 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 6.3021 0.1953 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 5.0490 0.5490 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 5.0490 -0.9510 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 5.0490 -2.4510 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 5.0490 -3.9510 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.7500 -4.7010 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 6.3481 -4.7010 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.5490 -2.4510 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 2.7990 -3.7500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 2.6463 -1.2530 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 6.5490 -2.4510 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 7.2990 -3.7500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 7.4518 -1.2530 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 2.4510 2.0490 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 2.4510 3.5490 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 2.4510 5.0490 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 2.4510 6.5490 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1.1519 7.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.7500 7.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.9510 5.0490 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 4.7010 6.3481 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 4.8537 3.8511 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 0.9510 5.0490 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 0.2010 6.3481 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 0.0482 3.8511 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 0.0000 1.5000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| -1.2990 2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1.1979 2.4028 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| -0.0000 -1.5000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| -1.2990 -2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1.1979 -2.4028 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| -2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1.6828 -1.4888 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1.6828 1.4888 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 5.8172 1.1093 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 5.8172 4.0869 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 6.5379 -0.7681 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.5602 -0.7681 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 3.9398 3.3662 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 0.9621 3.3662 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 | ||
| 1 2 2 0 | ||
| 2 3 1 0 | ||
| 3 4 1 0 | ||
| 4 5 1 0 | ||
| 5 6 1 0 | ||
| 6 7 1 0 | ||
| 7 8 1 0 | ||
| 8 9 1 0 | ||
| 9 10 1 0 | ||
| 10 11 1 0 | ||
| 10 12 2 0 | ||
| 9 13 1 0 | ||
| 13 14 1 0 | ||
| 13 15 2 0 | ||
| 9 16 1 0 | ||
| 16 17 1 0 | ||
| 16 18 2 0 | ||
| 6 19 1 0 | ||
| 19 20 1 0 | ||
| 20 21 1 0 | ||
| 21 22 1 0 | ||
| 22 23 1 0 | ||
| 22 24 2 0 | ||
| 21 25 1 0 | ||
| 25 26 1 0 | ||
| 25 27 2 0 | ||
| 21 28 1 0 | ||
| 28 29 1 0 | ||
| 28 30 2 0 | ||
| 6 31 1 0 | ||
| 31 32 1 0 | ||
| 32 33 1 0 | ||
| 33 34 1 0 | ||
| 34 35 1 0 | ||
| 34 36 2 0 | ||
| 33 37 1 0 | ||
| 37 38 1 0 | ||
| 37 39 2 0 | ||
| 33 40 1 0 | ||
| 40 41 1 0 | ||
| 40 42 2 0 | ||
| 3 43 1 0 | ||
| 43 44 1 0 | ||
| 43 45 2 0 | ||
| 3 46 1 0 | ||
| 46 47 1 0 | ||
| 46 48 2 0 | ||
| 2 49 1 0 | ||
| 4 50 1 0 | ||
| 4 51 1 0 | ||
| 8 52 1 0 | ||
| 8 53 1 0 | ||
| 20 54 1 0 | ||
| 20 55 1 0 | ||
| 32 56 1 0 | ||
| 32 57 1 0 | ||
| M CHG 8 2 1 10 1 11 -1 13 1 14 -1 16 1 17 -1 22 1 | ||
| M CHG 8 23 -1 25 1 26 -1 28 1 29 -1 34 1 35 -1 37 1 | ||
| M CHG 8 38 -1 40 1 41 -1 43 1 44 -1 46 1 47 -1 49 -1 | ||
| M END |
There was a problem hiding this comment.
id5.mol duplicates id4.mol content under a different identifier.
The structure/bond/charge blocks here are effectively identical to deepmd/deepmd_property_tools/DATA/mol_convert/id4.mol (only title differs). If unintentional, this introduces duplicate examples that can bias splits and inflate evaluation confidence. Please replace this file with the intended molecule or deduplicate at dataset generation time.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/deepmd_property_tools/DATA/mol_convert/id5.mol` around lines 1 - 121,
The file id5.mol contains the same molecule data as id4.mol (same
atoms/bonds/charges with only the title "id_5" changed); remove or replace
id5.mol with the intended unique molecule data (or update your dataset
generation/deduplication logic so files like id5.mol are not created as
duplicates of id4.mol). Search for the identifier/title "id_5" and the matching
atom/bond/charge blocks (e.g., the M CHG block and the same bond list) to
locate the duplicate, then either swap in the correct .mol content or
deduplicate at generation time to avoid biased/duplicated examples.
| if is_train: | ||
| self.result: PropertyDataResult | None = prepare_property_data( | ||
| data, | ||
| output_dir=self.save_path / "prepared_data", | ||
| input_out=self.save_path / "input.json", | ||
| property_name=property_name, | ||
| property_col=property_col, | ||
| train_ratio=train_ratio, | ||
| mol_dir=mol_dir, | ||
| mol_template=mol_template, | ||
| overlap_tol=overlap_tol, | ||
| seed=seed, | ||
| overwrite=overwrite, | ||
| numb_steps=numb_steps, | ||
| input_updates=input_updates, | ||
| ) |
There was a problem hiding this comment.
Type mismatch: property_col can be None but prepare_property_data expects str.
When is_train=True, property_col is passed to prepare_property_data which has signature property_col: str = "Property". If DataHub is called directly with is_train=True and property_col=None, this would pass None where a str is expected.
Consider either:
- Validating that
property_colis notNonewhenis_train=True, or - Changing the type annotation to
strwith a default value, since the inference path (context snippet frompredict.py:67) explicitly passesproperty_col=None
🛡️ Option 1: Add validation
if is_train:
+ if property_col is None:
+ raise ValueError("property_col is required for training")
self.result: PropertyDataResult | None = prepare_property_data(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if is_train: | |
| self.result: PropertyDataResult | None = prepare_property_data( | |
| data, | |
| output_dir=self.save_path / "prepared_data", | |
| input_out=self.save_path / "input.json", | |
| property_name=property_name, | |
| property_col=property_col, | |
| train_ratio=train_ratio, | |
| mol_dir=mol_dir, | |
| mol_template=mol_template, | |
| overlap_tol=overlap_tol, | |
| seed=seed, | |
| overwrite=overwrite, | |
| numb_steps=numb_steps, | |
| input_updates=input_updates, | |
| ) | |
| if is_train: | |
| if property_col is None: | |
| raise ValueError("property_col is required for training") | |
| self.result: PropertyDataResult | None = prepare_property_data( | |
| data, | |
| output_dir=self.save_path / "prepared_data", | |
| input_out=self.save_path / "input.json", | |
| property_name=property_name, | |
| property_col=property_col, | |
| train_ratio=train_ratio, | |
| mol_dir=mol_dir, | |
| mol_template=mol_template, | |
| overlap_tol=overlap_tol, | |
| seed=seed, | |
| overwrite=overwrite, | |
| numb_steps=numb_steps, | |
| input_updates=input_updates, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/data/datahub.py` around
lines 47 - 62, When is_train is True, ensure property_col is a non-None str
before calling prepare_property_data: in the DataHub constructor (or the method
where this code lives) validate that property_col is not None and raise a clear
ValueError if it is, or set a sensible default string (e.g., "Property") before
passing it to prepare_property_data; reference the symbols property_col,
is_train, and prepare_property_data to locate and fix the call site so
prepare_property_data always receives a str.
| for frame_idx, (symbols, coord) in enumerate(zip(atoms, coordinates)): | ||
| if coord.shape != (len(symbols), 3): | ||
| raise ValueError( | ||
| f"coordinates shape mismatch at sample {frame_idx}: {coord.shape}" | ||
| ) | ||
| for atom_idx, symbol in enumerate(symbols): | ||
| if symbol not in self.type_index: | ||
| raise ValueError( | ||
| f"Element {symbol!r} is not present in type_map {self.type_map}" | ||
| ) | ||
| atom_types[frame_idx, atom_idx] = self.type_index[symbol] | ||
| coords[frame_idx, : len(symbols), :] = coord | ||
| return coords, atom_types |
There was a problem hiding this comment.
Prevent silent truncation when input lengths diverge.
These zip(...) loops can drop samples/columns without error when lengths mismatch (atoms vs coordinates, rows vs y_pred, or predict_cols vs pred). This can silently corrupt prediction outputs.
💡 Suggested fix
def standardize(
self, atoms: list[list[str]], coordinates: list[np.ndarray]
) -> tuple[np.ndarray, np.ndarray]:
if not atoms:
raise ValueError("No samples to predict")
+ if len(atoms) != len(coordinates):
+ raise ValueError(
+ f"atoms/coordinates length mismatch: {len(atoms)} != {len(coordinates)}"
+ )
max_natoms = max(len(symbols) for symbols in atoms)
coords = np.zeros((len(atoms), max_natoms, 3), dtype=np.float32)
atom_types = np.full((len(atoms), max_natoms), -1, dtype=np.int32)
- for frame_idx, (symbols, coord) in enumerate(zip(atoms, coordinates)):
+ for frame_idx, (symbols, coord) in enumerate(
+ zip(atoms, coordinates, strict=True)
+ ):
if coord.shape != (len(symbols), 3):
raise ValueError(
f"coordinates shape mismatch at sample {frame_idx}: {coord.shape}"
)
@@
with out_path.open("w", encoding="utf-8", newline="") as fp:
writer = csv.DictWriter(fp, fieldnames=fieldnames)
writer.writeheader()
- for row, pred in zip(rows, y_pred):
+ if len(rows) != len(y_pred):
+ raise ValueError(
+ f"rows/prediction length mismatch: {len(rows)} != {len(y_pred)}"
+ )
+ for row, pred in zip(rows, y_pred, strict=True):
out_row = dict(row)
- for col, value in zip(predict_cols, pred):
+ for col, value in zip(predict_cols, pred, strict=True):
out_row[col] = float(value)
writer.writerow(out_row)Also applies to: 95-99
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 55-55: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/predictor.py` around lines
55 - 67, The zip(...) usage silently truncates mismatched sequences (e.g., atoms
vs coordinates, rows vs y_pred, predict_cols vs pred) causing data loss; before
each zip, validate lengths explicitly and raise a clear ValueError if they
differ (e.g., check len(atoms) == len(coordinates), len(rows) == len(y_pred),
len(predict_cols) == len(pred)), include identifying context in the message,
then proceed with the existing loops that populate coords/atom_types and consume
y_pred/pred so no samples/columns are silently dropped; apply these checks
around the blocks that reference atoms/coordinates and
rows/y_pred/predict_cols/pred.
| self.save_path = Path(save_path) | ||
| self.finetune = finetune | ||
| self.nproc_per_node = nproc_per_node |
There was a problem hiding this comment.
Resolve paths before changing working directory.
save_path/input_path can be relative, but run() changes cwd and still passes paths prefixed with self.save_path. This can mis-target files (e.g., nested .../exp_property/exp_property/out.json) and break default relative-path runs.
💡 Proposed fix
class Trainer:
@@
- self.save_path = Path(save_path)
+ self.save_path = Path(save_path).resolve()
@@
def run(self, input_path: str | Path) -> None:
- input_path = Path(input_path)
+ input_path = Path(input_path).resolve()
@@
train(
input_file=str(input_path),
@@
output=str(self.save_path / "out.json"),
)Also applies to: 38-39, 44-60, 66-77, 88-88
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/deepmd_property_tools/deepmd_property_tools/tasks/trainer.py` around
lines 28 - 30, The code sets self.save_path (and other path attributes like
self.input_path) as Path(save_path) but later run() changes the working
directory, causing relative paths to resolve incorrectly; change the constructor
to immediately resolve all incoming path attributes (e.g., call
Path(save_path).resolve() and Path(input_path).resolve() for any similar fields
set around lines where save_path/input_path are assigned) so that self.save_path
and related attributes are absolute before run() mutates cwd, and ensure any
other places assigning paths (the blocks around lines 38-39, 44-60, 66-77, 88)
are updated to use resolved Paths as well.
| MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt" | ||
|
|
||
| if not MODEL_PATH.exists(): | ||
| raise FileNotFoundError(f"Train first; checkpoint not found: {MODEL_PATH}") | ||
|
|
There was a problem hiding this comment.
Avoid hardcoding a specific checkpoint filename.
model.ckpt-10.pt tightly couples this script to one training run. Passing the experiment directory is more robust and lets PropertyPredict pick frozen_model.pth or the latest checkpoint automatically.
Suggested patch
-MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt"
-
-if not MODEL_PATH.exists():
- raise FileNotFoundError(f"Train first; checkpoint not found: {MODEL_PATH}")
+MODEL_PATH = ROOT / "exp_property_20"
+
+if not MODEL_PATH.exists():
+ raise FileNotFoundError(f"Train first; model directory not found: {MODEL_PATH}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| MODEL_PATH = ROOT / "exp_property_20" / "model.ckpt-10.pt" | |
| if not MODEL_PATH.exists(): | |
| raise FileNotFoundError(f"Train first; checkpoint not found: {MODEL_PATH}") | |
| MODEL_PATH = ROOT / "exp_property_20" | |
| if not MODEL_PATH.exists(): | |
| raise FileNotFoundError(f"Train first; model directory not found: {MODEL_PATH}") | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/deepmd_property_tools/predict_property_20.py` around lines 13 - 17,
The code currently hardcodes MODEL_PATH = ROOT / "exp_property_20" /
"model.ckpt-10.pt", which couples the script to one checkpoint; change it to
accept an experiment directory (or path) and let PropertyPredict or its loader
resolve the correct file (e.g., frozen_model.pth or latest checkpoint). Modify
the usage of MODEL_PATH and any instantiation of PropertyPredict to accept a
directory/path variable (e.g., exp_dir or model_dir) instead of a specific
filename, and update the existence check to verify the directory and delegate
selecting the actual checkpoint file to PropertyPredict (or a helper like
PropertyPredict.load_checkpoint) so the script works with frozen_model.pth or
the latest checkpoint automatically.
| finetune=PRETRAINED_MODEL, | ||
| use_pretrain_script=False, | ||
| input_updates={ |
There was a problem hiding this comment.
Enable pretrained-structure alignment when finetuning.
With finetune=PRETRAINED_MODEL, setting use_pretrain_script=False can cause descriptor/config mismatch against pretrained weights and make this demo fail on load.
Suggested patch
- use_pretrain_script=False,
+ use_pretrain_script=True,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| finetune=PRETRAINED_MODEL, | |
| use_pretrain_script=False, | |
| input_updates={ | |
| finetune=PRETRAINED_MODEL, | |
| use_pretrain_script=True, | |
| input_updates={ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/deepmd_property_tools/train_property_20.py` around lines 38 - 40, The
finetuning invocation sets finetune=PRETRAINED_MODEL while disabling the
pretrained alignment via use_pretrain_script=False, which can produce
descriptor/config mismatches; update the call that includes finetune,
PRETRAINED_MODEL and input_updates to enable the pretrained-structure alignment
by setting use_pretrain_script=True (or removing the override so the default
alignment runs) so the model descriptors/config match the pretrained weights
during load.
Summary
deepmd_property_toolsfor molecular property training and prediction.pyproject.tomlpackaging with a CLI entry point.Tests
python -m pytest tests -vSummary by CodeRabbit
New Features
trainandpredictCLI commands for workflow automation.Tests