Skip to content

Commit e9fe00f

Browse files
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
1 parent 30351e9 commit e9fe00f

31 files changed

Lines changed: 569 additions & 186 deletions

deepmd/deepmd_property_tools/DPA3_finetune_hyperparameters.md

Lines changed: 33 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ PropertyTrain(
3030

3131
其中 `use_pretrain_script=True` 会让 DeePMD-kit 根据预训练模型里的 `model_params` 自动修正当前 `input.json` 中的模型结构,使其更容易和 `DPA-3.2-5M.pt` 对齐。
3232

33-
---
33+
______________________________________________________________________
3434

3535
## 2. 应与预训练模型保持一致的参数
3636

@@ -53,13 +53,24 @@ unexpected key
5353
微调数据中的元素类型应被预训练模型支持。当前 20 条 demo 数据自动生成:
5454

5555
```json
56-
["H", "C", "N", "O"]
56+
[
57+
"H",
58+
"C",
59+
"N",
60+
"O"
61+
]
5762
```
5863

5964
如果使用全量数据且包含 `I`,则可能生成:
6065

6166
```json
62-
["H", "C", "N", "O", "I"]
67+
[
68+
"H",
69+
"C",
70+
"N",
71+
"O",
72+
"I"
73+
]
6374
```
6475

6576
需要确认预训练模型支持这些元素。
@@ -153,7 +164,7 @@ unexpected key
153164

154165
这些参数有些会影响模型结构,有些会影响模型计算逻辑。做预训练微调时,不建议手动随意修改。
155166

156-
---
167+
______________________________________________________________________
157168

158169
## 3. 可以根据当前任务设置的参数
159170

@@ -193,15 +204,15 @@ unexpected key
193204
可以自行设置:
194205

195206
```python
196-
numb_steps=10
207+
numb_steps = 10
197208
```
198209

199210
或正式训练时设置更大:
200211

201212
```python
202-
numb_steps=10000
203-
numb_steps=50000
204-
numb_steps=200000
213+
numb_steps = 10000
214+
numb_steps = 50000
215+
numb_steps = 200000
205216
```
206217

207218
当前 20 条 demo 数据只用于 smoke test,`10` steps 只是验证流程。
@@ -211,19 +222,19 @@ numb_steps=200000
211222
可以根据数据量和显存调整:
212223

213224
```python
214-
batch_size=1
225+
batch_size = 1
215226
```
216227

217228
或使用 DeePMD 支持的自动 batch:
218229

219230
```python
220-
batch_size="auto:512"
231+
batch_size = "auto:512"
221232
```
222233

223234
当前 20 条 demo 数据中很多 system 只有 1-2 个样本,如果设置:
224235

225236
```python
226-
batch_size=1024
237+
batch_size = 1024
227238
```
228239

229240
会出现 warning:
@@ -254,7 +265,7 @@ required batch size is larger than the size of the dataset
254265
`train_property_20.py` 中可通过 `input_updates` 设置:
255266

256267
```python
257-
input_updates={
268+
input_updates = {
258269
"learning_rate": {
259270
"type": "exp",
260271
"decay_steps": 1000,
@@ -284,8 +295,8 @@ input_updates={
284295
例如:
285296

286297
```python
287-
property_name="Property"
288-
property_col="Property"
298+
property_name = "Property"
299+
property_col = "Property"
289300
```
290301

291302
含义:
@@ -324,15 +335,15 @@ The fitting net will be re-init instead of using that in the pretrained model!
324335
这是 `deepmd_property_tools` 的工具层参数:
325336

326337
```python
327-
freeze=False
338+
freeze = False
328339
```
329340

330341
它控制训练结束后是否自动导出 `frozen_model.pth`
331342

332343
当前 DPA3 预训练模型的 `custom_silu` 在 TorchScript freeze 阶段可能报错,因此当前 demo 中使用:
333344

334345
```python
335-
freeze=False
346+
freeze = False
336347
```
337348

338349
先保存 checkpoint:
@@ -348,15 +359,15 @@ model.ckpt-10.pt
348359
这是 `deepmd_property_tools` 的训练启动参数,用于控制单节点启动多少个训练进程:
349360

350361
```python
351-
nproc_per_node=1
362+
nproc_per_node = 1
352363
```
353364

354365
默认值是 `1`,表示单进程训练。单进程时,工具会直接调用 DeePMD-kit 的 Python 训练入口。
355366

356367
如果设置为大于 1,例如:
357368

358369
```python
359-
nproc_per_node=2
370+
nproc_per_node = 2
360371
```
361372

362373
工具会改用 `torchrun` 启动多进程训练,等价于:
@@ -368,7 +379,7 @@ torchrun --nproc_per_node=2 --no-python dp --pt train input.json
368379
通常含义是单节点 2 张 GPU / 2 个训练进程。8 卡训练可以设置:
369380

370381
```python
371-
nproc_per_node=8
382+
nproc_per_node = 8
372383
```
373384

374385
注意:`nproc_per_node` 不是 CPU 线程数。如果只是在 CPU 上想使用更多线程,应通过环境变量控制,例如:
@@ -380,7 +391,7 @@ export DP_INTER_OP_PARALLELISM_THREADS=2
380391
python train_property_20.py
381392
```
382393

383-
---
394+
______________________________________________________________________
384395

385396
## 4. 当前推荐配置示例
386397

@@ -432,7 +443,7 @@ DPA3 结构开关
432443

433444
这些应由 `use_pretrain_script=True` 自动继承预训练模型配置。
434445

435-
---
446+
______________________________________________________________________
436447

437448
## 5. 简要总结
438449

@@ -463,7 +474,7 @@ nproc_per_node
463474
当前工具推荐让 DeePMD-kit 通过:
464475

465476
```python
466-
use_pretrain_script=True
477+
use_pretrain_script = True
467478
```
468479

469480
自动继承预训练模型结构,而用户主要调当前任务相关的训练超参。

deepmd/deepmd_property_tools/README.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,13 @@ DATA/
5353
Direct coordinate data is also supported:
5454

5555
```python
56-
clf.fit({
57-
"atoms": [["C", "H", "H", "H", "H"], ["O", "H", "H"]],
58-
"coordinates": [coords0, coords1],
59-
"target": [0.1, 0.2],
60-
})
56+
clf.fit(
57+
{
58+
"atoms": [["C", "H", "H", "H", "H"], ["O", "H", "H"]],
59+
"coordinates": [coords0, coords1],
60+
"target": [0.1, 0.2],
61+
}
62+
)
6163
```
6264

6365
## Command Line
@@ -72,19 +74,19 @@ Train from CSV + MOL inputs:
7274

7375
```bash
7476
deepmd-property-tools train \
75-
--dataset DATA/dataset_demo.csv \
76-
--mol-dir DATA/mol_convert \
77-
--save-path exp_property
77+
--dataset DATA/dataset_demo.csv \
78+
--mol-dir DATA/mol_convert \
79+
--save-path exp_property
7880
```
7981

8082
Predict with a checkpoint file or an experiment directory:
8183

8284
```bash
8385
deepmd-property-tools predict \
84-
--model exp_property \
85-
--dataset DATA/dataset_demo.csv \
86-
--mol-dir DATA/mol_convert \
87-
--save-path pred_property
86+
--model exp_property \
87+
--dataset DATA/dataset_demo.csv \
88+
--mol-dir DATA/mol_convert \
89+
--save-path pred_property
8890
```
8991

9092
## Notes
Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
# SPDX-License-Identifier: LGPL-3.0-or-later
22
"""Uni-Mol-tools-like helpers for DeePMD property tasks."""
33

4-
from .predict import PropertyPredict
5-
from .train import PropertyTrain
4+
from .predict import (
5+
PropertyPredict,
6+
)
7+
from .train import (
8+
PropertyTrain,
9+
)
610

711
__all__ = ["PropertyPredict", "PropertyTrain"]

deepmd/deepmd_property_tools/deepmd_property_tools/cli.py

Lines changed: 47 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,20 @@
11
# SPDX-License-Identifier: LGPL-3.0-or-later
22
"""Command line interface for DeePMD property tools."""
33

4-
from __future__ import annotations
4+
from __future__ import (
5+
annotations,
6+
)
57

68
import argparse
7-
from pathlib import Path
8-
from typing import Sequence
9+
from pathlib import (
10+
Path,
11+
)
12+
from collections.abc import Sequence
913

10-
from deepmd_property_tools import PropertyPredict, PropertyTrain
14+
from deepmd_property_tools import (
15+
PropertyPredict,
16+
PropertyTrain,
17+
)
1118

1219

1320
def build_parser() -> argparse.ArgumentParser:
@@ -25,21 +32,45 @@ def build_parser() -> argparse.ArgumentParser:
2532
subparsers = parser.add_subparsers(dest="command")
2633

2734
train_parser = subparsers.add_parser("train", help="Train a property model")
28-
train_parser.add_argument("--dataset", required=True, type=Path, help="CSV dataset path")
29-
train_parser.add_argument("--mol-dir", required=True, type=Path, help="MOL directory path")
30-
train_parser.add_argument("--save-path", required=True, type=Path, help="Experiment output directory")
31-
train_parser.add_argument("--property-col", default="Property", help="CSV property column")
32-
train_parser.add_argument("--property-name", default="Property", help="DeePMD property name")
33-
train_parser.add_argument("--finetune", default=None, help="Pretrained model name or path")
34-
train_parser.add_argument("--numb-steps", type=int, default=None, help="Number of training steps")
35-
train_parser.add_argument("--batch-size", type=int, default=None, help="Training batch size")
35+
train_parser.add_argument(
36+
"--dataset", required=True, type=Path, help="CSV dataset path"
37+
)
38+
train_parser.add_argument(
39+
"--mol-dir", required=True, type=Path, help="MOL directory path"
40+
)
41+
train_parser.add_argument(
42+
"--save-path", required=True, type=Path, help="Experiment output directory"
43+
)
44+
train_parser.add_argument(
45+
"--property-col", default="Property", help="CSV property column"
46+
)
47+
train_parser.add_argument(
48+
"--property-name", default="Property", help="DeePMD property name"
49+
)
50+
train_parser.add_argument(
51+
"--finetune", default=None, help="Pretrained model name or path"
52+
)
53+
train_parser.add_argument(
54+
"--numb-steps", type=int, default=None, help="Number of training steps"
55+
)
56+
train_parser.add_argument(
57+
"--batch-size", type=int, default=None, help="Training batch size"
58+
)
3659
train_parser.set_defaults(func=_run_train)
3760

3861
predict_parser = subparsers.add_parser("predict", help="Predict properties")
39-
predict_parser.add_argument("--model", required=True, type=Path, help="Model file or experiment directory")
40-
predict_parser.add_argument("--dataset", required=True, type=Path, help="CSV dataset path")
41-
predict_parser.add_argument("--mol-dir", required=True, type=Path, help="MOL directory path")
42-
predict_parser.add_argument("--save-path", default=None, type=Path, help="Prediction output directory")
62+
predict_parser.add_argument(
63+
"--model", required=True, type=Path, help="Model file or experiment directory"
64+
)
65+
predict_parser.add_argument(
66+
"--dataset", required=True, type=Path, help="CSV dataset path"
67+
)
68+
predict_parser.add_argument(
69+
"--mol-dir", required=True, type=Path, help="MOL directory path"
70+
)
71+
predict_parser.add_argument(
72+
"--save-path", default=None, type=Path, help="Prediction output directory"
73+
)
4374
predict_parser.set_defaults(func=_run_predict)
4475

4576
return parser
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-License-Identifier: LGPL-3.0-or-later
22
"""Configuration helpers for deepmd_property_tools."""
33

4-
from .config_handler import ConfigHandler
4+
from .config_handler import (
5+
ConfigHandler,
6+
)
57

68
__all__ = ["ConfigHandler"]

deepmd/deepmd_property_tools/deepmd_property_tools/config/config_handler.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,35 @@
11
# SPDX-License-Identifier: LGPL-3.0-or-later
22
"""JSON config handler."""
33

4-
from __future__ import annotations
4+
from __future__ import (
5+
annotations,
6+
)
57

68
import copy
79
import json
8-
from pathlib import Path
9-
from typing import Any
10+
from pathlib import (
11+
Path,
12+
)
13+
from typing import (
14+
Any,
15+
)
1016

1117

1218
class ConfigHandler:
1319
def __init__(self, config_path: str | Path | None = None) -> None:
14-
self.config_path = Path(config_path) if config_path else Path(__file__).with_name("default.json")
20+
self.config_path = (
21+
Path(config_path)
22+
if config_path
23+
else Path(__file__).with_name("default.json")
24+
)
1525

1626
def read(self) -> dict[str, Any]:
1727
return json.loads(self.config_path.read_text(encoding="utf-8"))
1828

1929
def write(self, data: dict[str, Any], out_file_path: str | Path) -> None:
20-
Path(out_file_path).write_text(json.dumps(data, indent=2) + "\n", encoding="utf-8")
30+
Path(out_file_path).write_text(
31+
json.dumps(data, indent=2) + "\n", encoding="utf-8"
32+
)
2133

2234
@staticmethod
2335
def merge(base: dict[str, Any], updates: dict[str, Any] | None) -> dict[str, Any]:

deepmd/deepmd_property_tools/deepmd_property_tools/config/default.json

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,21 @@
3838
"property_name": "Property",
3939
"intensive": true,
4040
"task_dim": 1,
41-
"neuron": [240, 240, 240],
41+
"neuron": [
42+
240,
43+
240,
44+
240
45+
],
4246
"resnet_dt": true,
4347
"seed": 1
4448
}
4549
},
4650
"loss": {
4751
"type": "property",
48-
"metric": ["mae", "rmse"],
52+
"metric": [
53+
"mae",
54+
"rmse"
55+
],
4956
"loss_func": "smooth_mae",
5057
"beta": 1.0
5158
},

0 commit comments

Comments
 (0)