You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(pt_expt): use model API for inference, consistent file naming (deepmodeling#5354)
## Summary
### Problem
Two inconsistencies in `.pt2`/`.pte` files:
1. **Python reads a flat metadata dict instead of using the model API.**
Other backends (`.dp`/`.yaml`, `.pth`) deserialize the model and query
it directly for `get_rcut()`, `get_sel()`, `model_output_type()`, etc.
The `.pt2`/`.pte` backend was reading these from a metadata dict stored
at export time, duplicating model logic.
2. **Inconsistent file naming.** `model_def_script.json` stored C++
runtime metadata in `.pt2`/`.pte`, but training config in `.pth`.
Training config lived separately in `model_params.json`.
`output_keys.json` was a standalone file that logically belongs with
metadata.
### Solution
**Python inference**: `DeepEval` now deserializes `model.json` into a
dpmodel instance (`self._dpmodel`) at load time and delegates all API
calls to it. `_reconstruct_model_output_def()` is removed.
**File layout renamed** so that each filename means the same thing
across `.pth` and `.pt2`/`.pte`:
| File | Before | After |
|------|--------|-------|
| `model_def_script.json` | C++ metadata | **Training config** (matches
`.pth`) |
| `metadata.json` | *(did not exist)* | **C++ metadata** + output_keys |
| `model_params.json` | Training config | **Removed** |
| `output_keys.json` | Output key list | **Removed** (merged into
`metadata.json`) |
| `model.json` | Full serialized model | No change |
**C++ inference** (`DeepPotPTExpt.cc`): Updated to read
`extra/metadata.json` instead of `extra/model_def_script.json`, and
reads `output_keys` from the metadata dict instead of a separate
`output_keys.json`.
**Why `metadata.json` still exists**: C++ inference cannot deserialize
`model.json` to call model API methods. The alternative — compiling
methods like `get_rcut()`, `get_sel()` as additional AOTInductor entry
points — was benchmarked and rejected:
- **Compilation overhead**: ~12s per trivial constant-returning function
(C++ codegen + compile + link). With ~8 methods, that adds ~1.5 min to
freeze time.
- **String outputs**: `get_type_map()` returns strings. `torch.export`
only supports tensor I/O — encoding strings as int tensors adds
complexity for no benefit.
- **These are constants**: `rcut`, `sel`, `type_map` never change after
export. A flat JSON file is the simplest and fastest solution.
### Other changes
- `compress` and `change_bias` entrypoints now preserve training config
through `.pte`/`.pt2` round-trips
- `.gitignore` updated to exclude `.pte`/`.pt2` model files
- `_collect_metadata()` drops `model_output_type` and `sel_type` (not
used by C++; Python now gets them from the model)
## Test plan
- [x] `source/tests/pt_expt/infer/test_deep_eval.py` — 36/36 pass
(`.pte` + `.pt2`)
- New: `test_model_api_delegation`,
`test_get_model_def_script_with_params`
- Updated: `test_get_model_def_script`, `test_pt2_has_metadata`,
`test_dynamic_shapes`
- [x] `source/tests/pt_expt/model/` — 50/50 pass (frozen, compression,
serialization)
- [x] `source/tests/pt_expt/test_change_bias.py` — new tests for
`.pte`/`.pt2` model_def_script preservation
- [x] C++ tests — 3/3 suites pass (`.pt2` models regenerated with new
`metadata.json`)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Preserve and restore training configuration when freezing or modifying
frozen models; clearer messages when training config is absent.
* **Refactor**
* Consolidated metadata layout inside exported model archives for more
consistent loading across formats and runtimes.
* **Tests**
* Added and updated tests to validate config preservation, metadata
delegation, and archive contents.
* **Chores**
* Extended ignore patterns to skip additional model file extensions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>
0 commit comments