|
| 1 | +# DeePMD-kit |
| 2 | + |
| 3 | +DeePMD-kit is a deep learning package for many-body potential energy representation and molecular dynamics. It supports multiple backends (TensorFlow, PyTorch, JAX, Paddle) and integrates with MD packages like LAMMPS, GROMACS, and i-PI. |
| 4 | + |
| 5 | +**Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.** |
| 6 | + |
| 7 | +## Working Effectively |
| 8 | + |
| 9 | +### Bootstrap and Build Repository |
| 10 | + |
| 11 | +- Create virtual environment: `uv venv venv && source venv/bin/activate` |
| 12 | +- Install base dependencies: `uv pip install tensorflow-cpu` (takes ~8 seconds) |
| 13 | +- Install PyTorch: `uv pip install torch --index-url https://download.pytorch.org/whl/cpu` (takes ~5 seconds) |
| 14 | +- Build Python package: `uv pip install -e .[cpu,test]` -- takes 67 seconds. **NEVER CANCEL. Set timeout to 120+ seconds.** |
| 15 | +- Build C++ components: `export TENSORFLOW_ROOT=$(python -c 'import importlib.util,pathlib;print(pathlib.Path(importlib.util.find_spec("tensorflow").origin).parent)')` then `export PYTORCH_ROOT=$(python -c 'import torch;print(torch.__path__[0])')` then `./source/install/build_cc.sh` -- takes 164 seconds. **NEVER CANCEL. Set timeout to 300+ seconds.** |
| 16 | + |
| 17 | +### Test Repository |
| 18 | + |
| 19 | +- Run single test: `pytest source/tests/tf/test_dp_test.py::TestDPTestEner::test_1frame -v` -- takes 8-13 seconds |
| 20 | +- Run test subset: `pytest source/tests/tf/test_dp_test.py -v` -- takes 15 seconds. **NEVER CANCEL. Set timeout to 60+ seconds.** |
| 21 | +- **Recommended: Use single test cases for validation instead of full test suite** -- full suite has 314 test files and takes 60+ minutes |
| 22 | + |
| 23 | +### Lint and Format Code |
| 24 | + |
| 25 | +- Install linter: `uv pip install ruff` |
| 26 | +- Run linting: `ruff check .` -- takes <1 second |
| 27 | +- Format code: `ruff format .` -- takes <1 second |
| 28 | +- **Always run `ruff check .` and `ruff format .` before committing changes or the CI will fail.** |
| 29 | + |
| 30 | +### Training and Validation |
| 31 | + |
| 32 | +- Test TensorFlow training: `cd examples/water/se_e2_a && dp train input.json --skip-neighbor-stat` -- training proceeds but is slow on CPU |
| 33 | +- Test PyTorch training: `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` -- training proceeds but is slow on CPU |
| 34 | +- **Training examples are for validation only. Real training takes hours/days. Timeout training tests after 60 seconds for validation.** |
| 35 | + |
| 36 | +## Validation Scenarios |
| 37 | + |
| 38 | +**ALWAYS manually validate any new code through at least one complete scenario:** |
| 39 | + |
| 40 | +### Basic Functionality Validation |
| 41 | + |
| 42 | +1. **CLI Interface**: Run `dp --version` and `dp -h` to verify installation |
| 43 | +2. **Python Interface**: Run `python -c "import deepmd; import deepmd.tf; print('Both interfaces work')"` |
| 44 | +3. **Backend Selection**: Test `dp --tf -h`, `dp --pt -h`, `dp --jax -h`, `dp --pd -h` |
| 45 | + |
| 46 | +### Training Workflow Validation |
| 47 | + |
| 48 | +1. **TensorFlow Training**: `cd examples/water/se_e2_a && timeout 60 dp train input.json --skip-neighbor-stat` -- should start training and show decreasing loss |
| 49 | +2. **PyTorch Training**: `cd examples/water/se_e2_a && timeout 60 dp --pt train input_torch.json --skip-neighbor-stat` -- should start training and show decreasing loss |
| 50 | +3. **Verify training output**: Look for "batch X: trn: rmse" messages showing decreasing error values |
| 51 | + |
| 52 | +### Test-Based Validation |
| 53 | + |
| 54 | +1. **Core Tests**: `pytest source/tests/tf/test_dp_test.py::TestDPTestEner::test_1frame -v` -- should pass in ~10 seconds |
| 55 | +2. **Multi-backend**: Test both TensorFlow and PyTorch components work |
| 56 | + |
| 57 | +## Common Commands and Timing |
| 58 | + |
| 59 | +### Repository Structure |
| 60 | + |
| 61 | +``` |
| 62 | +ls -la [repo-root] |
| 63 | +.github/ # GitHub workflows and templates |
| 64 | +CONTRIBUTING.md # Contributing guide |
| 65 | +README.md # Project overview |
| 66 | +deepmd/ # Python package source |
| 67 | +doc/ # Documentation |
| 68 | +examples/ # Training examples and configurations |
| 69 | +pyproject.toml # Python build configuration |
| 70 | +source/ # C++ source code and tests |
| 71 | +``` |
| 72 | + |
| 73 | +### Key Directories and Files |
| 74 | + |
| 75 | +- `deepmd/` - Main Python package with backend implementations |
| 76 | +- `source/lib/` - Core C++ library |
| 77 | +- `source/op/` - Backend-specific operators (TF, PyTorch, etc.) |
| 78 | +- `source/api_cc/` - C++ API |
| 79 | +- `source/api_c/` - C API |
| 80 | +- `source/tests/` - Test suite (314 test files) |
| 81 | +- `examples/water/se_e2_a/` - Basic water training example |
| 82 | +- `examples/` - Various model examples for different scenarios |
| 83 | + |
| 84 | +### Common CLI Commands |
| 85 | + |
| 86 | +- `dp --version` - Show version information |
| 87 | +- `dp -h` - Show help and available commands |
| 88 | +- `dp train input.json` - Train a model (TensorFlow backend) |
| 89 | +- `dp --pt train input.json` - Train with PyTorch backend |
| 90 | +- `dp --jax train input.json` - Train with JAX backend |
| 91 | +- `dp --pd train input.json` - Train with Paddle backend |
| 92 | +- `dp test -m model.pb -s system/` - Test a trained model |
| 93 | +- `dp freeze -o model.pb` - Freeze/save a model |
| 94 | + |
| 95 | +### Build Dependencies and Setup |
| 96 | + |
| 97 | +- **Python 3.9+** required |
| 98 | +- **Virtual environment** strongly recommended: `uv venv venv && source venv/bin/activate` |
| 99 | +- **Backend dependencies**: TensorFlow, PyTorch, JAX, or Paddle (install before building) |
| 100 | +- **Build tools**: CMake, C++ compiler, scikit-build-core |
| 101 | +- **C++ build requires**: Both TensorFlow and PyTorch installed, set TENSORFLOW_ROOT and PYTORCH_ROOT environment variables |
| 102 | + |
| 103 | +### Key Configuration Files |
| 104 | + |
| 105 | +- `pyproject.toml` - Python build configuration and dependencies |
| 106 | +- `source/CMakeLists.txt` - C++ build configuration |
| 107 | +- `examples/water/se_e2_a/input.json` - Basic TensorFlow training config |
| 108 | +- `examples/water/se_e2_a/input_torch.json` - Basic PyTorch training config |
| 109 | + |
| 110 | +## Frequent Patterns and Time Expectations |
| 111 | + |
| 112 | +### Installation and Build Times |
| 113 | + |
| 114 | +- **Virtual environment setup**: ~5 seconds |
| 115 | +- **TensorFlow CPU install**: ~8 seconds |
| 116 | +- **PyTorch CPU install**: ~5 seconds |
| 117 | +- **Python package build**: ~67 seconds. **NEVER CANCEL.** |
| 118 | +- **C++ components build**: ~164 seconds. **NEVER CANCEL.** |
| 119 | +- **Full fresh setup**: ~3-4 minutes total |
| 120 | + |
| 121 | +### Testing Times |
| 122 | + |
| 123 | +- **Single test**: 8-13 seconds |
| 124 | +- **Test file (~5 tests)**: ~15 seconds |
| 125 | +- **Backend-specific test subset**: 15-30 minutes. **Use sparingly.** |
| 126 | +- **Full test suite (314 files)**: 60+ minutes. **Avoid in development - use single tests instead.** |
| 127 | + |
| 128 | +### Linting and Formatting |
| 129 | + |
| 130 | +- **Ruff check**: <1 second |
| 131 | +- **Ruff format**: <1 second |
| 132 | +- **Pre-commit hooks**: May have network issues, use individual tools |
| 133 | + |
| 134 | +### Commit Messages and PR Titles |
| 135 | + |
| 136 | +**All commit messages and PR titles must follow [conventional commit specification](https://www.conventionalcommits.org/):** |
| 137 | + |
| 138 | +- **Format**: `type(scope): description` |
| 139 | +- **Common types**: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `ci` |
| 140 | +- **Examples**: |
| 141 | + - `feat(core): add new descriptor type` |
| 142 | + - `fix(tf): resolve memory leak in training` |
| 143 | + - `docs: update installation guide` |
| 144 | + - `ci: add workflow for testing` |
| 145 | + |
| 146 | +### Training and Model Operations |
| 147 | + |
| 148 | +- **Training initialization**: 10-30 seconds |
| 149 | +- **Training per batch**: 0.1-1 second (CPU), much faster on GPU |
| 150 | +- **Model freezing**: 5-15 seconds |
| 151 | +- **Model testing**: 10-30 seconds |
| 152 | + |
| 153 | +## Backend-Specific Notes |
| 154 | + |
| 155 | +### TensorFlow Backend |
| 156 | + |
| 157 | +- **Default backend** when no flag specified |
| 158 | +- **Configuration**: Use `input.json` format |
| 159 | +- **Training**: `dp train input.json` |
| 160 | +- **Requirements**: `tensorflow` or `tensorflow-cpu` package |
| 161 | + |
| 162 | +### PyTorch Backend |
| 163 | + |
| 164 | +- **Activation**: Use `--pt` flag or `export DP_BACKEND=pytorch` |
| 165 | +- **Configuration**: Use `input_torch.json` format typically |
| 166 | +- **Training**: `dp --pt train input_torch.json` |
| 167 | +- **Requirements**: `torch` package |
| 168 | + |
| 169 | +### JAX Backend |
| 170 | + |
| 171 | +- **Activation**: Use `--jax` flag |
| 172 | +- **Training**: `dp --jax train input.json` |
| 173 | +- **Requirements**: `jax` and related packages |
| 174 | +- **Note**: Experimental backend, may have limitations |
| 175 | + |
| 176 | +### Paddle Backend |
| 177 | + |
| 178 | +- **Activation**: Use `--pd` flag |
| 179 | +- **Training**: `dp --pd train input.json` |
| 180 | +- **Requirements**: `paddlepaddle` package |
| 181 | +- **Note**: Less commonly used |
| 182 | + |
| 183 | +## Critical Warnings |
| 184 | + |
| 185 | +- **NEVER CANCEL BUILD OPERATIONS**: Python build takes 67 seconds, C++ build takes 164 seconds |
| 186 | +- **USE SINGLE TESTS FOR VALIDATION**: Run individual tests instead of full test suite for faster feedback |
| 187 | +- **ALWAYS activate virtual environment**: Build and runtime failures occur without proper environment |
| 188 | +- **ALWAYS install backend dependencies first**: TensorFlow/PyTorch required before building C++ components |
| 189 | +- **ALWAYS run linting before commits**: `ruff check . && ruff format .` or CI will fail |
| 190 | +- **ALWAYS test both Python and C++ components**: Some features require both to be built |
| 191 | +- **ALWAYS follow conventional commit format**: All commit messages and PR titles must use conventional commit specification (`type(scope): description`) |
0 commit comments