Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@ tests/data_*.h5
tests/data_*/
tests/tmp.*
tests/.coverage

# local dev artifact
uv.lock
.venv/
134 changes: 134 additions & 0 deletions skills/dpdata-driver/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
name: dpdata-driver
description: Use dpdata Python Driver plugins to label systems (energies/forces/virials) via System.predict(), list available drivers, and build Driver objects (ase/deepmd/gaussian/sqm/hybrid). Use when working with dpdata Python API (not CLI) and you need driver-based energy/force prediction, plugin registration keys, or examples of using dpdata with ASE calculators or DeePMD models.
---

# dpdata-driver

Use dpdata “driver plugins” to **label** a `dpdata.System` (predict energies/forces/virials) and obtain a `dpdata.LabeledSystem`.

## Key idea

- A **Driver** converts an unlabeled `System` into a `LabeledSystem` by computing:
- `energies` (required)
- `forces` (optional but common)
- `virials` (optional)

In dpdata, this is exposed as:

- `System.predict(*args, driver="dp", **kwargs) -> LabeledSystem`

`driver` can be:

- a **string key** (plugin name), e.g. `"ase"`, `"dp"`, `"gaussian"`
- a **Driver object**, e.g. `Driver.get_driver("ase")(...)`

## List supported driver keys (runtime)

When unsure what drivers exist in *this* dpdata version/env, query them at runtime:

```python
from dpdata.driver import Driver

print(sorted(Driver.get_drivers().keys()))
```

In the current repo state, keys include:

- `ase`
- `dp` / `deepmd` / `deepmd-kit`
- `gaussian`
- `sqm`
- `hybrid`

(Exact set depends on dpdata version and installed extras.)

## Minimal workflow

```python
import dpdata
from dpdata.system import System

sys = System("input.xyz", fmt="xyz")
ls = sys.predict(driver="ase", calculator=...) # returns dpdata.LabeledSystem
```

### Verify you got a labeled system

```python
assert "energies" in ls.data
# optional:
# assert "forces" in ls.data
# assert "virials" in ls.data
```

## Example: use the ASE driver with an ASE calculator (runnable)

This is the easiest *fully runnable* example because it doesn’t require external QM software.

Dependencies (recommended): use `uv`:

```bash
uv run --with numpy --with ase python3 your_script.py
```

Script:

```python
import numpy as np
from ase.calculators.emt import EMT
from dpdata.system import System

# write a tiny molecule
open("tmp.xyz", "w").write("""2\n\nH 0 0 0\nH 0 0 0.74\n""")

sys = System("tmp.xyz", fmt="xyz")
ls = sys.predict(driver="ase", calculator=EMT())

print("energies", np.array(ls.data["energies"]))
print("forces shape", np.array(ls.data["forces"]).shape)
print("virials shape", np.array(ls.data["virials"]).shape)
```

## Example: pass a Driver object instead of a string

```python
from ase.calculators.emt import EMT
from dpdata.driver import Driver
from dpdata.system import System

sys = System("tmp.xyz", fmt="xyz")
ase_driver = Driver.get_driver("ase")(calculator=EMT())
ls = sys.predict(driver=ase_driver)
```

## Hybrid driver

Use `driver="hybrid"` to sum energies/forces/virials from multiple drivers.

The `HybridDriver` accepts `drivers=[ ... ]` where each item is either:

- a `Driver` instance
- a dict like `{"type": "sqm", ...}` (type is the driver key)

Example (structure only; may require external executables):

```python
from dpdata.driver import Driver

hyb = Driver.get_driver("hybrid")(
drivers=[
{"type": "sqm", "qm_theory": "DFTB3"},
{"type": "dp", "dp": "frozen_model.pb"},
]
)
# ls = sys.predict(driver=hyb)
```

## Notes / gotchas

- Many drivers require extra dependencies or external programs:
- `dp` requires `deepmd-kit` + a model file
- `gaussian` requires Gaussian and a valid executable (default `g16`)
- `sqm` requires AmberTools `sqm`
- If you just need file format conversion, use the existing **dpdata CLI** skill instead.
113 changes: 113 additions & 0 deletions skills/dpdata-plugin/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
name: dpdata-plugin
description: Create and install dpdata plugins (especially custom Format readers/writers) using Format.register(...) and pyproject.toml entry_points under 'dpdata.plugins'. Use when extending dpdata with new formats or distributing plugins as separate Python packages.
---

# dpdata-plugin

dpdata loads plugins in two ways:

1. **Built-in plugins** in `dpdata.plugins.*` (imported automatically)
1. **External plugins** exposed via Python package entry points: `dpdata.plugins`

This skill focuses on **external plugin packages**, the recommended way to add new formats without modifying dpdata itself.

## What can be extended?

Most commonly: add a new **Format** (file reader/writer) via:

```python
from dpdata.format import Format


@Format.register("myfmt")
class MyFormat(Format): ...
```

## How dpdata discovers plugins

dpdata imports `dpdata.plugins` during normal use (e.g. `dpdata.system` imports it). That module:

- imports every built-in module in `dpdata/plugins/*.py`
- then loads all **entry points** in group `dpdata.plugins`

So an external plugin package only needs to ensure that importing the entry-point target triggers the `@Format.register(...)` side effects.

## Minimal external plugin package (based on plugin_example/)

### 1) Create a new Python package

Example layout:

```
dpdata_random/
pyproject.toml
dpdata_random/
__init__.py
```
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

### 2) Implement and register your Format

In `dpdata_random/__init__.py` (shortened example):

```python
from __future__ import annotations

import numpy as np
from dpdata.format import Format


@Format.register("random")
class RandomFormat(Format):
def from_system(self, N, **kwargs):
return {
"atom_numbs": [20],
"atom_names": ["X"],
"atom_types": [0] * 20,
"cells": np.repeat(np.eye(3)[None, ...], N, axis=0) * 100.0,
"coords": np.random.rand(N, 20, 3) * 100.0,
"orig": np.zeros(3),
"nopbc": False,
}
```

Return dicts must match dpdata’s expected schema (cells/coords/atom_names/atom_types/...).

### 3) Expose an entry point

In `pyproject.toml`:

```toml
[project]
name = "dpdata_random"
version = "0.0.0"
dependencies = ["numpy", "dpdata"]

[project.entry-points.'dpdata.plugins']
random = "dpdata_random:RandomFormat"
```

Any importable target works; this pattern points directly at the class.

### 4) Install and test

In a clean env (recommended via `uv`):

```bash
uv run --with dpdata --with numpy python3 - <<'PY'
import dpdata
from dpdata.format import Format

# importing dpdata will load entry points (dpdata.plugins)
print('random' in Format.get_formats())
PY
```

If it prints `True`, your plugin was discovered.

## Debug checklist

- Did you install the plugin package into the same environment where you run dpdata?
- Does `pyproject.toml` contain `[project.entry-points.'dpdata.plugins']`?
- Does importing the entry point module/class execute the `@Format.register(...)` decorator?
- If using `uv run`, remember each command runs in its own environment unless you’re in a `uv` project (or you rely on `uv run --with ...`).
Loading