Skip to content

Commit 52eabc2

Browse files
committed
added method for templating residues from a Molecule object
1 parent bf0a851 commit 52eabc2

10 files changed

Lines changed: 518 additions & 196 deletions

File tree

AGENTS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,12 @@ mask = mol.resname == "NAG"
116116
mol.templateResidueFromSmiles(mask, smiles="CC(=O)NC1C(O)C(O)C(CO)OC1O", addHs=True)
117117
```
118118

119+
To template from a reference structure (e.g. an RCSB chemical-component CIF)
120+
instead of a SMILES string, use `templateResidueFromMolecule(mask, refmol,
121+
addHs=True)`. It matches the residue to the reference by atom name (same
122+
heavy-atom names required) and copies the reference's bond orders and formal
123+
charges verbatim, so those must already be correct in `refmol`.
124+
119125
---
120126

121127
### Compute a per-frame projection

doc/source/explanation/molecule-data-model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ of length `mol.numAtoms`. The most commonly used fields are:
2626
| `charge` | `float32` | Partial charge (populated by force-field tools) |
2727
| `masses` | `float32` | Atomic mass in Da |
2828
| `record` | `object` | PDB record type: `"ATOM"` or `"HETATM"` |
29-
| `formalcharge` | `int32` | Integer formal charge (populated by {py:func}`~moleculekit.tools.preparation.systemPrepare` and {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles`) |
29+
| `formalcharge` | `int32` | Integer formal charge (populated by {py:func}`~moleculekit.tools.preparation.systemPrepare`, {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles`, and {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromMolecule`) |
3030
| `atomtype` | `object` | Force-field atom type (populated downstream) |
3131

3232
Because these arrays are NumPy arrays you can inspect, compare, and manipulate

doc/source/explanation/system-preparation-pipeline.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,19 @@ mol_out, specs = systemPrepare(mol)
201201
{py:func}`~moleculekit.tools.preparation.systemPrepare` calls {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues` internally, so it will pick
202202
up the ligand spec automatically.
203203

204+
If you already have the residue as a reference structure with correct
205+
connectivity (e.g. an RCSB chemical-component CIF), use
206+
{py:meth}`~moleculekit.molecule.Molecule.templateResidueFromMolecule` instead of
207+
a SMILES string. It matches the residue to the reference by atom name (the two
208+
must share the same heavy-atom names) and copies the reference's bond orders and
209+
formal charges verbatim, so those must already be correct in the reference:
210+
211+
```python
212+
ref = Molecule("LIG.cif") # correct bonds, bond orders, and formal charges
213+
mol.templateResidueFromMolecule("resname LIG", ref, addHs=True)
214+
mol_out, specs = systemPrepare(mol)
215+
```
216+
204217
### mutateResidue before systemPrepare
205218

206219
When you want to swap canonical residues (e.g. a point mutation), call

doc/source/how-to/convert-to-rdkit-and-openff.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ rdmol = mol.toRDKitMol(sanitize=False) # sanitize=False for non-canonical resi
9595

9696
## Gotchas
9797

98-
- Conversion needs explicit bonds. If `mol.bonds` is empty (plain PDB load), run {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` for non-canonical residues so the conversion gets correct bond orders, or pass `guessBonds=True` for a distance-based fallback.
98+
- Conversion needs explicit bonds. If `mol.bonds` is empty (plain PDB load), run {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` (or {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromMolecule`, when you have a reference structure such as a CIF with correct bond orders and formal charges) for non-canonical residues so the conversion gets correct bond orders, or pass `guessBonds=True` for a distance-based fallback.
9999
- `sanitize=True` will raise on residues whose bonding or formal charges are inconsistent with chemical rules. For a freshly read PDB this often happens around metal centers and covalent ligands; template those residues first (see [Custom residues from SMILES](../tutorials/system-prep/03-custom-residues-from-smiles.md)).
100100
- Hydrogens must be present for stereochemistry / valence checks. Run {py:func}`~moleculekit.tools.preparation.systemPrepare` (or set explicit Hs via `templateResidueFromSmiles(..., addHs=True)`) before conversion.
101101
- `toOpenFFMolecule` requires `openff-toolkit` and `openff-units` installed.

doc/source/how-to/guess-bonds.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ assert mol.bonds.shape[0] > 0, "No bonds guessed — check coordinates and eleme
2929

3030
## Gotchas
3131

32-
- `mol.guessBonds()` does not infer bond orders; every entry in `mol.bondtype` is `"un"` (unknown). When bond orders matter (e.g. for {py:meth}`~moleculekit.molecule.Molecule.toRDKitMol` conversion or SMILES output), template the relevant residues from SMILES with {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` instead.
32+
- `mol.guessBonds()` does not infer bond orders; every entry in `mol.bondtype` is `"un"` (unknown). When bond orders matter (e.g. for {py:meth}`~moleculekit.molecule.Molecule.toRDKitMol` conversion or SMILES output), template the relevant residues from SMILES with {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` instead, or from a reference {py:class}`~moleculekit.molecule.Molecule` (e.g. a CIF carrying correct bond orders and formal charges) with {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromMolecule`.
3333
- Very close non-bonded atoms (e.g. crystal contacts, stacking interactions) can be incorrectly identified as bonded.
3434
- **Do not assign `mol.bonds = guess_bonds(mol)` directly.** Always use `mol.guessBonds()`, which updates `mol.bonds` and `mol.bondtype` together. See [The Molecule data model: Bonds](../explanation/molecule-data-model.md#bonds-bonds-and-bondtype) for why the two arrays must stay in lockstep.
3535

doc/source/llms.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
- [Atom selection](tutorials/02-atom-selection.md): VMD syntax and mask-based fast path.
99
- [System prep: basic protonation](tutorials/system-prep/01-basic-protonation.md): `systemPrepare` at chosen pH; the default 2-tuple `(mol_out, detect_specs)` return (3-tuple `(mol_out, detect_specs, details)` with `return_details=True`).
1010
- [System prep: non-standard residues](tutorials/system-prep/02-non-standard-residues.md): `detectNonStandardResidues` + `systemPrepare(detect_specs=...)`.
11-
- [System prep: custom residues from SMILES](tutorials/system-prep/03-custom-residues-from-smiles.md): `templateResidueFromSmiles` with `addHs=True`.
11+
- [System prep: custom residues from SMILES](tutorials/system-prep/03-custom-residues-from-smiles.md): `templateResidueFromSmiles` with `addHs=True`; `templateResidueFromMolecule` to template from a reference Molecule (e.g. a CIF) carrying correct bond orders and formal charges.
1212
- [System prep: mutation, gap closing, segmentation](tutorials/system-prep/04-mutation-gap-closing-segmentation.md): `mutateResidue`, `model_gaps`, `autoSegment`.
1313

1414
## How-to guides

doc/source/tutorials/system-prep/03-custom-residues-from-smiles.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,17 @@ show3d(mol, representations=[{"sel": "resname HRG ALC OIC NLE '200'", "type": "b
9696

9797
The per-resname atom counts after templating — each NCAA now carries heavy atoms + the hydrogens the SMILES specified.
9898

99+
### Alternative: template from a reference Molecule
100+
101+
If you already have the residue as a small reference structure that carries correct connectivity (for example an RCSB chemical-component CIF for the ligand), you can template from that {py:class}`~moleculekit.molecule.Molecule` instead of writing a SMILES string, using {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromMolecule`:
102+
103+
```python
104+
ref = Molecule("HRG.cif") # reference carrying correct bonds, bond orders, and formal charges
105+
mol.templateResidueFromMolecule("resname HRG", ref, addHs=True)
106+
```
107+
108+
The reference is matched to the residue by **atom name** (not by MCS, as with SMILES), so the two must share the same set of heavy-atom names and the reference's heavy-atom names must be unique. The bond orders and formal charges are copied straight from the reference: they must therefore already be correct in the template Molecule, because `templateResidueFromMolecule` transfers them verbatim and does not re-derive them. Everything else (`addHs`, `guessBonds`, automatic cross-residue bond handling, and per-copy templating) works exactly as in the SMILES variant.
109+
99110
## Step 4 — Run systemPrepare
100111

101112
```{code-cell} python
@@ -116,6 +127,7 @@ The five NCAAs all survive the preparation pipeline with their full heavy-atom t
116127
- It removes any hydrogens already on the matched residue and re-adds them from the SMILES (`addHs=True`), so the hydrogen pattern is deterministic — no manual pre-stripping needed.
117128
- One SMILES per residue type; the templater handles every copy automatically and trims terminal atoms (OXT, terminal NH) for mid-chain residues.
118129
- Cross-residue covalent bonds — peptide bonds, glycosidic bonds, isopeptide bonds — are detected automatically; the boundary atom's H count is corrected so it is not over-protonated.
130+
- {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromMolecule` is the same operation with a reference {py:class}`~moleculekit.molecule.Molecule` (e.g. a CIF) as the template instead of a SMILES string. It matches by atom name rather than MCS and copies the reference's bond orders and formal charges verbatim, so those must already be correct in the reference.
119131

120132
## Next
121133

moleculekit/molecule.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3103,6 +3103,62 @@ def templateResidueFromSmiles(
31033103
_logger=_logger,
31043104
)
31053105

3106+
def templateResidueFromMolecule(
3107+
self,
3108+
sel: "str | np.ndarray",
3109+
refmol: "Molecule",
3110+
addHs: bool = False,
3111+
onlyOnAtoms: "str | np.ndarray | None" = None,
3112+
guessBonds: bool = False,
3113+
_logger=True,
3114+
):
3115+
"""Assign bonds, bond orders, formal charges and (optionally) hydrogens
3116+
to a residue from a reference Molecule template, matched by atom name.
3117+
3118+
Like :meth:`templateResidueFromSmiles`, but the template is a reference
3119+
Molecule (e.g. loaded from a CIF) that already carries bonds, bond
3120+
orders and formal charges. The selected residue's heavy atoms are
3121+
mapped onto the reference by NAME and the reference's bond orders and
3122+
formal charges are transferred verbatim, so they must already be
3123+
correct in the reference (this function does not re-derive them). The
3124+
reference is used only as a template and is never appended. The
3125+
molecule is mutated in place. The residue and reference must share the
3126+
same set of heavy-atom names.
3127+
3128+
Parameters
3129+
----------
3130+
sel : str or np.ndarray
3131+
An atom selection string, a boolean mask, or an integer index array
3132+
of the residue(s) to template. May span multiple copies.
3133+
refmol : Molecule
3134+
The reference template. Its bonds, bond orders and formal charges
3135+
are copied onto the residue verbatim, so they must already be
3136+
correct; heavy-atom names must be unique and must match the
3137+
residue's.
3138+
addHs : bool
3139+
If True, add hydrogens after bond orders are transferred.
3140+
onlyOnAtoms : str or np.ndarray
3141+
Restrict which heavy atoms get hydrogens (only used with addHs).
3142+
guessBonds : bool
3143+
If True, distance-guess the residue's bonds before templating.
3144+
3145+
Examples
3146+
--------
3147+
>>> mol = Molecule("complex.pdb") # doctest: +SKIP
3148+
>>> mol.templateResidueFromMolecule("resname LIG", Molecule("LIG.cif"), addHs=True, guessBonds=True)
3149+
"""
3150+
from moleculekit.rdkittools import template_residue_from_molecule
3151+
3152+
template_residue_from_molecule(
3153+
mol=self,
3154+
sel=sel,
3155+
refmol=refmol,
3156+
addHs=addHs,
3157+
onlyOnAtoms=onlyOnAtoms,
3158+
guessBonds=guessBonds,
3159+
_logger=_logger,
3160+
)
3161+
31063162
def extendResidueFromSmiles(
31073163
self,
31083164
sel: str | np.ndarray,

0 commit comments

Comments
 (0)