You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/explanation/molecule-data-model.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,7 +133,7 @@ Moleculekit uses a single distance unit throughout: **Ångström (Å)**. This ap
133
133
-`mol.coords` — atomic positions.
134
134
-`mol.box` — periodic-box lengths.
135
135
- All readers and writers (regardless of the source format's native units — GROMACS' `.gro` / `.xtc` use nanometres on disk; moleculekit converts to Å on load and converts back on write).
136
-
- All distance parameters in the library (`coldist`, `spatialgap`, `find_clashes` thresholds, `within X of` selections, etc.).
136
+
- All distance parameters in the library (`coldist`, `autoSegment`'s `protein_cutoff`, `find_clashes` thresholds, `within X of` selections, etc.).
137
137
138
138
Angles — `mol.boxangles`, dihedrals returned by {py:meth}`~moleculekit.molecule.Molecule.getDihedral`, and rotation angles passed to {py:meth}`~moleculekit.molecule.Molecule.setDihedral` — are in **radians** for the function APIs, except `mol.boxangles` which is in degrees (matching the PDB convention).
Copy file name to clipboardExpand all lines: doc/source/howto/assign-segments-and-chains.md
+20-5Lines changed: 20 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Goal
4
4
5
-
Derive `segid` and/or `chain` fields for a structure that lacks them, using gap detection to split continuous segments automatically.
5
+
Derive `segid` and/or `chain` fields for a structure that lacks them, splitting the system into segments by following each polymer's physical backbone.
6
6
7
7
## Minimal example
8
8
@@ -15,29 +15,44 @@ mol = autoSegment(mol)
15
15
print(set(mol.segid))
16
16
```
17
17
18
+
## How segments are decided
19
+
20
+
A new segment starts between two consecutive residues when any of these holds: the backbone link distance exceeds the cutoff (protein `C(i)–N(i+1)`, nucleic `O3'(i)–P(i+1)`), the `chain` or `segid` already in the file changes, or the polymer type changes. Water collapses into one segment, ions into another, and the remaining ("other") molecules are split one segment per bonded molecule. Because continuity is read from coordinates, a gap in residue *numbering* with an intact backbone stays one segment, while a real spatial break is split.
21
+
18
22
## Parameters that matter
19
23
20
24
| Parameter | Type | Default | What it does |
21
25
|---|---|---|---|
22
26
|`mol`| {py:class}`~moleculekit.molecule.Molecule`| required | Input molecule (a copy is returned; original is unchanged) |
23
-
|`sel`|`str`|`"all"`| Restrict gap detection to this atom selection |
27
+
|`sel`|`str`|`"all"`| Restrict segmentation to this atom selection; atoms outside keep their existing `chain`/`segid`|
24
28
|`basename`|`str`|`"P"`| Prefix for generated segment names, e.g. `"P"` → `"P0"`, `"P1"`, … |
25
-
|`spatial`|`bool`|`True`| Treat a residue-numbering gap as a real gap only if Cα distance > `spatialgap` Å |
26
-
|`spatialgap`|`float`|`4.0`| Distance threshold in Å for spatial gap detection |
29
+
|`fields`|`tuple`|`("segid",)`| Which field(s) to write: any combination of `"segid"` and `"chain"`|
30
+
|`protein_cutoff`|`float`|`2.0`| Max `C(i)–N(i+1)` distance (Å) for two protein residues to be continuous |
31
+
|`nucleic_cutoff`|`float`|`2.2`| Max `O3'(i)–P(i+1)` distance (Å) for two nucleic residues to be continuous |
32
+
|`ca_fallback_cutoff`|`float`|`5.0`| Max `CA–CA` distance (Å) used when a protein residue lacks `C`/`N`|
33
+
|`nucleic_fallback_cutoff`|`float`|`3.2`| Max `C3'–P` distance (Å) used when a nucleic residue lacks `O3'`|
34
+
|`single_other_segment`|`bool`|`False`| Put all non-polymer, non-water, non-ion molecules into one segment instead of one per molecule |
27
35
28
36
## Common variations
29
37
30
38
```python
31
39
# Assign segments to protein chains only
32
40
mol = autoSegment(mol, sel="protein")
41
+
42
+
# Write both chain and segid in one call
43
+
mol = autoSegment(mol, fields=("chain", "segid"))
44
+
45
+
# Lump every ligand/cofactor into a single "other" segment
46
+
mol = autoSegment(mol, single_other_segment=True)
33
47
```
34
48
35
49
## Gotchas
36
50
37
51
- {py:func}`~moleculekit.tools.autosegment.autoSegment` returns a new {py:class}`~moleculekit.molecule.Molecule`; it does not mutate the input.
52
+
- Only coordinates and atom names are needed — explicit bonds are not required (they are guessed only for the "other" bucket).
38
53
-`segid` can be up to 4 characters (MD force-field convention); `chain` is a single character (PDB convention).
39
-
- Auto-assignment is topology-driven and can fail on structures with non-contiguous or missing residue numbers — inspect the result before use.
40
54
- When writing to PDB, only the `chain` field is stored in the standard CHAIN column; `segid` goes into the SEGID column, which many programs ignore.
55
+
-`autoSegment2` is deprecated and forwards to {py:func}`~moleculekit.tools.autosegment.autoSegment` with a `DeprecationWarning`; use `autoSegment` directly.
{py:func}`~moleculekit.tools.autosegment.autoSegment` detects that the backbone distance between GLY 140 and MET 154 (the flanking residues of the gap) exceeds the default 4 Å spatial threshold, and so it creates two independent segments: `P0` on chain A (residues 55–140) and `P1` on chain B (residues 154–209). Both the `chain` and `segid` fields are now consistent, which avoids warnings during {py:func}`~moleculekit.tools.preparation.systemPrepare`.
47
+
{py:func}`~moleculekit.tools.autosegment.autoSegment` detects that the backbone is broken between GLY 140 and MET 154 (the flanking residues of the gap) — their `C–N` distance far exceeds the peptide-bond cutoff (`protein_cutoff`, 2 Å by default) — and so it creates two independent segments: `P0` on chain A (residues 55–140) and `P1` on chain B (residues 154–209). Both the `chain` and `segid` fields are now consistent, which avoids warnings during {py:func}`~moleculekit.tools.preparation.systemPrepare`.
48
48
49
49
## Step 2 — Mutate a residue with the "best" rotamer
50
50
@@ -113,7 +113,7 @@ The full pipeline — segment, mutate, prepare — is now complete.
113
113
114
114
## Recap
115
115
116
-
- {py:func}`~moleculekit.tools.autosegment.autoSegment` detects backbone discontinuities and assigns a unique segid (and optionally chain letter) per topologically connected fragment; use `fields=("chain", "segid")` to keep both fields consistent.
116
+
- {py:func}`~moleculekit.tools.autosegment.autoSegment` detects backbone discontinuities from atomic coordinates and assigns a unique segid (and optionally chain letter) per backbone-continuous segment; use `fields=("chain", "segid")` to keep both fields consistent.
117
117
- {py:meth}`~moleculekit.molecule.Molecule.mutateResidue` with `sel` and `newres` swaps a residue's sidechain using Dunbrack rotamer selection: `rotamer_mode="best"` minimises VdW clashes against neighbours, `rotamer_mode="random"` samples by probability for speed. Add `minimize=True` to relax residual strain with OpenMM.
118
118
- {py:func}`~moleculekit.tools.modelling.model_gaps` fills missing residues by sequence using the ProMod3 loop-modelling engine — but it requires the ProMod3 Singularity image; there is no fallback.
0 commit comments