fix(structure): reject atom-to-self bonds in BondList and chem_comp_bond writer#884
fix(structure): reject atom-to-self bonds in BondList and chem_comp_bond writer#884haoyu-haoyu wants to merge 2 commits intobiotite-dev:mainfrom
Conversation
…ond writer Fixes biotite-dev#812. A `BondList` should encode chemistry, so a bond whose two atom indices refer to the same atom is meaningless and should be rejected at the construction boundary rather than being stored. - `BondList.add_bond` raises `ValueError` when the resolved positive indices are equal, including the case where one index is given positive and the other negative (e.g. `4, -1` with `atom_count=5` both resolve to atom `4`). - `BondList.__init__` performs the same check on the bulk-add path after the per-row sort, so building a `BondList` from a numpy array that contains a self-bond row now raises rather than silently storing it. The companion concern from biotite-dev#812 — that `pdbx.set_structure()` could emit a self-bond into `chem_comp_bond` when two distinct atoms in an `AtomArray` share the same `(res_name, atom_name)` annotations — is also addressed: `_set_intra_residue_bonds` now filters bond rows where the two `atom_name` annotations are equal before deduplication, and returns `None` if no rows remain. Tests cover both `BondList` construction and `add_bond`, including the mixed positive/negative index case. Assisted-by: Claude Code
padix-key
left a comment
There was a problem hiding this comment.
Hi, thanks for fixing the bug! Regarding the fix for bonds.pyx, this is already fixed in the BondList rewrite in the rust branch (#871), which I plan to merge for the next Biotite release. If the Rust port is not included in the next release I probably would give #829 priority here, as it was created earlier.
| # Two distinct atom indices can share the same (res_name, atom_name) | ||
| # annotations — that bond would surface in chem_comp_bond as a | ||
| # self-bond on the chemical component, which is meaningless. | ||
| not_self_bond = atom_id_1 != atom_id_2 | ||
| atom_id_1 = atom_id_1[not_self_bond] | ||
| atom_id_2 = atom_id_2[not_self_bond] | ||
| comp_id = comp_id[not_self_bond] | ||
| bond_array = bond_array[not_self_bond] | ||
| value_order = value_order[not_self_bond] | ||
| aromatic_flag = aromatic_flag[not_self_bond] | ||
| any_mask = any_mask[not_self_bond] | ||
| if len(bond_array) == 0: | ||
| return None |
There was a problem hiding this comment.
If we have a self-bond I think it would be better to raise a BadStructureError.
|
Thanks @padix-key — agreed, Plan for the follow-up push:
Will push and ping when ready. |
Per review on biotite-dev#884: the silent `not_self_bond` filter in `_set_intra_residue_bonds()` was losing data when two distinct atom indices shared the same `(res_name, atom_name)` annotations. That input is structurally invalid (chem_comp_bond would emit a self-bond on the chemical component), so raise `BadStructureError` instead, matching the existing precondition checks in the same function. Also drop the redundant post-filter empty guard; the earlier `len(bond_array) == 0` check on the unfiltered array still handles the genuine "no intra bonds" path. Test: new `test_set_structure_self_bond_raises` builds a 2-atom AtomArray with two distinct atoms sharing `(ALA, CA)` plus a bond between them, and asserts `set_structure()` raises.
|
Pushed
Ready for another look @padix-key. |
Fixes #812.
A `BondList` should encode chemistry, so a bond whose two atom indices refer to the same atom is meaningless. Today both the constructor and `add_bond()` accept self-bonds silently.
Changes
Tests