Skip to content

fix(structure): reject atom-to-self bonds in BondList and chem_comp_bond writer#884

Open
haoyu-haoyu wants to merge 2 commits intobiotite-dev:mainfrom
haoyu-haoyu:fix/bondlist-no-self-bond
Open

fix(structure): reject atom-to-self bonds in BondList and chem_comp_bond writer#884
haoyu-haoyu wants to merge 2 commits intobiotite-dev:mainfrom
haoyu-haoyu:fix/bondlist-no-self-bond

Conversation

@haoyu-haoyu
Copy link
Copy Markdown

Fixes #812.

A `BondList` should encode chemistry, so a bond whose two atom indices refer to the same atom is meaningless. Today both the constructor and `add_bond()` accept self-bonds silently.

Changes

  • `BondList.add_bond` raises `ValueError` when the resolved positive indices are equal, including the case where one index is given positive and the other negative (e.g. `4, -1` with `atom_count=5` — both resolve to atom `4`).
  • `BondList.init` performs the same check on the bulk-add path after the per-row sort, so building a `BondList` from a numpy array that contains a self-bond row raises instead of silently storing it.
  • The companion concern from Bond can be created of an atom to itself #812 — that `pdbx.set_structure()` could emit a self-bond into `chem_comp_bond` when two distinct atoms in an `AtomArray` share the same `(res_name, atom_name)` annotations — is also addressed: `_set_intra_residue_bonds()` now filters bond rows where the two `atom_name` annotations are equal before deduplication, and returns `None` if no rows remain.

Tests

  • `test_invalid_creation` extended with two self-bond cases (positive index pair, mixed positive/negative).
  • New `test_add_self_bond_rejected` covers `add_bond()` for both index forms and verifies the list is unchanged after each rejected add.

…ond writer

Fixes biotite-dev#812.

A `BondList` should encode chemistry, so a bond whose two atom indices
refer to the same atom is meaningless and should be rejected at the
construction boundary rather than being stored.

- `BondList.add_bond` raises `ValueError` when the resolved positive
  indices are equal, including the case where one index is given
  positive and the other negative (e.g. `4, -1` with
  `atom_count=5` both resolve to atom `4`).
- `BondList.__init__` performs the same check on the bulk-add path
  after the per-row sort, so building a `BondList` from a numpy array
  that contains a self-bond row now raises rather than silently
  storing it.

The companion concern from biotite-dev#812 — that `pdbx.set_structure()` could
emit a self-bond into `chem_comp_bond` when two distinct atoms in an
`AtomArray` share the same `(res_name, atom_name)` annotations — is
also addressed: `_set_intra_residue_bonds` now filters bond rows where
the two `atom_name` annotations are equal before deduplication, and
returns `None` if no rows remain.

Tests cover both `BondList` construction and `add_bond`, including the
mixed positive/negative index case.

Assisted-by: Claude Code
Copy link
Copy Markdown
Member

@padix-key padix-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for fixing the bug! Regarding the fix for bonds.pyx, this is already fixed in the BondList rewrite in the rust branch (#871), which I plan to merge for the next Biotite release. If the Rust port is not included in the next release I probably would give #829 priority here, as it was created earlier.

Comment on lines +1182 to +1194
# Two distinct atom indices can share the same (res_name, atom_name)
# annotations — that bond would surface in chem_comp_bond as a
# self-bond on the chemical component, which is meaningless.
not_self_bond = atom_id_1 != atom_id_2
atom_id_1 = atom_id_1[not_self_bond]
atom_id_2 = atom_id_2[not_self_bond]
comp_id = comp_id[not_self_bond]
bond_array = bond_array[not_self_bond]
value_order = value_order[not_self_bond]
aromatic_flag = aromatic_flag[not_self_bond]
any_mask = any_mask[not_self_bond]
if len(bond_array) == 0:
return None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a self-bond I think it would be better to raise a BadStructureError.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 20, 2026

Merging this PR will not alter performance

✅ 98 untouched benchmarks


Comparing haoyu-haoyu:fix/bondlist-no-self-bond (95eb8e4) with main (2e5ef85)

Open in CodSpeed

@haoyu-haoyu
Copy link
Copy Markdown
Author

Thanks @padix-key — agreed, BadStructureError is the right signal here. The function already raises BadStructureError for the adjacent "empty res_name" / "empty atom_name" preconditions, so raising one more for ambiguous annotations keeps the behavior consistent.

Plan for the follow-up push:

  • Replace the silent not_self_bond filter in _set_intra_residue_bonds with an up-front check that raises BadStructureError("Structure contains bonded atoms sharing the same (res_name, atom_name) annotation, which cannot be written to chem_comp_bond without producing a self-bond").
  • Drop the now-redundant if len(bond_array) == 0: return None guard that existed only to catch the post-filter empty case — the earlier len(bond_array) == 0 check on the unfiltered bond_array still handles the genuine "no intra bonds" path.
  • Add a focused test in tests/structure/io/pdbx/ that builds an AtomArray with two distinct atoms sharing (res_name, atom_name) plus a bond between them, and asserts set_structure() raises BadStructureError.

Will push and ping when ready.

Per review on biotite-dev#884: the silent `not_self_bond` filter in
`_set_intra_residue_bonds()` was losing data when two distinct atom
indices shared the same `(res_name, atom_name)` annotations. That
input is structurally invalid (chem_comp_bond would emit a self-bond
on the chemical component), so raise `BadStructureError` instead,
matching the existing precondition checks in the same function.

Also drop the redundant post-filter empty guard; the earlier
`len(bond_array) == 0` check on the unfiltered array still handles
the genuine "no intra bonds" path.

Test: new `test_set_structure_self_bond_raises` builds a 2-atom
AtomArray with two distinct atoms sharing `(ALA, CA)` plus a bond
between them, and asserts `set_structure()` raises.
@haoyu-haoyu
Copy link
Copy Markdown
Author

Pushed aeea708:

  • _set_intra_residue_bonds() now raises BadStructureError up-front when any bond row has atom_id_1 == atom_id_2 after annotation lookup, instead of filtering the row out. Error message: "Structure contains bonded atoms sharing the same (res_name, atom_name) annotations, which cannot be written to chem_comp_bond without producing a self-bond." — mirrors the style of the existing empty-name preconditions.
  • Dropped the post-filter if len(bond_array) == 0: return None guard (the pre-filter guard at the top of the function still covers the genuine "no intra bonds" path).
  • Added tests/structure/io/test_pdbx.py::test_set_structure_self_bond_raises — 2-atom AtomArray with both atoms annotated (ALA, CA) plus a bond between them, asserting pdbx.set_structure() raises.

Ready for another look @padix-key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bond can be created of an atom to itself

2 participants