Skip to content

Reduce conformer bench preprocessing times via single embed + jitter#181

Open
scal444 wants to merge 2 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:split/embed-perturb
Open

Reduce conformer bench preprocessing times via single embed + jitter#181
scal444 wants to merge 2 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:split/embed-perturb

Conversation

@scal444
Copy link
Copy Markdown
Collaborator

@scal444 scal444 commented May 27, 2026

Instead of running ETKDG on all conformers, embed one conformer, duplicate, jitter coordinates. Improves bottleneck for TFD, conformer RMSD, and ff minimization on the fly generation

Add bench_utils.perturb_conformer and bench_utils.embed_and_jitter as a
shared helper for the three benches that need 'embed once then perturb'
conformer prep:

* tfd_bench, ff_optimize_bench: parallel embed_and_jitter across many mols.
* conformer_rmsd_bench: per-mol EmbedMolecule + serial jitter loop in the
  existing single-mol benchmark structure (no parallel batch helper needed
  here since the outer loop processes one mol at a time).
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR replaces per-conformer ETKDG runs with a single-embed-then-jitter strategy across the TFD, conformer RMSD, and FF minimization benchmarks, significantly reducing preprocessing time. The shared embed_and_jitter helper in bench_utils/molprep.py parallelises the embedding step via process_map and then duplicates/perturbs the single base conformer in-process.

  • embed_and_jitter and perturb_conformer are extracted into bench_utils/molprep.py and exported; ff_optimize_bench.py removes its local _embed_conformers in favour of the shared helper.
  • tfd_bench.py switches from a per-molecule serial loop to the batch API, adding --prep-workers for parallelism control; conformer_rmsd_bench.py inlines the same single-embed + jitter pattern directly in its benchmark loop.

Confidence Score: 5/5

The changes are confined to benchmark preprocessing utilities and do not touch library code; the jitter logic is straightforward and the parallelism is well-guarded.

The refactor is clean: seed handling is consistent, the conformer copy-then-perturb order is correct (all copies are taken from the unperturbed base before the base itself is jittered), and no production library paths are affected.

No files require special attention.

Important Files Changed

Filename Overview
benchmarks/bench_utils/molprep.py Adds perturb_conformer (per-atom uniform jitter in place) and embed_and_jitter (single ETKDGv3 embed via process_map, then jitter-duplicate to desired count); logic is correct and seed usage is consistent
benchmarks/ff_optimize_bench.py Removes local _embed_conformers and replaces it with shared embed_and_jitter; the call correctly passes num_workers=args.rdkit_threads and mols already carry explicit Hs from prep_mols, so add_hs=False default is appropriate
benchmarks/tfd_bench.py Replaces per-molecule serial ETKDG with batch embed_and_jitter; add_hs=True and min_atoms=4 are correctly threaded through; seed hardcoded to 42 in prepare_molecules is functionally equivalent to the old 42+i scheme since _embed_one offsets by index internally
benchmarks/conformer_rmsd_bench.py Switches from EmbedMultipleConfs to single-embed + per-conformer jitter, now guarding num_confs < 2 explicitly; minor: embedding still happens before the num_confs < 2 skip check, so a single-conformer run does one unnecessary embed
benchmarks/bench_utils/init.py Exports embed_and_jitter and perturb_conformer from molprep; straightforward addition

Reviews (2): Last reviewed commit: "Update benchmarks/conformer_rmsd_bench.p..." | Re-trigger Greptile

Comment thread benchmarks/bench_utils/molprep.py
Comment thread benchmarks/conformer_rmsd_bench.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@scal444 scal444 requested a review from evasnow1992 May 27, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant