Reduce conformer bench preprocessing times via single embed + jitter by scal444 · Pull Request #181 · NVIDIA-Digital-Bio/nvMolKit

scal444 · 2026-05-27T12:34:42Z

Instead of running ETKDG on all conformers, embed one conformer, duplicate, jitter coordinates. Improves bottleneck for TFD, conformer RMSD, and ff minimization on the fly generation

Add bench_utils.perturb_conformer and bench_utils.embed_and_jitter as a shared helper for the three benches that need 'embed once then perturb' conformer prep: * tfd_bench, ff_optimize_bench: parallel embed_and_jitter across many mols. * conformer_rmsd_bench: per-mol EmbedMolecule + serial jitter loop in the existing single-mol benchmark structure (no parallel batch helper needed here since the outer loop processes one mol at a time).

greptile-apps · 2026-05-27T12:39:40Z

Greptile Summary

This PR replaces per-conformer ETKDG runs with a single-embed-then-jitter strategy across the TFD, conformer RMSD, and FF minimization benchmarks, significantly reducing preprocessing time. The shared embed_and_jitter helper in bench_utils/molprep.py parallelises the embedding step via process_map and then duplicates/perturbs the single base conformer in-process.

embed_and_jitter and perturb_conformer are extracted into bench_utils/molprep.py and exported; ff_optimize_bench.py removes its local _embed_conformers in favour of the shared helper.
tfd_bench.py switches from a per-molecule serial loop to the batch API, adding --prep-workers for parallelism control; conformer_rmsd_bench.py inlines the same single-embed + jitter pattern directly in its benchmark loop.

Confidence Score: 5/5

The changes are confined to benchmark preprocessing utilities and do not touch library code; the jitter logic is straightforward and the parallelism is well-guarded.

The refactor is clean: seed handling is consistent, the conformer copy-then-perturb order is correct (all copies are taken from the unperturbed base before the base itself is jittered), and no production library paths are affected.

No files require special attention.

Important Files Changed

Filename	Overview
benchmarks/bench_utils/molprep.py	Adds `perturb_conformer` (per-atom uniform jitter in place) and `embed_and_jitter` (single ETKDGv3 embed via process_map, then jitter-duplicate to desired count); logic is correct and seed usage is consistent
benchmarks/ff_optimize_bench.py	Removes local `_embed_conformers` and replaces it with shared `embed_and_jitter`; the call correctly passes `num_workers=args.rdkit_threads` and mols already carry explicit Hs from `prep_mols`, so `add_hs=False` default is appropriate
benchmarks/tfd_bench.py	Replaces per-molecule serial ETKDG with batch `embed_and_jitter`; `add_hs=True` and `min_atoms=4` are correctly threaded through; seed hardcoded to 42 in `prepare_molecules` is functionally equivalent to the old `42+i` scheme since `_embed_one` offsets by index internally
benchmarks/conformer_rmsd_bench.py	Switches from `EmbedMultipleConfs` to single-embed + per-conformer jitter, now guarding `num_confs < 2` explicitly; minor: embedding still happens before the `num_confs < 2` skip check, so a single-conformer run does one unnecessary embed
benchmarks/bench_utils/init.py	Exports `embed_and_jitter` and `perturb_conformer` from `molprep`; straightforward addition

_{Reviews (2): Last reviewed commit: "Update benchmarks/conformer_rmsd_bench.p..." | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/bench_utils/molprep.py

Comment thread benchmarks/conformer_rmsd_bench.py

Update benchmarks/conformer_rmsd_bench.py

6919aef

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

scal444 requested a review from evasnow1992 May 27, 2026 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce conformer bench preprocessing times via single embed + jitter#181

Reduce conformer bench preprocessing times via single embed + jitter#181
scal444 wants to merge 2 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:split/embed-perturb

scal444 commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scal444 commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 27, 2026 •

edited

Loading