Reduce conformer bench preprocessing times via single embed + jitter#181
Reduce conformer bench preprocessing times via single embed + jitter#181scal444 wants to merge 2 commits into
Conversation
Add bench_utils.perturb_conformer and bench_utils.embed_and_jitter as a shared helper for the three benches that need 'embed once then perturb' conformer prep: * tfd_bench, ff_optimize_bench: parallel embed_and_jitter across many mols. * conformer_rmsd_bench: per-mol EmbedMolecule + serial jitter loop in the existing single-mol benchmark structure (no parallel batch helper needed here since the outer loop processes one mol at a time).
|
| Filename | Overview |
|---|---|
| benchmarks/bench_utils/molprep.py | Adds perturb_conformer (per-atom uniform jitter in place) and embed_and_jitter (single ETKDGv3 embed via process_map, then jitter-duplicate to desired count); logic is correct and seed usage is consistent |
| benchmarks/ff_optimize_bench.py | Removes local _embed_conformers and replaces it with shared embed_and_jitter; the call correctly passes num_workers=args.rdkit_threads and mols already carry explicit Hs from prep_mols, so add_hs=False default is appropriate |
| benchmarks/tfd_bench.py | Replaces per-molecule serial ETKDG with batch embed_and_jitter; add_hs=True and min_atoms=4 are correctly threaded through; seed hardcoded to 42 in prepare_molecules is functionally equivalent to the old 42+i scheme since _embed_one offsets by index internally |
| benchmarks/conformer_rmsd_bench.py | Switches from EmbedMultipleConfs to single-embed + per-conformer jitter, now guarding num_confs < 2 explicitly; minor: embedding still happens before the num_confs < 2 skip check, so a single-conformer run does one unnecessary embed |
| benchmarks/bench_utils/init.py | Exports embed_and_jitter and perturb_conformer from molprep; straightforward addition |
Reviews (2): Last reviewed commit: "Update benchmarks/conformer_rmsd_bench.p..." | Re-trigger Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Instead of running ETKDG on all conformers, embed one conformer, duplicate, jitter coordinates. Improves bottleneck for TFD, conformer RMSD, and ff minimization on the fly generation