Skip to content

Add early-exit option for RDKit benchmarks that operates on a specific time budget#180

Open
scal444 wants to merge 4 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:split/throughput
Open

Add early-exit option for RDKit benchmarks that operates on a specific time budget#180
scal444 wants to merge 4 commits into
NVIDIA-Digital-Bio:mainfrom
scal444:split/throughput

Conversation

@scal444
Copy link
Copy Markdown
Collaborator

@scal444 scal444 commented May 26, 2026

Calculates throughput after running for up to X amount of time.

Refactor of substruct rdkit code to reduce churn, but it's still unfortunately verbose due to indentation issues.

scal444 added 3 commits May 26, 2026 15:20
- Add --rdkit_max_seconds to etkdg/ff_optimize/substruct benches. When set,
  the RDKit comparison stops once wall-clock elapsed exceeds the cap; the
  reported timing is over molecules/pairs actually processed.
- Replace 'speedup = baseline_ms / avg_ms' with throughput-based ratios
  (mols * confs / elapsed) so methods that processed different amounts of
  work are comparable.
- CSV gains mols_processed/pairs_processed, rdkit_max_seconds,
  confs_per_second/pairs_per_second, vs_rdkit_throughput_ratio columns.
- substruct: --rdkit_match_mode and --rdkit_threads become nargs='+'; main
  runs the cartesian product and picks the best-throughput variant as the
  validation/throughput baseline.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR adds a --rdkit_max_seconds early-exit option to all three RDKit benchmarks (ETKDG, FF-optimize, substruct), backed by a new Deadline class and time_it_bounded helper in bench_utils/timing.py. It also refactors substruct_bench to support sweeping multiple --rdkit_match_mode and --rdkit_threads values in a single run, and adds unit tests for the new timing primitives.

  • time_it_bounded drives early exit in substruct_bench; etkdg_bench and ff_optimize_bench instead embed a per-run Deadline inside their run() closure and still call the existing time_it, so total benchmark time with max_seconds set is runs × max_seconds rather than approximately max_seconds.
  • CSV output now includes mols_processed, rdkit_max_seconds, throughput columns, and a vs_rdkit_throughput_ratio; rdkit_max_seconds is correctly gated to rdkit-only rows in all three scripts.

Confidence Score: 4/5

Safe to merge with one bug fix: time_it_bounded can return an inconsistent (avg_ms, last_progress) pair when complete runs are followed by a partial run.

time_it_bounded returns last_progress from a partial run paired with avg_ms computed from the prior complete runs. Any throughput calculation that divides last_progress by avg_ms will underestimate actual throughput, and the RDKit hit max_seconds budget diagnostic will fire spuriously even though complete runs occurred. The fix is a one-line change on the return statement.

benchmarks/bench_utils/timing.py — the time_it_bounded mixed complete+partial return path

Important Files Changed

Filename Overview
benchmarks/bench_utils/timing.py Adds Deadline, throughput_per_s, time_it_bounded, and add_rdkit_max_seconds_arg. time_it_bounded has a bug in the mixed complete+partial run path: it returns last_progress from the partial run paired with avg_ms from complete runs, causing underestimated throughput at the call site.
benchmarks/etkdg_bench.py Adds per-run Deadline inside bench_rdkit and exposes --rdkit_max_seconds; uses time_it (not time_it_bounded), so all runs execute even when max_seconds is set, making total time runs×max_seconds rather than ~max_seconds.
benchmarks/ff_optimize_bench.py Mirrors etkdg_bench changes: adds per-run Deadline, rdkit_max_seconds gating, and throughput columns. Same time_it-vs-time_it_bounded inconsistency as etkdg_bench.
benchmarks/substruct_bench.py Significant refactor: supports multi-value --rdkit_match_mode/--rdkit_threads sweeps, extracts _validate_matches, uses time_it_bounded for early exit, and correctly gates rdkit_max_seconds to rdkit-only CSV rows.
benchmarks/tests/test_timing.py New unit tests for Deadline, throughput_per_s, and time_it_bounded. Test comment for the budget-exhaustion test now correctly describes the run-2-completes / deadline-before-run-3 behaviour.
benchmarks/tests/conftest.py New conftest that inserts the benchmarks/ directory into sys.path so bench_utils is importable from the tests/ subdirectory.
benchmarks/bench_utils/init.py Re-exports the five new timing symbols (Deadline, add_rdkit_max_seconds_arg, throughput_per_s, time_it_bounded) and adds them to all.

Reviews (2): Last reviewed commit: "Address greptile comments" | Re-trigger Greptile

Comment on lines +958 to +960
pairs_done,
args.rdkit_max_seconds,
throughput,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 rdkit_max_seconds is written as args.rdkit_max_seconds for every CSV row, including nvmolkit. Every other rdkit-specific field (rdkit_threads, rdkit_match_mode) is correctly gated to "N/A" for non-rdkit rows, so a consumer parsing the CSV will misread the nvmolkit row as if it was also bounded by the time budget.

Suggested change
pairs_done,
args.rdkit_max_seconds,
throughput,
pairs_done,
args.rdkit_max_seconds if is_rdkit else "N/A",
throughput,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai, this is fixed, please recheck

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — the fix is in place. rdkit_max_seconds is now assigned inside the if is_rdkit: ... else: rdkit_max_seconds = "N/A" block, consistent with how rdkit_match_mode and rdkit_threads are gated. Non-rdkit rows (e.g., nvmolkit) will correctly show N/A in the CSV column. This is good.

Comment thread benchmarks/tests/test_timing.py Outdated
Comment on lines +88 to +89
# 5 runs * 50ms = 250ms total, but budget is only 60ms.
# First run completes; before run 2 the deadline check kicks in.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The comment says the deadline fires "before run 2", but with a 50 ms sleep and a 60 ms budget, run 2 will normally start (only ~50 ms elapsed at the pre-run-2 check), and the deadline fires between run 2 and run 3. The assertion 1 <= call_count[0] < 5 is intentionally loose and passes either way, but the incorrect comment may lead future readers to misunderstand when the outer deadline check fires relative to per-run inner deadlines.

Suggested change
# 5 runs * 50ms = 250ms total, but budget is only 60ms.
# First run completes; before run 2 the deadline check kicks in.
# 5 runs * 50ms = 250ms total, but budget is only 60ms.
# First run completes in ~50ms; the deadline check between runs may fire
# after run 1 or run 2 depending on system load (50ms vs 60ms is close),
# so call_count ends up between 1 and 4.

@scal444 scal444 requested a review from evasnow1992 May 27, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant