Add early-exit option for RDKit benchmarks that operates on a specific time budget#180
Add early-exit option for RDKit benchmarks that operates on a specific time budget#180scal444 wants to merge 4 commits into
Conversation
- Add --rdkit_max_seconds to etkdg/ff_optimize/substruct benches. When set, the RDKit comparison stops once wall-clock elapsed exceeds the cap; the reported timing is over molecules/pairs actually processed. - Replace 'speedup = baseline_ms / avg_ms' with throughput-based ratios (mols * confs / elapsed) so methods that processed different amounts of work are comparable. - CSV gains mols_processed/pairs_processed, rdkit_max_seconds, confs_per_second/pairs_per_second, vs_rdkit_throughput_ratio columns. - substruct: --rdkit_match_mode and --rdkit_threads become nargs='+'; main runs the cartesian product and picks the best-throughput variant as the validation/throughput baseline.
|
| Filename | Overview |
|---|---|
| benchmarks/bench_utils/timing.py | Adds Deadline, throughput_per_s, time_it_bounded, and add_rdkit_max_seconds_arg. time_it_bounded has a bug in the mixed complete+partial run path: it returns last_progress from the partial run paired with avg_ms from complete runs, causing underestimated throughput at the call site. |
| benchmarks/etkdg_bench.py | Adds per-run Deadline inside bench_rdkit and exposes --rdkit_max_seconds; uses time_it (not time_it_bounded), so all runs execute even when max_seconds is set, making total time runs×max_seconds rather than ~max_seconds. |
| benchmarks/ff_optimize_bench.py | Mirrors etkdg_bench changes: adds per-run Deadline, rdkit_max_seconds gating, and throughput columns. Same time_it-vs-time_it_bounded inconsistency as etkdg_bench. |
| benchmarks/substruct_bench.py | Significant refactor: supports multi-value --rdkit_match_mode/--rdkit_threads sweeps, extracts _validate_matches, uses time_it_bounded for early exit, and correctly gates rdkit_max_seconds to rdkit-only CSV rows. |
| benchmarks/tests/test_timing.py | New unit tests for Deadline, throughput_per_s, and time_it_bounded. Test comment for the budget-exhaustion test now correctly describes the run-2-completes / deadline-before-run-3 behaviour. |
| benchmarks/tests/conftest.py | New conftest that inserts the benchmarks/ directory into sys.path so bench_utils is importable from the tests/ subdirectory. |
| benchmarks/bench_utils/init.py | Re-exports the five new timing symbols (Deadline, add_rdkit_max_seconds_arg, throughput_per_s, time_it_bounded) and adds them to all. |
Reviews (2): Last reviewed commit: "Address greptile comments" | Re-trigger Greptile
| pairs_done, | ||
| args.rdkit_max_seconds, | ||
| throughput, |
There was a problem hiding this comment.
rdkit_max_seconds is written as args.rdkit_max_seconds for every CSV row, including nvmolkit. Every other rdkit-specific field (rdkit_threads, rdkit_match_mode) is correctly gated to "N/A" for non-rdkit rows, so a consumer parsing the CSV will misread the nvmolkit row as if it was also bounded by the time budget.
| pairs_done, | |
| args.rdkit_max_seconds, | |
| throughput, | |
| pairs_done, | |
| args.rdkit_max_seconds if is_rdkit else "N/A", | |
| throughput, |
There was a problem hiding this comment.
@greptileai, this is fixed, please recheck
There was a problem hiding this comment.
Confirmed — the fix is in place. rdkit_max_seconds is now assigned inside the if is_rdkit: ... else: rdkit_max_seconds = "N/A" block, consistent with how rdkit_match_mode and rdkit_threads are gated. Non-rdkit rows (e.g., nvmolkit) will correctly show N/A in the CSV column. This is good.
| # 5 runs * 50ms = 250ms total, but budget is only 60ms. | ||
| # First run completes; before run 2 the deadline check kicks in. |
There was a problem hiding this comment.
The comment says the deadline fires "before run 2", but with a 50 ms sleep and a 60 ms budget, run 2 will normally start (only ~50 ms elapsed at the pre-run-2 check), and the deadline fires between run 2 and run 3. The assertion
1 <= call_count[0] < 5 is intentionally loose and passes either way, but the incorrect comment may lead future readers to misunderstand when the outer deadline check fires relative to per-run inner deadlines.
| # 5 runs * 50ms = 250ms total, but budget is only 60ms. | |
| # First run completes; before run 2 the deadline check kicks in. | |
| # 5 runs * 50ms = 250ms total, but budget is only 60ms. | |
| # First run completes in ~50ms; the deadline check between runs may fire | |
| # after run 1 or run 2 depending on system load (50ms vs 60ms is close), | |
| # so call_count ends up between 1 and 4. |
Calculates throughput after running for up to X amount of time.
Refactor of substruct rdkit code to reduce churn, but it's still unfortunately verbose due to indentation issues.