Consolidate smiles/smarts loaders further in benchmarks, ensure shuffling#182
Consolidate smiles/smarts loaders further in benchmarks, ensure shuffling#182scal444 wants to merge 2 commits into
Conversation
- load_smiles now shuffles its result deterministically with the seed so benches that consume a head slice get a representative cross-section rather than file-order bias. - substruct_bench.py drops its local load_pickle/load_smiles/load_smarts helpers in favor of the shared bench_utils versions. - butina_clustering_bench.py and cross_similarity_bench.py replace inline open()/pd.read_csv loaders with the shared load_smiles helper.
|
| Filename | Overview |
|---|---|
| benchmarks/bench_utils/loaders.py | All three loaders (load_pickle, load_smiles, load_sdf) now shuffle in the else-branch when sampling is not triggered, ensuring file-order bias is removed regardless of input size vs max_count. |
| benchmarks/butina_clustering_bench.py | Replaced manual head-slice CSV parsing with load_smiles(), gaining reservoir sampling and shuffle behaviour. |
| benchmarks/cross_similarity_bench.py | Replaced pandas CSV loading + manual parsing with load_smiles(); removes pandas dependency and gains shuffle; duplicate-padding fallback loop unchanged. |
| benchmarks/substruct_bench.py | Removed local load_pickle, load_smiles, load_smarts duplicates and replaced with imports from bench_utils; no logic changes to benchmark code itself. |
Reviews (2): Last reviewed commit: "Shuffle sdf and pickle too" | Re-trigger Greptile
No description provided.