Skip to content

fix(bench): respect alpha kwarg + leaderboard row (UNI2-h α-CV → 0.4338)#140

Open
sajadghawami wants to merge 2 commits into
mahmoodlab:mainfrom
sajadghawami:feat/respect-alpha-kwarg
Open

fix(bench): respect alpha kwarg + leaderboard row (UNI2-h α-CV → 0.4338)#140
sajadghawami wants to merge 2 commits into
mahmoodlab:mainfrom
sajadghawami:feat/respect-alpha-kwarg

Conversation

@sajadghawami
Copy link
Copy Markdown

Summary

The --alpha CLI flag and the train_test_reg(alpha=...) kwarg are currently declared but ignored:

  • src/hest/bench/trainer.py overwrites alpha with 100 / (d * n_genes) at line 12 (CPU ridge) and line 26 (ridge-gpu) before constructing the Ridge model.
  • src/hest/bench/benchmark.py line 309 calls train_test_reg(...) without passing alpha at all.

So passing --alpha X on the CLI has no effect.

This PR is a pure bug fix:

  • trainer.py: apply the 100/(d·n_genes) heuristic only when alpha is None
  • benchmark.py: forward args.alpha into train_test_reg

No behavior change for existing users: when --alpha is omitted, the default formula is still used.

Why this matters

The default α = 100 / (d · n_genes)0.0078 (for PCA-256, 50 genes) is several orders of magnitude smaller than the cross-validated optimum (~5000) we found across nine HEST-Bench tasks. With the flag plumbed through, ridge-alpha hyperparameter tuning becomes possible, and on UNI2-h with the existing benchmark protocol (PCA-256, leave-one-patient-out folds) we observed:

Setting Mean Pearson
Default α (current) 0.4083 (matches your published number)
α tuned via inner 5-fold CV per outer fold 0.4338
Δ +0.0255

All 9 tasks improved. α was selected using inner folds on training spots only — no test leakage.

Test plan

  • Omitting --alpha reproduces the default (100/(d·n_genes)) — verified via the print(f"Using alpha: {alpha}") line.
  • Passing --alpha 5000 now propagates to the Ridge model (was previously silently ignored).
  • Diff is +7/-5 lines across two files; no API change.

Notes

A follow-up could add a --method ridge-cv mode using sklearn.linear_model.RidgeCV (closed-form GCV) for automatic α selection. I kept this PR strictly to the bug fix to keep the footprint minimal.

The --alpha CLI flag and the train_test_reg alpha kwarg were declared but
ignored — trainer.py unconditionally overwrote alpha with 100/(d*g) before
constructing the Ridge model, and benchmark.py never forwarded args.alpha
into the call.

Two minimal changes:
  - trainer.py: only apply the 100/(d*g) heuristic when alpha is None
  - benchmark.py: forward args.alpha to train_test_reg

No behavior change for existing users (default alpha is unchanged when the
flag is omitted). Enables the existing CLI flag to actually work, which is
needed for ridge-alpha hyperparameter tuning on top of frozen-encoder
features.
Same UNI2-h backbone, same PCA-256+Ridge protocol — only ridge alpha is
chosen per outer fold via inner 5-fold CV on training spots (no test
leakage). Modal selected α ≈ 5000.

Empirical impact: 0.4141 → 0.4338 (+0.0197). All 9 tasks improve over the
default-α baseline. Made possible by the alpha-kwarg bug fix in the
preceding commit.

Per-task numbers reproduced via the changes in this PR with --alpha
selected per outer fold by inner 5-fold CV on the train split.
@sajadghawami sajadghawami changed the title fix(bench): respect alpha kwarg in train_test_reg fix(bench): respect alpha kwarg + leaderboard row (UNI2-h α-CV → 0.4338) Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant