Graduate contacts-v1 inference into the marinfold CLI by timodonnell · Pull Request #92 · Open-Athena/MarinFold

timodonnell · 2026-06-24T17:27:11Z

What

Graduates contacts-v1 inference into the marinfold CLI so the README "Try it out" (and a Colab notebook) can run our current-best model — eric-czech's #61/#75 contacts-v1 1.5B (issue #61, eval loss 2.7566), exported to the open-athena bucket by #89. Previously the graduated contacts_v1 package only did generate/view/tokenizer; inference lived only in the exp82/exp89 eval harnesses.

Supports both contact-prediction readouts, selectable via --method:

pairwise (default, ~0.3 s/protein) — symmetrized autoregressive P(contact) per pair, optionally averaged over --ensemble-k resampled realizations (exp: evaluate best contacts-v1 model on current eval set #89's TTA).
rollout (exp82's settled best LM-only recipe, ~50 s/protein) — vote over --n-rollouts sampled contact-section completions, each from a freshly resampled document, tie-broken by the pairwise log-prob (combined = votes + ½·minmax(pairwise sym)).

How

inference/core.py — adds a sample_completions sampling primitive to the Backend protocol (the existing surface was forward-pass-only).
inference/_vllm.py + _transformers.py — implement it (vLLM SamplingParams + generate; transformers model.generate). MLX raises NotImplementedError for rollout (pairwise still works there).
contacts_v1/inference.py — pairwise readout over the existing Backend.next_token_probs; _rollout_score_matrix (resample → vote → pairwise tie-break); _score_matrix dispatches on method. Records are method-agnostic (score + method).
contacts_v1/{plots,cli}.py — method-aware heatmap labels; infer/evaluate gain --method / --n-rollouts / --temperature / --top-p / --top-k (contacts-v1-specific knobs stay on the per-impl driver; top-level marinfold infer stays pairwise/narrow).
MODELS.yaml — new 1.5B-contacts-v1 entry, now the default (was 1B).
README + Colab notebook — "Try it out" headlines the contacts-v1 contact map (pairwise default, rollout via vLLM as the best recipe); notebooks/inference_example1.ipynb repointed to the contacts-v1 model with a pairwise/rollout selector; distogram models kept as "previous generation".

Validation

215 unit tests pass (stub-backend, no model download), incl. rollout vote-counting across resampled realizations, the votes + [0,0.5) tie-break bound, and method dispatch.
End-to-end with the published 5.9 GB checkpoint (transformers; this box's NVIDIA driver predates torch's bundled CUDA, so it runs CPU — correctness unaffected):
- pairwise evaluate on tests/data/1QYS.cif → long-range AUC 0.957 (exp89 regime).
- rollout infer on Trp-cage → integer votes + [0,0.5) pairwise tie-break, exactly as designed.
- Notebook compute cells dry-run against the checkpoint → AUC 0.957 + metrics/PDF/PNG written.

Notes for review

Default model changed 1B → 1.5B-contacts-v1.
MODELS.yaml deliberately omits wandb_url for the new entry (I don't have the verified link — it's eric's eric-czech/marin run; drop the exact URL in).
vLLM sample_completions is unverified on this box (GPU driver too old for the bundled CUDA) — it follows vLLM's standard generate API; the rollout logic is verified via transformers. MLX rollout is a deliberate NotImplementedError (an exp82 follow-up).

🤖 Generated with Claude Code

Add predict/evaluate for the contacts-v1 document structure so the top-level `marinfold infer` / `marinfold evaluate` (and the per-impl `contacts-v1` driver) can run eric-czech's #61/#75 contacts-v1 1.5B model (eval loss 2.7566), exported to the open-athena bucket by exp89. Previously the graduated package only did generate/view/tokenizer. - inference.py: InferenceConfig, structure_from_sequence, predict, evaluate. Pairwise P(contact) readout (exp82/exp89) over the existing Backend.next_token_probs primitive; --ensemble-k test-time augmentation; sklearn-free AUC + precision@{L,L/2,L/5,R} per range. - plots.py: P(contact) heatmap writers for infer/evaluate. - cli.py / __init__.py: infer/evaluate subcommands + dispatch exports. - MODELS.yaml: 1.5B-contacts-v1 entry (now the default). - README: "Try it out" headlines the contacts-v1 model; the distogram contacts-and-distances-v1 models stay as the previous generation. Validated end-to-end with the published checkpoint: evaluate on 1QYS gives long-range contact AUC 0.957 (exp89 regime). 208 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The W&B URL was a reconstruction, not a verified link; leave it out (it is an informational-only field) with a comment pointing at where to find the real one. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…g-231c65

contacts-v1 infer/evaluate gain a `method` knob: the existing fast `pairwise` P(contact) readout (default) plus exp82's settled best LM-only recipe, `rollout` — vote over N resampled sampled contact- section completions, tie-broken by the pairwise log-prob (combined = votes + 0.5*minmax(pairwise sym)). - inference/core.py: add a `sample_completions` sampling primitive to the Backend protocol. - inference/_vllm.py + _transformers.py: implement it (vLLM SamplingParams + generate; transformers model.generate). MLX raises NotImplementedError for now (pairwise still works there). - contacts_v1/inference.py: `_rollout_score_matrix` (resample + vote + pairwise tie-break); `_score_matrix` dispatch on cfg.method; the predict/evaluate record schema is now method-agnostic (`score` + `method`, was `p_contact`). - contacts_v1/{plots,cli}.py: method-aware heatmap labels; --method / --n-rollouts / --temperature / --top-p / --top-k. - README: rollout (vLLM) documented as the best recipe; the headline figure is now exp82's rollout result, so the stale ×10-ens copy is fixed. Validated: 215 tests pass (incl. stub-backend rollout vote/tiebreak); rollout runs E2E with the published checkpoint via transformers — integer votes + [0,0.5) tiebreak as designed. vLLM sampling follows vLLM's standard generate API but is unverified on this box (its CUDA driver predates torch's bundled CUDA build). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Repoint notebooks/inference_example1.ipynb from the distogram model to our current best contacts-v1 1.5B: it now imports the contacts_v1 impl, exposes a pairwise/rollout METHOD selector (+ N_ROLLOUTS / ENSEMBLE_K), installs the contacts-v1 extra (pyconfind) for ground-truth contacts, and plots the GT vs predicted contact map inline instead of distance heatmaps. README's Colab bullet updated to match. Verified the compute cells end-to-end against the published checkpoint (transformers): evaluate on 1QYS -> long-range AUC 0.957, and the metrics/PDF/PNG outputs write cleanly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

timodonnell and others added 5 commits June 24, 2026 13:26

Drop unverified wandb_url from the contacts-v1 MODELS.yaml entry

22e7f9c

The W&B URL was a reconstruction, not a verified link; leave it out (it is an informational-only field) with a comment pointing at where to find the real one. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into claude/pensive-hawkin…

80498d5

…g-231c65

timodonnell merged commit b9047f7 into main Jul 2, 2026

timodonnell deleted the claude/pensive-hawking-231c65 branch July 2, 2026 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graduate contacts-v1 inference into the marinfold CLI#92

Graduate contacts-v1 inference into the marinfold CLI#92
timodonnell merged 5 commits into
mainfrom
claude/pensive-hawking-231c65

timodonnell commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

timodonnell commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Validation

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timodonnell commented Jun 24, 2026 •

edited

Loading