Skip to content

Graduate contacts-v1 inference into the marinfold CLI#92

Merged
timodonnell merged 5 commits into
mainfrom
claude/pensive-hawking-231c65
Jul 2, 2026
Merged

Graduate contacts-v1 inference into the marinfold CLI#92
timodonnell merged 5 commits into
mainfrom
claude/pensive-hawking-231c65

Conversation

@timodonnell

@timodonnell timodonnell commented Jun 24, 2026

Copy link
Copy Markdown
Member

What

Graduates contacts-v1 inference into the marinfold CLI so the README "Try it out" (and a Colab notebook) can run our current-best model — eric-czech's #61/#75 contacts-v1 1.5B (issue #61, eval loss 2.7566), exported to the open-athena bucket by #89. Previously the graduated contacts_v1 package only did generate/view/tokenizer; inference lived only in the exp82/exp89 eval harnesses.

Supports both contact-prediction readouts, selectable via --method:

  • pairwise (default, ~0.3 s/protein) — symmetrized autoregressive P(contact) per pair, optionally averaged over --ensemble-k resampled realizations (exp: evaluate best contacts-v1 model on current eval set #89's TTA).
  • rollout (exp82's settled best LM-only recipe, ~50 s/protein) — vote over --n-rollouts sampled contact-section completions, each from a freshly resampled document, tie-broken by the pairwise log-prob (combined = votes + ½·minmax(pairwise sym)).

How

  • inference/core.py — adds a sample_completions sampling primitive to the Backend protocol (the existing surface was forward-pass-only).
  • inference/_vllm.py + _transformers.py — implement it (vLLM SamplingParams + generate; transformers model.generate). MLX raises NotImplementedError for rollout (pairwise still works there).
  • contacts_v1/inference.py — pairwise readout over the existing Backend.next_token_probs; _rollout_score_matrix (resample → vote → pairwise tie-break); _score_matrix dispatches on method. Records are method-agnostic (score + method).
  • contacts_v1/{plots,cli}.py — method-aware heatmap labels; infer/evaluate gain --method / --n-rollouts / --temperature / --top-p / --top-k (contacts-v1-specific knobs stay on the per-impl driver; top-level marinfold infer stays pairwise/narrow).
  • MODELS.yaml — new 1.5B-contacts-v1 entry, now the default (was 1B).
  • README + Colab notebook — "Try it out" headlines the contacts-v1 contact map (pairwise default, rollout via vLLM as the best recipe); notebooks/inference_example1.ipynb repointed to the contacts-v1 model with a pairwise/rollout selector; distogram models kept as "previous generation".

Validation

  • 215 unit tests pass (stub-backend, no model download), incl. rollout vote-counting across resampled realizations, the votes + [0,0.5) tie-break bound, and method dispatch.
  • End-to-end with the published 5.9 GB checkpoint (transformers; this box's NVIDIA driver predates torch's bundled CUDA, so it runs CPU — correctness unaffected):
    • pairwise evaluate on tests/data/1QYS.ciflong-range AUC 0.957 (exp89 regime).
    • rollout infer on Trp-cage → integer votes + [0,0.5) pairwise tie-break, exactly as designed.
    • Notebook compute cells dry-run against the checkpoint → AUC 0.957 + metrics/PDF/PNG written.

Notes for review

  • Default model changed 1B1.5B-contacts-v1.
  • MODELS.yaml deliberately omits wandb_url for the new entry (I don't have the verified link — it's eric's eric-czech/marin run; drop the exact URL in).
  • vLLM sample_completions is unverified on this box (GPU driver too old for the bundled CUDA) — it follows vLLM's standard generate API; the rollout logic is verified via transformers. MLX rollout is a deliberate NotImplementedError (an exp82 follow-up).

🤖 Generated with Claude Code

timodonnell and others added 5 commits June 24, 2026 13:26
Add predict/evaluate for the contacts-v1 document structure so the
top-level `marinfold infer` / `marinfold evaluate` (and the per-impl
`contacts-v1` driver) can run eric-czech's #61/#75 contacts-v1 1.5B
model (eval loss 2.7566), exported to the open-athena bucket by exp89.
Previously the graduated package only did generate/view/tokenizer.

- inference.py: InferenceConfig, structure_from_sequence, predict,
  evaluate. Pairwise P(contact) readout (exp82/exp89) over the existing
  Backend.next_token_probs primitive; --ensemble-k test-time
  augmentation; sklearn-free AUC + precision@{L,L/2,L/5,R} per range.
- plots.py: P(contact) heatmap writers for infer/evaluate.
- cli.py / __init__.py: infer/evaluate subcommands + dispatch exports.
- MODELS.yaml: 1.5B-contacts-v1 entry (now the default).
- README: "Try it out" headlines the contacts-v1 model; the distogram
  contacts-and-distances-v1 models stay as the previous generation.

Validated end-to-end with the published checkpoint: evaluate on 1QYS
gives long-range contact AUC 0.957 (exp89 regime). 208 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The W&B URL was a reconstruction, not a verified link; leave it out
(it is an informational-only field) with a comment pointing at where
to find the real one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
contacts-v1 infer/evaluate gain a `method` knob: the existing fast
`pairwise` P(contact) readout (default) plus exp82's settled best
LM-only recipe, `rollout` — vote over N resampled sampled contact-
section completions, tie-broken by the pairwise log-prob
(combined = votes + 0.5*minmax(pairwise sym)).

- inference/core.py: add a `sample_completions` sampling primitive to
  the Backend protocol.
- inference/_vllm.py + _transformers.py: implement it (vLLM
  SamplingParams + generate; transformers model.generate). MLX raises
  NotImplementedError for now (pairwise still works there).
- contacts_v1/inference.py: `_rollout_score_matrix` (resample + vote +
  pairwise tie-break); `_score_matrix` dispatch on cfg.method; the
  predict/evaluate record schema is now method-agnostic (`score` +
  `method`, was `p_contact`).
- contacts_v1/{plots,cli}.py: method-aware heatmap labels; --method /
  --n-rollouts / --temperature / --top-p / --top-k.
- README: rollout (vLLM) documented as the best recipe; the headline
  figure is now exp82's rollout result, so the stale ×10-ens copy is
  fixed.

Validated: 215 tests pass (incl. stub-backend rollout vote/tiebreak);
rollout runs E2E with the published checkpoint via transformers —
integer votes + [0,0.5) tiebreak as designed. vLLM sampling follows
vLLM's standard generate API but is unverified on this box (its CUDA
driver predates torch's bundled CUDA build).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Repoint notebooks/inference_example1.ipynb from the distogram model to
our current best contacts-v1 1.5B: it now imports the contacts_v1 impl,
exposes a pairwise/rollout METHOD selector (+ N_ROLLOUTS / ENSEMBLE_K),
installs the contacts-v1 extra (pyconfind) for ground-truth contacts,
and plots the GT vs predicted contact map inline instead of distance
heatmaps. README's Colab bullet updated to match.

Verified the compute cells end-to-end against the published checkpoint
(transformers): evaluate on 1QYS -> long-range AUC 0.957, and the
metrics/PDF/PNG outputs write cleanly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@timodonnell timodonnell merged commit b9047f7 into main Jul 2, 2026
@timodonnell timodonnell deleted the claude/pensive-hawking-231c65 branch July 2, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant