Fine tuning improves accuracy
Question
Can we fine tune from protenix v1 checkpoint?
Hypothesis
Expect to see various loss metrics go down, and get better performance on foldbench from the fine tuned checkpoint
Background
Mostly, this is a sanity check that our training pipeline works
Also I suspect that some of worse performance we expect to see in #4 can be addressed by fine tuning. So hopefully this will give us a better checkpoint to base future experiments off of.
Approach
- Fine tune on our standard dataset
- Check that train and eval loss metrics generally go down-ish, or at least don't explode
- Take the final checkpoint and save it to huggingface
- Run foldbench on the final checkpoint, compare to existing folding models
-->
Compute estimate
TBD
Success criteria
We get a better checkpoint (in terms of foldbench) than just using protenix v1 weights directly
Baselines
See #4
Notes
Retrospective scope: this issue wraps an already-running fine-tune (v1-finetune-01, H100:8, 10K steps, launched 2026-04-22, W&B run usvcruzi). Notebook pulls its metrics rather than launching new work.
Fine tuning improves accuracy
Question
Can we fine tune from protenix v1 checkpoint?
Hypothesis
Expect to see various loss metrics go down, and get better performance on foldbench from the fine tuned checkpoint
Background
Mostly, this is a sanity check that our training pipeline works
Also I suspect that some of worse performance we expect to see in #4 can be addressed by fine tuning. So hopefully this will give us a better checkpoint to base future experiments off of.
Approach
-->
Compute estimate
TBD
Success criteria
We get a better checkpoint (in terms of foldbench) than just using protenix v1 weights directly
Baselines
See #4
Notes
Retrospective scope: this issue wraps an already-running fine-tune (v1-finetune-01, H100:8, 10K steps, launched 2026-04-22, W&B run usvcruzi). Notebook pulls its metrics rather than launching new work.