You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**monomer_protein** LDDT **0.790** is the headline — a solid reimplementation result. Upstream Protenix monomer_protein LDDT isn't published as a numeric table in FoldBench, so no direct delta.
323
-
-**interface_protein_dna** tracks upstream Protenix closely (33.7% success vs upstream's 67.6% in the 2024-01+ regime). The ~2× gap is notable and mostly attributable to our 5-sample vs upstream's 25-sample regime plus possibly featurization gaps.
324
-
-**interface_antibody_antigen** is our weakest interface (5.4% vs upstream's 38.4%) — a 7× gap much larger than the sampling difference can explain. Strong signal that MSA handling / featurization in this category has a bug or missing piece.
325
-
-**interface_protein_rna** underperforms across all models in published tables; our 12.8% vs upstream 56.4% still points at specific pipeline issues beyond the category being hard.
A followup experiment (see issue referenced in `helico_experiment.baselines`)
331
-
will run upstream Protenix v1 on our exact 679-target subset to remove the
332
-
sampling/cutoff/token-limit caveats from the paper comparison.
349
+
Headlines:
350
+
351
+
-**Monomer categories match or beat Protenix's published numbers** (monomer_dna 0.52 ≥ 0.44, monomer_rna 0.60 ≥ 0.59, monomer_protein 0.83). With a very small N for DNA/RNA these aren't statistically significant but confirm nothing is horribly wrong.
352
+
-**Interface categories are ~50–80% of Protenix's published success rates.** All of these came up substantially from the pre-fix baseline (ab-ag 6.8% → 30.4%, p-protein 14.5% → 33.6%, etc.) — the template + MSA-subsample fixes are doing the expected work.
353
+
-**interface_protein_protein remains the widest relative gap** (52% of Protenix). Worth prioritizing in followup.
354
+
355
+
The remaining gap isn't template-shaped — FoldBench doesn't ship templates
356
+
and Protenix's published numbers also use the dummy-template path. Most
357
+
likely candidates for the remaining delta:
358
+
359
+
1.**MSA differences.** Published Protenix runs with `--use_msa_server` (its own MSA server); we use the FoldBench-bundled MSAs. Different MSA depth/pairing → different predictions.
360
+
2.**bf16 numerical accumulation.** Same weights, slightly different op ordering/precision across implementations compounds over 10 recycles × 200 diffusion steps × 25 samples.
361
+
362
+
The MSA hypothesis is the most testable — see issue #TBD for the
0 commit comments