|
| 1 | +# Human-GEM cell-type model validation: raven-python vs RAVEN |
| 2 | + |
| 3 | +Validation of raven-python's tINIT/ftINIT against MATLAB RAVEN on a real genome-scale |
| 4 | +reconstruction (Human-GEM) using the Hart2015 RNA-seq dataset (5 cell lines: DLD1, |
| 5 | +GBM, HCT116, HELA, RPE1). The goal is functional equivalence — do raven-python and RAVEN |
| 6 | +extract the *same* context-specific reaction sets from the same inputs? |
| 7 | + |
| 8 | +## Method |
| 9 | + |
| 10 | +* **Template & inputs.** RAVEN built the ftINIT reference model from Human-GEM |
| 11 | + (`prepHumanModelForftINIT`: remove drug/exchange/artificial reactions, set |
| 12 | + spontaneous/custom lists) and exported it as `raven_refModel.xml` (10198 reactions). |
| 13 | + raven-python builds on that *same* exported model, so the candidate reaction universe is |
| 14 | + identical and set comparison is exact. |
| 15 | +* **Scoring.** Gene scores from `log2(TPM+1)`-style expression via |
| 16 | + `gene_scores_from_expression`, mapped to reactions through the GPR |
| 17 | + (`score_reactions_from_genes`), matching RAVEN's `getExprForRxnScore`. |
| 18 | +* **ftINIT.** Series `1+1` (2 staged MILP steps). RAVEN run via `ftINIT.m` with Gurobi; |
| 19 | + raven-python via `raven_python.init.ftinit` with Gurobi (`mip_gap=0.001`, `time_limit=600`). |
| 20 | +* **tINIT.** raven-python `get_init_model` (classic single-MILP INIT) on HCT116, compared to |
| 21 | + the ftINIT result for the same cell line. |
| 22 | +* **Tasks.** Two raven-python ftINIT variants: *no-task* (expression only) and |
| 23 | + *task-constrained* (essential metabolic tasks, `metabolicTasks_Essential.txt`, force |
| 24 | + task-essential reactions to be kept). RAVEN's reference is task-constrained. |
| 25 | +* **Solver.** Gurobi 13.0.1 for both tools. |
| 26 | + |
| 27 | +## Engineering findings (raven-python tractability) |
| 28 | + |
| 29 | +Getting ftINIT to run at genome scale surfaced three issues, all now fixed and matching |
| 30 | +RAVEN's design: |
| 31 | + |
| 32 | +1. **O(n²) constraint construction.** Building the steady-state balances with Python |
| 33 | + `sum()` re-canonicalises a growing sympy expression at each term; hub metabolites |
| 34 | + (ATP/H⁺/H₂O in ~10³ reactions) made one constraint take ~minutes (≈154 s total build, |
| 35 | + benchmark: 1500-term `sum` = 59 s vs `optlang.symbolics.add` = 0.01 s). Fixed by |
| 36 | + building flat term lists once per reaction and summing with `optlang.symbolics.add` |
| 37 | + (in both ftINIT and tINIT). |
| 38 | +2. **Big-M too loose.** The on/off indicator constraints used each reaction's own bound |
| 39 | + (±1000) as big-M; with `force_on=0.1` that is a ~10⁴ ratio → very weak LP relaxation |
| 40 | + → Gurobi never closes the gap. RAVEN uses a fixed big-M = 100. Adopted. |
| 41 | +3. **Stoichiometric rescaling.** A fixed big-M=100 is only valid if no reaction needs |
| 42 | + flux ≫100; ported RAVEN's `rescaleModelForINIT` (cap each reaction's coefficient |
| 43 | + dynamic range at 25×, normalise mean |coeff| to 1) into `prep_init_model`. Without it |
| 44 | + the staged MILP is infeasible (step-1 caps transports that step-0 used freely). |
| 45 | + |
| 46 | +Net effect: a full ftINIT cell-line solve went from *not finishing* to ~200 s, |
| 47 | +comparable to RAVEN. |
| 48 | + |
| 49 | +## Results |
| 50 | + |
| 51 | +### Reaction counts |
| 52 | + |
| 53 | +| cell line | RAVEN ftINIT | raven-python ftINIT (no-task) | raven-python ftINIT (task) | |
| 54 | +|-----------|-------------:|--------------------------:|-----------------------:| |
| 55 | +| DLD1 | 7782 | 7744 | 7774 | |
| 56 | +| GBM | 7668 | 7667 | 7680 | |
| 57 | +| HCT116 | 7780 | 7752 | 7776 | |
| 58 | +| HELA | 7832 | 7789 | 7816 | |
| 59 | +| RPE1 | 7569 | 7564 | 7570 | |
| 60 | + |
| 61 | +Counts agree within ~0.5 % everywhere; the task-constrained run is closest (e.g. RPE1 |
| 62 | +7570 vs 7569, HCT116 7776 vs 7780). raven-python tINIT (HCT116) gives 6024 reactions — a |
| 63 | +smaller model, as expected from the different (classic INIT) objective. |
| 64 | + |
| 65 | +### Agreement — raven-python (no-task) ftINIT vs RAVEN ftINIT |
| 66 | + |
| 67 | +| cell line | shared | only raven-python | only RAVEN | Jaccard | |
| 68 | +|-----------|-------:|--------------:|-----------:|--------:| |
| 69 | +| DLD1 | 7667 | 77 | 115 | 0.976 | |
| 70 | +| GBM | 7562 | 105 | 106 | 0.973 | |
| 71 | +| HCT116 | 7675 | 77 | 105 | 0.977 | |
| 72 | +| HELA | 7707 | 82 | 125 | 0.974 | |
| 73 | +| RPE1 | 7470 | 94 | 99 | 0.975 | |
| 74 | + |
| 75 | +**~97.5 % of reactions are identical** between the two independent implementations, even |
| 76 | +though this run is *expression-only* while RAVEN's reference is task-constrained. The |
| 77 | +"only RAVEN" surplus (≈99–125) is expected to include task-essential reactions that the |
| 78 | +task-constrained run (below) recovers. |
| 79 | + |
| 80 | +### Agreement — raven-python (task-constrained) ftINIT vs RAVEN ftINIT |
| 81 | + |
| 82 | +| cell line | shared | only raven-python | only RAVEN | Jaccard | |
| 83 | +|-----------|-------:|--------------:|-----------:|--------:| |
| 84 | +| DLD1 | 7699 | 75 | 83 | 0.980 | |
| 85 | +| GBM | 7588 | 92 | 80 | 0.978 | |
| 86 | +| HCT116 | 7696 | 80 | 84 | 0.979 | |
| 87 | +| HELA | 7735 | 81 | 97 | 0.978 | |
| 88 | +| RPE1 | 7493 | 77 | 76 | 0.980 | |
| 89 | + |
| 90 | +Adding the essential metabolic tasks (same task list RAVEN uses) raises agreement to |
| 91 | +**Jaccard 0.978–0.980** and makes the disagreement symmetric (only-raven-python ≈ only-RAVEN |
| 92 | +≈ 80), confirming the prediction: the task constraints recover RAVEN's task-essential |
| 93 | +reactions. The residual ≈80 reactions each way out of ~7700 is at the level expected from |
| 94 | +MIP-gap tolerance (both accept near-optimal incumbents) and alternate optima. |
| 95 | + |
| 96 | +### raven-python tINIT vs ftINIT (HCT116) |
| 97 | + |
| 98 | +tINIT 6024 rxns vs ftINIT 7752; shared 5957, Jaccard 0.762. tINIT is nearly a subset |
| 99 | +(only 67 reactions unique to it) — the two methods agree on a common core, with ftINIT |
| 100 | +keeping more (its staged formulation and task handling are less aggressive at removal). |
| 101 | +This matches the expected tINIT/ftINIT relationship rather than indicating a defect. |
| 102 | + |
| 103 | +## Conclusions |
| 104 | + |
| 105 | +From identical inputs on a genome-scale human reconstruction, raven-python reproduces RAVEN's |
| 106 | +ftINIT reaction selection to **97.5 % (no-task) and 98 % (task-constrained) set identity** |
| 107 | +across five cell lines — strong evidence of functional equivalence between the two |
| 108 | +independent implementations. Agreement is symmetric and the residual (~80 reactions each |
| 109 | +way) is consistent with MIP near-optimality and alternate optima rather than any |
| 110 | +systematic divergence. |
| 111 | + |
| 112 | +Reaching genome-scale tractability required matching RAVEN's numerical-conditioning |
| 113 | +choices and fixing optlang-specific construction costs (see *Engineering findings*): |
| 114 | +fixed big-M = 100, `rescaleModelForINIT`, `optlang.symbolics.add` instead of Python |
| 115 | +`sum()` in every MILP build (ftINIT, tINIT, and the gap-fill). With these, a |
| 116 | +task-constrained cell-line model builds in ~15–25 min (dominated by the |
| 117 | +essential-forced staged MILP) and a no-task one in ~3 min, comparable to RAVEN. |
0 commit comments