|
| 1 | +# Changelog |
| 2 | + |
| 3 | +Milestones in the raven-python port. For function-level status see |
| 4 | +[docs/raven_migration.md](docs/raven_migration.md); for open work see |
| 5 | +[docs/todo.md](docs/todo.md). |
| 6 | + |
| 7 | +## Infrastructure |
| 8 | + |
| 9 | +* **GitHub Actions CI** ([.github/workflows/ci.yml](.github/workflows/ci.yml)) — |
| 10 | + ruff + pytest matrix over Python 3.11/3.12/3.13. Tests that require Gurobi |
| 11 | + auto-skip (no Gurobi on free runners); the known HiGHS upstream blocker |
| 12 | + (`hybrid_interface.Configuration` rejects `lp_method='primal'`) is marked |
| 13 | + `xfail(strict=True)` so CI flips red when optlang fixes it. |
| 14 | + |
| 15 | +## Quality sweep — known-issues section F (design-choice divergences) |
| 16 | + |
| 17 | +Closed the five items in section F (the "design choices that differ from RAVEN" |
| 18 | +backlog from the original review). Three docstring/comment fixes; two code |
| 19 | +fixes with matching MATLAB back-port proposals in IMPROVEMENTS.md (FS4, B2). |
| 20 | + |
| 21 | +* `run_init` docstring spells out the score-0 semantics divergence between |
| 22 | + classic INIT and ftINIT. |
| 23 | +* `get_init_model` inaccurate "same regime" comment replaced with an accurate |
| 24 | + description of the conservative pre-filter. |
| 25 | +* `fseof` classifier now uses the slope of `|flux|` (`linregress(enforced, |flux|)`) |
| 26 | + instead of first-vs-last endpoints. A track whose endpoints straddle a |
| 27 | + peak/trough no longer ends up mislabelled. |
| 28 | +* `reporter_metabolites` docstring documents the one-sided p-value + z-score |
| 29 | + ordering vs RAVEN's two-tailed sort, and points at the up/down split via |
| 30 | + `gene_fold_changes`. |
| 31 | +* `get_elemental_balance` now reports `unknown` for empty-stoichiometry |
| 32 | + reactions (previously vacuously `balanced`). Original review attributed the |
| 33 | + bug to `check_model`; the actual code is in `balance.py`. |
| 34 | + |
| 35 | +Two new regression tests (F3 in `test_analysis_fseof.py`, F5 in |
| 36 | +`test_utils_balance.py`). [docs/known_issues.md](docs/known_issues.md) now |
| 37 | +fully closed (all sections A–F). |
| 38 | + |
| 39 | +## Quality sweep — known-issues sections C / D / E |
| 40 | + |
| 41 | +Closed all the robustness, efficiency, and dead-code items in one pass. |
| 42 | + |
| 43 | +**Robustness (C):** |
| 44 | +* `constrain_reversible_reactions` wraps FVA in try/except + NaN check; both |
| 45 | + backend-raised `OptimizationError` and silent-NaN returns now surface as one |
| 46 | + clear `RuntimeError` (the original `abs(NaN) < eps` silently no-op'd). |
| 47 | +* `ensure_binary` downloads through `.part` + `os.replace`, matching `data.py` — |
| 48 | + an interrupted download leaves a `.part`, never a half-complete `.zip`. |
| 49 | +* `parse_task_list` (.xlsx) checks `wb.sheetnames` before lookup; missing |
| 50 | + `TASKS` sheet now raises a clear `ValueError` instead of a bare `KeyError`. |
| 51 | +* `parse_taxonomy` pads with explicit `""` when a depth level is skipped and |
| 52 | + warns once. |
| 53 | + |
| 54 | +**Efficiency (D):** |
| 55 | +* `group_linear_reactions` rewritten with a metabolite worklist (re-enqueue |
| 56 | + the mets touched by each merge); same observable result, O(n+m) work per |
| 57 | + pass instead of restarting the full scan after every merge. |
| 58 | +* `parse_kegg_reactions` now caches the parsed stoichiometry on each |
| 59 | + `KeggReaction.stoichiometry`; `build_reference_model` reuses it instead of |
| 60 | + re-parsing. |
| 61 | + |
| 62 | +**Dead code (E):** |
| 63 | +* Dropped `KeggReaction.modules` and `.rhea` (parsed but never consumed). |
| 64 | +* Dropped the vestigial `only_genes_in_models` parameter from `_ortholog_map`. |
| 65 | + |
| 66 | +Six new regression tests; the only one without a test is the `.part` atomic |
| 67 | +download (defensive, needs urlopen mocking). |
| 68 | + |
| 69 | +## Quality sweep — known-issues section B |
| 70 | + |
| 71 | +Closed all four "silent misbehaviour" items from [docs/known_issues.md](docs/known_issues.md): |
| 72 | +* `merge_models` warns on `formula` / `charge` conflicts when two source models |
| 73 | + share a name[comp] but disagree (used to silently keep the first-seen). |
| 74 | +* `add_reactions_from_equations` warns when creating a metabolite in an |
| 75 | + unregistered compartment — both the `mets_by="id"` and `mets_by="name"` paths |
| 76 | + (id-mode used to skip the check entirely, an asymmetry). |
| 77 | +* `parse_task_list` warns when continuation data appears before any task ID |
| 78 | + has been seen (used to silently drop the orphan row). |
| 79 | +* `export_model_to_sif` warns up front when a custom label map sends two |
| 80 | + distinct ids to the same label (used to silently collapse nodes). |
| 81 | +Four new regression tests cover them. |
| 82 | + |
| 83 | +## Quality sweep — known-issues section A |
| 84 | + |
| 85 | +Closed all six "latent edge-case bug" items from [docs/known_issues.md](docs/known_issues.md): |
| 86 | +* `add_reactions_from_equations` no longer misparses `"2 oxoglutarate"` (or any |
| 87 | + leading-number metabolite name) — the resolver tries the full token before |
| 88 | + splitting off a coefficient. |
| 89 | +* `add_reactions_from_equations` warns when an equation's terms cancel to a |
| 90 | + zero-metabolite reaction. |
| 91 | +* `add_reactions_from_model` tracks ids minted within the batch so two source |
| 92 | + metabolites whose ids both collide with the draft don't collapse onto the |
| 93 | + same generated id. |
| 94 | +* `add_transport_reactions` warns on duplicate metabolite names in the source |
| 95 | + or target compartment instead of silently dropping all but one. |
| 96 | +* `connect_blocked_reactions` membership-guards the FVA result before |
| 97 | + `.at[]` lookup. |
| 98 | +* `assign_kos` rejects `cutoff >= 1` up front — would have crashed inside the |
| 99 | + ratio filter at `log(best_evalue) == 0`. |
| 100 | +Six new regression tests cover the user-reachable cases. |
| 101 | + |
| 102 | +## Phase 7 — Localization |
| 103 | + |
| 104 | +* **Sub-cellular localisation by MILP.** [`localization.predict_localization`](src/raven_python/localization/predict.py) |
| 105 | + + [`apply_localization`](src/raven_python/localization/predict.py). Deterministic (not simulated |
| 106 | + annealing); caller-passed `reactions_to_relocate` set with everything else pinned; |
| 107 | + incomplete-model tolerant (no silent reaction removal); `apply=False` returns a diff |
| 108 | + preview; multi-compartment by default with primary-free, extras-penalised scoring. |
| 109 | +* **Predictor loaders.** [`load_wolfpsort`, `load_deeploc`](src/raven_python/localization/scores.py), |
| 110 | + with the `gene × compartment` DataFrame contract open for any predictor. |
| 111 | +* **Compartment helpers** ([`manipulation/compartments.py`](src/raven_python/manipulation/compartments.py)): |
| 112 | + `merge_compartments`, `copy_to_compartment` — useful standalone for model curation. |
| 113 | +* **Real-data validation on yeast-GEM** ([docs/yeast_localization_benchmark.md](docs/yeast_localization_benchmark.md)) |
| 114 | + — accuracy 0.72 → 0.39 on 298 GPR'd reactions as confident predictor mis-scoring rises |
| 115 | + from 0 % to 50 %; perfect on compartments with disjoint gene sets (c/g/lp/p/v/vm), and |
| 116 | + surfaces a `transport_cost` calibration insight for soft-probability score tables. |
| 117 | + |
| 118 | +## Phase 5 — Data integration & analysis |
| 119 | + |
| 120 | +* **Reporter metabolites, FSEOF, random sampling** ([`analysis/`](src/raven_python/analysis/)). |
| 121 | +* **HPA omics ingestion** ([`omics.parse_hpa`, `parse_hpa_rna`, `hpa_gene_scores`, `rna_gene_scores`](src/raven_python/omics/hpa.py)) |
| 122 | + — pandas-tidy DataFrames replace RAVEN's sparse-matrix layout; scoring adapters reuse the |
| 123 | + existing GPR walk. |
| 124 | +* **N-model comparison** ([`comparison.compare_models`](src/raven_python/comparison/compare.py)). |
| 125 | +* **Dynamic FBA** is **not ported** — established Python packages cover it (`dfba`, |
| 126 | + `reframed`, `mewpy`). |
| 127 | + |
| 128 | +## Phase 4d — ftINIT |
| 129 | + |
| 130 | +* **ftINIT pipeline** ([`init.ftinit`](src/raven_python/init/ftinit.py)) — staged MILP, linear merge, |
| 131 | + task-aware gap-filling, gene pruning. |
| 132 | +* **Validated against MATLAB RAVEN on Human-GEM.** 5 Hart2015 cell-line models; |
| 133 | + Jaccard 0.973–0.977 (no-task) and 0.978–0.980 (task-constrained). See |
| 134 | + [docs/humangem_validation.md](docs/humangem_validation.md). |
| 135 | +* **Parameter calibration & input-robustness study** ([docs/init_param_calibration.md](docs/init_param_calibration.md)) |
| 136 | + — `mip_gap=0.01` is the genome-scale full-pipeline sweet spot (~37% faster than 0.001 at |
| 137 | + Jaccard 0.995); pipeline is robust to expression noise (Jaccard 0.92–0.95) but sensitive |
| 138 | + to sparsity (50–70% dropout → Jaccard 0.59–0.71); the task + gap-fill layer keeps the |
| 139 | + essential-task pass-rate at 67–69/69 across the gradient, whereas tINIT-without-it passes |
| 140 | + only 35/69 even on clean data. |
| 141 | +* **Cross-solver portability** ([docs/init_solver_benchmark.md](docs/init_solver_benchmark.md)) |
| 142 | + + [`tests/test_init_solvers.py`](tests/test_init_solvers.py): Gurobi and GLPK pass at toy |
| 143 | + scale; only Gurobi is viable at genome scale today (HiGHS hits an upstream optlang |
| 144 | + `clone()` bug; GLPK ignores `configuration.timeout` on MIP). |
| 145 | +* **Engineering wins surfaced by the genome-scale work:** `check_tasks` and |
| 146 | + `fill_tasks._feasible` rewritten in-place (~12× each); `optlang.symbolics.add` builds |
| 147 | + in the MILP construction (the O(n²) sympy `sum()` blow-up was the original genome-scale |
| 148 | + blocker); bounded gap-fill MILP; `rescaleModelForINIT` ported. |
| 149 | + |
| 150 | +## Phase 4c — tINIT |
| 151 | + |
| 152 | +* **INIT MILP and the tINIT pipeline** ([`init.run_init`](src/raven_python/init/init.py), |
| 153 | + [`init.get_init_model`](src/raven_python/init/build.py)). Clean optlang reformulation; |
| 154 | + RNA-seq scoring via `5·ln(level/ref)`-clamped. |
| 155 | + |
| 156 | +## Phase 4b — Gap-filling |
| 157 | + |
| 158 | +* **Connectivity gap-filling** ([`gapfilling.connect_blocked_reactions`](src/raven_python/gapfilling/fill.py)) |
| 159 | + — MILP. Targeted (toward objective) mode delegates to `cobra.gapfill`. |
| 160 | + |
| 161 | +## Phase 4a — Metabolic tasks |
| 162 | + |
| 163 | +* **Task list parsing + `check_tasks`** ([`tasks/`](src/raven_python/tasks/)). |
| 164 | + |
| 165 | +## Phase 3 — Reconstruction |
| 166 | + |
| 167 | +* **Homology-based draft** from a template GEM + BLAST/DIAMOND wrappers |
| 168 | + ([`reconstruction/homology/`](src/raven_python/reconstruction/homology/)) — with structured |
| 169 | + improvements over RAVEN's `getModelFromHomology` (see IMPROVEMENTS H1–H6). |
| 170 | +* **KEGG five-step pipeline** ([`reconstruction/kegg/`](src/raven_python/reconstruction/kegg/)): |
| 171 | + dump → parser → HMM library builder → species model → HMM-query draft. |
| 172 | +* **MetaCyc reconstruction** **not ported** (and flagged for removal from MATLAB RAVEN — |
| 173 | + see IMPROVEMENTS R-MetaCyc). |
| 174 | + |
| 175 | +## Phase 2 — I/O |
| 176 | + |
| 177 | +* **YAML** aligned to cobra's `!!omap` writer + RAVEN-only fields preserved into `.notes`, |
| 178 | + plus geckopy `ec-*` for enzyme-constrained models |
| 179 | + ([`io/yaml.py`](src/raven_python/io/yaml.py)). |
| 180 | +* **SIF**, **Excel export**, and **Standard-GEM `model/<fmt>/…` git layout** |
| 181 | + ([`io/`](src/raven_python/io/)). Excel import intentionally excluded. |
| 182 | + |
| 183 | +## Phase 1 — Foundation |
| 184 | + |
| 185 | +* **GPR / balance / validation / parsing helpers** ([`utils/`](src/raven_python/utils/)) — |
| 186 | + cobra-absent bits only; the rest are cheatsheeted. |
| 187 | +* **Manipulation ergonomic layer** ([`manipulation/`](src/raven_python/manipulation/)) — |
| 188 | + add/change/remove/transport/transfer/merge/simplify/variance + adopted transforms. |
| 189 | +* **External-binary resolver** ([`binaries.py`](src/raven_python/binaries.py)) — version-pinned |
| 190 | + release-ZIP registry, SHA256-verified cache. |
| 191 | + |
| 192 | +## Phase 0 — Scaffold |
| 193 | + |
| 194 | +* Project structure, packaging, pytest skeleton, license alignment with MATLAB RAVEN |
| 195 | + (GPL-3.0-or-later). |
0 commit comments