diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..3d42fa9
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,195 @@
+# Changelog
+
+Milestones in the raven-python port. For function-level status see
+[docs/raven_migration.md](docs/raven_migration.md); for open work see
+[docs/todo.md](docs/todo.md).
+
+## Infrastructure
+
+* **GitHub Actions CI** ([.github/workflows/ci.yml](.github/workflows/ci.yml)) —
+  ruff + pytest matrix over Python 3.11/3.12/3.13. Tests that require Gurobi
+  auto-skip (no Gurobi on free runners); the known HiGHS upstream blocker
+  (`hybrid_interface.Configuration` rejects `lp_method='primal'`) is marked
+  `xfail(strict=True)` so CI flips red when optlang fixes it.
+
+## Quality sweep — known-issues section F (design-choice divergences)
+
+Closed the five items in section F (the "design choices that differ from RAVEN"
+backlog from the original review). Three docstring/comment fixes; two code
+fixes with matching MATLAB back-port proposals in IMPROVEMENTS.md (FS4, B2).
+
+* `run_init` docstring spells out the score-0 semantics divergence between
+  classic INIT and ftINIT.
+* `get_init_model` inaccurate "same regime" comment replaced with an accurate
+  description of the conservative pre-filter.
+* `fseof` classifier now uses the slope of `|flux|` (`linregress(enforced, |flux|)`)
+  instead of first-vs-last endpoints. A track whose endpoints straddle a
+  peak/trough no longer ends up mislabelled.
+* `reporter_metabolites` docstring documents the one-sided p-value + z-score
+  ordering vs RAVEN's two-tailed sort, and points at the up/down split via
+  `gene_fold_changes`.
+* `get_elemental_balance` now reports `unknown` for empty-stoichiometry
+  reactions (previously vacuously `balanced`). Original review attributed the
+  bug to `check_model`; the actual code is in `balance.py`.
+
+Two new regression tests (F3 in `test_analysis_fseof.py`, F5 in
+`test_utils_balance.py`). [docs/known_issues.md](docs/known_issues.md) now
+fully closed (all sections A–F).
+
+## Quality sweep — known-issues sections C / D / E
+
+Closed all the robustness, efficiency, and dead-code items in one pass.
+
+**Robustness (C):**
+* `constrain_reversible_reactions` wraps FVA in try/except + NaN check; both
+  backend-raised `OptimizationError` and silent-NaN returns now surface as one
+  clear `RuntimeError` (the original `abs(NaN) < eps` silently no-op'd).
+* `ensure_binary` downloads through `.part` + `os.replace`, matching `data.py` —
+  an interrupted download leaves a `.part`, never a half-complete `.zip`.
+* `parse_task_list` (.xlsx) checks `wb.sheetnames` before lookup; missing
+  `TASKS` sheet now raises a clear `ValueError` instead of a bare `KeyError`.
+* `parse_taxonomy` pads with explicit `""` when a depth level is skipped and
+  warns once.
+
+**Efficiency (D):**
+* `group_linear_reactions` rewritten with a metabolite worklist (re-enqueue
+  the mets touched by each merge); same observable result, O(n+m) work per
+  pass instead of restarting the full scan after every merge.
+* `parse_kegg_reactions` now caches the parsed stoichiometry on each
+  `KeggReaction.stoichiometry`; `build_reference_model` reuses it instead of
+  re-parsing.
+
+**Dead code (E):**
+* Dropped `KeggReaction.modules` and `.rhea` (parsed but never consumed).
+* Dropped the vestigial `only_genes_in_models` parameter from `_ortholog_map`.
+
+Six new regression tests; the only one without a test is the `.part` atomic
+download (defensive, needs urlopen mocking).
+
+## Quality sweep — known-issues section B
+
+Closed all four "silent misbehaviour" items from [docs/known_issues.md](docs/known_issues.md):
+* `merge_models` warns on `formula` / `charge` conflicts when two source models
+  share a name[comp] but disagree (used to silently keep the first-seen).
+* `add_reactions_from_equations` warns when creating a metabolite in an
+  unregistered compartment — both the `mets_by="id"` and `mets_by="name"` paths
+  (id-mode used to skip the check entirely, an asymmetry).
+* `parse_task_list` warns when continuation data appears before any task ID
+  has been seen (used to silently drop the orphan row).
+* `export_model_to_sif` warns up front when a custom label map sends two
+  distinct ids to the same label (used to silently collapse nodes).
+Four new regression tests cover them.
+
+## Quality sweep — known-issues section A
+
+Closed all six "latent edge-case bug" items from [docs/known_issues.md](docs/known_issues.md):
+* `add_reactions_from_equations` no longer misparses `"2 oxoglutarate"` (or any
+  leading-number metabolite name) — the resolver tries the full token before
+  splitting off a coefficient.
+* `add_reactions_from_equations` warns when an equation's terms cancel to a
+  zero-metabolite reaction.
+* `add_reactions_from_model` tracks ids minted within the batch so two source
+  metabolites whose ids both collide with the draft don't collapse onto the
+  same generated id.
+* `add_transport_reactions` warns on duplicate metabolite names in the source
+  or target compartment instead of silently dropping all but one.
+* `connect_blocked_reactions` membership-guards the FVA result before
+  `.at[]` lookup.
+* `assign_kos` rejects `cutoff >= 1` up front — would have crashed inside the
+  ratio filter at `log(best_evalue) == 0`.
+Six new regression tests cover the user-reachable cases.
+
+## Phase 7 — Localization
+
+* **Sub-cellular localisation by MILP.** [`localization.predict_localization`](src/raven_python/localization/predict.py)
+  + [`apply_localization`](src/raven_python/localization/predict.py). Deterministic (not simulated
+  annealing); caller-passed `reactions_to_relocate` set with everything else pinned;
+  incomplete-model tolerant (no silent reaction removal); `apply=False` returns a diff
+  preview; multi-compartment by default with primary-free, extras-penalised scoring.
+* **Predictor loaders.** [`load_wolfpsort`, `load_deeploc`](src/raven_python/localization/scores.py),
+  with the `gene × compartment` DataFrame contract open for any predictor.
+* **Compartment helpers** ([`manipulation/compartments.py`](src/raven_python/manipulation/compartments.py)):
+  `merge_compartments`, `copy_to_compartment` — useful standalone for model curation.
+* **Real-data validation on yeast-GEM** ([docs/yeast_localization_benchmark.md](docs/yeast_localization_benchmark.md))
+  — accuracy 0.72 → 0.39 on 298 GPR'd reactions as confident predictor mis-scoring rises
+  from 0 % to 50 %; perfect on compartments with disjoint gene sets (c/g/lp/p/v/vm), and
+  surfaces a `transport_cost` calibration insight for soft-probability score tables.
+
+## Phase 5 — Data integration & analysis
+
+* **Reporter metabolites, FSEOF, random sampling** ([`analysis/`](src/raven_python/analysis/)).
+* **HPA omics ingestion** ([`omics.parse_hpa`, `parse_hpa_rna`, `hpa_gene_scores`, `rna_gene_scores`](src/raven_python/omics/hpa.py))
+  — pandas-tidy DataFrames replace RAVEN's sparse-matrix layout; scoring adapters reuse the
+  existing GPR walk.
+* **N-model comparison** ([`comparison.compare_models`](src/raven_python/comparison/compare.py)).
+* **Dynamic FBA** is **not ported** — established Python packages cover it (`dfba`,
+  `reframed`, `mewpy`).
+
+## Phase 4d — ftINIT
+
+* **ftINIT pipeline** ([`init.ftinit`](src/raven_python/init/ftinit.py)) — staged MILP, linear merge,
+  task-aware gap-filling, gene pruning.
+* **Validated against MATLAB RAVEN on Human-GEM.** 5 Hart2015 cell-line models;
+  Jaccard 0.973–0.977 (no-task) and 0.978–0.980 (task-constrained). See
+  [docs/humangem_validation.md](docs/humangem_validation.md).
+* **Parameter calibration & input-robustness study** ([docs/init_param_calibration.md](docs/init_param_calibration.md))
+  — `mip_gap=0.01` is the genome-scale full-pipeline sweet spot (~37% faster than 0.001 at
+  Jaccard 0.995); pipeline is robust to expression noise (Jaccard 0.92–0.95) but sensitive
+  to sparsity (50–70% dropout → Jaccard 0.59–0.71); the task + gap-fill layer keeps the
+  essential-task pass-rate at 67–69/69 across the gradient, whereas tINIT-without-it passes
+  only 35/69 even on clean data.
+* **Cross-solver portability** ([docs/init_solver_benchmark.md](docs/init_solver_benchmark.md))
+  + [`tests/test_init_solvers.py`](tests/test_init_solvers.py): Gurobi and GLPK pass at toy
+  scale; only Gurobi is viable at genome scale today (HiGHS hits an upstream optlang
+  `clone()` bug; GLPK ignores `configuration.timeout` on MIP).
+* **Engineering wins surfaced by the genome-scale work:** `check_tasks` and
+  `fill_tasks._feasible` rewritten in-place (~12× each); `optlang.symbolics.add` builds
+  in the MILP construction (the O(n²) sympy `sum()` blow-up was the original genome-scale
+  blocker); bounded gap-fill MILP; `rescaleModelForINIT` ported.
+
+## Phase 4c — tINIT
+
+* **INIT MILP and the tINIT pipeline** ([`init.run_init`](src/raven_python/init/init.py),
+  [`init.get_init_model`](src/raven_python/init/build.py)). Clean optlang reformulation;
+  RNA-seq scoring via `5·ln(level/ref)`-clamped.
+
+## Phase 4b — Gap-filling
+
+* **Connectivity gap-filling** ([`gapfilling.connect_blocked_reactions`](src/raven_python/gapfilling/fill.py))
+  — MILP. Targeted (toward objective) mode delegates to `cobra.gapfill`.
+
+## Phase 4a — Metabolic tasks
+
+* **Task list parsing + `check_tasks`** ([`tasks/`](src/raven_python/tasks/)).
+
+## Phase 3 — Reconstruction
+
+* **Homology-based draft** from a template GEM + BLAST/DIAMOND wrappers
+  ([`reconstruction/homology/`](src/raven_python/reconstruction/homology/)) — with structured
+  improvements over RAVEN's `getModelFromHomology` (see IMPROVEMENTS H1–H6).
+* **KEGG five-step pipeline** ([`reconstruction/kegg/`](src/raven_python/reconstruction/kegg/)):
+  dump → parser → HMM library builder → species model → HMM-query draft.
+* **MetaCyc reconstruction** **not ported** (and flagged for removal from MATLAB RAVEN —
+  see IMPROVEMENTS R-MetaCyc).
+
+## Phase 2 — I/O
+
+* **YAML** aligned to cobra's `!!omap` writer + RAVEN-only fields preserved into `.notes`,
+  plus geckopy `ec-*` for enzyme-constrained models
+  ([`io/yaml.py`](src/raven_python/io/yaml.py)).
+* **SIF**, **Excel export**, and **Standard-GEM `model/<fmt>/…` git layout**
+  ([`io/`](src/raven_python/io/)). Excel import intentionally excluded.
+
+## Phase 1 — Foundation
+
+* **GPR / balance / validation / parsing helpers** ([`utils/`](src/raven_python/utils/)) —
+  cobra-absent bits only; the rest are cheatsheeted.
+* **Manipulation ergonomic layer** ([`manipulation/`](src/raven_python/manipulation/)) —
+  add/change/remove/transport/transfer/merge/simplify/variance + adopted transforms.
+* **External-binary resolver** ([`binaries.py`](src/raven_python/binaries.py)) — version-pinned
+  release-ZIP registry, SHA256-verified cache.
+
+## Phase 0 — Scaffold
+
+* Project structure, packaging, pytest skeleton, license alignment with MATLAB RAVEN
+  (GPL-3.0-or-later).
diff --git a/IMPROVEMENTS.md b/IMPROVEMENTS.md
new file mode 100644
index 0000000..a6a9877
--- /dev/null
+++ b/IMPROVEMENTS.md
@@ -0,0 +1,339 @@
+# Proposed improvements over RAVEN
+
+This project is **not** a one-to-one MATLAB→Python transcription. Where a RAVEN function can be
+made smarter/faster, or where a logical gap in RAVEN's feature set is worth filling, we record the
+change here — with enough detail that it can **also be back-ported to MATLAB RAVEN** later.
+
+Each entry states: what RAVEN does today, the proposed improvement, the rationale, and whether it
+is a candidate to upstream into MATLAB RAVEN.
+
+Categories:
+- **EFFICIENCY** — same behavior, faster/smarter implementation.
+- **ERGONOMICS** — same job, less friction / fewer foot-guns / clearer contract.
+- **NEW** — functionality RAVEN lacks but that fits naturally alongside what it already has.
+- **REMOVAL** — functionality that should be dropped (here *and* in MATLAB RAVEN) because it does
+  more harm than good.
+
+Status legend: 💡 proposed · 🔨 implemented in raven-python · ⬆️ upstreamed to MATLAB RAVEN ·
+🗑️ dropped (and to remove from MATLAB RAVEN)
+
+---
+
+## R-MetaCyc: drop MetaCyc reconstruction (REMOVAL — also remove from MATLAB RAVEN)
+
+**Decision 2026-05-24:** MetaCyc-based reconstruction is **not ported to raven-python** and should be
+**removed from MATLAB RAVEN**. Status: 🗑️.
+
+**What RAVEN does:** `getMetaCycModelForOrganism` builds a draft by BLAST/DIAMOND of the query
+proteome against `protseq.fsa` — MetaCyc's **single representative protein sequence per enzyme**
+(~11.6k sequences) — keeping each gene's best hit above a bitscore/positives cutoff and assigning
+the linked reaction. With one representative per enzyme there is no profile to tell true family
+members from look-alikes.
+
+**Evidence (this repo, real MetaCyc + KEGG 118 data):** a leave-organism-out precision/recall test
+(query each representative against the others, excluding its own organism; ground truth =
+MetaCyc's own MONOMER→reaction):
+
+| bitscore (ppos≥45) | reaction precision | EC-family precision | EC recall |
+|---|---|---|---|
+| 50 | 0.34 | 0.55 | 0.33 |
+| 100 *(RAVEN default)* | 0.36 | 0.59 | 0.32 |
+| 200 | 0.40 | 0.62 | 0.26 |
+| 300 | 0.44 | 0.65 | 0.22 |
+
+At the default cutoff **~64 % of reaction assignments are wrong** (~41 % wrong even at EC-family
+level); **no cutoff rescues precision** — tightening to bitscore 300 reaches only ~44 %/65 % while
+recall halves. Real proteomes (with non-enzyme decoys, not in this test) would be worse. Test
+scripts/artifacts: `/home/eduardk/metacyc_test/` (not committed).
+
+**Why drop rather than fix:** the low precision is intrinsic to MetaCyc's one-representative-per-
+enzyme data (can't build KEGG-quality HMMs from it). Accurate gene-calling already exists via KEGG
+HMMs (3b) and homology-to-template-models (3a). MetaCyc's genuine value (extra reactions/pathways/
+compound structures) does not justify a separate, data-heavy, low-precision track.
+
+**MATLAB RAVEN removal list** (`external/metacyc/`): `getMetaCycModelForOrganism.m`,
+`getModelFromMetaCyc.m`, `getRxnsFromMetaCyc.m`, `getMetsFromMetaCyc.m`, `getEnzymesFromMetaCyc.m`,
+`linkMetaCycKEGGRxns.m`, `addSpontaneousRxns.m`, and data `metaCycEnzymes.mat` / `metaCycMets.mat`
+/ `metaCycRxns.mat` / `protseq.fsa`; plus any `combineMetaCycKEGGModels` and MetaCyc references in
+tutorials/tests/docs. (`addSpontaneousRxns` could be kept as a small standalone helper if wanted —
+it is only incidentally in the MetaCyc folder.)
+
+---
+
+## getModelFromHomology (Phase 3a — implemented)
+
+Design + rationale in [docs/plan_get_model_from_homology.md](docs/plan_get_model_from_homology.md);
+implemented in `reconstruction/homology/homology.py`. *Logic* improvements over RAVEN's algorithm
+(RAVEN's own comments flag several of these spots as uncertain).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| H1 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | Split the overloaded `strictness` 1/2/3 into two orthogonal params: `bidirectional` (reciprocal hits) and `best_hits_only`. RBH = both true. `strictness=` kept as a compat alias. |
+| H2 | EFFICIENCY (robustness) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | Rewrite GPRs on the **cobra GPR AST**, not `regexprep` string substitution — eliminates partial-match hazards and the `OLD_… or` regex cleanup pass RAVEN needs. |
+| H3 | ERGONOMICS (correctness) | raven-python 🔨 | 🔨 | Explicit `complex_policy` (default **`flag`** = RAVEN-compatible `OLD_`; plus `keep`/`drop`) for AND-subunits lacking an ortholog, via correct OR/AND **AST** semantics. |
+| H4 | (correctness) | both 🔨/💡 | 🔨 | Best-hit selection by **bitscore** (db-size-independent, the RBH standard); `score="evalue"` optional. |
+| H5 | EFFICIENCY | raven-python 🔨 | 🔨 | DataFrame ortholog map (pandas merge + dict) replaces `allGenes`/`allTo`/`allFrom` sparse-matrix `sub2ind` index juggling. |
+| H6 | NEW | raven-python 🔨 | 🔨 | Structured provenance: `HomologyResult.gene_map` + per-reaction `notes['homology_source']`. |
+
+## KEGG download / dump parsing / HMM build (Phase 3b.1 / 3b.2 / 3b.3 — implemented)
+
+`fetch_keggdb.sh` → `reconstruction/kegg/download.py` (3b.1); parsing core of
+`getRxnsFromKEGG` / `getMetsFromKEGG` / `getGenesFromKEGG` / `getModelFromKEGG`
+→ `reconstruction/kegg/parse.py` (3b.2); `constructMultiFasta` + the
+cluster/align/train stages of `getKEGGModelForOrganism` → `reconstruction/kegg/hmm.py`
+and `taxonomy.py` (3b.3). Maintainer-side, build-time tooling (PLAN.md §2.3b).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| K1 | EFFICIENCY (robustness) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | Read each reaction's equation from its **own `EQUATION` field**, not from `reaction.lst` matched by line order. RAVEN reads `reaction.lst` line *i* into reaction *i*, assuming the two files stay perfectly aligned — brittle. **MATLAB back-port:** parse the `EQUATION` field already present in `reaction`. |
+| K2 | ERGONOMICS (correctness) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | Undefined-stoichiometry terms (`n C00001`, `(n+1) C00002`) keep their **real compound id** with coefficient 1 and the reaction is *flagged*, instead of minting `"n C00001"` pseudo-metabolites later renamed `undefined_N`. Cleaner metabolite graph; flag still drives the `keep*` filters. |
+| K3 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | Reaction quality labels become a tidy boolean **`rxn_flags` table** (spontaneous/undefined-stoich/incomplete/general) instead of free-text appended to `rxnNotes`, so downstream filters join on a column rather than substring-matching notes. |
+| K4 | EFFICIENCY | raven-python 🔨 | 🔨 | **Gene-free reference model** + separate `organism_gene_ko` table (the big one), instead of RAVEN's giant `rxnGeneMat` baked into the global model. Per-organism GPRs are built only at runtime (3b.4/3b.5), keeping the published artefact small. |
+| K5 | EFFICIENCY (portability) | raven-python 🔨 | 🔨 | **KEGG download in pure Python stdlib** (`urllib`/`tarfile`/`gzip`/`netrc`), porting `fetch_keggdb.sh`. Drops the script's `wget`/`tar`/`gunzip` (and Cygwin-on-Windows) requirement, so it runs unchanged on Linux/macOS/Windows; tar extraction uses the `data` filter (no path traversal); same `~/.netrc` credential hygiene. The arrange step is split out (`extract_kegg_dump`) so it's network-free and unit-tested. |
+| K6 | EFFICIENCY | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Per-KO multi-FASTA via a stdlib offset index** (`_index_fasta` → seek), replacing `constructMultiFasta`'s Java-`Hashtable` byte scan with 5M-element preallocation. One streaming pass, only wanted ids retained; no MATLAB/Java heap tuning. |
+| K7 | EFFICIENCY | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Concatenate per-KO HMMs and `hmmpress` into one pressed library**, so the query path (3b.5) runs a single `hmmscan` against the database instead of RAVEN's thousands of per-KO `hmmsearch` invocations. |
+| K8 | EFFICIENCY (scope) | raven-python 🔨 | 🔨 | **Drop the `getPhylDist` distance matrix.** Its only uses in RAVEN were per-organism HMM-sequence subsampling (`maxPhylDist`/`nSequences`) and the kingdom filter. Our fixed prok90/euk90 libraries (3b.3) remove the subsampling rationale, and domain mode (3b.4) uses the taxonomy domain classification directly — so the O(n²) matrix is never built. Simpler, faster, less code. |
+| K9 | EFFICIENCY (memory) | raven-python 🔨 | 🔨 | **Stream `organism_gene_ko` to disk** in `parse_kegg_dump` instead of building it in memory. Real KEGG has **9.05M** gene↔KO associations; the in-memory DataFrame build OOMs in a few GB. Streaming (now via the external merge sort of K14) runs the full parse with flat, bounded peak memory. (Found by validating against a real KEGG FTP dump.) |
+| K10 | EFFICIENCY (size) | raven-python 🔨 | 🔨 | **Reference model as gzipped RAVEN/cobra YAML** (`reference_model.yml.gz`) rather than SBML: RAVEN-native, MATLAB-readable, and ~1.1 MB vs ~30 MB SBML for the real 12k-reaction model. Made `io/yaml.py` gzip-aware on a `.gz` suffix (general-purpose). |
+| K11 | ERGONOMICS | raven-python 🔨 | 🔨 | **`ensure_data`** (`data.py`): version-pinned registry that fetches/verifies/caches the published KEGG artefacts under `~/.cache/raven-python/data/`, mirroring `ensure_binary`. End users get a draft model with no KEGG access and no manual data handling — the `…_from_artefacts` entry points auto-fetch when no local dir is supplied. |
+| K12 | EFFICIENCY | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Fast MAFFT (FFT-NS-2) for HMM training** instead of RAVEN's `--auto`, which selects slow iterative refinement (`dvtditr`) on medium/large KOs — observed ~2.5 min/KO (days for a domain) on real KEGG 118. FFT-NS-2 (`--retree 2 --maxiterate 0`) is seconds/KO and ample for profile-HMM building. **PartTree cutover is residue-based and memory-auto-tuned**: MAFFT memory tracks residues (count × length), not sequence count, so a count threshold let long-protein KOs (K00901: 2,788 seqs, 2.55 M residues) OOM under FFT-NS-2 — measured ~5 GB MAFFT RSS with FFT-NS-2 vs **0.69 GB with PartTree** for the same alignment. The cutover is **length-aware and memory-auto-tuned**: FFT-NS-2 memory is driven by the progressive-alignment **DP cost ≈ n_seqs × mean_len²** (= residues²/n_seqs), *not* residue count — a few hundred long proteins cost far more than the same residues in many short ones. (First tried a residue-only model `RSS≈1.32R²+1.84R`; it then OOM'd on K12047 — 452 seqs but mean length 2082, 0.94 M residues — because long proteins blow the per-residue cost.) Calibrated `RSS_GB ≈ 4.2e-9 × (n_seqs × mean_len²)` across real KEGG KOs (250k/266→0.67 GB … 1.5M/1624→5.73 GB; K12047 cost 1.96e9 = the largest, hence its OOM). `_auto_cost_budget` switches to PartTree when the DP cost exceeds `0.65 × (total − 2.5 GB overhead) / 4.2e-9` (≈7.9e8 on a 7.6 GB box), **warns on low-memory hosts**, and `parttree_residues` overrides with a manual residue cutoff. Back-portable to RAVEN. |
+| K13 | EFFICIENCY | raven-python 🗑️ | 🗑️ | ~~Per-KO sequence cap (`max_sequences`)~~ — **removed.** Briefly added as a count-based cap, but the residue-based PartTree cutover (K12) bounds MAFFT memory without dropping any sequences, so the cap was redundant complexity. All deduplicated sequences are kept. |
+| K14 | EFFICIENCY (size) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Sort `organism_gene_ko` by `(organism, gene)` and store it xz-compressed** (`organism_gene_ko.tsv.xz`), cutting the dominant artefact **≈78 → 27 MB (2.9×)**. Gene IDs within an organism share long prefixes (locus tags, numeric runs), so sorting makes them adjacent and far more compressible (sort alone: 78→48 MB; xz vs gzip captures the cross-row redundancy gzip's 32 KB window misses: →27 MB). The sort is an **external merge sort** bounded to `chunk_rows` rows in memory (sorted runs spooled to gzipped temp files, merged with `heapq.merge`), so it keeps K9's flat memory profile. Both `lzma` and `gzip` are Python stdlib (native on Windows/macOS/Linux, no extra binary); small tables stay gzipped TSV (MATLAB-native), only the big one is xz (MATLAB needs an external `unxz`). Sorted order also matches the by-organism query in `get_kegg_model_for_organism`, enabling a future `searchsorted` slice instead of loading all 9M rows. Back-portable to RAVEN. |
+| K15 | ERGONOMICS (correctness) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Recalibrate the HMM-query KO-assignment defaults** (`assign_kos`): cut-off `1e-50 → 1e-30`, `min_score_ratio_g 0.8 → 0.9`; `min_score_ratio_ko` left at 0.3 but **documented as empirically inert**. Cross-validated the full 3b.5 pipeline against the true KEGG gene→KO annotation of four organisms across both libraries and the well-/lesser-studied axis — *S. cerevisiae*, *Cyanidioschyzon merolae* (red alga), *E. coli* K-12, *Mycoplasma genitalium* (minimal genome). Real annotations score overwhelmingly (median E ≈ 1e-100…1e-155; even the weakest 1% ≈ 1e-15…1e-36) while spurious hits cluster at ≈1e-8 — a ~20-order-of-magnitude gap. RAVEN's `1e-50` therefore sits **inside the true-positive tail** and silently drops real-but-divergent hits for no noise-rejection gain: gene→KO recall on *M. genitalium* was only 0.84 (reaction recall 0.87). At `1e-30` + `ratio_g=0.9`: *M. genitalium* recall **0.84→0.94** (rxn 0.87→0.97), *E. coli* 0.95→0.97 with **fewer** unannotated reactions (198→173, the tighter gene-ratio prunes spurious multi-KO genes), *S. cerevisiae*/*C. merolae* held or improved. The three sweep tables showed `min_score_ratio_ko` produced identical output at 0.0/0.3/0.5 across all four organisms — a magic-number knob that does nothing; `min_score_ratio_g` is the real precision lever. Full numbers in [docs/kegg_hmm_cutoff_calibration.md](docs/kegg_hmm_cutoff_calibration.md) (reproduce with `scripts/analyze_hmm_cutoffs.py`). Back-portable to RAVEN. |
+
+## FSEOF (Phase 5 — implemented, redesigned)
+
+RAVEN `core/FSEOF.m` → `analysis/fseof.py` (`fseof`). User was unhappy with RAVEN's
+output; redesigned substantially.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| FS1 | CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Robust trend via linear regression** (slope + correlation) over the whole scan, instead of RAVEN's strict step-by-step monotonicity that discards a target on a single noisy step (LP alternative optima). |r| is a quality score for filtering. pFBA per step keeps the scan stable. |
+| FS2 | NEW | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Reports knockdown/knockout targets**, not just amplification. RAVEN only flags reactions whose \|flux\| *rises* with the enforced product; reactions driven *toward zero* — the down-regulation/deletion candidates, arguably the most actionable — are classified here (`knockdown`/`knockout`). |
+| FS3 | ERGONOMICS | raven-python 🔨 | 🔨 | **Gene-level aggregation** (`gene_targets`) mapping reaction targets to genes, plus the **full flux scan** retained — all as DataFrames, vs RAVEN's printed TSV + endpoint-only slope. |
+| FS4 | CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | Slope is the **regression slope** consistent with the selection criterion, not RAVEN's endpoint difference that disagreed with its own monotonicity filter. **Proposed back-port:** record the enforced target-flux level at each iteration into a vector and replace both the reported slope `abs(fseof.results(num,iterations) - fseof.results(num,1)) / abs(targetMax - targetMax/iterations)` and the `targets.slope` field with `polyfit(enforcedFlux, fseof.results(num,:), 1)` slope across all iterations. The label classification (amplify/knockdown/knockout, see FS2) should also use the slope of `|flux|` instead of comparing endpoints. |
+
+## randomSampling (Phase 5 — implemented)
+
+RAVEN `core/randomSampling.m` → `analysis/sampling.py` (`random_sampling`). The
+random-objective / extreme-point method of Bordel et al. (2010) — **not** what
+`cobra.sampling` (OptGP/ACHR) does (those draw a near-uniform MCMC sample of the
+polytope interior), so it is a genuine addition, and it was wrongly listed as
+cobra-covered in the PLAN cheatsheet. Each sample maximises a small random linear
+combination of reactions.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| SAMP1 | EFFICIENCY | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **`good_reactions` (loop-free, flux-carrying objective candidates) via one `cobra` FVA pass** rather than RAVEN's hand-rolled per-reaction `parfor` that solves a separate LP maximising and minimising every reaction. FVA computes the same min/max ranges, optimised and optionally `loopless` (`cycleFreeFlux`), in far less code. |
+| SAMP2 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Reproducible** (`seed`) and **`n_objectives` is a parameter** — RAVEN has no seed and hard-codes 2 objective reactions (its docstring even claims 3, a transcription bug worth fixing upstream). |
+| SAMP3 | ERGONOMICS | raven-python 🔨 | 🔨 | **Output is a samples × reactions DataFrame** (the `cobra.sampling` layout, directly usable with pandas/`analyzeSampling`-style stats) plus the reusable `good_reactions` list — instead of a reactions × samples matrix and a parallel `goodRxns` index vector that the caller must re-thread. |
+| SAMP4 | CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **`replace_max_bound` (RAVEN's `replaceBoundsWithInf`) defaults off and is scoped to sampling only.** It applies *after* `good_reactions` is found (FVA cannot evaluate `inf` bounds — it errors as 'unbounded'), and it can open unbounded loop directions; documented to pair with `min_flux`. RAVEN ran goodRxns detection on the inf-replaced model, conflating "loop reaction" with "unbounded objective". |
+
+## reporterMetabolites (Phase 5 — implemented)
+
+RAVEN `core/reporterMetabolites.m` → `analysis/reporter.py` (`reporter_metabolites`).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| RM1 | EFFICIENCY + CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Exact closed-form background correction** instead of RAVEN's 100 000-random-set Monte Carlo *per neighbour-count*. RAVEN samples with replacement from the scored-gene Z pool, so a random aggregate `Σz/√n` provably has mean `√n·μ` and std `σ` — the corrected score is exactly `(metZ − √n·μ)/σ`. Removes the slow sampling **and** its run-to-run randomness (deterministic results); back-portable to RAVEN. |
+| RM2 | ERGONOMICS | raven-python 🔨 | 🔨 | Returns a sorted **DataFrame** per test (`all`/`up`/`down`) and takes gene→p-value / gene→fold-change **dicts**, vs RAVEN's parallel arrays + struct array + print/file side-effects. Neighbour genes come from cobra's metabolite→reaction→gene graph (no `rxnGeneMat`). |
+
+## runINIT (Phase 4c — MILP core implemented)
+
+RAVEN `INIT/runINIT.m` → `init/init.py` (`run_init`).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| I1 | ERGONOMICS | raven-python 🔨 | 🔨 | **Clean optlang reformulation** of the INIT MILP instead of RAVEN's hand-built sparse `prob.A`/`blc`/`buc`/`vartype` arrays + fake "FAKEFORPM" metabolites. Standard include-indicator form `eps·x ≤ v ≤ ub·x` with objective `max Σ score·x + prod_weight·Σ sink`. Far more readable/reviewable; functional equivalence is the bar (PLAN §0). |
+| I2 | ERGONOMICS | raven-python 🔨 | 🔨 | **`no_rev_loops` as a single `x_fwd + x_rev ≤ 1`** per reversible reaction, replacing RAVEN's auxiliary A/B/C metabolites with int1/int2 reactions and `C ub=-1` construction. Same effect (no spurious forward/back connectivity loop), a fraction of the machinery. |
+| I3 | ERGONOMICS | raven-python 🔨 | 🔨 | **`present_mets` producibility via a small LP feasibility test** (sum of compartment-form drains ≥ 1), instead of mutating the live MILP's RHS one metabolite at a time. |
+| I4 | CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **MILP big-M is each reaction's own `ub`** (`v ≤ ub·x`), not RAVEN's fixed 1000; and `eps`/`prod_weight` are exposed parameters. RAVEN's hard-coded 1/1000/0.1/0.5 only suit ±1000-bounded models with O(1) scores — flagged as scale-dependent and tunable (don't blindly trust them). |
+| I5 | ERGONOMICS | raven-python 🔨 | 🔨 | **Predictor-agnostic scoring**: `get_init_model` takes gene *or* reaction scores; gene scoring is generic (`gene_scores_from_expression` for the common RNA-seq path), so single-cell/HPA are just alternative upstream sources feeding the same gene→score table — rather than RAVEN's HPA/array-specific structs baked into `getINITModel`. |
+
+## ftINIT (Phase 4d — in progress)
+
+RAVEN `INIT/ftINITInternalAlg.m` (+ orchestration) → `init/ftinit.py` (`run_ftinit`),
+`tasks/check.py` (`find_task_essential_reactions`). See docs/ftinit_review_and_plan.md.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| FT1 | EFFICIENCY | raven-python 🔨 | 🔨 | **Essential-reaction discovery via one FVA pass** (`find_task_essential_reactions`) — a reaction is essential for a task iff its FVA range excludes 0 (= RAVEN's constrain-to-0→infeasible), restricted to the flux-carrying candidates of a `pfba` solution. Replaces `getEssentialRxns`' per-reaction knockout loop. |
+| FT2 | ERGONOMICS | raven-python 🔨 | 🔨 | **Clean optlang reformulation** of the 6-category MILP instead of RAVEN's hand-built block `prob.a` with the `pi/ni/ei/vprb/vnrvm…` figure. Positive-score reactions keep the continuous-indicator trick (no binary — the ftINIT speedup), encoded as `net_flux ≥ force_on·y`; only negative scores and reversible-direction get binaries. |
+| FT3 | CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Big-M is each reaction's own bound**, not RAVEN's fixed 100/1000; `force_on`/`force_on_ess` exposed. Scale-dependent, calibrated in 4d.7 (cf. I4). |
+| FT4 | (caveat) | RAVEN parity | — | **No loopless constraint** (matches RAVEN): the bare MILP can "include" an internal thermodynamically-infeasible cycle if it carries positive net score. Loop-free models rely on the staged pipeline + exchange handling and, at genome scale, real exchanges making cycles non-optimal. A loopless option could be added later. |
+| FT9 | CORRECTNESS (robustness) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Per-reaction essential forcing, clamped to capacity** (4d.7 calibration). Staging fixes previous-step reactions as essential; RAVEN forces them at `min(0.99·\|prev flux\|, 0.1)` so none is forced above the flux it carried (`essential_force`). On top, the forced magnitude is **clamped to the reaction's bound** so a low-capacity essential never produces an `lb>ub` error — it is forced to its capacity instead (RAVEN would error). |
+| FT-met | (deferred) | — | ⬜ | **Metabolomics production bonus (4d.6) deferred.** The linear merge eliminates degree-2 detected metabolites, so it needs RAVEN's producer-group-mapping + `mon`/`vnrbm`/`vnrvm`/`vnim` negative-producer force-flux block — the most intricate MILP in ftINIT, for its least-used input. `ftinit(metabolomics=…)` raises `NotImplementedError`. |
+| FT8 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **`remove_low_score_genes`** (port of `removeLowScoreGenes`): prune negative-scoring genes from GPRs via a recursive walk of **cobra's GPR AST** — isozymes (OR) dropped keeping ≥1 (least-negative if all negative), complex subunits (AND) kept intact, nested groups handled — instead of RAVEN's regex subset-extraction with `#n#` placeholders. The all-negative tie-break is **deterministic** (reproducible) vs RAVEN's random. Wired into `ftinit(gene_scores=…)` (with `fill_tasks` for gap-filling) to complete the pipeline. Matches RAVEN's three docstring example cases. |
+| FT7 | EFFICIENCY | raven-python 🔨 | 🔨 | **Task gap-filling** (`fill_tasks`, port of `ftINITFillGapsForAllTasks` + the fill MILP): per task, only if it is **infeasible** in the current model (a cheap LP gates the MILP), add the minimum-cost set of reference reactions (cost = `−min(score, −0.1)`) to satisfy the task's ranged metabolite bounds; accumulate across tasks; exchange reactions excluded as candidates. Built on the shared `apply_task_constraints` (factored out of `check_tasks`) + an optlang on/off MILP, instead of RAVEN's hand-built `[S pos neg int b var]` block and the custom `rxnScores`-field-on-the-model hack. Matches RAVEN tinitTests T0003 (gap at R7 → R7 added back). |
+| FT6 | ERGONOMICS | raven-python 🔨 | 🔨 | **Staged pipeline** (`prep_init_model` → `PrepData`, `get_init_steps`/`InitStep`, `ftinit`): one-time reaction classification (`classify_reactions` = the `toIgnore` masks) + essential discovery + linear merge bundled into `PrepData`, reused per sample; the staged `'1+1'`/`'2+1'`/`full` schedule runs `run_ftinit` per step with previous-step reactions fixed as essential **in their flux direction** (carried as `essential_directions`, no model-flipping). RAVEN's Gurobi-specific per-step MIPGap retry schedule is dropped (solver-agnostic; tuning → 4d.7). Matches RAVEN tinitTests T0001 (no tasks → {R1,R4,R6,R8,R9,R10}; with R7/R10 spontaneous → {R1,R2,R4,R6,R7,R8}) and T0002 (task → essentials {R1,R7}, model {R1,R2,R4,R6,R7,R8,R9,R10}). |
+| FT5 | EFFICIENCY | raven-python 🔨 | 🔨 | **Linear merge** (`merge_linear` + `group_rxn_scores`, port of `mergeLinear`/`groupRxnScores`): contracts degree-2-metabolite reaction chains (one producer, one consumer; reversibles included) into single reactions, losslessly shrinking the MILP (~⅓ fewer reactions on Human-GEM). Clean Python on a working representation with reversibility ≡ `lb<0` (RAVEN's `rev1·rev2` falls out of the most-constraining bound recompute); drops genes and objective on the reduced model (it exists only to feed the MILP, which scores via `group_rxn_scores`). Matches RAVEN's tinitTests T0004 exactly (group ids, merged bounds/reversibility, flipped reactions, grouped scores incl. the 0→0.01 handling). |
+
+## parseTaskList / checkTasks (Phase 4a — implemented)
+
+RAVEN `core/parseTaskList.m` + `core/checkTasks.m` → `tasks/tasklist.py` + `tasks/check.py`.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| T1 | ERGONOMICS | raven-python 🔨 | 🔨 | **Structured `Task` dataclass + `TaskResult`** instead of RAVEN's parallel-array struct and a printed report; programmatic access to per-task pass/fail/feasibility/error. |
+| T2 | ERGONOMICS | raven-python 🔨 | 🔨 | **TSV-first task files** (stdlib `csv`); `.xlsx` still supported but via the lazy `[excel]` extra — no hard Excel dependency just to read a task list. |
+| T3 | EFFICIENCY | raven-python 🔨 | 🔨 | Inputs/outputs imposed directly on cobra's **metabolite mass-balance constraint bounds** (the analogue of RAVEN's two-column `model.b`), and existing boundary reactions are auto-closed — so a model with open exchanges is handled correctly (RAVEN assumes a closed model and silently misbehaves otherwise). |
+
+## fillGaps (Phase 4b — implemented)
+
+RAVEN `core/fillGaps.m`. Only the **connectivity** mode is ported, as
+`connect_blocked_reactions` ([gapfilling/fill.py](src/raven_python/gapfilling/fill.py)) —
+MILP via cobra/optlang (GLPK). RAVEN's other mode (fill to make the objective feasible)
+is `cobra.flux_analysis.gapfill` and is **cheatsheeted, not re-wrapped** (PLAN §1).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| GF1 | NEW (vs cobra) | raven-python 🔨 | 🔨 | **Connectivity gap-fill has no cobra equivalent**: add the minimum-penalty set of template reactions so *blocked* draft reactions can carry flux (cobra's `gapfill` only fills toward the objective). Ported as `connect_blocked_reactions` — a name that avoids confusion with `cobra.gapfill` and says what it does, vs RAVEN's overloaded `fillGaps(useModelConstraints=...)` boolean. |
+| GF2 | ERGONOMICS | raven-python 🔨 | 🔨 | **Templates matched by `name[comp]`** (via `add_reactions_from_model`), so a template in a different identifier namespace than the draft still contributes — as RAVEN's name-based merge does. (For the targeted `cobra.gapfill` path, ids must be aligned first, since cobra matches by id — noted in the cheatsheet.) |
+
+## addRxns
+
+RAVEN `core/addRxns.m` — add reactions from equation strings (or mets+coeffs), auto-creating
+metabolites/genes. Ported as `add_reactions_from_equations`
+([manipulation/add.py](src/raven_python/manipulation/add.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| A1 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Readable matching mode instead of the `eqnType` integer.** RAVEN takes `eqnType=1/2/3` (match by id / by name in a given compartment / `name[comp]`), which is opaque at call sites. raven-python uses `mets_by="id"\|"name"` and auto-detects `name[comp]` per token. **MATLAB back-port:** accept a string keyword. |
+| A2 | ERGONOMICS (bug-class) | raven-python 🔨 | 🔨 | **Error on duplicate reaction IDs explicitly.** RAVEN errors; cobra's `add_reactions` *silently ignores* a duplicate. raven-python keeps RAVEN's stricter behaviour (raise) rather than cobra's silent drop. |
+| A3 | EFFICIENCY (reuse) | raven-python 🔨 | 🔨 | **Delegate equation/arrow/coefficient parsing and gene/met creation to cobra** (`build_reaction_from_string` semantics, GPR auto-creation) instead of re-implementing RAVEN's `constructS`/`addGenesRaven`. Only the genuinely cobra-absent pieces (name matching, compartment for new mets, strict policies) are hand-written. |
+| A4 | NEW | both 💡 | 💡 | **Infer compartment from a structured metabolite ID** (e.g. `atp_c` → `c`) as an alternative to requiring `compartment`. Not yet implemented; would reduce boilerplate for SBML-style IDs. Revisit alongside `addMets`. |
+
+## changeGrRules
+
+Ported as `change_gene_reaction_rules` ([manipulation/change.py](src/raven_python/manipulation/change.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| G8 | EFFICIENCY (reuse) | raven-python 🔨 | 🔨 | **changeGrRules: delegate gene creation + normalization to cobra.** RAVEN calls `getGenesFromGrRules` + `addGenesRaven` + `standardizeGrRules` + rebuilds `rxnGeneMat`; cobra does all of that automatically on `gene_reaction_rule =`. The port keeps only the batch loop and the append (`(old) or (new)`) option. |
+
+## simplifyModel
+
+Gap modes ported in [manipulation/simplify.py](src/raven_python/manipulation/simplify.py).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| S3 | EFFICIENCY (scope) | raven-python 🔨 | 🔨 | **Only the cobra-absent modes ported as focused functions**, not a monolithic 8-flag `simplifyModel`. `deleteMinMax`→`find_blocked_reactions`, `deleteZeroInterval`→filter+prune, `deleteUnconstrained`→moot are cheatsheeted. dead-end / duplicate / constrain-reversible / group-linear are standalone, composable functions. |
+
+## mergeModels
+
+Ported as `merge_models` ([manipulation/merge.py](src/raven_python/manipulation/merge.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| M1 | EFFICIENCY (scope) | raven-python 🔨 | 🔨 | **~560 lines of struct field-padding + manual S-matrix assembly dropped.** On `cobra.Model` the merge is just: unify metabolites by `name[comp]`, re-add reactions remapped to the merged metabolites, let cobra rebuild S and create genes. |
+| M2 | ERGONOMICS | raven-python 🔨 | 🔨 | **Provenance via `notes['origin']`** (one place) instead of three parallel `rxnFrom`/`metFrom`/`geneFrom` fields. `match_by="name"|"id"` keyword replaces RAVEN's `metParam` string. |
+
+## checkModelStruct
+
+Ported (curation subset) as `check_model` ([utils/validate.py](src/raven_python/utils/validate.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| V1 | EFFICIENCY (scope) | raven-python 🔨 | 🔨 | **Drop the struct/type/duplicate-ID/`lb>ub`/`rev` checks** — cobra's object model enforces or precludes them (DictList forbids duplicate IDs, Reaction rejects `lb>ub`, no `rev` field). Only the curation checks cobra lacks survive. |
+| V2 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Return structured `ModelIssue`s, not printed warnings** (RAVEN prints / throws). Programmatically filterable by `category`. **MATLAB back-port:** return an issues struct array. |
+
+## setParam / getElementalBalance
+
+Ported as `set_parameters` ([manipulation/parameters.py](src/raven_python/manipulation/parameters.py))
+and `get_elemental_balance` ([utils/balance.py](src/raven_python/utils/balance.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| ~~P1~~ | ERGONOMICS | — | ↩ revised | A 6-mode keyword `set_parameters` was built then **trimmed** (review: not Pythonic — it re-wrapped cobra one-liners for `lb`/`ub`/`eq`/`obj`/`unc`). Only the `var` ±% band, which cobra has no idiom for, is kept as `set_variance_bounds`; the rest are documented as cobra idioms in the §1 cheatsheet. |
+| B1 | ERGONOMICS (correctness) | raven-python 🔨 | 🔨 | **getElementalBalance: report `unknown` for missing formulas.** cobra's `check_mass_balance` silently treats a metabolite with no formula as contributing nothing, so the reaction can read as (un)balanced on incomplete data. raven-python flags those as `unknown` rather than guessing — preserving RAVEN's distinction (its `-1` status). |
+| B2 | CORRECTNESS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Empty-stoichiometry reactions report `unknown`, not vacuous `balanced`.** A reaction with no metabolites used to fall through to `balanced` (any-over-empty is False; the zero-imbalance dict is empty). Same bug in MATLAB `core/getElementalBalance.m`: with no entries in `model.S(:,j)`, `balanceStatus` stays NaN through both loops and the final `isnan→1` step labels it `balanced`. **Proposed back-port:** add an `emptyRxns = full(sum(model.S~=0,1))==0` mask and `balanceStatus(emptyRxns) = min(-1, balanceStatus(emptyRxns))` before the `isnan→1` line, so empty reactions are marked "missing information". |
+
+## getRxnsInComp / getMetsInComp — not ported
+
+Briefly ported, then **removed** (user review): too thin over cobra (`metabolite.compartment` /
+`reaction.compartments` one-liners). Mapped in the §1 migration cheatsheet instead. Reconsider only
+if a downstream consumer needs the `include_partial` (fully-contained vs touching) distinction in
+several places — and ask before re-adding (see process note: argue pros/cons for marginal WRAPs).
+
+Ported as `remove_metabolites` / `remove_genes`
+([manipulation/remove.py](src/raven_python/manipulation/remove.py)). `removeReactions` was **not**
+ported: with orphan cleanup kept coupled (decision: don't separate metabolites from genes), it is
+identical to `cobra.Model.remove_reactions(remove_orphans=...)`.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| ~~R1~~ | ERGONOMICS | — | ❌ rejected | Separable orphan-met vs orphan-gene cleanup was considered, then **dropped** by decision — keep them coupled like cobra. (Removed the `remove_reactions` wrapper entirely as a result.) |
+| R2 | EFFICIENCY (reuse) | raven-python 🔨 | 🔨 | **GPR rewriting delegated to cobra's AST**, not RAVEN's `eval` of a `&&`/`\|\|`-substituted rule string. cobra's `remove_genes` already gives correct AND/OR semantics (removing one gene of `A and B` empties the rule; of `A or B` keeps the other). **MATLAB back-port:** replace `canRxnCarryFlux`'s `eval` with a parsed boolean tree (safer, no eval). |
+| R3 | ERGONOMICS | raven-python 🔨 | 🔨 | **`blocked_reactions` policy as a clear keyword** (`remove`/`constrain`/`keep`) instead of RAVEN's `removeBlockedRxns` boolean — and `keep` (rewrite GPR, leave bounds) is a third option RAVEN lacks. |
+| R4 | (review) | raven-python ⚠️ | 💡 | **`remove_metabolites` is a deletion candidate.** Its only value over cobra is `by_name` cross-compartment deletion, likely rarely used; revisit and possibly drop the wrapper. |
+
+## readYAMLmodel / writeYAMLmodel
+
+RAVEN `io/readYAMLmodel.m` + `writeYAMLmodel.m` (+ private legacy parser). Ported as
+`read_yaml_model`/`write_yaml_model` ([io/yaml.py](src/raven_python/io/yaml.py)).
+
+**Lens correction (no separate legacy parser).** RAVEN ships a 462-line `parseYAMLLegacy.m` for the
+`!!omap` dialect, and geckopy refuses it ("re-save from MATLAB"). But `!!omap` is **cobra's own YAML
+format**: `cobra.io.load_yaml_model` reads a real yeast-GEM.yml (4102 rxns) directly. So the
+raven-python-unique capability the PLAN imagined (a legacy reader) is unnecessary; the real cobra-absent
+value is preserving `metaData` identity and RAVEN-only per-entry fields, which is what was built.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| Y1 | EFFICIENCY (scope) | raven-python 🔨 | 🔨 | **Drop the bespoke legacy YAML parser; delegate to cobra's `!!omap` loader.** ~460 lines of RAVEN parsing not reimplemented. raven-python reads old RAVEN/Human-GEM YAML in pure Python with no MATLAB needed (geckopy can't — it tells users to re-save from MATLAB). |
+| Y2 | ERGONOMICS (data loss) | raven-python 🔨 | 🔨 | **Don't silently drop model identity/provenance or RAVEN-only fields.** A plain `cobra.io.load_yaml_model` of a RAVEN file yields `model.id is None` and discards `smiles`/`deltaG`/`confidence_score`/etc. raven-python preserves them. **Routed by meaning** (not blindly to notes): chemical-structure identifiers `smiles`/`inchis` → cobra `annotation` (the MIRIAM-style store other tools read); genuinely non-standard data (`deltaG`, `confidence_score`, `metFrom`/`rxnFrom`, `protein`) → `notes`. Not invented as attributes (`met.deltaG`), since cobra only persists `annotation`/`notes` through copy/SBML/JSON/YAML. |
+| Y4 | NEW | both 💡 | 💡 | **Upstream candidate: a first-class thermodynamics/confidence field.** `deltaG` and `confidence_score` live in `notes` because neither cobra nor SBML core has a home; if a standard slot (e.g. SBML fbc/groups or a cobra attribute) emerges, migrate them there. Also applies to MATLAB RAVEN's `metDeltaG`/`rxnConfidenceScores` consistency. |
+| Y3 | NEW | raven-python 🔨 | 🔨 | **Emit cobra-native `!!omap` output** (via cobra's own dumper) — done, matching RAVEN `fa281a1`. Verified `cobra.io.load_yaml_model` reads the output. |
+| Y5 | ERGONOMICS (correctness) | raven-python 🔨 | 🔨 | **Field placement realigned to `fa281a1`:** `smiles`/`ec-code` are in the cobra-owned `annotation` block (not top-level), `inchis` is top-level, and the top-level `notes` *string* (metNotes/rxnNotes) is handled rather than crashing a notes-as-dict assumption. |
+
+## changeRxns
+
+RAVEN `core/changeRxns.m` — change reaction equations. Ported as
+`change_reaction_equations` ([manipulation/change.py](src/raven_python/manipulation/change.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| C1 | EFFICIENCY | raven-python 🔨 | 🔨 | **Edit the reaction in place** instead of RAVEN's copy-all-fields → `removeReactions` → `addRxns` → `permuteModel` round-trip. cobra mutates the same `Reaction` object, so every other field and the model order are preserved for free, with no O(n) re-sort. (Not a MATLAB back-port — the round-trip is inherent to the struct layout there.) |
+
+## getIndexes
+
+RAVEN `core/getIndexes.m` — resolve a list of IDs / logical mask / index vector into positional
+indexes (or a logical array) for `rxns` / `mets` / `genes` / `metNames` / `metcomps` (and GECKO
+`ec.*` fields).
+
+**Decision (raven-python): do NOT port the function.** cobra is object-oriented, so the central
+index-resolver that RAVEN's struct-of-parallel-arrays design requires is largely unnecessary.
+cobra's `DictList` already covers the use cases more idiomatically — `get_by_any` (mixed
+id/object/index → objects), `get_by_id` (O(1)), `query` (name/substring/regex), `index` (position),
+list comprehensions for filtering. Porting a 1-based-index resolver would be redundant and
+un-Pythonic. **Only** the `name[comp]` composite resolver is kept (G7), as a small internal helper.
+
+The improvement insights below still hold for **MATLAB RAVEN**, where the function remains — flagged
+as upstream-only back-port candidates.
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| G1 | EFFICIENCY | MATLAB RAVEN only | 💡 | **Hash-based lookup instead of per-query linear scan.** RAVEN loops `find(strcmp(obj(i), searchIn))` per query → O(n·m). Build a `containers.Map` `{id: position}` once (O(n)), look up in O(1); `metcomps` likewise. (Moot for raven-python — cobra's `DictList` is already hashed.) |
+| G5 | ERGONOMICS (bug) | MATLAB RAVEN only | 💡 | **Disambiguate the `[1 1 1]` mask-vs-index bug.** RAVEN's `if all(objects)` conflates a logical all-true mask with the index vector `[1 1 1]` (its own comment: "This gets weird if it's all 1"). Test `islogical(objects)` explicitly instead of `all(objects)`. (Moot for raven-python — input kind is decided by dtype.) |
+| G7 | NEW | raven-python helper | 💡 | **Extract a reusable `name[comp]` parser/resolver.** The composite-id parsing buried in `getIndexes`'s `metcomps` branch is the one capability cobra lacks. Expose as a standalone `parse_name_comp` / `resolve_metabolite_by_name_comp`, reused by `addRxns`/`addTransport`/`mergeModels`. This is the only piece carried into raven-python. |
+
+**Obsoleted by cobra (no action — these were earlier raven-python proposals now covered by `DictList`):**
+predictable return type, return objects-not-positions, configurable missing-object policy across a
+batch, and substring/regex matching — all already provided by `get_by_any` / `get_by_id` / `query`.
+
+---
+
+## standardizeGrRules
+
+RAVEN `core/standardizeGrRules.m` — normalize grRule syntax + flag rules not in simple
+OR-of-AND-complex (DNF) form (`findPotentialErrors`).
+
+**Decision (raven-python): port the lint half only.** cobra auto-normalizes a GPR on assignment
+(`"(G1 AND G2)  OR  G3"` is stored as `"(G1 and G2) or G3"`), so the normalization half is
+redundant. The non-DNF lint has no cobra equivalent and was ported as `find_non_dnf_grrules`/`is_dnf`
+([utils/gpr.py](src/raven_python/utils/gpr.py)).
+
+| # | Cat | Target | Status | Improvement |
+|---|---|---|---|---|
+| S1 | ERGONOMICS | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Return structured lint results, don't just print.** RAVEN's `findPotentialErrors` only emits a `warning()` string; you can't act on it programmatically. raven-python returns a list of `GPRIssue(reaction_id, gpr, reason)`. **MATLAB back-port:** return the `indexes2check`/messages as a struct array (it already computes `indexes2check` — just surface it cleanly instead of only warning). |
+| S2 | EFFICIENCY (robustness) | raven-python 🔨 + MATLAB RAVEN 💡 | 🔨 | **Detect non-DNF via the boolean AST, not substring search.** RAVEN scans for the `) and (`, `) and`, `and (` substrings, which is brittle (sensitive to spacing/bracketing and to gene IDs containing those characters). raven-python walks cobra's GPR AST (`is_dnf`: no OR beneath any AND), which is exact. **MATLAB back-port:** parse the rule to a tree rather than string-matching. |
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..d39ae69
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,52 @@
+# Documentation
+
+Start with the [top-level README](../README.md). The docs are organised as:
+
+## RAVEN ↔ raven-python reference
+
+* **[raven_migration.md](raven_migration.md)** — function-by-function map from MATLAB RAVEN
+  to raven-python (and cobrapy where appropriate). Read this if you're coming from RAVEN.
+* **[../IMPROVEMENTS.md](../IMPROVEMENTS.md)** — improvements raven-python makes over RAVEN that
+  are also candidates to back-port into the MATLAB toolbox.
+
+## Open work
+
+* **[todo.md](todo.md)** — what's still on the books.
+* **[known_issues.md](known_issues.md)** — low-priority backlog from the full-codebase
+  review. None affects correctness on well-formed inputs.
+
+## Empirical studies & calibrations
+
+* **[humangem_validation.md](humangem_validation.md)** — raven-python ftINIT vs MATLAB RAVEN on
+  5 Hart2015 cell lines (Jaccard 0.975–0.980).
+* **[init_param_calibration.md](init_param_calibration.md)** — clean-data calibration +
+  input-robustness study for (f)tINIT (sweeps of `mip_gap` / `big_m` / `force_on` / `eps` /
+  `prod_weight` / scaling; dropout / noise / downsample robustness).
+* **[init_solver_benchmark.md](init_solver_benchmark.md)** — Gurobi vs HiGHS vs GLPK on
+  genome-scale ftINIT.
+* **[yeast_localization_benchmark.md](yeast_localization_benchmark.md)** —
+  `predict_localization` against curated yeast-GEM, with a predictor-noise sweep
+  (accuracy 0.72 → 0.39 as confident mis-scoring rises from 0 % to 50 %).
+* **[kegg_hmm_cutoff_calibration.md](kegg_hmm_cutoff_calibration.md)** — HMM E-value /
+  score-ratio sensitivity for the KEGG HMM-query reconstruction path.
+
+## Data formats & maintenance
+
+* **[kegg_data_format.md](kegg_data_format.md)** — layout of the KEGG artefact bundle.
+* **[maintaining_kegg_data.md](maintaining_kegg_data.md)** — building and publishing the
+  KEGG artefact releases.
+* **[maintaining_binaries.md](maintaining_binaries.md)** — building and publishing the
+  external-binary (BLAST/DIAMOND/HMMER) ZIP releases.
+
+## Archive
+
+Historical design notes (preserved for reference but no longer part of the user-facing
+documentation):
+
+* **[archive/ftinit_review_and_plan.md](archive/ftinit_review_and_plan.md)** — the
+  pre-implementation critical review of `ftINIT` that drove the Phase 4d port plan.
+* **[archive/localization_design.md](archive/localization_design.md)** — the
+  pre-implementation critical review of `predictLocalization` that drove the Phase 7
+  redesign (caller-passed relocate set, MILP instead of SA, partial-update mode).
+* **[archive/plan_get_model_from_homology.md](archive/plan_get_model_from_homology.md)** —
+  early planning notes for the homology-based reconstruction.
diff --git a/docs/archive/ftinit_review_and_plan.md b/docs/archive/ftinit_review_and_plan.md
new file mode 100644
index 0000000..63e0d66
--- /dev/null
+++ b/docs/archive/ftinit_review_and_plan.md
@@ -0,0 +1,184 @@
+# ftINIT (Phase 4d) — critical review and implementation plan
+
+Critical read of RAVEN's ftINIT before porting (memory: *review very critically, don't
+transcribe*). Covers `ftINIT.m`, `ftINITInternalAlg.m`, `getINITSteps.m`,
+`INITStepDesc.m`, `prepINITModel.m`, `mergeLinear.m`, `groupRxnScores.m`,
+`rescaleModelForINIT.m`, `ftINITFillGaps*.m` (~3000 lines).
+
+## 1. What ftINIT is, vs the tINIT we already have
+
+tINIT (Phase 4c, `init/`) solves **one** MILP: pick the reaction subset that best
+matches expression scores while staying flux-consistent. We ported its core as
+`run_init` (+ `get_init_model`). ftINIT ("fast tINIT") keeps the *objective* but
+attacks the **runtime** of the MILP on genome-scale models (Human-GEM: ~12k rxns)
+with four ideas:
+
+1. **One-time preprocessing (`prepINITModel` → `prepData`).** All omics-independent
+   work — simplification, essential-reaction discovery via tasks, reaction
+   classification, linear merging, scaling — is done once per *template* model and
+   reused across every sample. (RAVEN comment: prep is ~1 h; each model is then fast.)
+2. **Linear merging (`mergeLinear`).** Reactions connected through a degree-2
+   metabolite (one producer, one consumer) are contracted into one, summing scores.
+   Human-GEM 11888 → 7922 rxns — a ~33 % smaller MILP, losslessly.
+3. **Staged MILP (`getINITSteps`, default `'1+1'`).** Instead of one big MILP, run
+   two smaller ones: step 1 decides only GPR-associated reactions (transport/
+   exchange/spontaneous/no-GPR *ignored* = left in, not in the problem); step 2 fixes
+   step-1 reactions as **essential** and decides the remaining no-GPR reactions. The
+   `'full'` series is a single MILP ≈ classic tINIT.
+4. **A cheaper MILP formulation (`ftINITInternalAlg`).** Positive-score reactions use
+   a **continuous** on-indicator instead of a binary (see §3) — roughly halving the
+   integer count, the dominant cost.
+
+After the MILP: map merged groups back to original reactions, delete the rest,
+**gap-fill so all metabolic tasks are feasible** (`ftINITFillGapsForAllTasks`),
+re-add exchanges, optionally `removeLowScoreGenes`.
+
+## 2. Architecture (data flow)
+
+```
+template model ──prepINITModel──> prepData {refModel, refModelWithBM, minModel(merged+scaled),
+  (once)                                     groupIds, origRxnIds, essentialRxns,
+                                             essentialMetsForTasks, taskStruct, toIgnore* masks}
+                                                │
+omics (tissue) ──scoreComplexModel──> rxnScores │  (per sample)
+                                                ▼
+                              ftINIT step loop (getINITSteps)
+                              ├ step i: groupRxnScores(mask_i) ─> ftINITInternalAlg (MILP) ─> on/off
+                              │         prev results folded in as 'essential'
+                              └ ...
+                                                ▼
+                         delete off-reactions (mapped through groupIds)
+                                                ▼
+                         ftINITFillGapsForAllTasks (per-task gap-fill MILP)
+                                                ▼
+                         re-add exchanges, removeLowScoreGenes ─> model
+```
+
+## 3. The core MILP (`ftINITInternalAlg`) — formulation
+
+Reactions are partitioned into **six categories**, each with its own variables.
+`forceOnLim = 0.1` (min flux to count a reaction "on"); big-M `= 100` (rev split) and
+`1000`/own-ub elsewhere; `intTol = 1e-7`.
+
+| category | indicator | key constraint | binary? |
+|---|---|---|---|
+| **PosIrrev** (score>0, irrev) | `Y∈[0,1]` continuous | `v ≥ forceOnLim·Y` | **no** |
+| **PosRev** (score>0, rev) | `Y∈[0,1]` + dir-binary | split `v=v⁺−v⁻`; `v⁺+v⁻ ≥ 0.1·Y`; binary picks direction | 1 (dir only) |
+| **NegIrrev** (score<0, irrev) | `Y∈{0,1}` | `v ≤ 100·Y` | **yes** |
+| **NegRev** (score<0, rev) | `Y∈{0,1}` | split; `v⁺+v⁻ ≤ 0.1·Y` | yes |
+| **EssIrrev** (forced on) | — | `lb = min(0.99·|prev flux|, 0.1)` | no |
+| **EssRev** (forced on, rare) | dir-binary | split + force ≥0.1 | 1 |
+
+**The key insight:** a *positive*-score reaction needs no binary — a continuous `Y`
+suffices, because the objective *maximises* `Σ score·Y` so it pushes `Y→1` whenever
+flux can flow; the `v ≥ 0.1·Y` gate makes `Y>0` impossible without flux. Only
+*negative*-score reactions need a true binary (the objective would otherwise leave
+their `Y` at 0 for free). Reactions with **score 0 are left out of the problem
+entirely** (always present, can carry flux) — this is how the `toIgnore*` categories
+are excluded (their score is zeroed by `groupRxnScores`).
+
+Objective: minimise `Σ(−score)·Y − prodWeight·Σ mon` (metabolite-production bonus).
+`prodWeight = 5` (passed by `ftINIT.m`, overriding the docstring default 0.5).
+
+We already have a *simpler* version of this in `run_init`: binaries for all,
+`eps·x ≤ v ≤ ub·x`, reversible split, `prod_weight` sinks. ftINIT's formulation is
+the optimised superset.
+
+## 4. Critical findings (what NOT to transcribe)
+
+**Magic numbers** — collected, to be named constants and (where they bite) calibrated
+like we did for the HMM cut-offs (K15):
+
+| value | where | role | concern |
+|---|---|---|---|
+| `forceOnLim = 0.1` | InternalAlg | min "on" flux | scale-dependent; RAVEN itself muses "test 0.01" |
+| big-M `100` / `1000` | InternalAlg / FillGapsMILP | on/off gates | must exceed real fluxes; ties to ±1000 clamp |
+| `intTol 1e-7` / `1e-9` | InternalAlg / FillGapsMILP | integer tol | solver-specific (Gurobi v10 numerics) |
+| `prodWeight = 5` | ftINIT.m | met-production bonus | "has not been evaluated" (their words) |
+| score clamps `±0.1`, `0.01` | ftINIT.m / groupRxnScores | avoid 0 scores | MILP can't have exactly-0 score (binary flips freely) |
+| `maxStoichDiff = 25` | rescale | coeff-ratio cap | docstring says 250 — **doc/code disagree** |
+| MIPGap schedule `0.0004→0.003`, abs-gap `10/20`, TimeLimit `120/5000` | getINITSteps | per-step solver tuning | Gurobi-tuned; HiGHS will differ |
+| int-extraction `>1e-3`, `>0.5` | FillGapsMILP / InternalAlg | read binaries | tolerance-driven |
+
+**RAVEN/Gurobi coupling & quirks**
+- `ftINITFillGapsMILP` **hard-errors on glpk-via-COBRA**; "only tested with Gurobi".
+  raven-python uses **optlang**, so we are solver-agnostic — but the magic numbers and
+  `Seed=26` reproducibility were Gurobi-tuned. Validation must expect *equally optimal
+  but not identical* reaction sets (alternative optima); compare on objective value
+  and task feasibility, not exact rxn identity.
+- Several quirks worth fixing upstream too: `rescaleModelForINIT` doc/code mismatch;
+  `ftINITFillGaps` has a dead default-score branch referencing an undefined `models`;
+  `prepData` does **not** store `origRxnIds` (downstream relies on implicit ordering —
+  we will store it explicitly); the 2-column `b` (ranged mass-balance RHS) is not a
+  cobra concept and must be built as optlang ranged constraints.
+
+**Genuine simplification/improvement opportunities** (log in IMPROVEMENTS, back-port):
+- `simplifyModel`'s ~1 h reversibility-removal pass is an FVA in disguise; cobra's
+  optimised (optionally parallel) FVA does it far faster — the single biggest prep
+  speedup available to us.
+- Carry `rxnScores` as an aligned array, not a smuggled model field threaded through
+  `simplifyModel`/`removeReactions`/`mergeModels`.
+- Expose the magic numbers as parameters with documented, calibrated defaults rather
+  than burying them; `forceOnLim`/`prodWeight`/big-M are scale-dependent and deserve
+  the same empirical treatment as K15.
+- `removeLowScoreGenes` (gene pruning by isozyme/complex role) is a clean standalone
+  utility, reusable beyond ftINIT.
+
+**Prerequisite gap in our codebase:** `check_tasks` currently returns only
+pass/feasible per task. `prepINITModel` needs **essential-reaction discovery** (which
+reactions, and in which direction, are essential for each task) and the task
+metabolite set. That work was explicitly deferred from 4a/4c to here.
+
+## 5. What cobra/raven-python already gives us
+
+- MILP via **optlang** (HiGHS bundled, Gurobi/CPLEX if licensed) — `run_init`
+  already builds INIT-style MILPs this way; ftINIT extends that builder.
+- FVA / pFBA / `find_blocked_reactions` (`simplifyModel`, gap-fill feasibility).
+- `check_tasks` (relaxed mass-balance via constraint bounds = RAVEN's `b`), to be
+  extended with essential-reaction output.
+- `connect_blocked_reactions` (connectivity MILP) — related to but not the same as
+  the task gap-fill MILP (which adds min-cost template reactions to make a task
+  feasible); the task gap-filler is new.
+- `score_reactions_from_genes` / `gene_scores_from_expression` (scoring).
+
+## 6. Implementation plan — phased sub-steps
+
+Correctness first on small models (the RAVEN `tinitTests.m` cases define reaction
+scores directly via single-gene expression — ideal oracles), then the speed layers.
+
+| sub-phase | deliverable | builds on | risk |
+|---|---|---|---|
+| **4d.0** | **Test oracles**: port the `tinitTests.m` toy models + `getINITSteps`-style score→expression helper, so every later sub-phase is checked against RAVEN's expected on/off sets. | tasks, score | low |
+| **4d.1** | **Essential-reaction discovery**: extend `check_tasks` to return the per-task essential-reaction matrix + direction + task-metabolite set (the missing `prepINITModel` step-3 input). | tasks/check.py | med |
+| **4d.2** | **prepData preprocessing** (`prep_init_model`): simplify (cobra FVA for the reversibility pass), essential rxns (4d.1), reaction-classification masks (exchange/import/transport/spont/no-GPR), **linear merge** (`merge_linear` + `group_ids`, storing `orig_rxn_ids`), rescale. | manipulation/simplify, 4d.1 | **high** (merge is fiddly) |
+| **4d.3** | **Core staged MILP** (`ftinit_internal` + step loop): the 6-category formulation extending `run_init`; `INITStep` dataclass + `get_init_steps('1+1'/'2+1'/'full'/…)`; prev-results-as-essential folding; per-step MIPGap retry schedule. Start **without** metabolomics. | init/init.py | **high** |
+| **4d.4** | **Task gap-filling** (`ftinit_fill_gaps_for_tasks`): per-task, LP-feasibility-gated, min-cost template-reaction MILP (ranged `b` constraints in optlang). | gapfilling, 4d.1 | med |
+| **4d.5** | **Assembly + `removeLowScoreGenes`**: map merged groups back, delete, re-add exchanges, prune low-score genes; top-level `ftinit(...)` entry. | score, 4d.2–4 | low |
+| **4d.6** | **Metabolomics block** — **deferred** (decided 2026-05-26). The linear merge eliminates degree-2 detected metabolites, so a clean flux-based bonus is impossible; it needs RAVEN's producer-group-mapping + `mon`/`vnrbm`/`vnrvm`/`vnim` negative-producer force-flux blocks — the most intricate MILP in ftINIT, for its least-used input (not in the RNA-seq/single-cell/HPA ranking; "not designed for metabolomics only"), with a randomness-laden oracle (T0008). `ftinit(metabolomics=…)` raises `NotImplementedError`. | 4d.3 | med |
+| **4d.7** | **Validation + calibration**: run `'full'` vs `'1+1'` on a real template (Human-GEM or a smaller curated GEM), compare to RAVEN by objective/task-feasibility/size (not exact rxns); calibrate `forceOnLim`/`prodWeight`/big-M; document like K15. | all | med |
+
+Suggested module layout under `init/`: `prep.py` (4d.1–2), `ftinit.py` (4d.3 + top
+level), `taskfill.py` (4d.4), `genes.py` (`remove_low_score_genes`), reusing
+`init.py`/`score.py`/`build.py`.
+
+## 7. Decisions (locked 2026-05-25)
+
+1. **Solver: agnostic.** Write the MILP to optlang's generic interface; let cobra's
+   configured solver decide. Calibrate the default constants on **HiGHS** (open,
+   bundled) but do not hardcode anything Gurobi-specific. Validation compares on
+   objective/task-feasibility, never exact reaction identity.
+2. **Correctness-first.** Implement the single-step `'full'` MILP (extend `run_init`,
+   **no** linear merge) and match RAVEN on the toy oracles *first*; then add linear
+   merge and `'1+1'`/`'2+1'` staging as separately-verified speed layers. → reorders
+   the sub-phases below: **4d.3 (full MILP) precedes 4d.2 (merge)**.
+3. **Metabolomics deferred to 4d.6.**
+4. **Validation includes Human-GEM** (in 4d.7): toy oracles + small GEM for fast
+   iteration, then Human-GEM + a tissue dataset for the real ftINIT regime.
+
+### Resulting execution order
+
+`4d.0 oracles → 4d.1 essential-rxn discovery → 4d.3 full MILP (correctness gate) →
+4d.2 linear merge → 4d.3b '1+1'/'2+1' staging → 4d.4 task gap-fill → 4d.5 assembly +
+removeLowScoreGenes → 4d.6 metabolomics → 4d.7 calibration + Human-GEM validation.`
+
+The §6 table lists deliverables by id; this is the order they'll actually be built.
diff --git a/docs/archive/localization_design.md b/docs/archive/localization_design.md
new file mode 100644
index 0000000..12a1366
--- /dev/null
+++ b/docs/archive/localization_design.md
@@ -0,0 +1,251 @@
+# Localization (Phase 7) — critical review + redesign
+
+Design note for porting RAVEN's `predictLocalization.m` to raven-python. **Status: proposal,
+not yet implemented**; the user-facing API and the algorithmic choices are settled here
+before code lands, because RAVEN's all-or-nothing redistribution doesn't match how the
+function actually gets used.
+
+## 1. What RAVEN's `predictLocalization` does, and where it falls short
+
+The MATLAB function takes a single-compartment (or to-be-merged) model + a per-gene per-
+compartment score table (`GSS`) and assigns every reaction to a compartment by simulated
+annealing, maximising `Σ gene_score(g,c) − transport_cost`. Decisions baked into that
+shape that we should not transcribe blindly:
+
+1. **Destroys existing compartmentalization.** "If the model contains several
+   compartments they will be merged." Existing curated assignments — typically the most
+   reliable signal in a draft GEM — are thrown out and recreated from the predictor
+   alone. This is fine for a single-compartment Phase-3 draft but obviously wrong as a
+   default for a refinement step on a multi-compartment model.
+2. **Closed-model requirement.** "The input model should also not include any exchange,
+   demand or sink reactions, otherwise this function would not provide any results."
+   The caller must pre-process; the function has no graceful path for open models.
+3. **Silent connectivity surgery.** "Reactions that are unconnected are removed and
+   saved in removedRxns" — and the iterative "metabolite must be produced somewhere"
+   loop quietly deletes any reaction whose substrate has no producer. **This is the
+   "model must be complete" assumption**: a draft with dangling substrates loses
+   reactions without the caller necessarily noticing.
+4. **Reactions without GPRs get fake genes** (`&&FAKE&&N`) scored 0.5 everywhere — they
+   carry no signal but still have to be placed.
+5. **Mono-localization for genes** ("a gene can only be assigned to one compartment …
+   This is a simplification to keep the problem size down"). Biologically wrong for the
+   many dual-localized enzymes (mitochondrial/cytosolic isoforms etc.); a constraint of
+   the solver, not the biology.
+6. **Simulated annealing** with a `maxTime` cap — stochastic, no optimality certificate,
+   slow on large models.
+7. **`expandModel` first**, splitting isozyme reactions before optimisation. Changes the
+   reaction set rather than handling isozymes in scoring.
+8. **No partial-update mode.** Cannot say *"I added these 50 reactions, only place
+   those, keep the existing assignments"*.
+9. **No suggest-only mode.** Mutates the model in place; no preview.
+10. **Plotting and the predictor are baked into the function** (`plotResults`,
+    GSS-shape input).
+
+Items 1–3 and 8–9 are exactly what the user flagged. Items 5–7 are also worth
+reconsidering on a modern stack.
+
+> **Final API (post-implementation):** the proposal below was refined during
+> implementation. The shipped API:
+>
+> * `reactions_to_relocate` is a **required** caller-passed set of IDs (no
+>   `notes['localization']='uncertain'` auto-detect — one mechanism is clearer).
+> * **Multi-compartment is the default scoring model.** No `multi_compartment_genes`
+>   boolean. The highest-scoring compartment a gene lands in is "free"; every
+>   additional compartment costs `multi_compartment_penalty` *plus* its (typically
+>   lower) predictor score is its own implicit penalty. Pick a large penalty for
+>   effectively mono-localised genes.
+> * `mergeCompartments` and `copyToComps` are ported separately as
+>   `raven_python.manipulation.merge_compartments` / `copy_to_compartment` (they're useful
+>   independently of `predict_localization` — for flattening for analysis or building
+>   dual-localised pathways).
+> * `mapCompartments` is **not** ported — its main use case overlaps with
+>   `compare_models`.
+
+## 2. Proposed `predict_localization` for raven-python
+
+Decompose the function into independent concerns:
+
+* **A pluggable predictor input** — a `pandas.DataFrame` of *gene × compartment* scores
+  (higher = stronger evidence). Loaders for WoLF PSORT, DeepLoc, … convert their output
+  to this table; the algorithm is predictor-agnostic. Ships with `load_wolfpsort()` and
+  documents the `gene_id × compartment → score` contract so users can plug in DeepLoc /
+  TargetP / their own.
+* **A deterministic MILP** instead of simulated annealing.
+* **An explicit scope of reactions to (re-)place** — by default *all reactions whose
+  compartment is unset or marked uncertain*; user can pass `reactions_to_relocate=[…]`
+  to lock everything else.
+* **Existing compartments respected by default** — they are the high-confidence prior,
+  not noise to be merged away.
+* **Open models tolerated.** Boundary reactions (exchange/sink/demand) are pinned to
+  their current compartment automatically and excluded from relocation.
+* **Incomplete models tolerated.** Dangling metabolites are allowed; no silent reaction
+  removal. The MILP places what it can and reports the rest in `unplaced_reactions`,
+  not by deleting them.
+* **`apply=False`** returns a `LocalizationProposal` (a diff: which reactions move,
+  which transports are added) without mutating the input.
+
+### 2.1 API sketch
+
+```python
+@dataclass
+class LocalizationScores:
+    """Per-gene compartment scores. Index = gene_id; columns = compartment ids.
+    Missing genes / NaN scores fall back to a uniform prior."""
+    df: pd.DataFrame
+
+@dataclass
+class LocalizationProposal:
+    """What the predictor proposes, before applying it."""
+    moved: pd.DataFrame              # rxn_id, from_compartment, to_compartment, score_delta
+    added_transports: pd.DataFrame   # met_id, from, to, cost
+    unplaced_reactions: list[str]    # couldn't be placed (e.g. no scored genes)
+    objective: float
+
+@dataclass
+class LocalizationResult:
+    """Result of applying a proposal (or running with apply=True directly)."""
+    model: cobra.Model
+    gene_compartments: dict[str, str | set[str]]    # set when multi_compartment_genes=True
+    added_transports: list[cobra.Reaction]
+    proposal: LocalizationProposal
+
+def predict_localization(
+    model: cobra.Model,
+    scores: LocalizationScores,
+    *,
+    # === The two requested features ===
+    reactions_to_relocate: Iterable[str] | None = None,   # None ⇒ pick automatically (see below)
+    keep_existing: bool = True,                            # don't merge compartments first
+    apply: bool = True,                                    # False ⇒ return proposal only
+    # === Other knobs ===
+    default_compartment: str = "c",
+    transport_cost: float | Mapping[str, float] = 0.5,
+    multi_compartment_genes: bool = False,                 # relax mono-localization
+    require_producibility: bool = False,                   # off ⇒ don't drop unproducible mets
+    mip_gap: float = 0.001,
+    time_limit: float | None = None,
+) -> LocalizationResult | LocalizationProposal:
+    """Assign reactions to subcellular compartments via a deterministic MILP.
+
+    By default (``reactions_to_relocate=None``, ``keep_existing=True``), only reactions
+    that lack a compartment assignment (or are marked uncertain via a notes flag) are
+    placed; reactions already in a compartment stay put. Pass an explicit
+    ``reactions_to_relocate`` list to override.
+
+    With ``apply=False`` the function returns a :class:`LocalizationProposal` describing
+    what would change — useful for review or for diffing against a curator's choices.
+    """
+```
+
+### 2.2 What gets placed by default
+
+| reaction state | default behaviour |
+|---|---|
+| has a (single) compartment assignment in the existing model | **pinned** (kept as-is) |
+| boundary reaction (exchange / sink / demand) | **pinned** to its current compartment |
+| no compartment set OR marked uncertain (e.g. `notes['localization'] == 'uncertain'`) | **relocated** |
+| caller passed `reactions_to_relocate=[…]` | **only those** relocated; rest pinned |
+
+This handles both user requests in one design:
+
+* *Incomplete model*: dangling mets / missing producers are tolerated; reactions are
+  placed if any scored gene supports them, and unplaceable ones land in
+  `unplaced_reactions`.
+* *Selective re-localization*: pass the subset; defaults respect what's there.
+
+### 2.3 MILP formulation (mono-localization variant, `multi_compartment_genes=False`)
+
+Let `R*` = reactions to relocate, `G*` = unique genes appearing in those reactions, `C`
+= compartments, `M` = metabolites occurring in `R*` reactions.
+
+Variables (all binary):
+* `x[r, c]` for `r ∈ R*`, `c ∈ C` — reaction `r` placed in compartment `c`.
+* `y[g, c]` for `g ∈ G*`, `c ∈ C` — gene `g` assigned to compartment `c`.
+* `t[m, c]` for `m ∈ M`, `c ∈ C ∖ {default}` — transport of `m` between `default` and
+  `c`.
+
+Constraints:
+* Each relocated reaction goes to exactly one compartment: `Σ_c x[r,c] = 1`.
+* Each gene goes to exactly one compartment: `Σ_c y[g,c] = 1`.
+* Gene-reaction coupling: if reaction `r` requires gene `g` (per its GPR), then placing
+  `r` in `c` requires `g` in `c`: `x[r,c] ≤ y[g,c]` for the AND-clause; OR-clauses use
+  the standard linearisation (any isozyme satisfying the placement is enough).
+* Metabolite presence in each compartment: if any reaction touching `m` is placed in
+  `c`, then `m` must exist in `c` — modelled by adding a per-compartment "demand for
+  transport" when needed. Concretely: if a metabolite participates in reactions placed
+  in `c ≠ default`, a transport `t[m,c]` is required unless another reaction in `c`
+  balances it.
+
+Objective:
+```
+maximise  Σ_{g,c} gene_score(g,c) · y[g,c]   −   Σ_{m,c} transport_cost(m) · t[m,c]
+```
+
+With `multi_compartment_genes=True`, the `Σ_c y[g,c] = 1` constraint is relaxed (gene
+can be in several compartments), and gene-reaction coupling becomes
+`x[r,c] ≤ y[g,c]` per compartment as before — but the gene now contributes its
+compartment-specific score once per assigned compartment. (The cost of allowing this is
+that the same gene's score is double-counted across compartments; an alternative is to
+include a small penalty per extra compartment, controllable via a `multi_compartment_penalty`
+argument — leave this to a later iteration.)
+
+### 2.4 Why MILP rather than simulated annealing
+
+RAVEN's SA was chosen in an era when MATLAB MILP solvers were less accessible. Today
+we have Gurobi already wired in. The MILP gives:
+* deterministic, reproducible answers (important for science),
+* an explicit optimality certificate / gap,
+* faster solves on the small/medium problems this is (≪ 30k binaries even for a
+  whole-genome model with 10 compartments),
+* the same well-understood `mip_gap` / `time_limit` controls the rest of raven-python uses,
+* graceful degradation: at the time limit, return the best incumbent rather than the
+  "current SA state" with no quality guarantee.
+
+The SA's main practical benefit was *good-enough answers without a license-encumbered
+MILP solver*. We've already documented that genome-scale (f)tINIT needs Gurobi
+([docs/init_solver_benchmark.md](init_solver_benchmark.md)); reusing the same backend
+is consistent. (For users on open-source solvers, GLPK on this MILP should be tractable
+because it's smaller than ftINIT — the per-reaction big-M is at most `|C|`, not 1000.)
+
+## 3. Pluggable predictors
+
+The algorithm consumes a `gene × compartment` score table. raven-python ships:
+* `load_wolfpsort(path)` — RAVEN-compatible WoLF PSORT output → score table.
+* (later) a thin adapter for **DeepLoc** (TSV with per-class probabilities).
+* The format is open: a user can build the DataFrame from any source.
+
+A small `omics.scoring` helper could compute a uniform "no-evidence" row for genes
+absent from the predictor's output (RAVEN's 0.5-everywhere fallback).
+
+## 4. What this delivers for the user's two requests
+
+* **Incomplete model OK** — `require_producibility=False` (default) means unbalanced
+  metabolites are tolerated; reactions are placed when their genes have signal, and
+  unplaceable ones are *reported*, not silently deleted. No "model must be complete"
+  precondition.
+* **Selective re-localization** — `reactions_to_relocate=[…]` pins everything else.
+  With `apply=False` the user gets a diff to review; with `apply=True` only the
+  selected reactions move. The default (no `reactions_to_relocate` argument) places
+  only the uncompartmentalised / uncertain reactions, which is the natural "update the
+  draft" workflow.
+
+## 5. Open questions to resolve before implementing
+
+1. **How does raven-python mark "uncertain" compartmentalization?** Proposal: a
+   `notes['localization'] = 'uncertain'` flag on reactions; or a passed-in set of ids.
+   The "auto-pick-what-to-place" mode reads this.
+2. **Multi-compartment gene scoring**: simple multi-counting (the same score in every
+   compartment the gene lands in) vs a penalty per extra compartment. Start with simple
+   multi-counting; add the penalty later if users ask.
+3. **DeepLoc adapter** — ship now or later? DeepLoc has a stable TSV format; a 30-line
+   loader is trivial when there's a real use case. Defer until requested.
+
+## 6. Implementation plan
+
+1. `localization/scores.py` — `LocalizationScores`, `load_wolfpsort`.
+2. `localization/milp.py` — `predict_localization` (MILP).
+3. `localization/apply.py` — convert a proposal into a re-compartmentalised cobra model
+   (added transports, updated metabolite compartments).
+4. Tests on a small handcrafted model + a regression test against the RAVEN paper's
+   yeast example.
+5. PROGRESS / PLAN updates.
diff --git a/docs/archive/plan_get_model_from_homology.md b/docs/archive/plan_get_model_from_homology.md
new file mode 100644
index 0000000..f7185ae
--- /dev/null
+++ b/docs/archive/plan_get_model_from_homology.md
@@ -0,0 +1,120 @@
+# Design plan: `get_model_from_homology` (port of RAVEN `getModelFromHomology`)
+
+Detailed design for the core of homology reconstruction (Phase 3a). Companion to
+PLAN.md §2.3a. Goal: faithful *intent*, but a clearer, more robust implementation —
+RAVEN's own comments flag several soft spots, and the user asked for logic
+improvements.
+
+---
+
+## 1. What it is trying to do (the intent)
+
+Build a draft GEM for a **new organism** by transferring reactions from one or more
+**template GEMs**, using **orthology** (bidirectional protein BLAST/DIAMOND hits) to
+map template genes → new-organism genes and rewrite each reaction's GPR accordingly.
+
+A reaction is transferred if at least one of its genes has an ortholog in the new
+organism; its GPR is rewritten so template genes become their new-organism orthologs.
+When several templates are given, an optional **preferred order** makes each gene's
+reactions come from a single (highest-priority) template; otherwise reactions from all
+templates are merged. Metabolites are unified across templates by **name[compartment]**.
+
+## 2. RAVEN's algorithm, in steps
+
+1. Strip cross-species gene metadata from templates (`geneShortNames`, `geneMiriams`,
+   `proteins`, `geneFrom`); standardize template grRules.
+2. Validate: gene ids without stray `()`; blast `fromId`s match template ids; ≥5 % of a
+   template's genes appear in the hits (else the FASTA/model id styles disagree).
+3. **Filter hits** by `evalue ≤ maxE`, `aligLen ≥ minLen`, `identity ≥ minIde`.
+4. Drop template reactions with no genes (and genes with no reactions).
+5. `onlyGenesInModels`: drop hits whose template gene isn't in a model.
+6. `strictness==3`: reduce to best-by-**E-value** hits per `fromGene`.
+7. Build sparse matrices `allTo` (new→template hits) and `allFrom` (template→new) per
+   template, indexed by gene position.
+8. **Ortholog map** = per strictness: `1`/`3` → `allTo & allFrom` (reciprocal);
+   `2` → just `allFrom` or `allTo` (one-directional, per `mapNewGenesToOld`).
+9. Drop mappings to genes not in the model; simplify.
+10. Per template, keep reactions associated with a mapped gene (keeping AND-complexes
+    where *any* subunit mapped).
+11. `preferredOrder`: remove genes already claimed by an earlier template so each gene's
+    reactions come from one template.
+12. **Rewrite GPRs** by `regexprep` string substitution: replace each template gene with
+    its new ortholog(s) — `(new1 or new2)` if several; template genes with no ortholog
+    are renamed `OLD_<modelid>_<gene>` (kept so AND-complexes survive).
+13. `mergeModels(..., 'metNames')`; then `regexprep` away `OLD_…` genes that ended up in
+    `or` relations; set id/name/notes/confidence; standardize; delete unused genes.
+
+## 3. Proposed Python design
+
+```python
+def get_model_from_homology(
+    models,                     # list[cobra.Model], or one model
+    hits,                       # bidirectional hits DataFrame (run_blast / make_ortholog_hits)
+    model_for,                  # target organism id
+    *,
+    preferred_order=None,       # list[model_id]; if set, each gene's reactions from one model
+    bidirectional=True,         # require reciprocal hits (RBH-style)        [improvement A]
+    best_hits_only=False,       # keep only best-scoring hit per gene first  [improvement A]
+    map_direction="new_to_old", # used only when bidirectional=False
+    score="bitscore",           # best-hit criterion: "bitscore" | "evalue" [improvement D]
+    complex_policy="flag",      # AND-subunits lacking orthologs: flag|keep|drop [improvement C]
+                                # default "flag" = RAVEN-compatible (OLD_<model>_<gene>)
+    only_genes_in_models=False,
+    max_evalue=1e-30, min_align_len=200, min_identity=40,
+) -> HomologyResult            # .model + .gene_map + .reaction_sources  [improvement F]
+```
+
+**Data flow (all on the hits DataFrame + cobra GPR AST, no sparse-matrix juggling):**
+
+1. Validate + filter hits (`max_evalue`/`min_align_len`/`min_identity`) — a DataFrame mask.
+2. Optionally reduce to best hit per gene by `score` (groupby-idxmax).
+3. Build the **ortholog map** `new_gene -> {model_id: {template_gene, ...}}`:
+   - `bidirectional`: inner-join the two directions on (new_gene, template_gene) — a pandas
+     merge, i.e. reciprocal hits. (`bidirectional` + `best_hits_only` = reciprocal best hits.)
+   - else: take the one direction (`map_direction`).
+4. For each template, for each reaction: rewrite the GPR on the **cobra GPR AST**
+   (improvement B) — substitute each leaf gene by the OR of its new orthologs; handle
+   unmapped leaves per `complex_policy`; keep the reaction iff ≥1 leaf mapped.
+5. Resolve `preferred_order`: a new gene's reactions come from the first template (in order)
+   that maps it (improvement E — a dict lookup, not matrix index math).
+6. Transfer the surviving reactions into the draft, unifying metabolites by `name[comp]`
+   (reuse `merge_models` / `add_reactions_from_model`); record provenance (improvement F).
+7. Finalize: id=`model_for`, name, per-reaction note/confidence, prune unused genes,
+   lint GPRs (`find_non_dnf_grrules`).
+
+`make_ortholog_hits` (PLAN §2.3a) feeds the same DataFrame, so the whole thing is
+testable without BLAST.
+
+## 4. Logic improvements (the point of this exercise)
+
+| # | Improvement | Why it's better than RAVEN |
+|---|---|---|
+| **H1** | **Two orthogonal params** (`bidirectional`, `best_hits_only`) replace the overloaded `strictness` 1/2/3. | RAVEN's `strictness` conflates two independent axes — *directionality* (one-way vs reciprocal) and *best-hit filtering*. Splitting them is clearer and exposes all 4 combinations (incl. one-way best-hit). RAVEN mapping documented: `1`→bidir; `2`→one-way; `3`→bidir+best-hits = **reciprocal best hits (RBH)**. |
+| **H2** | **AST-based GPR rewriting** (cobra `GPR`) instead of `regexprep` on rule strings. | RAVEN's author flags the string substitution as uncertain ("I hope that it's ok…") and needs a follow-up regex to clean `OLD_… or` leftovers. Substituting leaves on the parsed boolean tree is robust (no partial-match hazards, no cleanup pass) and reuses the AST machinery already in `expand_model`/`find_non_dnf_grrules`. |
+| **H3** | **Explicit complex policy** for AND-subunits lacking an ortholog: `keep` (drop the unmapped subunit, transfer the reaction), `drop` (require a fully-mapped AND-clause, else drop the reaction), `flag` (RAVEN's `OLD_<model>_<gene>` placeholder). | RAVEN's behaviour is the implicit, fragile `OLD_`+regex path its comments distrust. Making it an explicit, AST-correct policy (OR = keep isozyme branches that mapped; AND = configurable) is principled and lets the user choose optimism vs strictness. `flag` preserves RAVEN compatibility. |
+| **H4** | **Best-hit selection by `bitscore`** (default), E-value optional. | E-value depends on database size and ties at ~0 for strong hits; **bitscore is database-size-independent** and the standard criterion for reciprocal-best-hit orthology. RAVEN uses min E-value only. |
+| **H5** | **DataFrame ortholog map** replaces the `allGenes`/`allTo`/`allFrom` sparse-matrix + `sub2ind` index juggling. | The reciprocal mapping and preferred-order resolution become a pandas merge + dict lookup — far less error-prone than the position-index gymnastics RAVEN itself annotates with uncertainty. |
+| **H6** | **Provenance** in the result: per reaction, which template + which ortholog pairs supported it (`reaction.notes`), and a returned `gene_map`. | RAVEN returns only `hitGenes` (flat old/new lists) and a fixed note. Structured provenance aids curation and debugging. |
+
+**RAVEN-compatibility:** accept a `strictness=` alias that sets `bidirectional`/`best_hits_only`
+(1/2/3) and default `complex_policy="flag"` *iff* `strictness` is passed, so legacy calls reproduce
+RAVEN behaviour; otherwise use the clearer defaults above.
+
+## 5. Resolved: `complex_policy` default
+
+**Decided: default `complex_policy="flag"`** (RAVEN-compatible) — for an AND-complex reaction with a
+subunit lacking an ortholog, keep the reaction and mark the missing subunit (`OLD_<model>_<gene>`),
+matching current RAVEN output. `keep` (drop the unmapped subunit) and `drop` (require a fully-mapped
+AND-clause) remain available for users wanting cleaner or higher-confidence drafts. Implementation
+note: even under `flag`, the GPR is rebuilt on the **AST** (improvement H2) — the `OLD_` marking and
+the removal of `OLD_` genes left in `or` branches are done as AST operations, not the regex passes
+RAVEN uses.
+
+## 6. Test strategy
+
+- **Core, no BLAST:** drive with `make_ortholog_hits` + small template models; assert reactions
+  transfer, GPRs rewrite correctly (one-to-one, one-to-many isozymes, AND-complex under each
+  `complex_policy`), `bidirectional`/`best_hits_only` select the right hits, `preferred_order`
+  routes a gene's reactions to one template, metabolites unify by name[comp].
+- **Edge cases:** unmapped subunit policies; gene mapping to multiple orthologs (`(a or b)`);
+  reaction supported by only one isozyme; the ≥5 % overlap validation; reciprocal vs one-way.
diff --git a/docs/known_issues.md b/docs/known_issues.md
new file mode 100644
index 0000000..1ed394f
--- /dev/null
+++ b/docs/known_issues.md
@@ -0,0 +1,139 @@
+# Known issues — deferred findings from the full-codebase review
+
+Low-priority issues found during the 2026-05-26 full critical review (see PROGRESS
+commits `db5a1fa`, `2b85830` for the bugs that *were* fixed). These are deliberately
+not fixed yet: each is an edge case, robustness gap, efficiency concern, dead code, or
+a documented design choice — none affects correctness on normal, well-formed inputs.
+Line numbers are indicative; refer to the named function.
+
+## A. Latent edge-case bugs
+
+All six items in this section were closed in a quality-sweep pass (see CHANGELOG
+"Quality sweep" entry); regression tests live alongside each fixed function. Kept
+here for traceability of the original review.
+
+- ✅ **`manipulation/add.py` — `add_reactions_from_equations` (name mode):** a
+  metabolite whose *name* starts with a number (e.g. `"2 oxoglutarate"`) was
+  misparsed — the leading number was taken as a coefficient. Fixed by trying the
+  full token as a name first and only splitting off a coefficient when the
+  remainder names something resolvable. Test:
+  `tests/test_manipulation_add.py::test_name_mode_preserves_leading_number_name`.
+- ✅ **`manipulation/add.py` — `add_reactions_from_equations`:** an equation whose
+  terms net to zero produced a reaction with no metabolites and no warning. Now
+  warns. Test: `test_empty_stoichiometry_warns`.
+- ✅ **`manipulation/transfer.py` — `add_reactions_from_model` (`_new_met_id`):**
+  two source metabolites sharing an `id` but different `name[comp]`, both
+  needing minted ids, used to collide. Now tracks ids minted in the batch.
+  Test: `tests/test_manipulation_transfer.py::test_intra_batch_id_minting_unique`.
+- ✅ **`manipulation/transport.py` — `add_transport_reactions`:** the
+  source/target metabolite lookup was keyed by name, silently dropping
+  same-name duplicates. Now warns on collision. Test:
+  `tests/test_manipulation_transport.py::test_duplicate_name_in_source_compartment_warns`.
+- ✅ **`gapfilling/fill.py` — `connect_blocked_reactions`:** the
+  `fva.at[r, "maximum"]` access used to crash with `KeyError` if FVA dropped a
+  candidate. Now membership-guarded (defensive — the original is unreachable
+  with cobra's default FVA, no regression test).
+- ✅ **`reconstruction/kegg/query.py` — `assign_kos`:** `cutoff >= 1` would let
+  `log(best_evalue) == 0` through and crash inside the ratio filter. Now
+  rejected up front with a clear error. Test:
+  `tests/test_reconstruction_kegg_query.py::test_cutoff_ge_one_rejected`.
+
+## B. Silent misbehaviour on unusual inputs
+
+All four items in this section were closed in a quality-sweep pass (see CHANGELOG
+"Quality sweep — known-issues section B" entry); regression tests live alongside
+each fixed function. Kept here for traceability of the original review.
+
+- ✅ **`manipulation/merge.py` — `merge_models`:** name[comp]-matched metabolites
+  with differing `formula` or `charge` across models now warn instead of silently
+  unifying to the first-seen. Tests
+  `tests/test_manipulation_merge.py::test_formula_conflict_warns`,
+  `test_charge_conflict_warns`.
+- ✅ **`manipulation/add.py`:** both the `mets_by="id"` and `mets_by="name"` paths
+  now warn when a new metabolite is created in a compartment that hasn't been
+  registered yet (the id-mode path used to skip the check entirely). Tests
+  `tests/test_manipulation_add.py::test_id_mode_unknown_compartment_warns`,
+  `test_name_comp_unknown_compartment_warns`.
+- ✅ **`tasks/tasklist.py` — `parse_task_list`:** continuation rows appearing
+  *before* the first ID-bearing row used to be silently dropped. Now warns
+  with the file:row pointer. Test
+  `tests/test_tasks.py::test_parse_warns_on_data_row_before_first_id`.
+- ✅ **`io/sif.py` — `export_model_to_sif`:** when the caller's custom label
+  map sends two distinct ids to the same label, the target-side dedup used
+  to silently merge the nodes. Now warns up front. Test
+  `tests/test_io_sif.py::test_collapsing_label_map_warns`.
+
+## C. Robustness gaps
+
+All four items closed in the same quality sweep (see CHANGELOG); regression
+tests live alongside each fixed function.
+
+- ✅ **`manipulation/simplify.py` — `constrain_reversible_reactions`:** the
+  FVA call is now wrapped in a try/except + NaN check; both backend-raised
+  `OptimizationError` and silent-NaN returns surface as a single clear
+  `RuntimeError`. Test
+  `tests/test_manipulation_simplify.py::test_constrain_reversible_raises_on_infeasible`.
+- ✅ **`binaries.py` — `ensure_binary`:** downloads through a `.part` sibling
+  and `os.replace`s into the final name on success, mirroring `data.py`.
+  An interrupted download leaves a `.part` (never a half-written `.zip`).
+  Defensive — no regression test (needs urlopen mocking).
+- ✅ **`tasks/tasklist.py`:** the xlsx reader checks `wb.sheetnames` before
+  the `wb["TASKS"]` lookup; a missing sheet now raises a clear `ValueError`
+  listing the actual sheets. Test
+  `tests/test_tasks.py::test_parse_task_list_xlsx_missing_tasks_sheet`.
+- ✅ **`reconstruction/kegg/taxonomy.py`:** depth handling pads with explicit
+  `""` placeholders and warns once when a level is skipped (e.g. `####`
+  directly under `##`). Test
+  `tests/test_reconstruction_kegg_hmm.py::test_parse_taxonomy_handles_skipped_depth`.
+
+## D. Efficiency (correct but slow at scale)
+
+- ✅ **`manipulation/simplify.py` — `group_linear_reactions`:** rewritten with
+  a metabolite worklist (re-enqueue the mets touched by each merge) instead of
+  the restart-after-every-merge loop. Same observable behaviour, O(n+m) work
+  per pass instead of O(n²·m). Test
+  `tests/test_manipulation_simplify.py::test_group_linear_merges_long_chain_in_one_pass`.
+- ✅ **`reconstruction/kegg/parse.py`:** `parse_kegg_reactions` now caches the
+  parsed stoichiometry on each `KeggReaction.stoichiometry`; `build_reference_model`
+  reuses it instead of re-parsing. Test
+  `tests/test_reconstruction_kegg_parse.py::test_stoichiometry_cached`.
+
+## E. Dead / vestigial code
+
+- ✅ **`reconstruction/kegg/parse.py`:** removed `KeggReaction.modules` and
+  `.rhea` (parsed but never consumed by the artefact builders).
+- ✅ **`reconstruction/homology/homology.py`:** removed the vestigial
+  `only_genes_in_models` parameter from `_ortholog_map` (the actual filtering
+  happens earlier in `get_model_from_homology`).
+
+## F. Documented design choices that differ from RAVEN (not bugs)
+
+All five items addressed in the quality-sweep pass (see CHANGELOG). The three
+docstring/comment items got documentation fixes; the two correctness items
+(F3, F5) got code fixes plus matching RAVEN-back-port proposals in
+[IMPROVEMENTS.md](../IMPROVEMENTS.md) (FS4, B2).
+
+- ✅ **`init/init.py` — `run_init`:** docstring now spells out the score-0
+  semantics divergence between classic INIT (score-0 = removable) and ftINIT
+  (score-0 = kept by default).
+- ✅ **`init/build.py` — `get_init_model`:** the inaccurate "same regime
+  run_init will use" comment is replaced — the pre-filter is now correctly
+  described as conservative (only reactions blocked under the *most* permissive
+  regime are removed, so the strict run_init path never loses a candidate).
+- ✅ **`analysis/fseof.py` — `fseof`:** classifier now uses the slope of
+  `|flux|` (a real `linregress(enforced, |flux|)` fit) instead of the first vs
+  last endpoints. A track whose endpoints straddle a peak/trough no longer
+  ends up with the wrong amplify/knockdown label. Matching MATLAB back-port
+  proposed as IMPROVEMENTS FS4. Test
+  `tests/test_analysis_fseof.py::test_amplify_label_uses_abs_slope_not_endpoint_difference`.
+- ✅ **`analysis/reporter.py` — `reporter_metabolites`:** docstring now
+  documents the one-sided ``1 - Φ(z)`` `p_value` and the `z_score`-descending
+  sort, explicitly contrasted with RAVEN's two-tailed `allPValues` ordering.
+  Internally consistent; same information available via the up/down `gene_fold_changes`
+  partition.
+- ✅ **`utils/balance.py` — `get_elemental_balance`:** empty-stoichiometry
+  reactions now report `unknown` instead of `balanced`. (Original review
+  attributed the bug to `utils/validate.py::check_model`; the actual code
+  lives in `balance.py`.) Matching MATLAB back-port proposed as
+  IMPROVEMENTS B2. Test
+  `tests/test_utils_balance.py::test_empty_reaction_is_unknown`.
diff --git a/docs/matlab_raven_backports.md b/docs/matlab_raven_backports.md
new file mode 100644
index 0000000..b281896
--- /dev/null
+++ b/docs/matlab_raven_backports.md
@@ -0,0 +1,255 @@
+# MATLAB RAVEN — proposed back-ports
+
+This is a consolidated list of fixes and improvements that the Python port surfaced
+and that are worth carrying upstream into MATLAB RAVEN. Each item names a RAVEN
+file/function, briefly diagnoses the current behaviour, and proposes the minimal
+MATLAB-side patch. Items are sourced from [IMPROVEMENTS.md](../IMPROVEMENTS.md)
+(the `MATLAB RAVEN 💡` rows) plus two new items from the section-F quality sweep
+in [docs/known_issues.md](known_issues.md).
+
+The matching Python-side implementation is linked for context — in most cases the
+fix is identical in spirit; the patch shape just differs.
+
+> Status legend used below: 💡 *proposed back-port* · 🐛 *bug* · ⚡ *efficiency* ·
+> 🧹 *ergonomics / readability*.
+
+---
+
+## `core/getModelFromHomology.m`
+
+Implemented in raventoolbox as [`reconstruction.homology.get_model_from_homology`](../src/raventoolbox/reconstruction/homology/homology.py).
+
+* **H1** 🧹 — Split the overloaded `strictness` 1/2/3 into two orthogonal
+  options: `bidirectional` (reciprocal hits) and `bestHitsOnly`. Reciprocal
+  best hits = both true. Keep `strictness=` as a compat alias. Easier to read
+  at call sites and easier to combine.
+* **H2** 🐛 — Rewrite GPRs by walking a parsed boolean tree (the same AST
+  RAVEN already builds for other checks), not via `regexprep` substring
+  substitution. The current regex approach has partial-match hazards on gene
+  IDs that contain `OLD_` or share prefixes, and needs a separate
+  cleanup pass to fix degenerate `(OLD_… or …)` rules. Doing it on the tree
+  is exact and removes the cleanup step.
+* **H4** 🐛 — Default best-hit selection to **bitscore**, not E-value. Bitscore
+  is database-size-independent (the RBH standard); the current E-value
+  preference can flip the chosen template when the database grows. Keep
+  E-value as an option.
+
+## `core/getKEGGModelForOrganism.m` and the KEGG pipeline
+
+Implemented across [`reconstruction.kegg`](../src/raventoolbox/reconstruction/kegg/).
+
+* **K1** 🐛 — In the KEGG flat-file parser, read each reaction's equation from
+  its **own `EQUATION` field**, not by matching line *i* of `reaction.lst`
+  against reaction *i*. The current line-matched approach silently corrupts
+  the model if `reaction.lst` and `reaction` ever drift out of perfect order.
+  The `EQUATION` field is already in the entry — just read it.
+* **K2** 🧹 — Undefined-stoichiometry terms (`n C00001`, `(n+1) C00002`) keep
+  the **real compound id** with coefficient 1 and *flag the reaction* as
+  undefined-stoichiometry, instead of minting `"n C00001"` pseudo-metabolites
+  that get renamed `undefined_N` later. Cleaner metabolite graph; the flag
+  still drives the `keep*` filters.
+* **K3** 🧹 — Replace the free-text reaction-quality strings appended to
+  `rxnNotes` (`spontaneous`, `undefined stoichiometry`, `incomplete reaction`,
+  `general reaction`) with a structured boolean **`rxn_flags`** table.
+  Downstream filters join on a column instead of substring-matching notes.
+* **K6** ⚡ — Per-KO multi-FASTA construction via a single streaming pass
+  with an offset index (`seek`-based), replacing the Java `Hashtable`
+  byte-scan in `constructMultiFasta`. Drops the 5M-element preallocation
+  and the Java heap tuning.
+* **K7** ⚡ — Concatenate all per-KO HMMs and `hmmpress` them into one
+  pressed library, so the query path runs a single `hmmscan` against the
+  database instead of thousands of per-KO `hmmsearch` invocations. Same
+  result, orders-of-magnitude less I/O.
+* **K12** ⚡ — Use fast MAFFT (`FFT-NS-2`, i.e. `--retree 2 --maxiterate 0`)
+  for HMM training instead of `--auto`. `--auto` selects slow iterative
+  refinement on medium/large KOs (observed ~2.5 min/KO, days for a domain
+  on real KEGG); `FFT-NS-2` is seconds/KO and ample for profile-HMM
+  building. Switch to `PartTree` when the alignment DP cost
+  ≈ `n_seqs × mean_len²` exceeds the host's available memory budget (the
+  Python implementation calibrated this as ≈4.2e-9 × DP-cost in GB and
+  switches when DP-cost > 0.65 × (RAM − 2.5 GB) / 4.2e-9). Same protein
+  set, no sequences dropped.
+* **K14** ⚡ — Sort `organism_gene_ko` by `(organism, gene)` and store it
+  xz-compressed (`organism_gene_ko.tsv.xz`). Cuts the dominant artefact from
+  ~78 MB to ~27 MB (2.9×): gene IDs within an organism share long prefixes
+  so adjacency makes them far more compressible, and xz's larger window
+  catches cross-row redundancy gzip's 32 KB window misses. MATLAB needs an
+  external `unxz` to read the file; the small tables remain plain gzipped TSV.
+* **K15** 🐛 — Recalibrate the HMM-query KO-assignment defaults: change
+  `cutoff` from `1e-50` to `1e-30` and `min_score_ratio_g` from `0.8` to
+  `0.9`. `min_score_ratio_ko` can stay at `0.3` but is empirically inert
+  (identical output at 0.0/0.3/0.5 across the validation organisms).
+  Cross-validated against the true KEGG gene→KO annotation of *S. cerevisiae*,
+  *Cyanidioschyzon merolae*, *E. coli* K-12, and *Mycoplasma genitalium*:
+  the current `1e-50` cutoff sits inside the *true-positive* tail
+  (median true E ≈ 1e-100…1e-155; spurious ≈ 1e-8), silently dropping
+  divergent real hits. At the proposed values, *M. genitalium* gene→KO recall
+  rose from 0.84 → 0.94 (reaction recall 0.87 → 0.97) with no precision loss.
+  Full numbers in [docs/kegg_hmm_cutoff_calibration.md](kegg_hmm_cutoff_calibration.md).
+
+## `core/FSEOF.m`
+
+Implemented in raventoolbox as [`analysis.fseof`](../src/raventoolbox/analysis/fseof.py).
+
+* **FS1** 🐛 — Replace the strict step-by-step monotonicity gate (a target is
+  discarded if any single step's flux fails to exceed the previous) with a
+  linear-regression filter: a reaction is a target if its flux track is
+  reasonably correlated with the enforced product flux (|r| above a
+  threshold). One noisy step from LP alternative optima no longer kills a
+  valid target. Run `pFBA` per step to keep the scan stable.
+* **FS2** 🧹 — Report **knockdown** and **knockout** targets as first-class
+  outputs, not just amplification. The currently-flagged "amplify" set are
+  reactions whose `|flux|` rises with the enforced product; the reactions
+  driven *toward zero* (knock-down / knock-out candidates — arguably the most
+  actionable for strain design) are silently dropped.
+* **FS4** 🐛 — Replace the reported "Slope" — currently
+  `abs(fseof.results(num, iterations) - fseof.results(num, 1)) / abs(targetMax - targetMax/iterations)`,
+  i.e. an endpoint difference — with the **regression slope across all
+  iterations**:
+
+  ```matlab
+  enforcedFlux = (1:iterations) * targetMax / iterations;   % record per-iter
+  ...
+  for num = 1:length(model.rxns)
+      p = polyfit(enforcedFlux, fseof.results(num,:), 1);
+      slope(num) = p(1);
+  end
+  ```
+
+  An endpoint difference disagrees with the monotonicity filter on traces
+  whose first and last values happen to be similar; the regression slope
+  matches the selection criterion. The amplify/knockdown/knockout label
+  (after FS2) should use the slope of `|flux|`, not endpoint comparison.
+
+## `core/randomSampling.m`
+
+Implemented in raventoolbox as [`analysis.random_sampling`](../src/raventoolbox/analysis/sampling.py).
+
+* **SAMP1** ⚡ — Compute `goodRxns` (loop-free, flux-carrying objective
+  candidates) via a **single FVA pass**, not the current per-reaction `parfor`
+  that solves a separate LP minimising and maximising every reaction. FVA
+  computes the same min/max ranges, optionally `loopless` (`cycleFreeFlux`),
+  in far less code.
+* **SAMP2** 🐛 — Add a `seed` argument for reproducibility, and make
+  `nObjectives` a parameter. The current code hard-codes 2 objective
+  reactions per sample (the docstring even claims 3 — a transcription bug
+  worth fixing alongside).
+* **SAMP4** 🐛 — Run `goodRxns` detection **before** `replaceBoundsWithInf`
+  applies, not after. FVA errors as "unbounded" on infinite bounds, and
+  running goodRxns on the inf-replaced model conflates "loop reaction" with
+  "unbounded objective". Scope `replaceBoundsWithInf` to sampling only, after
+  the goodRxns set is found.
+
+## `core/reporterMetabolites.m`
+
+Implemented in raventoolbox as [`analysis.reporter_metabolites`](../src/raventoolbox/analysis/reporter.py).
+
+* **RM1** ⚡🐛 — Replace the per-neighbour-count Monte Carlo background
+  correction (currently 100 000 random sets drawn *for each distinct
+  neighbour count*) with the exact closed-form expression. RAVEN samples with
+  replacement from the scored-gene Z pool, so a random aggregate Z = Σz/√n
+  has mean `√n · μ` and standard deviation `σ` provably — the corrected
+  metabolite Z is just `(metZ − √n·μ) / σ`. This removes both the slow
+  sampling step and its run-to-run randomness (results become deterministic).
+
+## `core/runINIT.m`
+
+Implemented in raventoolbox as [`init.run_init`](../src/raventoolbox/init/init.py).
+
+* **I4** 🐛 — Drop the hard-coded big-M (1000) in the MILP and use **each
+  reaction's own upper bound** instead: `v ≤ ub · x`. Expose `eps` and
+  `prod_weight` as parameters. The current 1/1000/0.1/0.5 quadruple only
+  fits ±1000-bounded models with O(1) scores — flagged in RAVEN as
+  scale-dependent and tunable, but no API knobs are exposed.
+
+## `core/ftINIT.m` and the ftINIT pipeline
+
+Implemented across [`init.ftinit`](../src/raventoolbox/init/ftinit.py), [`init.taskfill`](../src/raventoolbox/init/taskfill.py), [`init.merge`](../src/raventoolbox/init/merge.py), [`init.prep`](../src/raventoolbox/init/prep.py).
+
+* **FT3** 🐛 — Same big-M issue as `runINIT` — use each reaction's own bound
+  as big-M instead of the fixed 100/1000. Expose `force_on` and `force_on_ess`
+  as parameters (the calibrated values for genome-scale Human-GEM are
+  documented in [docs/init_param_calibration.md](init_param_calibration.md)).
+* **FT9** 🐛 — Per-reaction essential forcing must be **clamped to the
+  reaction's bound**. The staged pipeline fixes previous-step reactions as
+  essential at `min(0.99 · |prev flux|, 0.1)`; if a low-capacity reaction
+  cannot carry that magnitude, the current code produces `lb > ub` and
+  errors. Clamp the forced magnitude to `rxn.ub` (in the positive direction)
+  / `rxn.lb` (in the negative direction) so a low-capacity essential is
+  forced *to* its capacity instead.
+* **FT8** 🧹 — Rewrite `removeLowScoreGenes` on the parsed boolean tree
+  instead of regex subset-extraction with `#n#` placeholders. Same logic
+  (isozyme OR-branches dropped while keeping at least one; complex
+  AND-subunits kept intact; nested groups handled), but exact and tractable
+  to test. Make the all-negative tie-break **deterministic** (sort + take
+  the least-negative) instead of random.
+
+## `core/addRxns.m`
+
+Implemented in raventoolbox as [`manipulation.add_reactions_from_equations`](../src/raventoolbox/manipulation/add.py).
+
+* **A1** 🧹 — Accept a string keyword (`metsBy = 'id'` / `'name'`) instead of
+  the opaque `eqnType = 1 / 2 / 3` integer. Call-sites become self-documenting.
+
+## `core/checkModelStruct.m` (or its curation subset)
+
+Implemented in raventoolbox as [`utils.check_model`](../src/raventoolbox/utils/validate.py).
+
+* **V2** 🧹 — Return a struct array of issues (one per finding, with
+  `category` / `target` / `message` fields) instead of printing warnings or
+  throwing on the first problem. Programmatically filterable.
+
+## `core/getElementalBalance.m`
+
+Implemented in raventoolbox as [`utils.get_elemental_balance`](../src/raventoolbox/utils/balance.py).
+
+* **B2** 🐛 — A reaction with no metabolites (an empty `S(:,j)`) currently
+  falls through to `balanceStatus = 1` ("balanced") because both pre-loops
+  leave `balanceStatus` at NaN and the final `isnan→1` step maps NaN to
+  balanced. Fix:
+
+  ```matlab
+  % Right before the final `balanceStatus(isnan(...)) = 1;` line:
+  emptyRxns = full(sum(model.S ~= 0, 1)) == 0;
+  balanceStatus(emptyRxns) = min(-1, balanceStatus(emptyRxns));
+  ```
+
+  Empty reactions then report `-1` ("missing information") rather than
+  vacuously balanced.
+
+## `core/findPotentialErrors.m` (GPR linting)
+
+Implemented in raventoolbox as [`utils.find_non_dnf_grrules`](../src/raventoolbox/utils/gpr.py).
+
+* **S1** 🧹 — Return the `indexes2check` array (which the function already
+  computes) plus per-reaction reason strings as a struct array, instead of
+  only emitting a `warning()`. Callers cannot act on the warning text.
+* **S2** 🐛 — Detect non-DNF GPRs via the parsed boolean tree, not substring
+  search for `) and (`, `) and`, `and (`. The substring approach is sensitive
+  to spacing and bracketing and to gene IDs containing those characters; the
+  tree walk (no OR beneath any AND) is exact.
+
+## `core/getIndexes.m`
+
+* **G1** ⚡ — Build a `containers.Map` `{id: position}` once per call and use
+  hashed lookup (O(1)) instead of looping `find(strcmp(obj(i), searchIn))`
+  per query (O(n · m)). Same for the `metcomps` branch.
+* **G5** 🐛 — `if all(objects)` conflates a logical all-true mask with the
+  index vector `[1 1 1]` (the function's own comment notes "this gets weird
+  if it's all 1"). Test `islogical(objects)` explicitly instead.
+
+## Cross-cutting
+
+* **Y4** 🧹 — A first-class home for `metDeltaG` / `rxnConfidenceScores` (in
+  RAVEN) and `deltaG` / `confidence_score` (in cobra/raventoolbox `.notes`).
+  Neither cobra nor SBML core has a standard slot yet; if one emerges (SBML
+  fbc/groups, or a cobra attribute) both projects should migrate. For now
+  this is a watching brief.
+
+## R-MetaCyc — removal candidate
+
+Already flagged in `IMPROVEMENTS.md` as a removal target on both sides — the
+MetaCyc reconstruction path is dropped in raventoolbox and proposed for removal
+from MATLAB RAVEN (`external/metacyc/*`). Pasting here for visibility because
+the actual removal still needs to land upstream. See
+[IMPROVEMENTS.md § R-MetaCyc](../IMPROVEMENTS.md) for the full rationale.
diff --git a/docs/raven_migration.md b/docs/raven_migration.md
new file mode 100644
index 0000000..6de2067
--- /dev/null
+++ b/docs/raven_migration.md
@@ -0,0 +1,120 @@
+# RAVEN → raven-python migration reference
+
+A function-by-function map from the MATLAB RAVEN Toolbox to raven-python (and cobrapy where
+appropriate). For each RAVEN function we record one of four outcomes:
+
+* ✅ **ported** — there's a direct raven-python replacement; see the link.
+* 🗒️ **cheatsheet** — cobrapy already covers it; a one-liner or short idiom does the job
+  (recorded under each row).
+* ⛔ **not ported** — explicit decision not to bring it across, with rationale.
+* 🆕 **new in raven-python** — functionality raven-python adds that has no RAVEN counterpart.
+
+For raven-python's deliberate improvements over RAVEN (and which of them are candidates to
+upstream into MATLAB RAVEN), see [IMPROVEMENTS.md](../IMPROVEMENTS.md).
+
+## Design principle
+
+The in-memory object is always a [`cobra.Model`](https://cobrapy.readthedocs.io). There is
+no parallel RAVEN struct, no `ravenCobraWrapper`-style adapter. RAVEN fields that cobra
+doesn't model natively (`rxnMiriams`, `metDeltaG`, `rxnConfidenceScores`, …) live in
+cobra's `annotation` / `notes` dictionaries. raven-python's YAML I/O follows the cobra YAML
+standard plus the geckopy enzyme-constrained extension, so ecModels round-trip.
+
+---
+
+## Foundation: utilities, manipulation, I/O
+
+| RAVEN | raven-python | Notes |
+|---|---|---|
+| `addRxns` (equations) | ✅ [`manipulation.add_reactions_from_equations`](../src/raven_python/manipulation/add.py) | The keystone: equation-string → reactions, matching mets by id, name, or `name[comp]`; strict / auto policies for new mets and genes. |
+| `addTransport` | ✅ [`manipulation.add_transport_reactions`](../src/raven_python/manipulation/transport.py) | Transport across compartments, matching mets by name, sequential `tr_NNNN` ids. cobra has no transport primitive. |
+| `addRxnsGenesMets` | ✅ [`manipulation.add_reactions_from_model`](../src/raven_python/manipulation/transfer.py) | Copy reactions from a source model, matching mets by `name[comp]` (vs cobra's strict-by-id merge). |
+| `mergeModels` | ✅ [`manipulation.merge_models`](../src/raven_python/manipulation/merge.py) | N-model merge, unify mets by `name[comp]` (or id), keep all reactions (id collisions renamed). cobra's `merge` is pairwise / strict-by-id. |
+| `changeRxns`, `changeGrRules` | ✅ [`manipulation.change_reaction_equations`](../src/raven_python/manipulation/change.py), `change_gene_reaction_rules` | Stoichiometry change in place; batch GPR set/append (`(old) or (new)`). |
+| `setParam('var', …)` | ✅ [`manipulation.set_variance_bounds`](../src/raven_python/manipulation/parameters.py) | ±% band around measured values. **Other modes (lb/ub/eq/obj/unc)**: 🗒️ cobra one-liners (`reaction.bounds`, `model.objective`, `Configuration().bounds`). |
+| `setExchangeBounds` | ⛔ not ported | cobra's `model.medium = {ex_id: uptake}` covers it. |
+| `simplifyModel` (gap modes) | ✅ [`manipulation.remove_dead_end_reactions`](../src/raven_python/manipulation/simplify.py), `remove_duplicate_reactions`, `constrain_reversible_reactions`, `group_linear_reactions` | Cobra-covered modes (no-flux→`find_blocked_reactions`, zero-interval, unconstrained) are 🗒️ cheatsheeted. `group_linear` is lossy (drops genes), per RAVEN. |
+| `removeMets`, `removeGenes` | ✅ [`manipulation.remove_metabolites`](../src/raven_python/manipulation/remove.py), `remove_genes` | Delegate to cobra; add `by_name` cross-compartment deletion (mets) and a `blocked_reactions` policy (genes). `removeReactions` itself ⛔ — cobra's `remove_reactions` covers it. |
+| `convertToIrrev` | ✅ [`manipulation.convert_to_irreversible`](../src/raven_python/manipulation/irreversible.py) | Splits reversible non-exchange reactions into forward + `_REV`. Adopted from geckopy. |
+| `expandModel` | ✅ [`manipulation.expand_model`](../src/raven_python/manipulation/expand.py) | Splits OR-GPR (isozyme) reactions into one per AND-clause. Adopted from geckopy. |
+| `mergeCompartments` | ✅ [`manipulation.merge_compartments`](../src/raven_python/manipulation/compartments.py) | Collapse a multi-compartment model into one; deduplicate identical reactions; optionally drop one-met collapses (`drop_single_metabolite_reactions`). |
+| `copyToComps` | ✅ [`manipulation.copy_to_compartment`](../src/raven_python/manipulation/compartments.py) | Duplicate reactions into a target compartment (idempotent; `delete_original=True` makes it a move). |
+| `mapCompartments` | ⛔ not ported | Overlaps with `comparison.compare_models` on the reaction-id intersection. |
+| `getElementalBalance` | ✅ [`utils.get_elemental_balance`](../src/raven_python/utils/balance.py) | Graded `balanced` / `unbalanced` / `unknown` — `unknown` catches a missing formula that cobra's `check_mass_balance` silently miscounts. |
+| `checkModelStruct` (curation subset) | ✅ [`utils.check_model`](../src/raven_python/utils/validate.py) | Structured curation report. RAVEN's struct/type checks are moot in cobra. |
+| `is_dnf` / GPR check (from `standardizeGrRules`) | ✅ [`utils.is_dnf`](../src/raven_python/utils/gpr.py), `find_non_dnf_grrules` | Lint-only half; cobra auto-normalises GPRs on assignment, so the rewriting half isn't ported. |
+| `getIndexes` (`metcomps` sliver) | ✅ [`utils.parse_name_comp`](../src/raven_python/utils/parse.py) | The only `getIndexes` bit cobra doesn't already cover. |
+| `editMiriam`, `extractMiriam` | ⛔ not ported | cobra's `.annotation` is already a `{namespace: id(s)}` dict — read/write it directly. |
+| `getRxnsInComp`, `getMetsInComp` | ⛔ not ported | One-liners over cobra's `reaction.compartments` / `metabolite.compartment`. |
+| `constructEquations` | ⛔ not ported | `reaction.build_reaction_string(use_metabolite_names=...)` already does both id and name equations. |
+| `sortIdentifiers` | ✅ [`utils.sort_identifiers`](../src/raven_python/utils/sort.py) | Model-wide alphabetical sort; also via `sort_ids=` on `write_yaml_model`. |
+
+### I/O
+
+| RAVEN | raven-python | Notes |
+|---|---|---|
+| `readYAMLmodel`, `writeYAMLmodel` | ✅ [`io.read_yaml_model`, `write_yaml_model`](../src/raven_python/io/yaml.py) | Aligned to cobra's `!!omap` writer (RAVEN `fa281a1`). Adds the RAVEN-only top-level per-entry keys (inchis/deltaG/metFrom/notes, confidence_score/references/rxnFrom/deltaG, protein) into `.notes`, plus `version`/`metaData`/GECKO `ec-*`. cobra-readable output verified. |
+| `exportModelToSIF` | ✅ [`io.export_model_to_sif`](../src/raven_python/io/sif.py) | Cytoscape SIF (`rc`/`rr`/`cc` graphs). |
+| `exportToExcelFormat` (export only) | ✅ [`io.export_to_excel`](../src/raven_python/io/excel.py) | RAVEN 5-sheet xlsx (RXNS / METS / COMPS / GENES / MODEL). Excel **import** is intentionally excluded. |
+| `exportForGit` | ✅ [`io.export_for_git`](../src/raven_python/io/git.py) | Standard-GEM repo layout (`model/<fmt>/…`). |
+| `importYAML/SBML/Mat/Excel` | 🗒️ cobra's standard readers | `cobra.io.read_sbml_model` / `load_json_model` / etc.; Excel import not ported. |
+
+## Reconstruction
+
+| RAVEN | raven-python | Notes |
+|---|---|---|
+| `getModelFromHomology` + `getBlast`/`getDiamond` | ✅ [`reconstruction.homology.get_model_from_homology`](../src/raven_python/reconstruction/homology/homology.py), `run_blast`, `run_diamond`, `blast_from_table` | Core homology reconstruction with structured improvements (bidirectional / best-hits-only, AST GPR rewrite, complex policy, bitscore best-hits, DataFrame ortholog map). |
+| KEGG download → species model (5 steps) | ✅ [`reconstruction.kegg.*`](../src/raven_python/reconstruction/kegg/) | All five steps: `download.fetch_keggdb`, `parse.read_kegg_table` + reference model, `hmm.build_libraries`, `organism.build_kegg_model_for_organism` (no-FASTA), `query.assign_kos` + `run_hmmscan`. |
+| `getPhylDist` | ⛔ not ported | Fixed prok90/euk90 libraries make the distance matrix moot. |
+| `getMetaCycModelForOrganism` | ⛔ not ported (and **flagged for removal from MATLAB RAVEN**) | BLAST-to-single-representatives is low-precision at every cutoff. See [IMPROVEMENTS.md](../IMPROVEMENTS.md) under `R-MetaCyc`. |
+
+## Tasks, gap-filling, INIT, ftINIT
+
+| RAVEN | raven-python | Notes |
+|---|---|---|
+| `parseTaskList`, `checkTasks` | ✅ [`tasks.parse_task_list`](../src/raven_python/tasks/tasklist.py), `check_tasks` ([tasks/check.py](../src/raven_python/tasks/check.py)) | `check_tasks` reuses one model across the task list (no per-task model copy) — at genome scale ~12× faster than the copy-per-task implementation it replaced. |
+| `fillGaps` (connectivity mode) | ✅ [`gapfilling.connect_blocked_reactions`](../src/raven_python/gapfilling/fill.py) | MILP (min penalty-weighted template reactions s.t. blocked reactions carry flux). Targeted mode → 🗒️ `cobra.gapfill`. |
+| `runINIT`, `scoreComplexModel`, `getINITModel` | ✅ [`init.run_init`](../src/raven_python/init/init.py), [`init.score_reactions_from_genes`, `gene_scores_from_expression`](../src/raven_python/init/score.py), [`init.get_init_model`](../src/raven_python/init/build.py) | INIT MILP rewritten in optlang (no sparse-matrix construction); RNA-seq scoring is `5·ln(level/ref)`-clamped. |
+| `ftINIT`, `prepINITModel`, `ftINITInternalAlg`, `getINITSteps` | ✅ [`init.ftinit`](../src/raven_python/init/ftinit.py), [`init.prep_init_model`](../src/raven_python/init/prep.py), [`init.run_ftinit`](../src/raven_python/init/ftinit.py), [`init.get_init_steps`](../src/raven_python/init/steps.py) | Staged MILP + linear merge + scaling (`rescaleModelForINIT`). Validated against RAVEN on Human-GEM (Jaccard 0.975–0.980; see [humangem_validation.md](humangem_validation.md)). Metabolomics-based scoring is the one piece not yet implemented (raises `NotImplementedError`). |
+| `ftINITFillGaps`, `ftINITFillGapsMILP`, `ftINITFillGapsForAllTasks` | ✅ [`init.fill_tasks`](../src/raven_python/init/taskfill.py) | Task-aware gap-filling within ftINIT; in-place `_feasible` check + bounded fill MILP (`mip_gap`, `time_limit`). |
+| `mergeLinear` | ✅ [`init.merge_linear`](../src/raven_python/init/merge.py) | Linear merge of unit-stoichiometry chains; bookkeeping (`group_ids`, `reversed_rxns`) to map back to the reference model. |
+| `removeLowScoreGenes` | ✅ [`init.remove_low_score_genes`](../src/raven_python/init/genes.py) | Final gene-prune step of ftINIT. |
+| `fitTasks` | ⛔ not ported | Niche; tasks are normally consumed via `checkTasks` / ftINIT's task layer. |
+
+## Omics, analysis, comparison
+
+| RAVEN | raven-python | Notes |
+|---|---|---|
+| `parseHPA`, `parseHPArna`, `scoreModel` | ✅ [`omics.parse_hpa`, `parse_hpa_rna`, `hpa_gene_scores`, `rna_gene_scores`](../src/raven_python/omics/hpa.py) | Pandas-tidy DataFrames; scoring adapters reuse `score_reactions_from_genes` (single source of truth for the GPR walk). |
+| `reporterMetabolites` | ✅ [`analysis.reporter_metabolites`](../src/raven_python/analysis/reporter.py) | Exact closed-form background replaces RAVEN's Monte-Carlo (RM1 in IMPROVEMENTS). |
+| `FSEOF` | ✅ [`analysis.fseof`](../src/raven_python/analysis/fseof.py) | Regression slope + correlation, amplify/knockdown/knockout classes, gene aggregation (FS1–FS4 in IMPROVEMENTS). |
+| `randomSampling` | ✅ [`analysis.sample`](../src/raven_python/analysis/sampling.py) | Wraps cobra's flux sampling. |
+| `compareMultipleModels` | ✅ [`comparison.compare_models`](../src/raven_python/comparison/compare.py) | Tidy DataFrames (reactions / mets / genes / subsystems presence + pairwise Jaccard + optional `check_tasks` pass/fail). Plotting and tSNE/MDS are 🗒️ one-liners in seaborn/scikit-learn; intentionally not in the function. |
+| `runDynamicFBA` | ⛔ not ported | Established Python implementations exist: [`dfba`](https://pypi.org/project/dfba/) (Pinheiro et al.; CVODES-backed), [`reframed`](https://pypi.org/project/reframed/) (Machado lab), [`mewpy`](https://pypi.org/project/mewpy/) (Cunha lab). Cobrapy itself has none, but re-porting would duplicate maintained prior art. |
+
+## Localisation (Phase 7)
+
+| RAVEN | raven-python | Notes |
+|---|---|---|
+| `predictLocalization` | ✅ [`localization.predict_localization`](../src/raven_python/localization/predict.py) | Deterministic MILP (not simulated annealing). Caller-passed `reactions_to_relocate` set (everything else pinned). Multi-compartment by default: primary "free", extras pay `multi_compartment_penalty`. Tolerates incomplete models (no silent reaction removal). `apply=False` returns a `LocalizationProposal` diff. Real-data validation against curated yeast-GEM in [yeast_localization_benchmark.md](yeast_localization_benchmark.md). |
+| `getWoLFScores`, `parseScores('wolf')` | ✅ [`localization.load_wolfpsort`](../src/raven_python/localization/scores.py) | Parses WoLF PSORT summary output (RAVEN-compatible); row-normalised. Does not shell out to the WoLF PSORT binary — run that separately and feed in the output. |
+| `parseScores('deeploc')` | ✅ [`localization.load_deeploc`](../src/raven_python/localization/scores.py) | DeepLoc 2 per-protein CSV (Protein_ID / Localizations / Signals + one column per compartment). |
+
+## Things deliberately not ported
+
+* **`ravenCobraWrapper` / RAVEN struct adapter** — cobra is the canonical object; no parallel struct.
+* **`checkModelStruct` struct/type checks** — moot in cobra.
+* **`runDynamicFBA`** — see Omics/analysis row.
+* **`getMetaCycModelForOrganism`** — see Reconstruction row; flagged for upstream removal.
+* **`getPhylDist`** — fixed prok90/euk90 libraries make it moot.
+* **`mapCompartments`** — overlaps with `compare_models`.
+* **`editMiriam`, `extractMiriam`, `getRxnsInComp`, `getMetsInComp`, `constructEquations`, `getIndexes`** (most), **`setExchangeBounds`**, **most `setParam` modes**, **`getBlastFromExcel` Excel branch** — cobra one-liners; recorded above.
+
+## "New in raven-python" entry points
+
+These are not direct RAVEN ports:
+
+* [`comparison.compare_models`](../src/raven_python/comparison/compare.py) — already in RAVEN, but the raven-python version returns tidy DataFrames suitable for downstream analysis (RAVEN's version embeds heatmap/tSNE plotting).
+* [`reconstruction.homology.run_blast` / `run_diamond` / `blast_from_table`](../src/raven_python/reconstruction/homology/blast.py) — generic subprocess wrappers; a DataFrame is the canonical "hits" object (no `blastStructure` struct).
+* [`binaries.resolve_binary`, `ensure_binary`](../src/raven_python/binaries.py) — version-pinned release-ZIP registry for external tools (BLAST/DIAMOND/HMMER), SHA256-verified.
+* [`localization.LocalizationProposal`](../src/raven_python/localization/predict.py) — the diff-preview mode (`apply=False`).
diff --git a/docs/todo.md b/docs/todo.md
new file mode 100644
index 0000000..29f8571
--- /dev/null
+++ b/docs/todo.md
@@ -0,0 +1,48 @@
+# Open work
+
+What's still on the books. See [raven_migration.md](raven_migration.md) for the function-by-
+function port status (most of it is done); see [IMPROVEMENTS.md](../IMPROVEMENTS.md) for the
+catalogue of raven-python improvements that should also be back-ported into MATLAB RAVEN.
+
+## Major
+
+### Visualization (`plotting/`)
+
+Not started. RAVEN has limited plotting (`drawMap` etc., MATLAB-bound). For raven-python the most
+useful targets are:
+
+* Pathway maps / Escher integration for context-specific models.
+* Omics overlay (gene-score / expression heatmaps on the reaction set).
+* Flux distribution overlays.
+
+cobrapy + Escher already covers a lot here — likely a thin integration layer rather than a port.
+
+### Metabolomics-based scoring in ftINIT
+
+The metabolomics-detected metabolite production-reward block in [`init.ftinit`](../src/raven_python/init/ftinit.py)
+currently raises `NotImplementedError` if a non-empty `metabolomics` argument is passed. The
+linear merge eliminates degree-2 detected metabolites, so it needs producer-group-mapping +
+negative-producer force-flux constraints — the most intricate MILP piece, for the least-used
+input. Worth doing only when a real user request lands.
+
+## Infrastructure
+
+* **Binary ZIP releases** for BLAST/DIAMOND (Phase 3a). The runtime resolver in
+  [`binaries.py`](../src/raven_python/binaries.py) is ready; the registry is empty until ZIPs are
+  published as GitHub release assets.
+* **KEGG data artefact releases.** See [maintaining_kegg_data.md](maintaining_kegg_data.md).
+
+## Smaller items
+
+* [known_issues.md](known_issues.md) — backlog of low-priority edge cases / robustness gaps /
+  dead code from the full-codebase review. None affects correctness on well-formed inputs.
+* [IMPROVEMENTS.md](../IMPROVEMENTS.md) — items marked 💡 *proposed* are candidates to
+  implement (and back-port).
+
+## Upstream blockers (not raven-python work, but worth tracking)
+
+* `optlang.hybrid_interface.Configuration.clone()` bug — blocks HiGHS at any scale (CI catches
+  it in `tests/test_init_solvers.py`).
+* GLPK's MIP solve ignores `configuration.timeout` at genome scale — blocks GLPK on large MILPs.
+* Both documented in [init_solver_benchmark.md](init_solver_benchmark.md) with concrete fix
+  suggestions.
diff --git a/src/raven_python/io/yaml.py b/src/raven_python/io/yaml.py
index 151954b..e3da8ac 100644
--- a/src/raven_python/io/yaml.py
+++ b/src/raven_python/io/yaml.py
@@ -86,13 +86,33 @@ def _capture_entry_fields(entries, fields):
 
 
 def read_yaml_model(path: str | Path) -> cobra.Model:
-    """Read a RAVEN/cobrapy YAML model into a ``cobra.Model``."""
+    """Read a RAVEN/cobrapy YAML model into a ``cobra.Model``.
+
+    Convenience wrapper around :func:`model_from_yaml_data` that opens the
+    file (transparently un-gzipping ``.gz``) and parses the YAML. Callers
+    that need to pre-process the document (e.g. lift legacy fields that
+    cobra doesn't recognise) can read+normalise themselves and call
+    :func:`model_from_yaml_data` with the resulting dict.
+    """
     with _open_text(path, "r") as handle:
         raw = _to_plain(_cobra_yaml.load(handle))
 
     if not isinstance(raw, dict):
         raise ValueError(f"{path}: top-level YAML is a {type(raw).__name__}, not a mapping.")
+    return model_from_yaml_data(raw)
+
 
+def model_from_yaml_data(raw: dict) -> cobra.Model:
+    """Build a ``cobra.Model`` from an already-parsed RAVEN/cobrapy YAML dict.
+
+    Strips and restores the RAVEN per-entry side-fields onto each entry's
+    ``.notes``, lifts ``id``/``name`` out of legacy ``metaData``, and
+    stashes everything else (``version``, the ``metaData`` block itself,
+    and any unknown top-level keys such as ``ec-rxns``/``ec-enzymes``)
+    onto ``model.notes`` so a round-trip via :func:`write_yaml_model`
+    preserves them. ``raw`` is mutated in place — copy it first if the
+    caller needs the original.
+    """
     metadata = raw.pop("metaData", None) or {}
     version = raw.pop("version", None)
     foreign = {k: raw.pop(k) for k in list(raw) if k not in _COBRA_TOP_KEYS}
@@ -159,6 +179,17 @@ def write_yaml_model(
 
     doc = OrderedDict(_to_plain(model_to_dict(model)))
 
+    # cobra's model_to_dict serialises model.notes verbatim into doc["notes"],
+    # so the three management keys we just lifted out would otherwise also
+    # appear nested inside the notes section. Strip them; preserve any other
+    # genuine notes the caller stored on the model.
+    doc_notes = doc.get("notes")
+    if isinstance(doc_notes, dict):
+        for key in ("metaData", "version", "_yaml_sections"):
+            doc_notes.pop(key, None)
+        if not doc_notes:
+            doc.pop("notes", None)
+
     if sort_ids:
         for section in ("metabolites", "reactions", "genes"):
             if section in doc:
diff --git a/src/raven_python/plotting/__init__.py b/src/raven_python/plotting/__init__.py
new file mode 100644
index 0000000..4fb3839
--- /dev/null
+++ b/src/raven_python/plotting/__init__.py
@@ -0,0 +1 @@
+"""Pathway-map and omics-overlay visualisation (stub — not yet implemented)."""