Skip to content

Commit 8af3a5a

Browse files
committed
Land yeast-GEM curation framework over raven-python + RAVEN (phase 6)
Moves the generic batch-curation engine upstream: raven-python gains raven_python.curation and RAVEN gains core/curateModelFromTables (both on feat/yeast-gem-shared in their respective repos). yeast-GEM keeps the user-facing curateMetsRxnsGenes function name as a thin shim that pins yeast's s_/r_ prefixes and forwards upstream. MATLAB shim (code/modelCuration/curateMetsRxnsGenes.m): - Replaces the 330-line original with a ~50-line shim. - Forwards to curateModelFromTables(..., 's_', 'r_'). - Preserves the original 1-arg through 5-arg call shapes, so the v8_*/v9_* curation scripts and TEMPLATEcuration keep working without change. Python (yeastgem.curation): - curate_mets_rxns_genes(model, *, mets_df=..., genes_df=..., rxns_df=..., rxns_coeffs_df=...) returns a CurationResult dataclass recording adds and overwrites. - curate_mets_rxns_genes_from_tsv(...) takes file paths instead. - Both fix met_id_prefix='s_', rxn_id_prefix='r_'; everything else is delegated to raven_python.curation. Schema (unchanged from MATLAB): metabolites — match by (name, comp); columns metNames, comps, formula, charge, inchi, metNotes, then MIRIAM. genes — match by gene id; columns genes, geneShortNames, then MIRIAM. reactions — match by stoichiometric signature; columns rxnNames, grRules, lb, ub, rev, subSystems, eccodes, rxnNotes, rxnReferences, rxnConfidenceScores, then MIRIAM. coefficients— rxnNames, metNames, comps, coefficient (one row per (rxn, met) pair). Optional leading 'index' column from the v8_7_0 schema is silently ignored upstream. "Everything after the listed core columns is MIRIAM" — yeast-GEM's TSVs work unchanged. Tests (4 new yeast-side, 65 total): one each for the s_ / r_ prefix pinning, the empty-call no-op, and an end-to-end smoke test against the real v8_6_3 VolPolyP TSV pack (1 gene + 35 reactions match by stoichiometry → warned overwrites, as expected on the current model state). Verification: MATLAB shim no-op call (model unchanged) confirms the prefix pinning forwards correctly to RAVEN's curateModelFromTables. Full Python-vs-MATLAB end-to-end parity on real TSV packs is blocked by pre-existing flakiness in the legacy curateMetsRxnsGenes — it errors on the v8_6_3 VolPolyP schema (no `index` column) and the v8_7_0 DBnewRxns pack (logical-index bug at line 282 against the current model). The Python implementation is more permissive and handles both packs cleanly; lock-step parity for new curation work holds by construction since both languages now go through the same upstream engine. runPhase6Curation.m persisted under code/python/tests/reference/ for re-use when a clean test TSV pack becomes available. PORTING_PLAN.md status table marks phase 6 done; UPSTREAM_CANDIDATES.md records the curation move.
1 parent 5518c2d commit 8af3a5a

7 files changed

Lines changed: 217 additions & 316 deletions

File tree

code/modelCuration/curateMetsRxnsGenes.m

Lines changed: 33 additions & 314 deletions
Large diffs are not rendered by default.

code/python/PORTING_PLAN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ saved and validated entirely from Python.
1616
| 3.5. Upstream restructure (raven-python + RAVEN) | **done** | Decision #1 reversed: generic helpers move upstream rather than living locally. Moved to raven-python (on `feat/yeast-gem-shared`): `raven_python.comparison.diff_models` + `DiffReport` (renamed from the local `compare_models`/`ComparisonReport`), `raven_python.annotation.{add_sbo_terms, load_delta_g_csv, save_delta_g_csv}`, `raven_python.conditions.{apply_condition, load_condition, set_reaction_bounds}`. Moved to RAVEN (on `feat/yeast-gem-shared`): `io/readYAML.m`, `core/applyCondition.m`. yeast-GEM now: depends on `raven-python` (git URL pinned to the feature branch), `yeastgem.compare`/`yeastgem.missing_fields`/`yeastgem.conditions` become thin wrappers that configure upstream defaults with yeast-specific data; MATLAB `code/readYAML.m` deleted, `code/applyCondition.m` → `code/applyYeastCondition.m` (handles the `amino_acid_ratio` pre-step then delegates to RAVEN). yeast-GEM uses the legacy `only_last_reaction_for_pseudo=True` flag on the upstream `add_sbo_terms` to stay byte-equivalent during the transition. **46 new raven-python tests + 46 yeast-GEM tests passing.** Verified: all 4 conditions byte-equivalent pre vs post restructure on MATLAB; Python `commit_yeast_model` (now through raven-python) semantically equal to MATLAB `commitYeastModel`. |
1717
| 4. Tier 2 — biomass + conditions in Python | **done (core)** | Biomass subsystem moved upstream as `raven_python.biomass` (`BiomassConfig` + `BiomassComponent` + `sum_biomass` / `scale_biomass` / `rescale_pseudoreaction` / `set_gam`; 19 new tests on synthetic models). yeast-GEM ids.yml gained a `biomass_components` section; `yeastgem.biomass` exposes `sum_biomass`, `scale_biomass`, `rescale_pseudoreaction` (with the yeast `lipid` → backbone+chain aggregation), `set_gam` (auto-locates the NGAM reaction by name), and `change_amino_acid_ratio` (reads `data/physiology/aminoAcid_Bjorkeroth2020.tsv`). `yeastgem.conditions.apply` now handles `amino_acid_ratio` before delegating to upstream; `yeastgem.io.commit_yeast_model` runs the anaerobic growth check on a copy. **Verified** end-to-end on the real model: Python `conditions.apply('anaerobic')` produces SBML semantically equal to MATLAB `applyYeastCondition('anaerobic')`; Python `commit_yeast_model` (with anaerobic check active) produces SBML semantically equal to MATLAB `commitYeastModel`. 54 yeast-GEM tests + 38 new raven-python tests passing. **Deferred:** chemostat sweep + `fit_gam` (analysis/calibration, not part of the commit pipeline; tracked in UPSTREAM_CANDIDATES.md). |
1818
| 5. Tier 3 — test suite | **done** | Ported the four ``code/modelTests/`` routines to ``yeastgem.model_tests``: ``growth`` (Tobias 2013 chemostat R² across 4 conditions), ``essential_genes`` (cobrapy ``single_gene_deletion`` + Stanford KO collection, returns ``EssentialGeneResult`` dataclass with accuracy / sensitivity / specificity / MCC), ``anaerobic_flux_predictions`` (Jouhten 2008 + Frick & Wittmann flux R² + mean relative error), ``plot_anaerobic`` (fermentation-product bar plot), ``find_duplicated_rxns`` (wrapper over the new ``raven_python.manipulation.find_duplicate_reactions``). Stanford ORF lists extracted from ``essentialGenes.m`` to ``data/essentialGenes/{inviable,verified}_orfs.txt`` so both languages read the same source. 7 new yeast-GEM tests + 6 new raven-python tests; full Python suite 61/61 passing. Verified vs MATLAB on the real model (`runPhase5Metrics.m`): growth R² matches at 1e-7; anaerobic flux R² and essential-gene accuracy/MCC match within 5e-3; single 1-gene difference in the essential-gene confusion matrix is a Gurobi/HiGHS solver-tolerance borderline at the 1e-6 ratio threshold. |
19-
| 6. Tier 4 — curation framework | not started | |
19+
| 6. Tier 4 — curation framework | **done** | Generic `curateModelFromTables` engine moved to RAVEN (with `metPrefix` / `rxnPrefix` parameters defaulted to BiGG `M_`/`R_`); equivalent `raven_python.curation.{batch_curate, batch_curate_from_tsv}` in raven-python with the same schema (DataFrames + a `from_tsv` convenience). yeast-GEM keeps the user-facing `curateMetsRxnsGenes` MATLAB function as a 50-line shim that pins yeast's `s_`/`r_` prefixes and forwards upstream; the historical v8_*/v9_* curation scripts and `TEMPLATEcuration` keep working without change. New `yeastgem.curation.curate_mets_rxns_genes` Python entry point with the same prefix pinning. "Everything after the listed core columns is MIRIAM" — yeast-GEM's existing TSVs (12+10+9 MIRIAM columns) work unchanged. 13 new raven-python tests + 4 new yeast-GEM tests; full Python suite 65/65 passing. **MATLAB shim verified** to forward correctly (no-op call leaves the model unchanged). Direct MATLAB-vs-Python end-to-end parity check is blocked by pre-existing flakiness in the legacy `curateMetsRxnsGenes` (errors on the v8_6_3 VolPolyP schema and the v8_7_0 DBnewRxns pack); the Python implementation is more permissive than the legacy MATLAB on these edge cases. |
2020
| 7. Docs + CI | partial | Python CI workflow added; README "not yet functional" note still to update. |
2121

2222
## Design principles

code/python/UPSTREAM_CANDIDATES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ The following were extracted from yeast-GEM into upstream branches
2525
| `code/applyCondition.m` (generic core) | RAVEN `core/applyCondition.m` | takes YAML path or struct |
2626
| biomass subsystem (`sumBioMass`/`scaleBioMass`/`rescalePseudoReaction`/`changeGAM`) | `raven_python.biomass` | `BiomassConfig`/`BiomassComponent`, `sum_biomass`, `scale_biomass`, `rescale_pseudoreaction`, `set_gam` |
2727
| `findDuplicatedRxns` (detection only) | `raven_python.manipulation` | `find_duplicate_reactions(model, *, ignore_direction=True)` |
28+
| `curateMetsRxnsGenes` (batch TSV curation engine) | `raven_python.curation` + RAVEN `core/curateModelFromTables.m` | `batch_curate(model, mets_df=…, genes_df=…, rxns_df=…, rxns_coeffs_df=…, met_id_prefix=…, rxn_id_prefix=…)`, `batch_curate_from_tsv` |
2829

2930
yeast-GEM now keeps:
3031
- `yeastgem.compare` — re-export of the upstream `diff_models` under
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
function runPhase6Curation(yeastGemPath, outXml)
2+
% runPhase6Curation Apply the v8_6_3 VolPolyP curation TSVs and export.
3+
%
4+
% Used for the Python-vs-MATLAB curation-engine parity check.
5+
6+
warning('off','all');
7+
restoredefaultpath; rehash toolboxcache;
8+
addpath('/opt/gurobi1301/linux64/matlab');
9+
addpath(genpath('/home/eduardk/github/RAVEN'));
10+
addpath(fullfile(yeastGemPath, 'code'));
11+
addpath(fullfile(yeastGemPath, 'code', 'modelCuration'));
12+
addpath(fullfile(yeastGemPath, 'code', 'otherChanges'));
13+
addpath(fullfile(yeastGemPath, 'code', 'missingFields'));
14+
15+
dataDir = fullfile(yeastGemPath, 'data', 'modelCuration', 'v8_7_0');
16+
mets = fullfile(dataDir, 'DBnewRxnsMets.tsv');
17+
genes = fullfile(dataDir, 'DBnewRxnsGenes.tsv');
18+
rxns = fullfile(dataDir, 'DBnewRxnsRxns.tsv');
19+
coeffs = fullfile(dataDir, 'DBnewRxnsCoeffs.tsv');
20+
21+
model = loadYeastModel;
22+
model = curateMetsRxnsGenes(model, mets, genes, coeffs, rxns);
23+
exportModel(model, outXml);
24+
fprintf('Wrote %s\n', outXml);
25+
end

code/python/tests/test_curation.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
"""Tests for ``yeastgem.curation`` against the real yeast-GEM model.
2+
3+
Unit-level coverage of the generic engine lives upstream in
4+
``raven_python/tests/test_curation.py``; here we just verify the
5+
yeast-GEM shim picks up the ``s_`` / ``r_`` prefixes and that real
6+
v8_*/v9_* TSVs apply cleanly.
7+
"""
8+
from __future__ import annotations
9+
10+
import pandas as pd
11+
12+
from yeastgem import curation
13+
from yeastgem.io import REPO_PATH
14+
15+
16+
def test_new_met_uses_s_prefix(model):
17+
mutated = model.copy()
18+
df = pd.DataFrame([
19+
{"metNames": "test_metabolite_phase_6", "comps": "c",
20+
"formula": "C2H6O", "charge": 0, "inchi": "", "metNotes": ""},
21+
])
22+
result = curation.curate_mets_rxns_genes(mutated, mets_df=df)
23+
assert len(result.added_metabolites) == 1
24+
assert result.added_metabolites[0].startswith("s_")
25+
26+
27+
def test_new_rxn_uses_r_prefix(model):
28+
mutated = model.copy()
29+
# Use an existing yeast met (s_0794 = H+[c]) to avoid the
30+
# add-new-met machinery.
31+
atp = next(m for m in mutated.metabolites if m.name == "ATP" and m.compartment == "c")
32+
rxns_df = pd.DataFrame([
33+
{"rxnNames": "phase6 test rxn", "grRules": "", "lb": 0, "ub": 1000,
34+
"rev": 0, "subSystems": "", "eccodes": "", "rxnNotes": "",
35+
"rxnReferences": "", "rxnConfidenceScores": ""},
36+
])
37+
coeffs_df = pd.DataFrame([
38+
{"rxnNames": "phase6 test rxn", "metNames": atp.name, "comps": "c",
39+
"coefficient": -1.0},
40+
{"rxnNames": "phase6 test rxn", "metNames": "H+", "comps": "c",
41+
"coefficient": 1.0},
42+
])
43+
result = curation.curate_mets_rxns_genes(
44+
mutated, rxns_df=rxns_df, rxns_coeffs_df=coeffs_df,
45+
)
46+
assert len(result.added_reactions) == 1
47+
assert result.added_reactions[0].startswith("r_")
48+
49+
50+
def test_real_curation_tsvs_v8_6_3_volpolyp(model):
51+
"""Apply the v8_6_3 VolPolyP curation files end-to-end. Mostly a
52+
smoke test: confirm no exception, and that some entities were
53+
added/updated."""
54+
mutated = model.copy()
55+
data_dir = REPO_PATH / "data" / "modelCuration" / "v8_6_3"
56+
57+
result = curation.curate_mets_rxns_genes_from_tsv(
58+
mutated,
59+
mets_tsv=data_dir / "VolPolyPMets.tsv",
60+
genes_tsv=data_dir / "VolPolyPGenes.tsv",
61+
rxns_tsv=data_dir / "VolPolyPRxns.tsv",
62+
rxns_coeffs_tsv=data_dir / "VolPolyPRxnsCoeffs.tsv",
63+
)
64+
# We applied a TSV pack — at minimum some entity should land.
65+
touched = (
66+
len(result.added_metabolites) + len(result.updated_metabolites)
67+
+ len(result.added_genes) + len(result.updated_genes)
68+
+ len(result.added_reactions) + len(result.updated_reactions)
69+
)
70+
assert touched > 0
71+
72+
73+
def test_empty_call_no_op(model):
74+
mutated = model.copy()
75+
result = curation.curate_mets_rxns_genes(mutated)
76+
assert not result

code/python/yeastgem/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88
from __future__ import annotations
99

10-
from yeastgem import biomass, conditions, model_tests
10+
from yeastgem import biomass, conditions, curation, model_tests
1111
from yeastgem.compare import ComparisonReport, compare_models
1212
from yeastgem.config import YeastIDs, load_ids
1313
from yeastgem.io import (
@@ -29,6 +29,7 @@
2929
"commit_yeast_model",
3030
"compare_models",
3131
"conditions",
32+
"curation",
3233
"load_delta_g",
3334
"load_ids",
3435
"model_tests",

code/python/yeastgem/curation.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
"""Yeast-GEM batch curation entry point.
2+
3+
Thin wrapper over :func:`raven_python.curation.batch_curate` (and its
4+
``from_tsv`` companion) that pins the yeast-GEM id prefixes
5+
(``'s_'`` for new metabolites, ``'r_'`` for new reactions). All
6+
schema details (column names, MIRIAM-auto-detection, match keys) live
7+
upstream — see ``raven_python.curation`` for the full reference.
8+
9+
The MATLAB counterpart is ``code/modelCuration/curateMetsRxnsGenes.m``
10+
(also a shim over RAVEN's ``curateModelFromTables``).
11+
"""
12+
from __future__ import annotations
13+
14+
from pathlib import Path
15+
16+
import cobra
17+
import pandas as pd
18+
from raven_python.curation import (
19+
CurationResult,
20+
)
21+
from raven_python.curation import (
22+
batch_curate as _ra_batch_curate,
23+
)
24+
from raven_python.curation import (
25+
batch_curate_from_tsv as _ra_batch_curate_from_tsv,
26+
)
27+
28+
# Yeast-GEM id prefixes — frozen for both Python and MATLAB callers.
29+
_MET_ID_PREFIX = "s_"
30+
_RXN_ID_PREFIX = "r_"
31+
32+
33+
def curate_mets_rxns_genes(
34+
model: cobra.Model,
35+
*,
36+
mets_df: pd.DataFrame | None = None,
37+
genes_df: pd.DataFrame | None = None,
38+
rxns_df: pd.DataFrame | None = None,
39+
rxns_coeffs_df: pd.DataFrame | None = None,
40+
) -> CurationResult:
41+
"""Add or update metabolites / reactions / genes from DataFrames.
42+
43+
Yeast-GEM-specific id prefixes are applied automatically (``s_`` /
44+
``r_``); everything else is delegated to
45+
:func:`raven_python.curation.batch_curate`. See its docstring for
46+
the schema, match-key rules and the MIRIAM-auto-detection
47+
convention.
48+
"""
49+
return _ra_batch_curate(
50+
model,
51+
mets_df=mets_df,
52+
genes_df=genes_df,
53+
rxns_df=rxns_df,
54+
rxns_coeffs_df=rxns_coeffs_df,
55+
met_id_prefix=_MET_ID_PREFIX,
56+
rxn_id_prefix=_RXN_ID_PREFIX,
57+
)
58+
59+
60+
def curate_mets_rxns_genes_from_tsv(
61+
model: cobra.Model,
62+
*,
63+
mets_tsv: str | Path | None = None,
64+
genes_tsv: str | Path | None = None,
65+
rxns_tsv: str | Path | None = None,
66+
rxns_coeffs_tsv: str | Path | None = None,
67+
) -> CurationResult:
68+
"""File-path convenience wrapper — same shape as the MATLAB
69+
``curateMetsRxnsGenes(model, metsInfo, genesInfo, rxnsCoeffs,
70+
rxnsInfo)``."""
71+
return _ra_batch_curate_from_tsv(
72+
model,
73+
mets_tsv=mets_tsv,
74+
genes_tsv=genes_tsv,
75+
rxns_tsv=rxns_tsv,
76+
rxns_coeffs_tsv=rxns_coeffs_tsv,
77+
met_id_prefix=_MET_ID_PREFIX,
78+
rxn_id_prefix=_RXN_ID_PREFIX,
79+
)

0 commit comments

Comments
 (0)