Skip to content

Commit 3436276

Browse files
committed
Docs + cross-language CI parity gates (phase 7)
Closes the porting plan: top-level README reflects the now-functional Python contribution path, code/python/README.md gets a getting-started block + API map, and the Python CI workflow grows two new required parity gates. README updates -------------- - Top-level README: replaced the "Contribution via python is not yet functional" paragraph with a description of the yeastgem + raven-python split. Updated the load/save example to use read_yeast_model / commit_yeast_model (with the saveYeastModel -> commitYeastModel rename note). Removed the obsolete .env setup-step language (yeastgem auto-detects the repo root). - code/python/README.md: rewritten as an API map covering the seven modules (io, compare, conditions, biomass, missing_fields, model_tests, curation) with one-liner descriptions and links into the source. Add the dev / pytest / ruff workflows. CI workflow (.github/workflows/python.yml) ------------------------------------------ - test (matrix Python 3.10/3.11/3.12, ruff + pytest) — unchanged. - parity-level-1-round-trip (new) — runs code/python/tests/ci/check_round_trip.py: load the committed model/yeast-GEM.xml via cobrapy, write it to a temp file, reload, diff via raven_python.comparison.diff_models. Catches SBML library regressions, annotation losses, and accidental id rewrites. - parity-level-2-metrics (new) — runs code/python/tests/ci/check_metrics.py: compute growth R², essential-gene accuracy / sensitivity / specificity / MCC + confusion matrix, and anaerobic flux R² on the committed model; diff against the MATLAB-produced reference at code/python/tests/reference/metrics.json within tolerance. - The matlab-reference-compare placeholder job is gone — its work is now done by the two real parity gates. - Workflow path filters extended to trigger on changes to data/yeastgem/, data/conditions/, data/essentialGenes/, and data/physiology/ (everything the CI reads). Reference metrics ----------------- code/python/tests/reference/metrics.json seeds the level-2 gate with the values measured during phase 5 verification (MATLAB R2024b + Gurobi 13.0 + RAVEN feat/yeast-gem-shared on commit b4d3769): growth_r2 0.906164 essential_genes accuracy 0.902439 (tp/tn/fp/fn 934/65/94/14) anaerobic_flux_r2 0.904765 Tolerances absorb the known Gurobi-vs-HiGHS drift around the 1e-6 growth-ratio threshold: gene counts ±2, R² ±5e-3, MCC ±5e-2. Verified locally: - parity-level-1-round-trip → Models are semantically equal. - parity-level-2-metrics → All metric-parity checks passed. Prerequisite for the CI to pass on GitHub: raven-python's feat/yeast-gem-shared branch must be pushed so the ``raven-python @ git+https://...@feat/yeast-gem-shared`` URL in pyproject.toml resolves. Once that branch lands on a tagged release, the pin can switch to a version constraint. PORTING_PLAN.md status table marks phase 7 done; the porting plan is complete.
1 parent 8af3a5a commit 3436276

8 files changed

Lines changed: 302 additions & 48 deletions

File tree

.github/workflows/python.yml

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,31 @@ on:
66
paths:
77
- 'code/python/**'
88
- 'code/io.py'
9+
- 'data/yeastgem/**'
10+
- 'data/conditions/**'
11+
- 'data/essentialGenes/**'
12+
- 'data/physiology/**'
913
- 'model/**'
1014
- '.github/workflows/python.yml'
1115
pull_request:
1216
branches: [main, develop]
1317
paths:
1418
- 'code/python/**'
1519
- 'code/io.py'
20+
- 'data/yeastgem/**'
21+
- 'data/conditions/**'
22+
- 'data/essentialGenes/**'
23+
- 'data/physiology/**'
1624
- 'model/**'
1725
- '.github/workflows/python.yml'
1826

1927
jobs:
28+
# Unit tests + lint across the supported Python matrix. Fast (~5 min
29+
# wall clock per Python version after caches warm).
2030
test:
2131
runs-on: ubuntu-latest
2232
strategy:
33+
fail-fast: false
2334
matrix:
2435
python-version: ['3.10', '3.11', '3.12']
2536

@@ -44,12 +55,11 @@ jobs:
4455
working-directory: code/python
4556
run: pytest -v
4657

47-
# Level-1 semantic-equality gate vs. the committed MATLAB reference
48-
# artifact. Skipped until the reference bundle is seeded (see
49-
# code/python/tests/reference/README.md). When enabled, this becomes
50-
# a required check per the lock-step parity policy.
51-
matlab-reference-compare:
52-
if: false # enable once reference bundle is committed
58+
# Level-1 parity — Python SBML read+write of the committed
59+
# model/yeast-GEM.xml must round-trip to a semantically-equal model.
60+
# Catches SBML library regressions, annotation losses, and
61+
# accidental id rewrites.
62+
parity-level-1-round-trip:
5363
runs-on: ubuntu-latest
5464
needs: test
5565
steps:
@@ -60,12 +70,37 @@ jobs:
6070
uses: actions/setup-python@v5
6171
with:
6272
python-version: '3.12'
73+
cache: pip
74+
cache-dependency-path: code/python/pyproject.toml
75+
76+
- name: Install yeastgem
77+
run: pip install -e code/python/
78+
79+
- name: SBML round-trip preserves model
80+
run: python code/python/tests/ci/check_round_trip.py
81+
82+
# Level-2 parity — Python validation metrics must match the
83+
# committed MATLAB-produced reference within tolerance. Tolerances
84+
# account for Gurobi-vs-HiGHS solver drift around the essential-gene
85+
# 1e-6 growth-ratio threshold. Regenerate the reference via
86+
# code/python/tests/reference/runPhase5Metrics.m when the metrics
87+
# shift legitimately.
88+
parity-level-2-metrics:
89+
runs-on: ubuntu-latest
90+
needs: test
91+
steps:
92+
- name: Checkout
93+
uses: actions/checkout@v4
94+
95+
- name: Set up Python
96+
uses: actions/setup-python@v5
97+
with:
98+
python-version: '3.12'
99+
cache: pip
100+
cache-dependency-path: code/python/pyproject.toml
63101

64102
- name: Install yeastgem
65103
run: pip install -e code/python/
66104

67-
- name: Compare Python-loaded model against MATLAB reference
68-
run: |
69-
python -m yeastgem.compare \
70-
model/yeast-GEM.xml \
71-
code/python/tests/reference/yeast-GEM.xml
105+
- name: Validation metrics match the committed reference
106+
run: python code/python/tests/ci/check_metrics.py

README.md

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,26 @@ Please see the installation instructions for each software package.
7171
* [RAVEN Toolbox](https://github.com/SysBioChalmers/RAVEN) version 2.8.3 or later
7272

7373
* Python-based
74-
Contribution via python (cobrapy) is not yet functional. In essence, if you can retain the same format of the model files, you can still contribute to the development of yeast-GEM. However, you cannot use the MATLAB functions.
75-
76-
If you want to use any of the [provided](https://github.com/SysBioChalmers/yeast-GEM/tree/main/code) Python functions, you may create an environment with all requirements:
74+
Contribution via Python is supported through the `yeastgem` package
75+
under [code/python/](code/python/) and its
76+
[PORTING_PLAN.md](code/python/PORTING_PLAN.md). The package builds
77+
on [cobrapy](https://github.com/opencobra/cobrapy) and
78+
[raven-python](https://github.com/SysBioChalmers/raven-python) (the
79+
Python port of RAVEN) — the latter provides the generic GEM
80+
utilities (`diff_models`, `add_sbo_terms`, condition / biomass /
81+
curation helpers) that `yeastgem` configures with the yeast-specific
82+
data files under [data/](data/).
83+
84+
Install from a checkout:
7785
```bash
78-
pip install -r code/requirements/requirements.txt # install all dependencies
79-
touch .env # create a .env file for locating the root
86+
pip install -e code/python/[dev]
8087
```
8188

89+
The release pipeline equivalent to the MATLAB `commitYeastModel`
90+
is `yeastgem.commit_yeast_model`. The historical
91+
[code/io.py](code/io.py) is kept as a deprecated forwarding shim
92+
that re-exports from the new package.
93+
8294
If you want to locally run `memote run` or `memote report history`, you should also install [git lfs](https://git-lfs.github.com/), as `results.db` (the database that stores all memote results) is tracked with git lfs.
8395

8496
## Model usage
@@ -87,21 +99,26 @@ Make sure to load/save the model with the corresponding wrapper functions:
8799
* In Matlab:
88100
```matlab
89101
cd ./code
90-
model = loadYeastModel(); % loading
91-
saveYeastModel(model); % saving
102+
model = loadYeastModel(); % loading
103+
commitYeastModel(model); % saving — release pipeline (was saveYeastModel)
92104
```
93105
* If RAVEN is not installed, you can also use COBRA-native functions (`readCbModel`, `writeCbModel`), but these model-files cannot be committed back to the GitHub repository.
94-
* In Python:
95-
Before opening Python, the following command should (once) be run in the yeast-GEM root folder:
96-
```bash
97-
touch .env # create a .env file for locating the root
98-
```
99-
Afterwards, the model can be loaded in Python with:
106+
* `saveYeastModel` is kept as a deprecated shim that forwards to `commitYeastModel`; it emits a deprecation warning.
107+
* In Python (after `pip install -e code/python/`):
100108
```python
101-
import code.io as io
102-
model = io.read_yeast_model() # loading
103-
io.write_yeast_model(model) # saving
109+
from yeastgem import read_yeast_model, commit_yeast_model
110+
model = read_yeast_model() # loading
111+
commit_yeast_model(model) # saving — release pipeline (validates,
112+
# applies canonical state, writes SBML +
113+
# ΔG CSVs, updates README)
104114
```
115+
The Python release pipeline currently writes the `.xml` artifact and
116+
the ΔG side-car CSVs; the `.yml` / `.txt` companion exports still
117+
require running the MATLAB `commitYeastModel`. Anaerobic growth and
118+
the model_tests benchmarks are wired in
119+
[`yeastgem.model_tests`](code/python/yeastgem/model_tests/); batch
120+
curation from TSV inputs is available via
121+
[`yeastgem.curation`](code/python/yeastgem/curation.py).
105122

106123
### Online visualization
107124

code/python/PORTING_PLAN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ saved and validated entirely from Python.
1717
| 4. Tier 2 — biomass + conditions in Python | **done (core)** | Biomass subsystem moved upstream as `raven_python.biomass` (`BiomassConfig` + `BiomassComponent` + `sum_biomass` / `scale_biomass` / `rescale_pseudoreaction` / `set_gam`; 19 new tests on synthetic models). yeast-GEM ids.yml gained a `biomass_components` section; `yeastgem.biomass` exposes `sum_biomass`, `scale_biomass`, `rescale_pseudoreaction` (with the yeast `lipid` → backbone+chain aggregation), `set_gam` (auto-locates the NGAM reaction by name), and `change_amino_acid_ratio` (reads `data/physiology/aminoAcid_Bjorkeroth2020.tsv`). `yeastgem.conditions.apply` now handles `amino_acid_ratio` before delegating to upstream; `yeastgem.io.commit_yeast_model` runs the anaerobic growth check on a copy. **Verified** end-to-end on the real model: Python `conditions.apply('anaerobic')` produces SBML semantically equal to MATLAB `applyYeastCondition('anaerobic')`; Python `commit_yeast_model` (with anaerobic check active) produces SBML semantically equal to MATLAB `commitYeastModel`. 54 yeast-GEM tests + 38 new raven-python tests passing. **Deferred:** chemostat sweep + `fit_gam` (analysis/calibration, not part of the commit pipeline; tracked in UPSTREAM_CANDIDATES.md). |
1818
| 5. Tier 3 — test suite | **done** | Ported the four ``code/modelTests/`` routines to ``yeastgem.model_tests``: ``growth`` (Tobias 2013 chemostat R² across 4 conditions), ``essential_genes`` (cobrapy ``single_gene_deletion`` + Stanford KO collection, returns ``EssentialGeneResult`` dataclass with accuracy / sensitivity / specificity / MCC), ``anaerobic_flux_predictions`` (Jouhten 2008 + Frick & Wittmann flux R² + mean relative error), ``plot_anaerobic`` (fermentation-product bar plot), ``find_duplicated_rxns`` (wrapper over the new ``raven_python.manipulation.find_duplicate_reactions``). Stanford ORF lists extracted from ``essentialGenes.m`` to ``data/essentialGenes/{inviable,verified}_orfs.txt`` so both languages read the same source. 7 new yeast-GEM tests + 6 new raven-python tests; full Python suite 61/61 passing. Verified vs MATLAB on the real model (`runPhase5Metrics.m`): growth R² matches at 1e-7; anaerobic flux R² and essential-gene accuracy/MCC match within 5e-3; single 1-gene difference in the essential-gene confusion matrix is a Gurobi/HiGHS solver-tolerance borderline at the 1e-6 ratio threshold. |
1919
| 6. Tier 4 — curation framework | **done** | Generic `curateModelFromTables` engine moved to RAVEN (with `metPrefix` / `rxnPrefix` parameters defaulted to BiGG `M_`/`R_`); equivalent `raven_python.curation.{batch_curate, batch_curate_from_tsv}` in raven-python with the same schema (DataFrames + a `from_tsv` convenience). yeast-GEM keeps the user-facing `curateMetsRxnsGenes` MATLAB function as a 50-line shim that pins yeast's `s_`/`r_` prefixes and forwards upstream; the historical v8_*/v9_* curation scripts and `TEMPLATEcuration` keep working without change. New `yeastgem.curation.curate_mets_rxns_genes` Python entry point with the same prefix pinning. "Everything after the listed core columns is MIRIAM" — yeast-GEM's existing TSVs (12+10+9 MIRIAM columns) work unchanged. 13 new raven-python tests + 4 new yeast-GEM tests; full Python suite 65/65 passing. **MATLAB shim verified** to forward correctly (no-op call leaves the model unchanged). Direct MATLAB-vs-Python end-to-end parity check is blocked by pre-existing flakiness in the legacy `curateMetsRxnsGenes` (errors on the v8_6_3 VolPolyP schema and the v8_7_0 DBnewRxns pack); the Python implementation is more permissive than the legacy MATLAB on these edge cases. |
20-
| 7. Docs + CI | partial | Python CI workflow added; README "not yet functional" note still to update. |
20+
| 7. Docs + CI | **done** | Top-level README updated: "Contribution via Python is supported" section explaining the `yeastgem` + `raven-python` split + `saveYeastModel` → `commitYeastModel` rename. `code/python/README.md` rewritten with a getting-started block and an API map across the seven modules. CI workflow has three required jobs: `test` (matrix Python 3.10/3.11/3.12 + ruff + pytest), `parity-level-1-round-trip` (Python SBML read+write must round-trip the committed model semantically equal — `tests/ci/check_round_trip.py`), and `parity-level-2-metrics` (Python validation metrics must match the committed MATLAB reference within tolerance — `tests/ci/check_metrics.py` against `tests/reference/metrics.json`). Reference tolerances absorb the known Gurobi-vs-HiGHS solver drift (1 gene on the essential-gene confusion matrix, ≤ 5e-3 on R² metrics). Both parity scripts pass locally. **Prerequisite for CI to pass**: `raven-python`'s `feat/yeast-gem-shared` branch must be pushed to GitHub so the `pip install` URL dep resolves. |
2121

2222
## Design principles
2323

code/python/README.md

Lines changed: 76 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,102 @@
1-
# yeastgem — Python port of the yeast-GEM functions
1+
# yeastgem — Python interface to yeast-GEM
22

3-
This directory hosts the in-development Python interface to yeast-GEM. It
4-
is the Python counterpart to the MATLAB code under [../](..).
3+
Python counterpart to the MATLAB code under [../](..). Builds on
4+
[cobrapy](https://github.com/opencobra/cobrapy) and
5+
[raven-python](https://github.com/SysBioChalmers/raven-python) — the
6+
latter hosts the generic GEM utilities (model diffing, SBO term
7+
assignment, condition / biomass / curation engines) that `yeastgem`
8+
configures with the yeast-specific data files under
9+
[../../data/](../../data/).
510

6-
## Status
7-
8-
Early scaffolding. See [PORTING_PLAN.md](PORTING_PLAN.md) for the full
9-
plan and [UPSTREAM_CANDIDATES.md](UPSTREAM_CANDIDATES.md) for the
11+
See [PORTING_PLAN.md](PORTING_PLAN.md) for the porting history and
12+
[UPSTREAM_CANDIDATES.md](UPSTREAM_CANDIDATES.md) for the
1013
function-level upstream tracking.
1114

1215
## Install (development)
1316

14-
From a yeast-GEM checkout:
15-
1617
```bash
1718
pip install -e code/python/[dev]
1819
```
1920

20-
This installs the `yeastgem` package (cobrapy-based; no ravengem
21-
dependency by design).
21+
`raven-python` is pinned via a `git+` URL in
22+
[pyproject.toml](pyproject.toml) to the
23+
`feat/yeast-gem-shared` branch; once that branch is on a release tag
24+
the pin will switch to a version constraint.
2225

2326
## Quick start
2427

2528
```python
26-
from yeastgem import read_yeast_model
29+
from yeastgem import read_yeast_model, commit_yeast_model
2730

28-
model = read_yeast_model() # cobra.Model
29-
print(model.optimize().objective_value)
31+
model = read_yeast_model()
32+
print(model.optimize().objective_value) # → ~0.088 / h on the default media
33+
34+
# Make some changes …
35+
commit_yeast_model(model) # full release pipeline
3036
```
3137

32-
`yeastgem` auto-detects the repo root via the package location, the
33-
`YEAST_GEM_PATH` environment variable, or a `.env` file at the repo
34-
root (historical convention) — in that order.
38+
`yeastgem` auto-locates the repo root via the package install path,
39+
the `YEAST_GEM_PATH` environment variable, or a legacy `.env` file —
40+
in that order. No additional setup needed for the common case.
41+
42+
## API map
43+
44+
| Area | Module | Highlights |
45+
|---|---|---|
46+
| **I/O** | [`yeastgem.io`](yeastgem/io.py) | `read_yeast_model`, `commit_yeast_model` (release pipeline: canonical state → SBML validity → aerobic + anaerobic growth → write `.xml` + ΔG CSVs → update README). `write_yeast_model` is a deprecated forwarding shim. |
47+
| **Comparison** | [`yeastgem.compare`](yeastgem/compare.py) | `compare_models` / `ComparisonReport` re-exported from `raven_python.comparison.diff_models`. Use for cross-toolchain semantic-equality checks. |
48+
| **Conditions** | [`yeastgem.conditions`](yeastgem/conditions.py) | `apply(model, name)` — minimal_Y6, anaerobic, glycine_nitrogen, nitrogen_limitation. Files under [`data/conditions/`](../../data/conditions/). |
49+
| **Biomass** | [`yeastgem.biomass`](yeastgem/biomass.py) | `sum_biomass`, `scale_biomass`, `rescale_pseudoreaction`, `set_gam`, `change_amino_acid_ratio`. Configured from [`data/yeastgem/ids.yml`](../../data/yeastgem/ids.yml). |
50+
| **Annotations** | [`yeastgem.missing_fields`](yeastgem/missing_fields.py) | `add_sbo_terms`, `load_delta_g`, `save_delta_g`. |
51+
| **Model tests** | [`yeastgem.model_tests`](yeastgem/model_tests/) | `growth` (Tobias 2013 chemostat R²), `essential_genes` (Stanford KO collection), `anaerobic_flux_predictions`, `plot_anaerobic`, `find_duplicated_rxns`. |
52+
| **Curation** | [`yeastgem.curation`](yeastgem/curation.py) | `curate_mets_rxns_genes` / `..._from_tsv` — batch curation from data tables with the yeast `s_`/`r_` id prefixes. |
3553

3654
## Layout
3755

3856
```
3957
code/python/
40-
yeastgem/ # the package
41-
io.py # read/write the model (commit_yeast_model lands later)
42-
tests/ # pytest unit tests
58+
yeastgem/ # the package
59+
io.py # read_yeast_model + commit_yeast_model
60+
compare.py # backwards-compat shim → raven_python.comparison
61+
config.py # YeastIDs loader (data/yeastgem/ids.yml)
62+
conditions.py # apply(model, name)
63+
biomass.py # sum_biomass / scale_biomass / set_gam / AA-ratio
64+
missing_fields.py # add_sbo_terms, ΔG CSV persistence
65+
curation.py # batch curation wrapper
66+
model_tests/ # Tier-3 benchmarks (growth, essential genes, …)
67+
tests/ # pytest suite (65 tests across the package)
68+
reference/ # MATLAB-produced verification artefacts +
69+
# the runPhase*.m drivers
4370
pyproject.toml
4471
PORTING_PLAN.md
4572
UPSTREAM_CANDIDATES.md
4673
```
74+
75+
## Running the tests
76+
77+
```bash
78+
cd code/python
79+
pytest -q
80+
```
81+
82+
Tests load the real model once per session (~2 min) and exercise every
83+
public function on it. ruff is the linter:
84+
85+
```bash
86+
ruff check code/python
87+
```
88+
89+
The CI workflow under
90+
[`.github/workflows/python.yml`](../../.github/workflows/python.yml)
91+
runs the same checks across Python 3.10 / 3.11 / 3.12, plus two
92+
cross-language parity gates (level-1 SBML round-trip vs the committed
93+
model, level-2 metric parity vs the committed reference values).
94+
95+
## Where work happens
96+
97+
Code under [`yeastgem/`](yeastgem/) is *only* yeast-specific
98+
configuration and orchestration. Anything generic — model diff,
99+
condition application, biomass scaling, curation, annotation — lives
100+
in [raven-python](https://github.com/SysBioChalmers/raven-python).
101+
Functions tracked for future upstreaming are in
102+
[UPSTREAM_CANDIDATES.md](UPSTREAM_CANDIDATES.md).

code/python/tests/ci/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)