Reconstruction: BLAST/DIAMOND wrappers and homology-based drafting by edkerk · Pull Request #4 · SysBioChalmers/raven-toolbox

edkerk · 2026-05-29T22:08:24Z

Generic BLAST and DIAMOND subprocess wrappers (DataFrame hits, no parallel struct) and the homology-based draft model builder ported from getModelFromHomology — with the structured improvements over RAVEN documented in IMPROVEMENTS.md (H1–H6).

Base: feature/io.

Ran six tests on yeast-GEM (4102 rxns), iJO1366 (2583 rxns), e_coli_core (95 rxns), and the tINIT synthetic testModel. Two tests (#2 evalue, #5 threads) require BLAST/HMMER binaries not available in this environment. Results: - #1 replace_max_bound: Python False CORRECT; True causes unbounded solver on yeast-GEM (4083/4102 rxns at big-M → +inf opens unbounded objectives) - #2 evalue: untestable; leave at 1e-5 (matches blastp default) - #3 allow_excretion: effect is zero with default prod_weight=0.5; fix inconsistency between get_init_model (True) and run_init (False) - #4 flux_eps: Python 1e-6 CORRECT; 1e-8 picks up 21 false-positive targets with flux std ~5e-7 (solver noise, not biology) on iJO1366 - #5 threads: untestable; mark for performance fix without benchmark - #6 remove_genes blocked_reactions: Python 'remove' CORRECT; 'keep' gives wrong essentiality prediction for single-gene reactions on e_coli_core - #7 time_limit (localization): yeast-GEM solves in ~2.6 min; leave None, document 900s cap for Human-GEM scale - #8 mip_gap/time_limit (init): toy model too small to distinguish; leave None, document MATLAB's 0.0004/5s as guidance for genome-scale Three items confirmed correct and closed (#1, #4, #6). Three items need code changes (#3, #5, #7/#8 docstring only). Two items remain open (#2, #5).

* Add parameter defaults inventory and evaluation plan to maintenance docs * Evaluate parameter defaults: fill MATLAB RAVEN parity column, flag issues Compared every optional parameter against MATLAB RAVEN source and literature references. Eight issues identified: - High: replace_max_bound default mismatch (MATLAB True vs Python False) - High: BLAST/Diamond evalue mismatch (MATLAB 1e-4 vs Python 1e-5) - Medium: allow_excretion inconsistency (get_init_model vs run_init/run_ftinit) - Medium: flux_eps in FSEOF looser than MATLAB implicit tolerance - Medium: threads=1 throughout (MATLAB auto-detects cores) - Medium: remove_genes blocked_reactions behaviour inverted vs MATLAB - Low: predict_localization missing default time cap (MATLAB: 15 min) - Low: mip_gap/time_limit not defaulting to MATLAB's tuned values Status column updated from ? to ✓/⚠ for all evaluated parameters. * Reframe evaluation methodology: empirical testing over MATLAB parity MATLAB RAVEN defaults were never systematically validated. Change the methodology to treat MATLAB values as a prior, not ground truth: - Criterion 1 is now empirical testing on real models (iJO1366, Yeast9, Human-GEM), measuring result quality vs a benchmark - Criterion 2 is sensitivity envelope analysis - MATLAB cross-check is criterion 4 (corroborating evidence, not authority) - Evaluation workflow now requires running candidate values and reporting precision/recall or sensitivity profiles, not just citing MATLAB All "Proposed fix:" notes in the inventory replaced with "Open question:" framing that points to the specific test needed to settle the question. * Add empirical test results for all 8 parameter default questions Ran six tests on yeast-GEM (4102 rxns), iJO1366 (2583 rxns), e_coli_core (95 rxns), and the tINIT synthetic testModel. Two tests (#2 evalue, #5 threads) require BLAST/HMMER binaries not available in this environment. Results: - #1 replace_max_bound: Python False CORRECT; True causes unbounded solver on yeast-GEM (4083/4102 rxns at big-M → +inf opens unbounded objectives) - #2 evalue: untestable; leave at 1e-5 (matches blastp default) - #3 allow_excretion: effect is zero with default prod_weight=0.5; fix inconsistency between get_init_model (True) and run_init (False) - #4 flux_eps: Python 1e-6 CORRECT; 1e-8 picks up 21 false-positive targets with flux std ~5e-7 (solver noise, not biology) on iJO1366 - #5 threads: untestable; mark for performance fix without benchmark - #6 remove_genes blocked_reactions: Python 'remove' CORRECT; 'keep' gives wrong essentiality prediction for single-gene reactions on e_coli_core - #7 time_limit (localization): yeast-GEM solves in ~2.6 min; leave None, document 900s cap for Human-GEM scale - #8 mip_gap/time_limit (init): toy model too small to distinguish; leave None, document MATLAB's 0.0004/5s as guidance for genome-scale Three items confirmed correct and closed (#1, #4, #6). Three items need code changes (#3, #5, #7/#8 docstring only). Two items remain open (#2, #5).

edkerk added 9 commits May 29, 2026 22:45

Project scaffold: pyproject + package skeleton + README + LICENSE

0a9f4a1

Add GitHub Actions CI and the maintainer-scripts README

b7b69ac

Add the foundation utilities: GPR, balance, parse, sort, validate

50bea40

Add the model-manipulation layer (add, remove, transport, merge, etc.)

4bc0d6e

Add binary + data resolvers for external tools and published artefacts

a1dc557

Add YAML and SIF model I/O

6ef3357

Add Excel export and the Standard-GEM git-layout export

7a9b69a

Add BLAST and DIAMOND wrappers for protein-homology searches

cf199dc

Add the homology-based draft model builder (getModelFromHomology port)

eccce57

edkerk changed the base branch from feature/io to develop May 30, 2026 00:02

edkerk marked this pull request as ready for review May 30, 2026 00:03

edkerk merged commit 2c356c0 into develop May 30, 2026
4 checks passed

edkerk deleted the feature/reconstruction-homology branch May 30, 2026 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reconstruction: BLAST/DIAMOND wrappers and homology-based drafting#4

Reconstruction: BLAST/DIAMOND wrappers and homology-based drafting#4
edkerk merged 9 commits into
developfrom
feature/reconstruction-homology

edkerk commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

edkerk commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant