Skip to content

Reconstruction: BLAST/DIAMOND wrappers and homology-based drafting#4

Merged
edkerk merged 9 commits into
developfrom
feature/reconstruction-homology
May 30, 2026
Merged

Reconstruction: BLAST/DIAMOND wrappers and homology-based drafting#4
edkerk merged 9 commits into
developfrom
feature/reconstruction-homology

Conversation

@edkerk

@edkerk edkerk commented May 29, 2026

Copy link
Copy Markdown
Member

Generic BLAST and DIAMOND subprocess wrappers (DataFrame hits, no parallel struct) and the homology-based draft model builder ported from getModelFromHomology — with the structured improvements over RAVEN documented in IMPROVEMENTS.md (H1–H6).

Base: feature/io.

@edkerk edkerk changed the base branch from feature/io to develop May 30, 2026 00:02
@edkerk edkerk marked this pull request as ready for review May 30, 2026 00:03
@edkerk edkerk merged commit 2c356c0 into develop May 30, 2026
4 checks passed
@edkerk edkerk deleted the feature/reconstruction-homology branch May 30, 2026 00:05
edkerk added a commit that referenced this pull request Jun 20, 2026
Ran six tests on yeast-GEM (4102 rxns), iJO1366 (2583 rxns), e_coli_core
(95 rxns), and the tINIT synthetic testModel. Two tests (#2 evalue, #5
threads) require BLAST/HMMER binaries not available in this environment.

Results:
- #1 replace_max_bound: Python False CORRECT; True causes unbounded solver
  on yeast-GEM (4083/4102 rxns at big-M → +inf opens unbounded objectives)
- #2 evalue: untestable; leave at 1e-5 (matches blastp default)
- #3 allow_excretion: effect is zero with default prod_weight=0.5;
  fix inconsistency between get_init_model (True) and run_init (False)
- #4 flux_eps: Python 1e-6 CORRECT; 1e-8 picks up 21 false-positive targets
  with flux std ~5e-7 (solver noise, not biology) on iJO1366
- #5 threads: untestable; mark for performance fix without benchmark
- #6 remove_genes blocked_reactions: Python 'remove' CORRECT; 'keep' gives
  wrong essentiality prediction for single-gene reactions on e_coli_core
- #7 time_limit (localization): yeast-GEM solves in ~2.6 min; leave None,
  document 900s cap for Human-GEM scale
- #8 mip_gap/time_limit (init): toy model too small to distinguish; leave
  None, document MATLAB's 0.0004/5s as guidance for genome-scale

Three items confirmed correct and closed (#1, #4, #6). Three items need
code changes (#3, #5, #7/#8 docstring only). Two items remain open (#2, #5).
edkerk added a commit that referenced this pull request Jul 1, 2026
* Add parameter defaults inventory and evaluation plan to maintenance docs

* Evaluate parameter defaults: fill MATLAB RAVEN parity column, flag issues

Compared every optional parameter against MATLAB RAVEN source and literature
references. Eight issues identified:
- High: replace_max_bound default mismatch (MATLAB True vs Python False)
- High: BLAST/Diamond evalue mismatch (MATLAB 1e-4 vs Python 1e-5)
- Medium: allow_excretion inconsistency (get_init_model vs run_init/run_ftinit)
- Medium: flux_eps in FSEOF looser than MATLAB implicit tolerance
- Medium: threads=1 throughout (MATLAB auto-detects cores)
- Medium: remove_genes blocked_reactions behaviour inverted vs MATLAB
- Low: predict_localization missing default time cap (MATLAB: 15 min)
- Low: mip_gap/time_limit not defaulting to MATLAB's tuned values

Status column updated from ? to ✓/⚠ for all evaluated parameters.

* Reframe evaluation methodology: empirical testing over MATLAB parity

MATLAB RAVEN defaults were never systematically validated. Change the
methodology to treat MATLAB values as a prior, not ground truth:

- Criterion 1 is now empirical testing on real models (iJO1366, Yeast9,
  Human-GEM), measuring result quality vs a benchmark
- Criterion 2 is sensitivity envelope analysis
- MATLAB cross-check is criterion 4 (corroborating evidence, not authority)
- Evaluation workflow now requires running candidate values and reporting
  precision/recall or sensitivity profiles, not just citing MATLAB

All "Proposed fix:" notes in the inventory replaced with "Open question:"
framing that points to the specific test needed to settle the question.

* Add empirical test results for all 8 parameter default questions

Ran six tests on yeast-GEM (4102 rxns), iJO1366 (2583 rxns), e_coli_core
(95 rxns), and the tINIT synthetic testModel. Two tests (#2 evalue, #5
threads) require BLAST/HMMER binaries not available in this environment.

Results:
- #1 replace_max_bound: Python False CORRECT; True causes unbounded solver
  on yeast-GEM (4083/4102 rxns at big-M → +inf opens unbounded objectives)
- #2 evalue: untestable; leave at 1e-5 (matches blastp default)
- #3 allow_excretion: effect is zero with default prod_weight=0.5;
  fix inconsistency between get_init_model (True) and run_init (False)
- #4 flux_eps: Python 1e-6 CORRECT; 1e-8 picks up 21 false-positive targets
  with flux std ~5e-7 (solver noise, not biology) on iJO1366
- #5 threads: untestable; mark for performance fix without benchmark
- #6 remove_genes blocked_reactions: Python 'remove' CORRECT; 'keep' gives
  wrong essentiality prediction for single-gene reactions on e_coli_core
- #7 time_limit (localization): yeast-GEM solves in ~2.6 min; leave None,
  document 900s cap for Human-GEM scale
- #8 mip_gap/time_limit (init): toy model too small to distinguish; leave
  None, document MATLAB's 0.0004/5s as guidance for genome-scale

Three items confirmed correct and closed (#1, #4, #6). Three items need
code changes (#3, #5, #7/#8 docstring only). Two items remain open (#2, #5).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant