Last Updated: 2026-05-15
Two parallel workstreams are active:
- DBTL campaign for M. extorquens AM1 ΔmxaF media optimisation — rounds 1 and 2 of the v10 MaxPro+OptBlock design executed; round 3 (v16) planned. See §DBTL Campaign Status below.
- MP Medium Ingredient Properties dataset with citation tracking — see §Ingredient Properties Dataset below.
| Round | Date | Design | Growth assay | Nd assay | Status |
|---|---|---|---|---|---|
| 1 | Feb–Mar 2026 | v10 MaxPro+OptBlock (69 conditions, 4 plates) | OD600 @ 600 nm, 3 timepoints | Arsenazo III @ 660 nm | analysed |
| 2 | May 2026 | v10 (repeat with minor adjustments) | Biolog PM08, 740/590 nm, 144 timepoints | Arsenazo III @ 660 nm (15 µM Nd dose) | analysed |
| 3 | planned | v16 (proposal pending) | TBD | LanM-fluorescence (proposed) | planning |
- Adapter + pipeline:
scripts/build_round2_replicate_statistics.pyingests Biolog + arsenazo data into the round-1 analysis schema. Recipes:just analyze-experimental-round2[-nd|-redox]. - Joint OD600 × Nd Pareto: 8 winners (MPOB_008, _019, _020, _022, _024,
_035, _058, _066). See
outputs/round2_3way_pareto/. - Cross-cluster join: only MPOB_008 is a majority double-winner
(top-growth ∩ top-Nd-uptake clusters). See
outputs/round2_double_winners/. - Round-1 vs round-2 reproducibility: Spearman ρ ≈ 0 — measurement
modality drift (600 nm → 740 nm + raw abs660 → calibrated µM), not
biology drift. See
outputs/round1_vs_round2/REPRODUCIBILITY_REPORT.md. - Bayesian optimisation seeds for v16: 10 BO suggestions; top predicted
OD600 = 0.268 vs round-2 best 0.265 (round-2 close to practical optimum
of the 6-factor design). Phosphate is dominant (Sobol ST = 0.65). See
outputs/round2_recommendations/v16_bo_seeds.md.
- Precipitation-risk analysis (
outputs/round2_precipitation_risk/): Q/Ksp model predicted 7 of 8 Pareto winners as HIGH NdPO₄-precipitation risk.⚠️ Model REFUTED by empirical abiotic data (see next bullet). - Abiotic-correction diagnostic (
outputs/round2_abiotic_correction/): An updated round-2 file with pairedabs660_abiotic_t{1,2}was added on 2026-05-15. The observed abiotic drift does not match the Q/Ksp ranking (r = +0.23, wrong sign). 54 of 62 model-HIGH-risk conditions show stable abiotic. Conversely, MPOB_058 (model-predicted MEDIUM) shows the strongest abiotic chemistry signal. Either precipitation completed before t1 (so the t1→t2 drift can't see it) or the model is mis-calibrated. - Uncertainty-aware MC Pareto (
outputs/round2_mc_pareto/): Of the 8 deterministic Pareto winners, only 2 (MPOB_008, MPOB_058) are stable under MC perturbation (freq ≥ 0.8) of replicate σ. (This finding is independent of the chemistry interpretation and still holds.)
Current best-defended interpretation: the abiotic data does not let us
declare any condition unambiguously biology or chemistry without cell-pellet
ICP-MS. The original "8 Pareto winners × 2 reps" anchor allocation in v16
remains the most defensible plan. The earlier (now-superseded) reallocation
that nominated MPOB_058 as a 4-rep biology anchor was based on the unrefuted
model and has been reverted in outputs/round2_recommendations/v16_design_recommendation.md.
- Paired biology signal at t2 (
outputs/round2_t2_paired_biology/): with the t1+t2 abiotic data we can compute (biotic - abiotic) at t2 directly — the cleanest chemistry-vs-biology analysis the round-2 data supports. MPOB_008 is the strongest convergent biology candidate (MC-stable + chem-quiet borderline + clear t2 biology signal). MPOB_058 has real biology mixed with real chemistry. MPOB_022 and MPOB_019 have clean biology + clean chemistry but are MC-fragile. The other 4 winners show no biology signal at t2 — their t3 depletion happened after our paired-control window and needs t3 ICP-MS to interpret.
Final v16 anchor allocation (this is the converged plan; see v16 doc for the per-condition reps): MPOB_008 = 4 reps (biology anchor), MPOB_058/_022/_019 = 2 reps each (biology candidates needing replication or paired ICP-MS), MPOB_024/_035/_020/_066 = 1 rep each (late-uptake outliers, flag for ICP-MS). Total 14 anchor wells across plates.
The round-2 analysis stack now defaults to t2 (6 h) instead of t3 (9 h)
as the endpoint timepoint. Reasoning: only t2 has a paired abiotic control
in the round-2 data, making it the only timepoint where chemistry-vs-biology
attribution is empirically possible. All analysis scripts accept
--endpoint-timepoint {t1,t2,t3}; default is t2; existing t3-format
fallback retained for backward compatibility.
At t2 the 3-way Pareto frontier collapses from 8 conditions to 3: MPOB_058, MPOB_008, MPOB_019. MC-stable: MPOB_058 only (freq 0.99). Five conditions previously on the t3 frontier (MPOB_022/_066/_020/_035/_024) fall off at t2 — their t3 winning status was driven by depletion that happened between t2 and t3, in a window without paired abiotic data.
Authoritative v16 anchor allocation (replaces the pre-t2 allocations above): MPOB_058 = 4 reps + ICP-MS, MPOB_008 = 3 reps, MPOB_019 = 2 reps, MPOB_022 = 1 rep, MPOB_024/_035/_020/_066 = 1 rep each (precipitation+late- uptake controls, flag for ICP-MS). Total 13 anchor wells.
- Factor-range proposal:
outputs/round2_recommendations/v16_design_recommendation.md(inherits v15 ranges, adds Nd³⁺ as a 7th factor, bumps methanol upper). - BO point candidates:
outputs/round2_recommendations/v16_bo_seeds.md(5 new condition recipes recommended as 2-rep anchor wells). - Assay alternatives report:
outputs/round3_recommendations/nd_assay_alternatives_report.mdnd_assay_alternatives_1pager.md— recommends lanmodulin (LanM) fluorescence as primary HT readout + cell-pellet ICP-MS on the anchor subset.
data/experimental/plate_designs_v10_maxprooptblock_long__round2_results/— Biolog 740 + 590 nm raw + collaborator rollup. SHA256s logged indata/checksums.txt.data/experimental/plate_designs_v10_maxprooptblock_long__round2_results_asezuran/— arsenazo III calibrated Nd predictions. SHA256s logged.
This project also manages the MP Medium Ingredient Properties dataset with comprehensive citation tracking and validation.
- Total Ingredients: 158 rows
- Total Columns: 68 (47 data + 21 organism context columns)
- DOI Citations: 158 unique DOIs
- Citation Coverage: 90.5% (143/158 DOIs with evidence)
- PDFs: 92 (58.2%)
- Abstracts: 44 (27.8%)
- Missing: 15 (9.5%)
✅ 7 invalid DOIs successfully corrected (14 instances in CSV)
- Improved coverage from 86.1% → 90.5% (+4.4%)
- See:
notes/DOI_CORRECTIONS_FINAL_UPDATED.mdfor complete details
- CSV:
data/raw/mp_medium_ingredient_properties.csv(68 columns) - Schema:
src/microgrowagents/schema/mp_medium_schema.yaml(LinkML)
- Final Report:
notes/DOI_CORRECTIONS_FINAL_UPDATED.md⭐ MOST IMPORTANT - Corrections Applied:
- Batch 1:
data/results/doi_corrections_applied.json(4 DOIs → 10 cells) - Batch 2:
data/results/additional_corrections_applied.json(3 DOIs → 4 cells)
- Batch 1:
- Correction Definitions:
data/corrections/doi_corrections_17_invalid.yamldata/corrections/additional_corrections_found.yaml
- Validation Results:
data/results/doi_validation_22.json(validation of 22 invalid DOIs)data/results/csv_all_dois_results.json(all CSV DOIs)
- All DOIs:
data/results/all_doi_links.txt(158 unique DOIs) - Missing Citations:
data/results/missing_citations_report.txt(77 missing) - Coverage Summary:
notes/CITATION_COVERAGE_SUMMARY.md
Located in scripts/ organized by function:
DOI Validation: scripts/doi_validation/
validate_failed_dois.py- Validate DOI HTTP resolutionvalidate_new_corrections.py- Validate correction candidatesfind_correct_dois.py- Research correct DOI alternatives
DOI Corrections: scripts/doi_corrections/
apply_doi_corrections.py- Apply validated correctionsapply_additional_corrections.py- Batch correctionsclean_invalid_dois.py- Remove invalid DOIs
PDF Downloads: scripts/pdf_downloads/
download_all_pdfs_automated.py- Automated PDF retrievalretry_failed_dois_with_fallbackpdf.py- Fallback PDF service
Schema: scripts/schema/
add_role_columns.py- Add organism/role columnsmigrate_schema.py- Schema migration utility
Enrichment: scripts/enrichment/
enrich_ingredient_effects.py- Enrich ingredient data
1 Pre-DOI Era Publication (should be removed/marked):
- Thiamin + Cu/Fe (PMID 9481873) - published 1997, no DOI exists
- File: Mark in CSV as "Not available"
5 Unable to Locate (may need institutional access):
- Thiamin autoclave stability (
10.1002/cbdv.201700122) - Cobalt upper bound toxicity (
10.1007/s00424-010-0920-y) - Iron hydrolysis (
10.1016/S0016-7037(14)00566-3) - Dysprosium EDTA chelation (
10.1016/S0304386X23001494) - Cobalamin light sensitivity (
10.1073/pnas.0804699108)
See notes/DOI_CORRECTIONS_FINAL_UPDATED.md for details.
21 organism context columns were added but are not yet populated:
- Pattern:
{Property} Citation Organism - Allowed values: scientific names, strain names, taxonomy, or "general"
- File:
data/raw/mp_medium_ingredient_properties.csv(columns 48-68)
77 missing citations identified across 18 ingredients:
- See:
data/results/missing_citations_report.txt
MicroGrowAgents/
├── docs/
│ └── STATUS.md # ← You are here
├── notes/ # Research & documentation
│ ├── DOI_CORRECTIONS_FINAL_UPDATED.md # ⭐ Most important
│ ├── CITATION_COVERAGE_SUMMARY.md
│ └── ... (25+ other notes)
├── data/
│ ├── raw/
│ │ └── mp_medium_ingredient_properties.csv
│ ├── corrections/ # DOI correction definitions (YAML/JSON)
│ └── results/ # Validation & processing logs
├── scripts/ # Organized by function
│ ├── doi_validation/
│ ├── doi_corrections/
│ ├── pdf_downloads/
│ ├── enrichment/
│ └── schema/
└── src/microgrowagents/schema/
└── mp_medium_schema.yaml # LinkML schema
- Remove/mark pre-DOI publication (1 DOI - PMID 9481873)
- Populate organism context columns (21 columns currently empty)
- Fill missing citations (77 missing DOI cells)
- Consider institutional access for 5 unable-to-locate DOIs
uv run python scripts/doi_validation/validate_failed_dois.py# Edit data/corrections/doi_corrections_17_invalid.yaml first
uv run python scripts/doi_corrections/apply_doi_corrections.pyuv run python scripts/pdf_downloads/download_all_pdfs_automated.py- DOI corrections:
data/results/doi_corrections_applied.json - Validation:
data/results/doi_validation_22.json - Full report:
notes/DOI_CORRECTIONS_FINAL_UPDATED.md
- Main CSV: 68 columns (47 data + 21 organism)
- LinkML Schema: Defined in
src/microgrowagents/schema/ - Citation Coverage: 90.5% (143/158 DOIs)
- Corrections Applied: 7 DOIs (14 CSV cells updated)
For detailed history, see files in notes/ directory.