This directory contains multiple versions of the MicroGrowAgents architecture diagrams, optimized for different audiences and purposes.
| Use Case | Recommended Diagram | Why |
|---|---|---|
| Conference presentation | architecture_abstract.png |
Large text (18-28pt), minimal boxes, excellent readability |
| Grant application | workflow_abstract.png |
Clear DBTL cycle, professional appearance |
| Paper methods section | architecture_simplified.svg |
Balances detail with accessibility, no jargon |
| Developer documentation | architecture_v2.svg |
Shows all 29 agents, 70+ skills, complete reference |
| Teaching materials | workflows_simplified.svg |
Accessible language, clear examples |
| Quick README overview | workflow_abstract.png |
Minimal text, maximum clarity |
Best for: High-level presentations, executive summaries, posters, slides
Files:
microgrow_agents_architecture_abstract.png- System architecturemicrogrow_agents_workflow_abstract.png- Workflow diagram
Characteristics:
- ✅ Minimal boxes: 5-6 major components (vs 30+ in other versions)
- ✅ Large fonts: 18-28pt for categories, 13-16pt for details
- ✅ Category-level grouping: DATA → AI AGENTS → DESIGN/TEST → LEARN → OUTPUT
- ✅ Clear DBTL cycle: Design → Build → Test → Learn with feedback loop
- ✅ Single-label boxes: "AI AGENTS (29 Specialized Agents)" instead of listing each one
- ✅ High readability: Works well in presentations, posters, conference talks
Generate with:
uv run python scripts/generate_architecture_diagrams_abstract.pyBest for: Broader scientific audience, documentation, teaching
Files:
microgrow_agents_architecture_simplified.svg- System layersmicrogrow_agents_workflows_simplified.svg- Example workflows
Characteristics:
- Accessible language (avoids jargon like "TF-IDF", "FAISS", "FBA")
- Layered architecture (Data Sources → Core Analysis → Specialists → Integration)
- ~20-30 boxes organized in layers
- Font size: 9-11pt for labels, 11-13pt for headers
- Shows data flow between layers
Use cases:
- Documentation
- Teaching materials
- Collaboration with biologists
- Internal presentations
Generate with:
uv run python scripts/generate_architecture_diagrams_simplified.pyBest for: Technical documentation, developer onboarding, system audit
Files:
microgrow_agents_architecture_v2.svg- Complete system architecturemicrogrow_agents_workflows_v2.svg- DBTL workflow paths
Characteristics:
- Comprehensive: All 29 agents, 70+ skills, 6-step analysis pipeline
- Color-coded: External data (dashed borders), Agents (bold borders), Skills, Tools
- Technical details: File names (
kg_reasoning_agent.py), script names, specific methods - Small fonts: 8-11pt (designed for large displays or printed posters)
- Multiple sections: Data Sources, Agents, Skills, Analysis Pipeline, Optimization Stack
- Legend: Visual guide to box types and categories
Use cases:
- Developer documentation
- Technical deep-dives
- System audit and compliance (bbop-skills)
- Complete reference guide
Generate with:
uv run python scripts/generate_architecture_diagrams_v2.pyFiles:
microgrow_agents_architecture.svg- Original architecturemicrogrow_agents_architecture_edit.svg- Manually edited versionmicrogrow_agents_workflows_edit.svg- Manually edited workflows
Status: Superseded by newer versions, kept for reference
| Feature | Abstract ⭐ | Simplified | Detailed (v2) |
|---|---|---|---|
| Number of boxes | 5-6 | 20-30 | 50+ |
| Font size (main) | 18-28pt | 11-13pt | 8-11pt |
| Technical jargon | None | Minimal | High |
| Best for | Presentations | Documentation | Developer reference |
| Target audience | General/Executive | Scientists | Developers/Engineers |
| File size (PNG) | ~470 KB | ~235 KB | ~582 KB |
| Readability | Excellent | Good | Requires zoom |
| Detail level | High-level categories | System layers | All components |
- Minimize cognitive load: Focus on 5-6 major categories
- Maximize font size: 18-28pt ensures readability in presentations
- Group by function: Clear DBTL workflow (Design → Build → Test → Learn)
- Visual hierarchy: Box size and color reflect importance
- Simple flow: Arrows show data flow and feedback loops
Key insight: Viewers should understand the system in 30 seconds
- Avoid jargon: "Database Query" instead of "DuckDB SQL Agent"
- Layered architecture: Show information flow through system layers
- Accessible terms: "Growth Simulation" instead of "FBA with GEMSembler"
- Clear examples: Four workflow paths with step-by-step descriptions
Key insight: Scientists should understand how the system works without learning technical details
- Complete coverage: Every agent, skill, tool, and script documented
- Technical precision: Exact file names and method names
- Color coding: Visual distinction between external data, agents, skills, tools
- Sectioned layout: Logical grouping (Knowledge Agents, Modeling Agents, etc.)
Key insight: Developers should be able to navigate the codebase using the diagram
| Color | Usage | Hex Code | Examples |
|---|---|---|---|
| Blue | Data Sources, Knowledge | #64B5F6 / #1565C0 |
KG-Microbe, Literature, Databases |
| Purple | AI Agents, Design | #BA68C8 / #6A1B9A |
Media Design, Specialist Agents |
| Green | Testing, Analysis | #81C784 / #2E7D32 |
Experimental Analysis, Clustering |
| Orange | Learning, Optimization | #FFB74D / #E65100 |
Bayesian Optimization, Sensitivity |
| Red | Outputs, Results | #E57373 / #C62828 |
Media Recipes, Reports, Designs |
| Gray | Supporting Services | #90A4AE / #37474F |
Validation, Evidence Extraction |
Recommended: Abstract workflow diagram
Why:
- Audience sees clear DBTL cycle
- Large fonts readable from back of room
- Focuses on "what" not "how"
- Professional, polished appearance
Slide structure:
- Problem slide → Need for AI-driven media design
- Abstract workflow diagram → Our solution (DBTL cycle)
- Results slide → Performance metrics
Recommended: Simplified architecture diagram
Why:
- Shows system structure without overwhelming detail
- Uses scientific language (not developer jargon)
- Reviewers can understand agent interactions
- Fits in 1-column or 2-column layout
Caption example:
"Figure 2: MicroGrowAgents architecture. The system integrates multiple data sources (blue) to train 29 specialized AI agents (purple) that design growth media (pink), analyze experimental results (green), and optimize designs through Bayesian optimization (orange)."
Recommended: Detailed v2 architecture diagram
Why:
- Shows exact file locations (
agents/kg_reasoning_agent.py) - New developers can navigate codebase
- See relationships between agents, skills, tools
- Understand data flow through pipeline
Documentation structure:
- README overview → Abstract workflow
- Architecture deep-dive → Detailed v2 diagram
- API reference → Individual agent docs
Recommended: Abstract architecture + Abstract workflow
Why:
- Reviewers (often non-experts) need quick understanding
- Emphasize DBTL cycle (familiar to reviewers)
- Show AI integration clearly
- Professional appearance increases credibility
Figures:
- Figure 1: Abstract architecture (system overview)
- Figure 2: Abstract workflow (DBTL cycle)
- Figure 3: Results (performance comparison)
docs/architecture/
├── README.md # This file
├── microgrow_agents_architecture_abstract.png # ⭐ NEW - Presentations
├── microgrow_agents_architecture_abstract.svg
├── microgrow_agents_workflow_abstract.png # ⭐ NEW - Workflow overview
├── microgrow_agents_workflow_abstract.svg
├── microgrow_agents_architecture_simplified.png
├── microgrow_agents_architecture_simplified.svg
├── microgrow_agents_workflows_simplified.png
├── microgrow_agents_workflows_simplified.svg
├── microgrow_agents_architecture_v2.png # Technical reference
├── microgrow_agents_architecture_v2.svg
├── microgrow_agents_workflows_v2.png
└── microgrow_agents_workflows_v2.svg
File sizes:
- Abstract PNG: ~470 KB (high resolution, 300 DPI)
- Simplified PNG: ~235 KB
- Detailed v2 PNG: ~582 KB
Formats:
- PNG: Presentation-ready, high resolution (300 DPI), for slides/posters
- SVG: Vector format, infinitely scalable, editable in Inkscape/Illustrator
All diagrams can be regenerated from Python scripts:
# Abstract version (NEW - large fonts, minimal boxes)
uv run python scripts/generate_architecture_diagrams_abstract.py
# Simplified version (accessible language)
uv run python scripts/generate_architecture_diagrams_simplified.py
# Detailed v2 version (complete technical reference)
uv run python scripts/generate_architecture_diagrams_v2.pyOutput location: docs/architecture/
Dependencies: matplotlib, numpy (installed via uv)
-
DATA SOURCES
- Knowledge Graphs (KG-Microbe: 40K+ organisms, 1800+ media)
- Literature (PubMed, DOIs, PDFs)
- Genomes (NCBI, Bakta annotations)
- Experimental Data (Plate reader CSV files)
-
AI AGENTS (29 Specialized)
- Knowledge & Database Agents: Query KG-Microbe, literature, SQL databases
- Genomics & Modeling Agents: GEMSembler (FBA), GAPMind (pathway gaps), genome annotation
- Chemistry & Analysis Agents: Osmolarity, ionic strength, sensitivity analysis
- Specialist Agents: Lanthanide genes, codon bias, analogy reasoning
- Media Design Agents: Formulation, concentration prediction, ingredient substitution
- Orchestration Agents: Evidence extraction, design recommendations
-
DESIGN
- Media formulation (ingredient selection)
- Concentration prediction (with confidence intervals)
- Experimental design (DoE: MaxPro OptBlock)
-
TEST
- Experimental analysis (plate reader data)
- Statistical analysis (replicates, outliers)
- Visualization (growth curves, heatmaps, PCA)
- Clustering (hierarchical, Ward linkage)
-
LEARN
- Response surface modeling (Gaussian Processes)
- Bayesian optimization (Expected Improvement)
- Sensitivity analysis (Sobol indices)
- Pareto frontier analysis (multi-objective)
- Next design recommendations (v14 YAML)
-
OUTPUTS
- Media recipes (JSON/TSV/Markdown with citations)
- Plate designs (96/384-well layouts, Hamilton protocols)
- Analysis reports (interpretation, evidence, recommendations)
- Optimization reports (Sobol, Pareto, boundary effects)
DESIGN → BUILD → TEST → LEARN
↑ ↓
└────── ITERATE ←────────┘
Feedback loop: LEARN phase generates recommendations for next DESIGN iteration
- Each agent has a specific, well-defined responsibility
- Agents can be composed in different workflows
- Easy to add new agents without changing existing ones
- All recommendations include citations (DOI links)
- Multi-tier confidence scoring (HIGH/MEDIUM/LOW)
- Transparent reasoning paths
- 90.5% citation coverage for MP medium ingredients
- KG-Microbe integrates multiple data sources
- External tools (Bakta, GAPMind, GEMSembler) seamlessly integrated
- Hierarchical search across ingredient ontologies
- Graph embeddings for similarity search
- LinkML schema validation for all outputs
- Constraint checking (osmolarity, pH, element balance)
- FBA simulation for growth predictions
- Multi-objective optimization with Pareto frontiers
- All outputs include provenance tracking
- Deterministic DoE generation with seeds
- Version-controlled schemas and workflows
- SHA256 checksums for input data
- Language: Python 3.10+
- Dependency Management: uv
- Schemas: LinkML (with validation)
- Database: DuckDB (KG-Microbe queries)
- CLI: Typer
- Bakta: Genome annotation
- GAPMind: Pathway gap prediction
- GEMSembler: Metabolic modeling (wraps CarveMe + COBRApy)
- BLAST: Sequence alignment
- DoE: MaxPro + OptBlock algorithms (R via rpy2)
- Optimization: Gaussian Processes (scikit-learn), Bayesian Optimization
- Sensitivity: Sobol analysis (SALib)
- Clustering: Hierarchical clustering (scipy, Ward linkage)
- Visualization: matplotlib, seaborn
- Graph Embeddings: DeepWalk SkipGram (512 dimensions)
- Similarity Search: FAISS (optional, falls back to NumPy)
- Embedding Index: 208,811 chemicals with graph-based representations
Used when output of one agent is input to next:
SQL Agent → GenomeFunction → GAPMind → MediaFormulation
Multiple agents contribute data to orchestrator:
Cooccurrence ─────┐
MetabolicSource ──├→ GenMediaConc → Output
AnalogyReasoning ─┘
Agent revisits earlier agents based on results:
GenMediaConc → Validation → [fail] → Cooccurrence → GenMediaConc
High-level agent coordinates multiple sub-agents:
MediaFormulationAgent:
├─ SQLAgent (find organism)
├─ GenomeFunctionAgent (analyze genome)
├─ GAPMindAgent (find gaps)
├─ MediaRoleAgent (classify ingredients)
├─ GenMediaConcAgent (predict concentrations)
├─ ChemistryAgent (validate)
└─ LiteratureAgent (gather evidence)
- Modeling: predict-growth, analyze-gaps, reconstruct-model, compare-gap-fba
- Genome: analyze-genome, analyze-lanthanide-genes, analyze-transporters
- Chemistry: calculate-chemistry, analyze-electron-balance, predict-concentration
- Knowledge: query-knowledge-graph, query-database, search-literature
- Design: recommend-media, find-alternates, optimize-growth-conditions
- DoE: design-maxpro-optblock
- Validation: validate-media, classify-role, validate-formulation-comprehensive
- Analysis: analyze-sensitivity, analyze-cooccurrence, recommend-next-design
- Experimental: interpret-experimental-results, reconcile-growth-predictions
- recommend-media-comprehensive: Full organism → medium pipeline
- optimize-medium: Iterative improvement of existing media
- ingredient-report: Evidence extraction for specific ingredients
# Using comprehensive workflow
uv run MicroGrowAgents recommend-media-comprehensive \
--organism "Methylorubrum extorquens AM1" \
--output medium_recipe.json# MaxPro OptBlock DoE
uv run MicroGrowAgents design-maxpro-optblock \
--factors factors.yaml \
--plates 4 \
--output plate_design.csv# Dual analysis (absolute + relative)
just analyze-experimental data/experimental/plate_designs_v10_maxprooptblock_long__results# Sobol sensitivity + Bayesian optimization + Pareto
python scripts/generate_optimization_report.py \
outputs/experimental_analysis_absolute/ \
--mode absolute# v14 recommendations from v13 results
python scripts/recommend_v14_design.py \
outputs/optimization_report_absolute/ \
--output data/designs/v14_recommendations.yamlUser Query: "Design medium for Methylorubrum extorquens AM1"
↓
KGReasoningAgent → Query KG-Microbe for organism data
↓
GEMSemblerAgent + GAPMindAgent → FBA growth prediction + pathway gaps
↓
MediaRoleAgent → Classify essential roles (C/N/P/S/cofactors)
↓
GenMediaConcAgent → Predict ingredient concentrations
↓
ChemistryAgent → Validate osmolarity, pH, precipitation
↓
LiteratureAgent → Gather citation evidence
↓
Output: Complete medium recipe (JSON) with evidence and confidence scores
Input: Plate reader CSV (raw OD600 time series)
↓
analyze_plate_replicates.py → Replicate statistics (mean/std/CV/SEM)
↓
cluster_heatmap_replicates.py → Hierarchical clustering (Ward linkage)
↓
visualize_plate_data.py → Growth curves, PCA, heatmaps
↓
analyze_response_surfaces.py → Gaussian Process fitting
↓
interpret_experimental_results.py → Biological interpretation
↓
Output: Analysis reports (TSV, PDF plots, interpretation report)
Input: GP models + experimental data from TEST phase
↓
Pareto frontier on GP surface → 10K-point grid, non-dominated sort
↓
Sobol sensitivity analysis → SALib S1 + ST indices, ingredient ranking
↓
Bayesian optimization → Expected Improvement, top 20 next experiments
↓
Boundary effect detection → +/-5% threshold, EXPAND triggers
↓
recommend_v14_design.py → Factor adjustments (YAML)
↓
Output: v14 design recommendations (LinkML validated)
Input: Ingredient CSV with DOI citations (158 ingredients)
↓
DOIMappingService → DOI to PDF path, abstract fallback
↓
OrganismExtractor → 21 organism context columns, NLP extraction
↓
EvidenceSnippetExtractor → 25 property columns, text snippets
↓
CSV Update → Incremental saves, dry-run mode, resume support
↓
Output: Enriched CSV (90.5% DOI coverage, evidence snippets)
Tools:
- Inkscape (free, open-source) - Download
- Adobe Illustrator (commercial)
- Any vector graphics editor
Advantages:
- Infinitely scalable (no quality loss)
- Edit text, colors, shapes
- Export to PNG at any resolution
Advantages:
- Version controlled
- Consistent styling
- Easy to update all diagrams at once
Workflow:
- Modify Python script in
scripts/generate_architecture_diagrams_*.py - Run script:
uv run python scripts/generate_architecture_diagrams_abstract.py - Review output in
docs/architecture/ - Commit both script changes and generated diagrams
- 2026-02-24: Added abstract version with large fonts (18-28pt) and minimal boxes (5-6)
- 2026-02-20: Created detailed v2 version with all 29 agents and 70+ skills
- 2026-02-09: Created simplified version with accessible language
- 2026-02-07: Original architecture diagrams
- Interactive web version with clickable agents (D3.js, Cytoscape.js)
- Animated workflow showing data flow
- Agent dependency graph (directed acyclic graph)
- Performance profiling overlay (execution time per agent)
- Growth Prediction: Integrate GrowthCodon for codon usage bias
- Transporter Matching: Match media ingredients to organism uptake systems
- Multi-Objective Optimization: Balance cost, complexity, performance
- Active Learning: Update predictions from experimental results
- Federated Agents: Distribute computation across multiple nodes
- Project Overview: README.md
- Development Guidelines: CLAUDE.md
- Skills Documentation: .claude/skills/
- Agents Documentation: docs/AGENTS_SKILLS_TOOLS.md
- Optimization Guide: docs/OPTIMIZATION_GUIDE.md
- Audit Report: docs/AUDIT_REPORT_BBOP_SKILLS.md
Last updated: 2026-02-24 Diagram versions: Abstract v1, Simplified v1, Detailed v2
