Skip to content

Latest commit

 

History

History
633 lines (497 loc) · 21 KB

File metadata and controls

633 lines (497 loc) · 21 KB

MicroGrowAgents Architecture Diagrams

This directory contains multiple versions of the MicroGrowAgents architecture diagrams, optimized for different audiences and purposes.

Quick Start: Which Diagram Should I Use?

Use Case Recommended Diagram Why
Conference presentation architecture_abstract.png Large text (18-28pt), minimal boxes, excellent readability
Grant application workflow_abstract.png Clear DBTL cycle, professional appearance
Paper methods section architecture_simplified.svg Balances detail with accessibility, no jargon
Developer documentation architecture_v2.svg Shows all 29 agents, 70+ skills, complete reference
Teaching materials workflows_simplified.svg Accessible language, clear examples
Quick README overview workflow_abstract.png Minimal text, maximum clarity

Diagram Versions

1. Abstract Version ⭐ NEW - RECOMMENDED FOR PRESENTATIONS

Best for: High-level presentations, executive summaries, posters, slides

Files:

  • microgrow_agents_architecture_abstract.png - System architecture
  • microgrow_agents_workflow_abstract.png - Workflow diagram

Characteristics:

  • Minimal boxes: 5-6 major components (vs 30+ in other versions)
  • Large fonts: 18-28pt for categories, 13-16pt for details
  • Category-level grouping: DATA → AI AGENTS → DESIGN/TEST → LEARN → OUTPUT
  • Clear DBTL cycle: Design → Build → Test → Learn with feedback loop
  • Single-label boxes: "AI AGENTS (29 Specialized Agents)" instead of listing each one
  • High readability: Works well in presentations, posters, conference talks

Example: Abstract Architecture Preview

Generate with:

uv run python scripts/generate_architecture_diagrams_abstract.py

2. Simplified Version (ACCESSIBLE LANGUAGE)

Best for: Broader scientific audience, documentation, teaching

Files:

  • microgrow_agents_architecture_simplified.svg - System layers
  • microgrow_agents_workflows_simplified.svg - Example workflows

Characteristics:

  • Accessible language (avoids jargon like "TF-IDF", "FAISS", "FBA")
  • Layered architecture (Data Sources → Core Analysis → Specialists → Integration)
  • ~20-30 boxes organized in layers
  • Font size: 9-11pt for labels, 11-13pt for headers
  • Shows data flow between layers

Use cases:

  • Documentation
  • Teaching materials
  • Collaboration with biologists
  • Internal presentations

Generate with:

uv run python scripts/generate_architecture_diagrams_simplified.py

3. Detailed Version (v2) (COMPREHENSIVE TECHNICAL REFERENCE)

Best for: Technical documentation, developer onboarding, system audit

Files:

  • microgrow_agents_architecture_v2.svg - Complete system architecture
  • microgrow_agents_workflows_v2.svg - DBTL workflow paths

Characteristics:

  • Comprehensive: All 29 agents, 70+ skills, 6-step analysis pipeline
  • Color-coded: External data (dashed borders), Agents (bold borders), Skills, Tools
  • Technical details: File names (kg_reasoning_agent.py), script names, specific methods
  • Small fonts: 8-11pt (designed for large displays or printed posters)
  • Multiple sections: Data Sources, Agents, Skills, Analysis Pipeline, Optimization Stack
  • Legend: Visual guide to box types and categories

Use cases:

  • Developer documentation
  • Technical deep-dives
  • System audit and compliance (bbop-skills)
  • Complete reference guide

Generate with:

uv run python scripts/generate_architecture_diagrams_v2.py

4. Legacy Versions (HISTORICAL)

Files:

  • microgrow_agents_architecture.svg - Original architecture
  • microgrow_agents_architecture_edit.svg - Manually edited version
  • microgrow_agents_workflows_edit.svg - Manually edited workflows

Status: Superseded by newer versions, kept for reference


Comparison Table

Feature Abstract ⭐ Simplified Detailed (v2)
Number of boxes 5-6 20-30 50+
Font size (main) 18-28pt 11-13pt 8-11pt
Technical jargon None Minimal High
Best for Presentations Documentation Developer reference
Target audience General/Executive Scientists Developers/Engineers
File size (PNG) ~470 KB ~235 KB ~582 KB
Readability Excellent Good Requires zoom
Detail level High-level categories System layers All components

Design Principles

Abstract Version (NEW)

  1. Minimize cognitive load: Focus on 5-6 major categories
  2. Maximize font size: 18-28pt ensures readability in presentations
  3. Group by function: Clear DBTL workflow (Design → Build → Test → Learn)
  4. Visual hierarchy: Box size and color reflect importance
  5. Simple flow: Arrows show data flow and feedback loops

Key insight: Viewers should understand the system in 30 seconds

Simplified Version

  1. Avoid jargon: "Database Query" instead of "DuckDB SQL Agent"
  2. Layered architecture: Show information flow through system layers
  3. Accessible terms: "Growth Simulation" instead of "FBA with GEMSembler"
  4. Clear examples: Four workflow paths with step-by-step descriptions

Key insight: Scientists should understand how the system works without learning technical details

Detailed Version

  1. Complete coverage: Every agent, skill, tool, and script documented
  2. Technical precision: Exact file names and method names
  3. Color coding: Visual distinction between external data, agents, skills, tools
  4. Sectioned layout: Logical grouping (Knowledge Agents, Modeling Agents, etc.)

Key insight: Developers should be able to navigate the codebase using the diagram


Color Scheme (Consistent Across All Versions)

Color Usage Hex Code Examples
Blue Data Sources, Knowledge #64B5F6 / #1565C0 KG-Microbe, Literature, Databases
Purple AI Agents, Design #BA68C8 / #6A1B9A Media Design, Specialist Agents
Green Testing, Analysis #81C784 / #2E7D32 Experimental Analysis, Clustering
Orange Learning, Optimization #FFB74D / #E65100 Bayesian Optimization, Sensitivity
Red Outputs, Results #E57373 / #C62828 Media Recipes, Reports, Designs
Gray Supporting Services #90A4AE / #37474F Validation, Evidence Extraction

Example Use Cases by Diagram Version

Scenario 1: Conference Presentation (20-minute talk)

Recommended: Abstract workflow diagram

Why:

  • Audience sees clear DBTL cycle
  • Large fonts readable from back of room
  • Focuses on "what" not "how"
  • Professional, polished appearance

Slide structure:

  1. Problem slide → Need for AI-driven media design
  2. Abstract workflow diagram → Our solution (DBTL cycle)
  3. Results slide → Performance metrics

Scenario 2: Methods Section in Scientific Paper

Recommended: Simplified architecture diagram

Why:

  • Shows system structure without overwhelming detail
  • Uses scientific language (not developer jargon)
  • Reviewers can understand agent interactions
  • Fits in 1-column or 2-column layout

Caption example:

"Figure 2: MicroGrowAgents architecture. The system integrates multiple data sources (blue) to train 29 specialized AI agents (purple) that design growth media (pink), analyze experimental results (green), and optimize designs through Bayesian optimization (orange)."

Scenario 3: Developer Onboarding Documentation

Recommended: Detailed v2 architecture diagram

Why:

  • Shows exact file locations (agents/kg_reasoning_agent.py)
  • New developers can navigate codebase
  • See relationships between agents, skills, tools
  • Understand data flow through pipeline

Documentation structure:

  1. README overview → Abstract workflow
  2. Architecture deep-dive → Detailed v2 diagram
  3. API reference → Individual agent docs

Scenario 4: Grant Application (NSF, NIH, DOE)

Recommended: Abstract architecture + Abstract workflow

Why:

  • Reviewers (often non-experts) need quick understanding
  • Emphasize DBTL cycle (familiar to reviewers)
  • Show AI integration clearly
  • Professional appearance increases credibility

Figures:

  • Figure 1: Abstract architecture (system overview)
  • Figure 2: Abstract workflow (DBTL cycle)
  • Figure 3: Results (performance comparison)

File Organization

docs/architecture/
├── README.md                                    # This file
├── microgrow_agents_architecture_abstract.png  # ⭐ NEW - Presentations
├── microgrow_agents_architecture_abstract.svg
├── microgrow_agents_workflow_abstract.png      # ⭐ NEW - Workflow overview
├── microgrow_agents_workflow_abstract.svg
├── microgrow_agents_architecture_simplified.png
├── microgrow_agents_architecture_simplified.svg
├── microgrow_agents_workflows_simplified.png
├── microgrow_agents_workflows_simplified.svg
├── microgrow_agents_architecture_v2.png        # Technical reference
├── microgrow_agents_architecture_v2.svg
├── microgrow_agents_workflows_v2.png
└── microgrow_agents_workflows_v2.svg

File sizes:

  • Abstract PNG: ~470 KB (high resolution, 300 DPI)
  • Simplified PNG: ~235 KB
  • Detailed v2 PNG: ~582 KB

Formats:

  • PNG: Presentation-ready, high resolution (300 DPI), for slides/posters
  • SVG: Vector format, infinitely scalable, editable in Inkscape/Illustrator

Regenerating Diagrams

All diagrams can be regenerated from Python scripts:

# Abstract version (NEW - large fonts, minimal boxes)
uv run python scripts/generate_architecture_diagrams_abstract.py

# Simplified version (accessible language)
uv run python scripts/generate_architecture_diagrams_simplified.py

# Detailed v2 version (complete technical reference)
uv run python scripts/generate_architecture_diagrams_v2.py

Output location: docs/architecture/

Dependencies: matplotlib, numpy (installed via uv)


System Architecture Overview

Main Components (Abstract View)

  1. DATA SOURCES

    • Knowledge Graphs (KG-Microbe: 40K+ organisms, 1800+ media)
    • Literature (PubMed, DOIs, PDFs)
    • Genomes (NCBI, Bakta annotations)
    • Experimental Data (Plate reader CSV files)
  2. AI AGENTS (29 Specialized)

    • Knowledge & Database Agents: Query KG-Microbe, literature, SQL databases
    • Genomics & Modeling Agents: GEMSembler (FBA), GAPMind (pathway gaps), genome annotation
    • Chemistry & Analysis Agents: Osmolarity, ionic strength, sensitivity analysis
    • Specialist Agents: Lanthanide genes, codon bias, analogy reasoning
    • Media Design Agents: Formulation, concentration prediction, ingredient substitution
    • Orchestration Agents: Evidence extraction, design recommendations
  3. DESIGN

    • Media formulation (ingredient selection)
    • Concentration prediction (with confidence intervals)
    • Experimental design (DoE: MaxPro OptBlock)
  4. TEST

    • Experimental analysis (plate reader data)
    • Statistical analysis (replicates, outliers)
    • Visualization (growth curves, heatmaps, PCA)
    • Clustering (hierarchical, Ward linkage)
  5. LEARN

    • Response surface modeling (Gaussian Processes)
    • Bayesian optimization (Expected Improvement)
    • Sensitivity analysis (Sobol indices)
    • Pareto frontier analysis (multi-objective)
    • Next design recommendations (v14 YAML)
  6. OUTPUTS

    • Media recipes (JSON/TSV/Markdown with citations)
    • Plate designs (96/384-well layouts, Hamilton protocols)
    • Analysis reports (interpretation, evidence, recommendations)
    • Optimization reports (Sobol, Pareto, boundary effects)

DBTL Cycle

DESIGN → BUILD → TEST → LEARN
   ↑                        ↓
   └────── ITERATE ←────────┘

Feedback loop: LEARN phase generates recommendations for next DESIGN iteration


Key Design Principles

1. Modularity

  • Each agent has a specific, well-defined responsibility
  • Agents can be composed in different workflows
  • Easy to add new agents without changing existing ones

2. Evidence-Based

  • All recommendations include citations (DOI links)
  • Multi-tier confidence scoring (HIGH/MEDIUM/LOW)
  • Transparent reasoning paths
  • 90.5% citation coverage for MP medium ingredients

3. Data Integration

  • KG-Microbe integrates multiple data sources
  • External tools (Bakta, GAPMind, GEMSembler) seamlessly integrated
  • Hierarchical search across ingredient ontologies
  • Graph embeddings for similarity search

4. Validation

  • LinkML schema validation for all outputs
  • Constraint checking (osmolarity, pH, element balance)
  • FBA simulation for growth predictions
  • Multi-objective optimization with Pareto frontiers

5. Reproducibility

  • All outputs include provenance tracking
  • Deterministic DoE generation with seeds
  • Version-controlled schemas and workflows
  • SHA256 checksums for input data

Technology Stack

Core Technologies

  • Language: Python 3.10+
  • Dependency Management: uv
  • Schemas: LinkML (with validation)
  • Database: DuckDB (KG-Microbe queries)
  • CLI: Typer

External Tools

  • Bakta: Genome annotation
  • GAPMind: Pathway gap prediction
  • GEMSembler: Metabolic modeling (wraps CarveMe + COBRApy)
  • BLAST: Sequence alignment

Analysis Stack

  • DoE: MaxPro + OptBlock algorithms (R via rpy2)
  • Optimization: Gaussian Processes (scikit-learn), Bayesian Optimization
  • Sensitivity: Sobol analysis (SALib)
  • Clustering: Hierarchical clustering (scipy, Ward linkage)
  • Visualization: matplotlib, seaborn

Search & Embeddings

  • Graph Embeddings: DeepWalk SkipGram (512 dimensions)
  • Similarity Search: FAISS (optional, falls back to NumPy)
  • Embedding Index: 208,811 chemicals with graph-based representations

Agent Interaction Patterns

Sequential Pipeline

Used when output of one agent is input to next:

SQL Agent → GenomeFunction → GAPMind → MediaFormulation

Parallel Aggregation

Multiple agents contribute data to orchestrator:

Cooccurrence ─────┐
MetabolicSource ──├→ GenMediaConc → Output
AnalogyReasoning ─┘

Iterative Refinement

Agent revisits earlier agents based on results:

GenMediaConc → Validation → [fail] → Cooccurrence → GenMediaConc

Orchestrated Workflow

High-level agent coordinates multiple sub-agents:

MediaFormulationAgent:
  ├─ SQLAgent (find organism)
  ├─ GenomeFunctionAgent (analyze genome)
  ├─ GAPMindAgent (find gaps)
  ├─ MediaRoleAgent (classify ingredients)
  ├─ GenMediaConcAgent (predict concentrations)
  ├─ ChemistryAgent (validate)
  └─ LiteratureAgent (gather evidence)

Skills & Workflows

Core Skills (70+)

  • Modeling: predict-growth, analyze-gaps, reconstruct-model, compare-gap-fba
  • Genome: analyze-genome, analyze-lanthanide-genes, analyze-transporters
  • Chemistry: calculate-chemistry, analyze-electron-balance, predict-concentration
  • Knowledge: query-knowledge-graph, query-database, search-literature
  • Design: recommend-media, find-alternates, optimize-growth-conditions
  • DoE: design-maxpro-optblock
  • Validation: validate-media, classify-role, validate-formulation-comprehensive
  • Analysis: analyze-sensitivity, analyze-cooccurrence, recommend-next-design
  • Experimental: interpret-experimental-results, reconcile-growth-predictions

Workflows

  • recommend-media-comprehensive: Full organism → medium pipeline
  • optimize-medium: Iterative improvement of existing media
  • ingredient-report: Evidence extraction for specific ingredients

Example Usage

1. Design Medium for New Organism

# Using comprehensive workflow
uv run MicroGrowAgents recommend-media-comprehensive \
  --organism "Methylorubrum extorquens AM1" \
  --output medium_recipe.json

2. Generate Experimental Design

# MaxPro OptBlock DoE
uv run MicroGrowAgents design-maxpro-optblock \
  --factors factors.yaml \
  --plates 4 \
  --output plate_design.csv

3. Analyze Experimental Results

# Dual analysis (absolute + relative)
just analyze-experimental data/experimental/plate_designs_v10_maxprooptblock_long__results

4. Generate Optimization Report

# Sobol sensitivity + Bayesian optimization + Pareto
python scripts/generate_optimization_report.py \
  outputs/experimental_analysis_absolute/ \
  --mode absolute

5. Recommend Next Design Version

# v14 recommendations from v13 results
python scripts/recommend_v14_design.py \
  outputs/optimization_report_absolute/ \
  --output data/designs/v14_recommendations.yaml

Workflow Paths (Detailed Examples)

Path 1: Media Recommendation (DESIGN)

User Query: "Design medium for Methylorubrum extorquens AM1"
  ↓
KGReasoningAgent → Query KG-Microbe for organism data
  ↓
GEMSemblerAgent + GAPMindAgent → FBA growth prediction + pathway gaps
  ↓
MediaRoleAgent → Classify essential roles (C/N/P/S/cofactors)
  ↓
GenMediaConcAgent → Predict ingredient concentrations
  ↓
ChemistryAgent → Validate osmolarity, pH, precipitation
  ↓
LiteratureAgent → Gather citation evidence
  ↓
Output: Complete medium recipe (JSON) with evidence and confidence scores

Path 2: Experimental Analysis (TEST)

Input: Plate reader CSV (raw OD600 time series)
  ↓
analyze_plate_replicates.py → Replicate statistics (mean/std/CV/SEM)
  ↓
cluster_heatmap_replicates.py → Hierarchical clustering (Ward linkage)
  ↓
visualize_plate_data.py → Growth curves, PCA, heatmaps
  ↓
analyze_response_surfaces.py → Gaussian Process fitting
  ↓
interpret_experimental_results.py → Biological interpretation
  ↓
Output: Analysis reports (TSV, PDF plots, interpretation report)

Path 3: Optimization (LEARN)

Input: GP models + experimental data from TEST phase
  ↓
Pareto frontier on GP surface → 10K-point grid, non-dominated sort
  ↓
Sobol sensitivity analysis → SALib S1 + ST indices, ingredient ranking
  ↓
Bayesian optimization → Expected Improvement, top 20 next experiments
  ↓
Boundary effect detection → +/-5% threshold, EXPAND triggers
  ↓
recommend_v14_design.py → Factor adjustments (YAML)
  ↓
Output: v14 design recommendations (LinkML validated)

Path 4: Evidence Extraction (SUPPORT)

Input: Ingredient CSV with DOI citations (158 ingredients)
  ↓
DOIMappingService → DOI to PDF path, abstract fallback
  ↓
OrganismExtractor → 21 organism context columns, NLP extraction
  ↓
EvidenceSnippetExtractor → 25 property columns, text snippets
  ↓
CSV Update → Incremental saves, dry-run mode, resume support
  ↓
Output: Enriched CSV (90.5% DOI coverage, evidence snippets)

Editing Diagrams

SVG Files (Recommended for Modifications)

Tools:

  • Inkscape (free, open-source) - Download
  • Adobe Illustrator (commercial)
  • Any vector graphics editor

Advantages:

  • Infinitely scalable (no quality loss)
  • Edit text, colors, shapes
  • Export to PNG at any resolution

Regenerating from Code (Recommended for Consistency)

Advantages:

  • Version controlled
  • Consistent styling
  • Easy to update all diagrams at once

Workflow:

  1. Modify Python script in scripts/generate_architecture_diagrams_*.py
  2. Run script: uv run python scripts/generate_architecture_diagrams_abstract.py
  3. Review output in docs/architecture/
  4. Commit both script changes and generated diagrams

Version History

  • 2026-02-24: Added abstract version with large fonts (18-28pt) and minimal boxes (5-6)
  • 2026-02-20: Created detailed v2 version with all 29 agents and 70+ skills
  • 2026-02-09: Created simplified version with accessible language
  • 2026-02-07: Original architecture diagrams

Future Extensions

Potential Diagram Enhancements

  • Interactive web version with clickable agents (D3.js, Cytoscape.js)
  • Animated workflow showing data flow
  • Agent dependency graph (directed acyclic graph)
  • Performance profiling overlay (execution time per agent)

Potential System Enhancements

  • Growth Prediction: Integrate GrowthCodon for codon usage bias
  • Transporter Matching: Match media ingredients to organism uptake systems
  • Multi-Objective Optimization: Balance cost, complexity, performance
  • Active Learning: Update predictions from experimental results
  • Federated Agents: Distribute computation across multiple nodes

References


Last updated: 2026-02-24 Diagram versions: Abstract v1, Simplified v1, Detailed v2