Complete guide to using the MicroGrowAgents optimization system for data-driven v14 design generation.
- Overview
- Quick Start
- Pipeline Architecture
- Interpreting Optimization Reports
- Understanding v14 Recommendations
- Command Reference
- Troubleshooting
- Advanced Usage
The optimization system converts experimental data into actionable recommendations for the next design iteration. It combines:
- Gaussian Process (GP) models - Surrogate models for response surface prediction
- Pareto frontier analysis - Multi-objective optimization (OD600 + Nd_uM)
- Sobol sensitivity analysis - Variance decomposition to identify important factors
- Bayesian optimization - Expected Improvement acquisition for next experiments
- Boundary effect detection - Identify factors saturated at design space limits
- LinkML validation - Structured, validated recommendations for v14 design
Scientific Goal: Identify which ingredient ranges to expand, contract, or shift for the next experimental design iteration based on quantitative evidence from v10/v12 data.
# Start with experimental data directory
just analyze-experimental-full data/experimental/plate_designs_v10_results/
# This runs ALL steps:
# Step 1: Analysis (replicate statistics)
# Step 2: Clustering (hierarchical clustering)
# Step 3: Visualization (growth curves, heatmaps, PCA)
# Step 4: Response surfaces (GP model fitting)
# Step 5: Optimization report (Pareto, Sobol, BO, boundaries)
# Step 6: v14 recommendations (LinkML-validated YAML)# Optimization report
cat outputs/plate_designs_v10_results_experimental_analysis_absolute/optimization/OPTIMIZATION_REPORT_absolute.md
# v14 recommendations
cat outputs/recommendations/recommendations_v14.yaml# Open key plots
open outputs/.../optimization/sensitivity_tornado_absolute.pdf
open outputs/.../optimization/acquisition_surface_absolute.pdf
open outputs/.../optimization/uncertainty_heatmap_absolute.pdfData Flow:
Raw Experimental Data (plate1.tsv, plate2.tsv, plate3.tsv)
↓
[Step 1-3] Analysis + Clustering + Visualization
↓
replicate_statistics_{mode}.tsv
↓
[Step 4] Response Surface Analysis (GP Fitting)
↓
GP models (pickled), surface predictions
↓
[Step 5] Optimization Report Generation
↓
Pareto frontier, Sobol indices, BO suggestions, boundary effects
↓
[Step 6] v14 Recommendations (absolute mode only)
↓
recommendations_v14.yaml (LinkML-validated)
Inputs:
replicate_statistics_{mode}.tsv- Experimental data with ingredient concentrations and measurements- GP models (fitted or refitted from experimental data)
Outputs:
optimization/
├── OPTIMIZATION_REPORT_{mode}.md # 6-section comprehensive report
├── pareto_experimental_{mode}.csv # Experimental Pareto points (if 2+ measurements)
├── pareto_predicted_{mode}.csv # GP surface Pareto points (novel optima)
├── next_experiments_bayesian_{mode}.csv # Top 20 by Expected Improvement
├── sensitivity_sobol_{mode}.csv # Sobol indices (ST, S1)
├── boundary_effects_{mode}.csv # Boundary flags per ingredient
├── uncertainty_grid_{mode}.csv # GP σ predictions (5000 points)
├── sensitivity_tornado_{mode}.pdf/png # Ingredient ranking visualization
├── acquisition_surface_{mode}.pdf/png # EI surface with top-5 markers
└── uncertainty_heatmap_{mode}.pdf/png # GP uncertainty hexbin
Analyses Performed:
-
Pareto Frontier (2+ measurements only):
- Computes non-dominated points on GP surface (10,000-point Sobol grid)
- Identifies novel optima not in experimental data
- Compares experimental vs predicted Pareto frontiers
-
Bayesian Optimization:
- Computes Expected Improvement (EI) acquisition function
- Suggests top 20 next experiments by EI score
- Balances exploitation (high predicted response) vs exploration (high uncertainty)
-
Sobol Sensitivity:
- Variance-based global sensitivity analysis
- Computes total-order (ST) and first-order (S1) indices
- Ranks ingredients by importance (ST captures main + interaction effects)
-
Boundary Effects:
- Detects if optima clustered at design space boundaries (±5% margin)
- Flags ingredients needing range expansion (UPPER, LOWER, BOTH)
-
Uncertainty Quantification:
- Evaluates GP prediction std (σ) across design space
- Identifies high-uncertainty regions for targeted exploration
Inputs:
- All optimization report outputs (CSV files)
Outputs:
recommendations/
├── recommendations_v14.yaml # LinkML-validated recommendations
├── v14_factor_recommendations.csv # Tabular summary
└── v14_generation_provenance.json # Decision log
Recommendation Logic:
for each ingredient:
if boundary_effect == 'UPPER':
adjustment_type = 'EXPAND_UPPER'
new_max = current_max * 1.5 # Expand by 50%
elif boundary_effect == 'LOWER':
adjustment_type = 'EXPAND_LOWER'
new_min = current_min * 0.5 # Expand by 50%
elif sobol_index < 0.05:
adjustment_type = 'CONTRACT'
# Narrow to ±20% of median
elif pareto_optimal far from center:
adjustment_type = 'SHIFT'
# Center on Pareto median, keep width
else:
adjustment_type = 'MAINTAIN'
# No change needed
priority_score = 0.5*Sobol_ST + 0.3*(1-EI_rank/20) + 0.2*boundary_flag
if priority_score > 0.6:
priority = 'VERY HIGH'
elif priority_score > 0.4:
priority = 'HIGH'
elif priority_score > 0.2:
priority = 'MEDIUM'
else:
priority = 'LOW'What to look for:
- Best experimental condition (highest observed response)
- Best predicted condition (GP model optimum)
- Number of Pareto frontier points (novel optima discovered)
- Number of BO suggestions (high-EI candidates)
Example:
**Best Experimental Condition:** OD600 = 1.113
**Best Predicted Condition:** OD600 = 0.912
**Pareto Frontier Points:** 15 novel optima identified
**Next Experiments Suggested:** 20 high-EI conditionsInterpretation:
- Experimental best (1.113) is higher than predicted best (0.912) → GP model may be underestimating optimal region
- 15 novel Pareto points → GP surface suggests unexplored high-performing conditions
- 20 BO suggestions → Next experiments to improve model accuracy
Key Insights:
- Experimental Pareto: Actual non-dominated conditions from v10 data
- Predicted Pareto: GP-predicted non-dominated points (10,000-point grid)
- Novel optima: Predicted points NOT in experimental data
What to look for:
- Are predicted Pareto points far from experimental data? → Exploration opportunity
- Do predictions cluster in specific ingredient regions? → Important factor
- Are there multiple competing optima? → Trade-offs exist
Visualization: pareto_comparison_{mode}.pdf
- Blue circles = Experimental Pareto
- Red triangles = Predicted Pareto
- Look for red triangles away from blue → novel optima
Top 20 Next Experiments:
- Ranked by Expected Improvement (EI) score
- Higher EI = Better balance of predicted performance + uncertainty
What to look for:
- High EI (>0.01): Strong candidates, likely to improve model
- Clustered suggestions: Specific ingredient region worth exploring
- Diverse suggestions: Model uncertain across design space
Example:
1. EI=0.0108, OD600_pred=0.912, σ=0.145
2. EI=0.0040, OD600_pred=0.874, σ=0.151
Interpretation:
- Top suggestion has high EI (0.0108) → Run this condition first in v14
- High σ (0.145) → Model uncertain in this region, needs data
Visualization: acquisition_surface_{mode}.pdf
- Stars mark top-5 EI maxima
- Color intensity = EI value
- Look for star clusters → promising regions
Sobol Indices Explained:
- S1 (First-order): Main effect of ingredient alone
- ST (Total-order): Main effect + all interactions
- ST - S1: Interaction effects
Interpretation Guidelines:
- ST > 0.3: High impact factor (prioritize exploration)
- ST 0.1-0.3: Moderate impact (standard exploration)
- ST < 0.1: Low impact (consider contracting range)
Example:
| Rank | Ingredient | ST |
|------|-----------------|-------|
| 1 | (NH4)2SO4 | 0.520 |
| 2 | Succinate | 0.380 |
| 3 | Phosphate | 0.322 |
| 4 | Methanol | 0.268 |
| 5 | PQQ | 0.249 |
| 6 | CoCl2 | 0.224 |
Interpretation:
- (NH4)2SO4 is most important (ST=0.52) → Focus v14 optimization here
- Sum of top 3 (1.22) > 1.0 → Strong interaction effects
- CoCl2 lowest (ST=0.22) → Less critical, could narrow range
Visualization: sensitivity_tornado_{mode}.pdf
- Horizontal bars ranked by ST
- Color-coded by magnitude
- Focus on top 3-5 ingredients
Flags:
- UPPER: Optima at upper boundary → Expand upper bound in v14
- LOWER: Optima at lower boundary → Expand lower bound in v14
- BOTH: Multiple optima at boundaries → Expand both bounds
- NONE: Interior optimum → Range well-calibrated
Example:
| Ingredient | Boundary | Recommendation |
|-------------|----------|---------------------|
| (NH4)2SO4 | UPPER | Expand upper bound |
| Succinate | NONE | Maintain range |
Interpretation:
- (NH4)2SO4 at UPPER → Increase max from 100 mM to 150 mM in v14
- Succinate NONE → Current range (10-100 mM) is good
Metrics:
- Mean σ: Average prediction uncertainty
- Max σ: Highest uncertainty region
- High-uncertainty fraction: % of design space with σ > 50% of max
Example:
OD600:
- Mean σ: 0.179
- Max σ: 0.193
- High-uncertainty regions: 99.8%
Interpretation:
- Mean σ=0.179 relative to OD600 range (0-1.2) → ~15% uncertainty
- 99.8% high-uncertainty → Sparse data coverage, model needs more data
- Prioritize BO suggestions to reduce uncertainty
Visualization: uncertainty_heatmap_{mode}.pdf
- Hexbin color intensity = GP σ
- White crosses = Experimental conditions
- Dark regions (high σ) = Unexplored, run BO suggestions there
metadata:
version: v14_experimental
generation_date: '2026-02-15'
organism: Methylorubrum extorquens AM1
optimization_recommendations:
- ingredient: (NH4)2SO4_mM_first
current_range_min: 1.73
current_range_max: 28.25
recommended_range_min: 1.73
recommended_range_max: 42.38 # EXPAND_UPPER: 28.25 * 1.5
adjustment_type: EXPAND_UPPER
sobol_index: 0.525
boundary_effect: UPPER
priority: HIGH
rationale: "Optima clustered at upper boundary (UPPER). High impact factor (ST=0.525)."-
EXPAND_UPPER
- Trigger: >50% of Pareto points at upper boundary
- Action:
new_max = current_max * 1.5 - Example: (NH4)2SO4 range 1.73-28.25 → 1.73-42.38 mM
-
EXPAND_LOWER
- Trigger: >50% of Pareto points at lower boundary
- Action:
new_min = max(0, current_min * 0.5) - Example: PQQ range 0.002-0.005 → 0.001-0.005 µM
-
CONTRACT
- Trigger: Sobol ST < 0.05 (low impact)
- Action: Narrow to ±20% of median
- Example: CoCl2 range 0.001-0.067 → 0.015-0.045 µM
-
SHIFT
- Trigger: Pareto optimal >30% away from current center
- Action: Center on Pareto median, keep width
- Example: Succinate 3-63 mM, optimal at 50 → shift to 20-80 mM
-
MAINTAIN
- Trigger: None of the above
- Action: Keep current range unchanged
- Example: Well-calibrated range with interior optimum
Scoring Formula:
score = 0.5 * Sobol_ST + 0.3 * (1 - EI_rank/20) + 0.2 * boundary_flagThresholds:
- VERY HIGH (>0.6): High Sobol + top EI + boundary effect
- HIGH (>0.4): High Sobol or top EI
- MEDIUM (>0.2): Moderate Sobol or mid-range EI
- LOW (≤0.2): Low Sobol, low EI, no boundary
Usage:
- Focus v14 optimization on VERY HIGH and HIGH priority factors
- Maintain or contract LOW priority factors
Every recommendation includes complete provenance:
provenance:
decisions:
- timestamp: '2026-02-15T00:03:21'
phase: v14_recommendation
decision_type: factor_range_adjustment
decision: Generated 6 optimization-driven recommendations
rationale: Applied optimization-driven range adjustments...
skills_used:
- skill_name: generate_optimization_report
invocation_count: 1
purpose: Compute Pareto frontier, Sobol sensitivity...
data_sources:
- source_type: experimental
path: outputs/.../optimization
description: Optimization analysis outputs (absolute mode)# Recommended: Run complete analysis
just analyze-experimental-full data/experimental/v10_results/
# Dual mode (absolute + relative)
uv run python scripts/run_dual_analysis.py data/experimental/v10_results/
# Absolute mode only (faster for testing)
uv run python scripts/run_dual_analysis.py data/experimental/v10_results/ --skip-relative# Optimization report only (requires existing analysis)
just generate-optimization-report outputs/v10_analysis_absolute/ absolute
# v14 recommendations only (requires optimization report)
just recommend-v14-design outputs/v10_analysis_absolute/optimization/ outputs/recommendations/ absolute
# Validate v14 recommendations
just validate-linkml outputs/recommendations/recommendations_v14.yaml# Disable optimization (Steps 5-6)
uv run python scripts/run_dual_analysis.py DATA_DIR --disable-optimization-report
# Disable v14 recommendations (Step 6 only)
uv run python scripts/run_dual_analysis.py DATA_DIR --disable-v14-recommendations
# Disable response surfaces (skips Steps 4-6)
uv run python scripts/run_dual_analysis.py DATA_DIR --disable-response-surfacesCause: Missing measurements (e.g., Nd_uM not in data)
Solution:
# Run with single measurement
uv run python scripts/generate_optimization_report.py ANALYSIS_DIR --mode absolute --measurements OD600Cause: Only 1 measurement available
Effect: Pareto analysis disabled, BO and Sobol still run
Solution: This is normal for single-objective data. Bayesian optimization and Sobol analysis still provide actionable insights.
Cause: GP model fitting failed (insufficient data, <5 points)
Check:
# Verify replicate statistics file exists and has >5 rows
wc -l outputs/v10_analysis_absolute/*replicate_statistics*.tsvSolution:
- Need at least 5 experimental conditions for GP fitting
- Check data quality (no all-NaN measurements)
Cause: Optimization report CSVs missing
Check:
ls outputs/v10_analysis_absolute/optimization/*.csvSolution:
- Run optimization report first
- Check for errors in Step 5 output
Common Issues:
-
Missing required fields
ERROR: 'purpose' is a required propertySolution: Update script to include all required schema fields
-
Pattern mismatch
ERROR: 'v14_optimization' does not match '^v[0-9]+(_agentic|_experimental)?$'Solution: Use valid version patterns (e.g.,
v14_experimental) -
Type mismatch
ERROR: {...} is not of type 'string'Solution: Convert dicts to JSON strings for parameters field
# Multi-measurement optimization (OD600 + Nd_uM + custom)
uv run python scripts/generate_optimization_report.py \\
outputs/analysis_absolute/ \\
--mode absolute \\
--measurements OD600 Nd_uM GFP_fluorescenceEdit scripts/recommend_v14_design.py:
# Relaxed constraints (current default)
osmolarity_limit = 900 # mOsm
cn_ratio_range = (1.5, 120)
# Strict constraints
osmolarity_limit = 800
cn_ratio_range = (2, 100)Edit scripts/recommend_v14_design.py:
def _compute_priority(self, sobol_index, ei_rank, boundary_effect):
# Current weights
sobol_score = 0.5 * sobol_index
ei_score = 0.3 * (1 - ei_rank / 20)
boundary_score = 0.2 if boundary_effect in ['UPPER', 'LOWER', 'BOTH'] else 0.0
# Alternative: Emphasize Sobol more
sobol_score = 0.7 * sobol_index # Increased weight
ei_score = 0.2 * (1 - ei_rank / 20)
boundary_score = 0.1 if ...# Process multiple datasets
for dataset in data/experimental/v*/; do
just analyze-experimental-full "$dataset"
done
# Compare v14 recommendations
diff outputs/recommendations_v10/recommendations_v14.yaml \\
outputs/recommendations_v12/recommendations_v14.yamlInput: Experimental plate data (v10/v12 designs)
Process:
- Run full pipeline:
just analyze-experimental-full DATA_DIR - Review optimization report:
cat outputs/.../optimization/OPTIMIZATION_REPORT_absolute.md - Inspect visualizations:
open outputs/.../optimization/*.pdf - Check v14 recommendations:
cat outputs/recommendations/recommendations_v14.yaml - Generate v14 design using recommended ranges
- Run v14 experiments
- Iterate (run pipeline on v14 data for v15 recommendations)
Output: LinkML-validated v14 recommendations with full provenance
Timeline:
- Pipeline runtime: ~2 minutes (69 conditions, 6 ingredients)
- Manual review: ~15 minutes
- v14 design generation: Variable (depends on DOE method)
- Total: <1 hour from data to v14 design ready
- Sobol Sensitivity Analysis
- Gaussian Process Regression
- Bayesian Optimization
- LinkML Schema
- MicroGrowAgents Schema
Generated: 2026-02-15 Version: 1.0 Optimization System: v14 Contact: MicroGrowAgents Development Team