This feature links CommunityMech growth media records to external databases:
- CultureMech: Standard microbiology media database
- MediaIngredientMech: Media ingredient/component database
MediaFetcher
- Fetches and caches YAML data from CultureMech and MediaIngredientMech GitHub repos
- Implements TTL-based caching (default: 24 hours)
- Cache stored in
media_cache/(gitignored)
MediaMatcher
- Fuzzy matching for media and ingredient names
- Exact match (case-insensitive) → fuzzy match (≥0.85 threshold)
- Supports manual overrides from
conf/media_mappings.yaml
CompositionMerger
- Merges CultureMech ingredients with existing community-curated data
- Marks source: "CultureMech" vs "community_curated"
- Preserves existing ingredients, adds new ones
Command-line tool for batch processing all community YAML files.
Features:
- Dry-run mode for previewing changes
- Single community or batch processing
- Configurable fuzzy matching threshold
- Cache control (TTL, no-cache flag)
- Color-coded output with statistics
- Backup originals before modification
just link-media-dryjust link-mediajust link-media-reportThis generates three reports in reports/:
ingredient_mapping.csv- All ingredients with mapping statusmedia_mapping.csv- All media with mapping statusmedia_linking_summary.txt- Human-readable summary
uv run python scripts/link_growth_media.py --community-id EcoFAB_Ring_Trial_SynCom17 --dry-runuv run python scripts/link_growth_media.py \
--dry-run \
--fuzzy-threshold 0.9 \
--cache-ttl 3600 \
--no-cache \
--limit 10 \
--ingredient-report reports/ingredients.csv \
--media-report reports/media.csv \
--summary-report reports/summary.txtIngredient Mapping CSV (ingredient_mapping.csv)
ingredient_name,community_id,media_name,mapped_id,match_score,status
yeast extract,EcoFAB_Ring_Trial_SynCom17,R2A medium,MediaIngredientMech:000015,1.000,mapped
peptone,EcoFAB_Ring_Trial_SynCom17,R2A medium,,0.850,unmappedMedia Mapping CSV (media_mapping.csv)
media_name,community_id,mapped_id,match_score,status
R2A medium,EcoFAB_Ring_Trial_SynCom17,CultureMech:000042,1.000,mapped
LB medium,Synechococcus_Ecoli_SPC,CultureMech:000001,1.000,mappedSummary Report (media_linking_summary.txt)
- Human-readable text report
- Statistics on mapping success
- Lists of mapped and unmapped items
- Communities where each ingredient appears
- Identify unmapped ingredients: Review
ingredient_mapping.csvforstatus=unmapped - Add manual overrides: Update
conf/media_mappings.yamlwith corrections - Re-run with reports: Generate updated reports to verify fixes
- Track progress: Use summary report to monitor curation completeness
Provide explicit mappings when fuzzy matching fails:
media_overrides:
"R2A medium":
culturemech_id: "CultureMech:000042"
note: "Standard R2A for heterotrophs"
ingredient_overrides:
"yeast extract":
media_ingredient_mech_id: "MediaIngredientMech:000015"
note: "Standard yeast extract component"Each MicrobialCommunity can have growth_media[]:
growth_media:
- name: R2A medium
culturemech_id: CultureMech:000042
culturemech_url: https://github.com/CultureBotAI/CultureMech/...
composition:
- name: yeast extract
media_ingredient_mech_id: MediaIngredientMech:000015
media_ingredient_mech_url: https://github.com/CultureBotAI/MediaIngredientMech/...
concentration: "0.5"
unit: g/L
from: CultureMech
- name: glucose
concentration: "1.0"
unit: g/L
chebi_term:
id: CHEBI:17234
label: glucose
from: community_curated
ph: "7.0"
temperature: "30"
temperature_unit: "°C"
atmosphere: aerobic- Repository: https://github.com/CultureBotAI/CultureMech
- Raw YAML:
https://raw.githubusercontent.com/CultureBotAI/CultureMech/main/kb/media/CultureMech_XXXXXX.yaml
- Repository: https://github.com/CultureBotAI/MediaIngredientMech
- Raw YAML:
https://raw.githubusercontent.com/CultureBotAI/MediaIngredientMech/main/kb/ingredients/MediaIngredientMech_XXXXXX.yaml
-
Cache Management: The
media_cache/directory is gitignored. Cached data is stored as JSON with timestamp validation. -
Fuzzy Matching: Uses
difflib.SequenceMatcherwith default threshold 0.85. Adjust via--fuzzy-thresholdflag. -
Source Tracking: Ingredients are marked with
fromfield to distinguish CultureMech imports from manual curation. -
Validation: All changes preserve LinkML schema compliance. Run
just validate-allafter applying. -
Future Work:
- CultureMech/MediaIngredientMech need to provide index files for full matching
- Current implementation has placeholder logic for media/ingredient discovery
- Consider adding evidence tracking for media assignments
After running the script, validate the results:
just link-media
just validate-all- Cross-Repo Environmental Linking -- Environment-based linking via
related_mediaandrelated_ingredientsfields (complements the cultivation-based linking described here) - LinkML schema:
src/communitymech/schema/communitymech.yaml - GrowthMedia class definition
- GrowthMediaComponent class definition