Commit 8824afc
Address kg-model-review findings: Abscess + BacDive dual-edge + Rhea→EC
Three fixes for the modeling violations the merged-kg review surfaced.
After re-running the bacdive + rhea_mappings transforms and re-merging,
expected impact is ~133K of the 149,388 domain/range warnings cleared
(89%) without minting any new METPO term.
Finding #1 — Abscess mapping pointed at a UBERON ID that does not exist
(mappings/isolation_source_to_ontology.tsv:4)
The earlier codex_review_v2 fix swapped MONDO:0005227 for UBERON:0006548,
but UBERON has no generic "abscess" term — only as descriptive text on
unrelated anatomy entries. UBERON:0006548 is fictitious. Replaced with
HP:0025615 (HP "Abscess", a phenotypic feature defined as 'a localized
collection of purulent material surrounded by inflammation and
granulation'). HP is present in data/transformed/ontologies/hp_nodes.tsv
and avoids the disease-class concern that motivated the original fix.
Finding #2 — BacDive dual-edge assay pattern violates predicate ranges
(kg_microbe/transform_utils/bacdive/bacdive.py:2794-2812 + constants.py:295)
bacdive.py emits two edges per assay result: organism→ChEBI substrate
(the trait claim) and organism→assay (the test that produced it). Both
were emitted with the same trait predicate (METPO:2000002 assimilates,
METPO:2000011 ferments, etc.), generating 125,119 organism→Procedure
violations because those predicates' biolink-mapped range is
biolink:ChemicalEntity, not biolink:Procedure. Two changes:
* The organism→assay edge now uses METPO:2000511 (has observation), the
existing parent of the has-X-observation predicates. This correctly
reads as "organism has the observation that was recorded on this assay
procedure." Substrate edge predicate is unchanged.
* ASSAY_CATEGORY swapped from "biolink:Procedure" to multi-cat
"biolink:Procedure|METPO:1001000" (observation). biolink:Procedure
carries the biolink semantic for downstream tooling; METPO:1001000
satisfies METPO:2000511's existing range without any upstream METPO
range modification. Avoids minting a new METPO term and keeps the
proposal TSV (metpo_proposal_2026_05) untouched.
Finding #3 — Rhea→EC enabled_by violates biolink range
(kg_microbe/transform_utils/constants.py:237)
RHEA reactions (biolink:MolecularActivity) → EC enzymes (also typed
biolink:MolecularActivity in this graph) was emitted with biolink:enabled_by
whose range is biolink:PhysicalEntity, generating 7,902 violations.
Could not change EC_CATEGORY without breaking the upstream
"Protein → enables → EC" semantics that requires EC to be activity. So
swapped RHEA_TO_EC_EDGE to biolink:close_match — biolink's purpose-built
predicate for cross-vocabulary alignments, which is what Rhea↔EC actually
is (both classify the same enzymatic reaction at different granularities).
Domain/range unconstrained, semantic match is good.
Re-run scope before next merge:
* bacdive — picks up bacdive.py edge change, ASSAY_CATEGORY multi-cat,
and the load_isolation_source_mappings() reload of the Abscess fix.
* rhea_mappings — picks up RHEA_TO_EC_EDGE constant.
* merge — produces fresh merged-kg.tar.gz with all three fixes applied.
Validation: ROBOT template/merge/ELK reason clean; isolation_source
validator OK; pytest tests/test_extract_metpo_proposals.py +
tests/test_isolation_source_mapping_utils.py → 10 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 8e1ac1e commit 8824afc
3 files changed
Lines changed: 20 additions & 7 deletions
File tree
- kg_microbe/transform_utils
- bacdive
- mappings
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2792 | 2792 | | |
2793 | 2793 | | |
2794 | 2794 | | |
2795 | | - | |
| 2795 | + | |
2796 | 2796 | | |
2797 | 2797 | | |
2798 | 2798 | | |
2799 | | - | |
| 2799 | + | |
| 2800 | + | |
| 2801 | + | |
| 2802 | + | |
| 2803 | + | |
| 2804 | + | |
| 2805 | + | |
| 2806 | + | |
| 2807 | + | |
2800 | 2808 | | |
2801 | | - | |
| 2809 | + | |
2802 | 2810 | | |
2803 | 2811 | | |
2804 | 2812 | | |
2805 | 2813 | | |
2806 | | - | |
| 2814 | + | |
2807 | 2815 | | |
2808 | 2816 | | |
2809 | 2817 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
237 | | - | |
| 237 | + | |
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
| |||
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
295 | | - | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
296 | 301 | | |
297 | 302 | | |
298 | 303 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
0 commit comments