Commit 8f044ac
Extend stub-import transform to cover BTO
The 2026-05-18 MIM SSSOM republish (PR #564) added a `MIM:Cell_Lysate →
BTO:0004304 cell lysate` row. Combined with the pre-existing
`Wound-fluid → BTO:0003114 wound fluid` row in
`mappings/isolation_source_to_ontology.tsv`, the BTO footprint in
kg-microbe is now 2 IDs — past the threshold where the original
ontologies-stubs design (PR #565) opted to leave BTO on the label-only
inline path. Promote BTO to the same SemSQL-backed enriched-stub
treatment as NCIT and mesh, so the merged KG carries full label +
synonyms + xrefs on both BTO nodes.
Changes:
- kg_microbe/transform_utils/ontologies_stubs/ontologies_stubs_transform.py:
add `BTO` entry to `STUB_ONTOLOGY_SOURCES` (db_filename=bto.db,
knowledge_source=infores:bto). Class docstring updated.
- kg_microbe/transform_utils/bacdive/bacdive.py:2991-3007: extend the
inline-emit skip-list from `{"NCIT", "mesh"}` to
`{"NCIT", "mesh", "BTO"}` so BacDive defers BTO stub-node emission to
the new transform (avoids duplicate node rows). Code comment updated
to reflect the new partitioning.
- kg_microbe/utils/isolation_source_mapping_utils.py: STUB_ONTOLOGY_PREFIXES
docstring updated to document the new partitioning (NCIT/mesh/BTO
enriched via SemSQL; PRIDE/PCO/GENEPIO/FAO/SNOMED stay on the label-
only inline path).
- download.yaml: add `bto.db.gz` from s3.amazonaws.com/bbop-sqlite
(~30 MB, same distribution as the NCIT and mesh SemSQL DBs).
- merge.yaml / merge.no_metatraits.yaml / merge_bakta.yaml: add
`data/transformed/ontologies_stubs/bto_nodes.tsv` to the
ontologies_stubs source filename list in each variant.
- tests/test_ontologies_stubs.py: rename + update
`test_stub_ontology_sources_covers_ncit_and_mesh` →
`test_stub_ontology_sources_covers_ncit_mesh_bto`; assert the set
is now exactly `{"NCIT", "mesh", "BTO"}`.
Verified:
- `collect_stub_curies(['NCIT', 'mesh', 'BTO'])` finds 73 NCIT + 95
mesh + 2 BTO CURIEs from the committed mappings.
- 13 unit tests pass; integration test still skipped pending real
SemSQL DB download.
- ruff clean.
End-to-end (requires `poetry run kg download` to fetch the three DBs,
~400 MB total):
poetry run kg transform -s ontologies_stubs
# → data/transformed/ontologies_stubs/{ncit,mesh,bto}_nodes.tsv
poetry run pytest tests/test_ontologies_stubs.py -v
# integration test no longer skipped; asserts every collector-
# discovered CURIE has a corresponding stub-node row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent d2b03ea commit 8f044ac
8 files changed
Lines changed: 41 additions & 17 deletions
File tree
- kg_microbe
- transform_utils
- bacdive
- ontologies_stubs
- utils
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
442 | 442 | | |
443 | 443 | | |
444 | 444 | | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2988 | 2988 | | |
2989 | 2989 | | |
2990 | 2990 | | |
2991 | | - | |
| 2991 | + | |
2992 | 2992 | | |
2993 | 2993 | | |
2994 | 2994 | | |
2995 | 2995 | | |
2996 | 2996 | | |
2997 | 2997 | | |
2998 | | - | |
2999 | | - | |
3000 | | - | |
| 2998 | + | |
| 2999 | + | |
| 3000 | + | |
| 3001 | + | |
| 3002 | + | |
| 3003 | + | |
3001 | 3004 | | |
3002 | 3005 | | |
3003 | 3006 | | |
3004 | 3007 | | |
| 3008 | + | |
3005 | 3009 | | |
3006 | 3010 | | |
3007 | 3011 | | |
| |||
Lines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
71 | 79 | | |
72 | 80 | | |
73 | 81 | | |
74 | 82 | | |
75 | 83 | | |
76 | 84 | | |
77 | 85 | | |
78 | | - | |
| 86 | + | |
79 | 87 | | |
80 | 88 | | |
81 | 89 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
93 | | - | |
| 93 | + | |
94 | 94 | | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
100 | 100 | | |
101 | | - | |
| 101 | + | |
102 | 102 | | |
103 | | - | |
104 | | - | |
| 103 | + | |
| 104 | + | |
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
92 | | - | |
93 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
94 | 100 | | |
95 | 101 | | |
96 | 102 | | |
| |||
0 commit comments