|
90 | 90 | # |
91 | 91 | # Two stub-import paths exist for these prefixes: |
92 | 92 | # |
93 | | -# 1. NCIT and mesh: a SemSQL-backed enriched stub source. The |
| 93 | +# 1. NCIT, mesh, and BTO: a SemSQL-backed enriched stub source. The |
94 | 94 | # OntologiesStubsTransform (kg_microbe/transform_utils/ontologies_stubs/) |
95 | | -# queries data/raw/ncit.db and data/raw/mesh.db via OAK to fetch |
96 | | -# rdfs:label, exact synonyms, and dbxrefs for every NCIT/mesh CURIE that |
97 | | -# appears anywhere under mappings/. Output: |
98 | | -# data/transformed/ontologies_stubs/{ncit,mesh}_nodes.tsv. This is the |
99 | | -# preferred path — stubs carry full metadata, not just a label. The |
| 95 | +# queries data/raw/{ncit,mesh,bto}.db via OAK to fetch rdfs:label, exact |
| 96 | +# synonyms, and dbxrefs for every NCIT/mesh/BTO CURIE that appears |
| 97 | +# anywhere under mappings/. Output: |
| 98 | +# data/transformed/ontologies_stubs/{ncit,mesh,bto}_nodes.tsv. This is |
| 99 | +# the preferred path — stubs carry full metadata, not just a label. The |
100 | 100 | # BacDive inline emit at bacdive.py defers to this transform for these |
101 | | -# two prefixes (see the `not in {"NCIT", "mesh"}` branch there). |
| 101 | +# three prefixes (see the `not in {"NCIT", "mesh", "BTO"}` branch there). |
102 | 102 | # |
103 | | -# 2. The long-tail prefixes (PRIDE, PCO, GENEPIO, FAO, BTO, SNOMED): each |
104 | | -# has 1-3 IDs in the whole repo, so the BacDive transform writes a thin |
| 103 | +# 2. The long-tail prefixes (PRIDE, PCO, GENEPIO, FAO, SNOMED): each has |
| 104 | +# 1-3 IDs in the whole repo, so the BacDive transform writes a thin |
105 | 105 | # label-only node row inline at edge-emit time using the object_label |
106 | 106 | # from the mapping TSV. Setting up SemSQL DBs for these would be |
107 | 107 | # overkill. |
|
0 commit comments