cthoyt · cthoyt · Jan 6, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
diff --git a/_posts/2026-literature-and-kgs.md b/_posts/2026-literature-and-kgs.md
@@ -0,0 +1,71 @@
+---
+layout: post
+title: The Role of Literature in Constructing a Knowledge Graph
+date: 2026-02-10 11:13:00 +0100
+author: Charles Tapley Hoyt
+tags:
+  - SSSOM
+  - semantic mappings
+  - knowledge graphs
+---
+
+[PubMed](https://pubmed.ncbi.nlm.nih.gov) is an index of nearly 40 million
+
+[`pubmed-downloader`](https://github.com/cthoyt/pubmed-downloader)
+
+## Identifying Relevant Literature
+
+1. Search over pubmed
+2. Enrichment of citations
+
+every knowledge graph needs an aspect of literature enrichment. here's what
+happened
+
+```mermaid
+flowchart LR
+    articles[peer-reviewed articles] --> source
+    preprints[pre-prints] --> source
+    patents --> source
+    experts[expert text] --> source
+```
+
+1. Define some queries for relevant literature. In Catalaix, this is based on
+   finding papers authored by people in the consortium
+2. Enrich the retrieved literature based on both upstream and downstream
+   citations
+3. Curate papers as being relevant or not. In RAPTER and Bioregistry project, we
+   did this very successfully
+4. Run NER and other information extraction workflows on these papers in a
+   semi-automated curation look. This part is much more agile in the beginning
+   as the data model doesn't need to be set. Though I don't have a taste for it,
+   LLMs show potential for quickly constructing novel information extraction
+   pipelines, e.g., DRAGON-AI reference?, but in practice, I haven't yet seen
+   this be used successfully.
+
+The accelerating rate of publication of peer-reviewed papers, patents,
+(electronic) laboratory notebooks (e.g., Chemotion), repositories (e.g.,
+RADAR4Chem), and other expert-driven text creates challenges for catalaix
+consortium members in finding and understanding relevant publications. The CKG
+will aggregate and index all relevant publications and provide catalaix
+consortium members with access, e.g., through a Reaxys-like search interface for
+chemical names and structures. Such interfaces will uniquely leverage a
+combination of public and project-specific ontologies to contextualize searches,
+e.g., to find all zinc-containing catalysts of alcoholysis reactions on PET. The
+aggregation step will enrich publications with bibliometric metadata such as
+publication year, venue, authorships, and citations. The indexing step will
+implement information extraction workflows such as named entity recognition
+(NER), which can identify substrates, products, catalysts, reagents, chemical
+reactions, and other named entities appearing within the text, link them to
+appropriate ontology terms, and enable them to be queried through the CKG. On
+top of NER, relation extraction workflows can capture relationships between
+named entities appearing within the text, such as the classification of a
+chemical as a plasticizer or dye. Such workflows are semi-automated, i.e., have
+a fully automated initial step followed by a human-in-the-loop curation step to
+ensure high quality results. Importantly, such workflows will be connected to
+the already existing catalaix Wiki, democratizing the ability for domain experts
+within the consortium to contribute to the CKG simply by adding text to the
+Wiki.
+
+## Catalaix Use Case
+
+https://github.com/catalaix/catalaix-kg/pull/6
diff --git a/_posts/2026-oer-mappings.md b/_posts/2026-oer-mappings.md
@@ -0,0 +1,105 @@
+---
+layout: post
+title:
+  Mapping between Open Educational Resource Data Models and Related Ontologies
+date: 2025-11-07 10:14:00 +0200
+author: Charles Tapley Hoyt
+tags:
+  - open educational resources
+  - learning materials
+  - OERs
+  - SSSOM
+  - SSSOM Curator
+  - Biomappings
+  - semantic mappings
+---
+
+Interest in (open) educational resources (OERs) in the last twenty years has
+lead to a highly fragmented landscape of modeling efforts. This post is about
+establishing mappings and crosswalks between these disparate efforts using the
+[Simple Standard for Sharing Ontological Mappings (SSSOM)](https://mapping-commons.github.io/sssom)
+and [SSSOM Curator](https://github.com/cthoyt/sssom-curator).
+
+More concretely, most modeling efforts for (open) educational resources and
+learning materials involves developing a metadata model that captures key
+information such as the title, description, authors, language, disciple, and
+keywords as well as pedagogical metadata like the target audience, required
+proficiency level, and learning objectives. Notably, the Dublin Core Metadata
+Initiative's
+[Learning Resource Metadata Innovation (LMRI)](https://www.dublincore.org/specifications/lrmi)
+and
+[Educational Resource Discovery Index (ERuDIte)](https://www.pagestudy.org/erudite-training-resource-standard/)
+each produced their own OER metadata models, then later consolidated efforts
+with a third OER metadata model in Schema.org. The World Wide Web Consortium
+(W3C) established the
+[Open Educational Resources Schema Community Group](https://www.w3.org/community/oerschema/)
+which developed [OERSchema](https://github.com/open-curriculum/oerschema), but
+this metadata model did not see critical adoption, the working group shut down
+in 2023, and the repository is effectively inactive. There's also numerous
+partially overlapping isolated efforts (surprisingly, many from German groups) with
+heterogeneous reusability (e.g., many are published by not downloadable, many
+are poorly constructed).
+
+Here's a non-exhaustive list of metadata models that follow semantic web
+standards (see Semantic Farm collection [0000018](https://semantic.farm/collection/0000018)):
+
+| Prefix                                         | Name                                                    | Homepage                                                             |
+| ---------------------------------------------- | ------------------------------------------------------- | -------------------------------------------------------------------- |
+| [`educor`](https://semantic.farm/educor)       | Educational and Career-Oriented Recommendation Ontology | https://github.com/tibonto/educor                                    |
+| [`lrmi`](https://semantic.farm/lrmi)           | DCMI Learning Resource Metadata Innovation Terms        | https://www.dublincore.org/specifications/lrmi/lrmi_terms/2022-06-14 |
+| [`modalia`](https://semantic.farm/modalia)     | MoDALIA Ontology                                        | https://git.rwth-aachen.de/dalia/dalia-ontology                      |
+| [`oerschema`](https://semantic.farm/oerschema) | OER Schema                                              | https://github.com/open-curriculum/oerschema                         |
+| [`schema`](https://semantic.farm/schema)       | Schema.org                                              | https://schema.org                                                   |
+| [`vivo`](https://semantic.farm/vivo)           | VIVO Ontology                                           | https://github.com/vivo-ontologies/vivo-ontology                     |
+
+## TL;DR
+
+This post is about predicting mappings between ontologies, data models, and other
+semantic spaces relevant for open educational resources (OERs) and curating them.
+
+
+
+with [SSSOM Curator](https://github.com/cthoyt/sssom-curator),
+a generalization and re-implementation of [Biomappings](https://github.com/biopragmatics/biomappings), a
+semi-automated, human-in-the-loop mapping curations workflow that was originally domain-specific for life sciences.
+
+
+
+```console
+$ uv tool install sssom-curator[predict-lexical,exports,web]
+$ sssom-curator init
+$ sssom-curator predict lexical --all-by-all --force kim.hcrt schema vivo
+$ sssom-curator web
+```
+
+1. Surveying the semantic landscape
+2. Ingesting resources
+3. using lexical prediction workflow
+4. curation
+5. future: assess the amount of uncurated stuff (i.e., islands in the mapping
+   graph)
+
+## Survey Semantic Landscape
+
+## Education Levels
+
+| Prefix                                                           | Name                                                             | Homepage                                                                                                                      |
+| ---------------------------------------------------------------- | ---------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| [`ans.educationlevel`](https://semantic.farm/ans.educationlevel) | U.S. Education Level Vocabulary                                  | http://purl.org/ASN/scheme/ASNEducationLevel/                                                                                 |
+| [`isced1997`](https://semantic.farm/isced1997)                   | International Standard Classification of Education, 1997 Edition | https://ec.europa.eu/eurostat/statistics-explained/index.php?title=International_Standard_Classification_of_Education_(ISCED) |
+| [`isced2011`](https://semantic.farm/isced2011)                   | International Standard Classification of Education, 2011 Edition | https://ec.europa.eu/eurostat/statistics-explained/index.php?title=International_Standard_Classification_of_Education_(ISCED) |
+| [`isced2013`](https://semantic.farm/isced2013)                   | International Standard Classification of Education, 2013 Edition | https://ec.europa.eu/eurostat/statistics-explained/index.php?title=International_Standard_Classification_of_Education_(ISCED) |
+| [`kim.educationlevel`](https://semantic.farm/kim.educationlevel) | KIM Education Level                                              | https://github.com/dini-ag-kim/educationalLevel                                                                               |
+| [`kim.esv`](https://semantic.farm/kim.esv)                       | Educational Sectors Vocabulary                                   | https://github.com/dini-ag-kim/vocabs-edu                                                                                     |
+| [`kim.hcrt`](https://semantic.farm/kim.hcrt)                     | Higher Education Resource Types                                  | https://github.com/dini-ag-kim/hcrt                                                                                           |
+| [`oeh.educationlevel`](https://semantic.farm/oeh.educationlevel) | OpenEduHub Education Level                                       | https://github.com/openeduhub/oeh-metadata-vocabs                                                                             |
+
+## Subjects and Disciplines
+
+| Prefix                                                                                   | Name                                             | Homepage                                                  |
+| ---------------------------------------------------------------------------------------- | ------------------------------------------------ | --------------------------------------------------------- |
+| [`ccso`](https://semantic.farm/ccso)                                                     | Curriculum Course Syllabus Ontology              | https://github.com/Vkreations/CCSO                        |
+| [`kim.schulfaecher`](https://semantic.farm/kim.schulfaecher)                             | KIM School Subjects                              | https://github.com/dini-ag-kim/schulfaecher               |
+| [`kim.hochschulfaechersystematik`](https://semantic.farm/kim.hochschulfaechersystematik) | German University Subject Classification System  | https://github.com/dini-ag-kim/hochschulfaechersystematik |
+| [`adcad`](https://semantic.farm/adcad)                                                   | Arctic Data Center Academic Disciplines Ontology | https://github.com/NCEAS/adc-disciplines                  |
+| [`edam`](https://semantic.farm/edam)                                                     | EDAM Ontology                                    | https://github.com/edamontology/edamontology              |
diff --git a/_posts/2026-semantic-farm-for-nfdi.md b/_posts/2026-semantic-farm-for-nfdi.md
@@ -0,0 +1,80 @@
+---
+layout: post
+title: Semantic Farm for NFDI
+date: 2025-11-05 8:32:00 +0100
+author: Charles Tapley Hoyt
+tags:
+  - open educational resources
+  - learning materials
+  - OERs
+  - SSSOM
+  - SSSOM Curator
+  - Biomappings
+---
+
+The [Semantic Farm (https://semantic.farm)](https://semantic.farm) is a data interoperability platform
+that indexes ontologies, databases, and other resources that assign (persistent)
+identifiers.
+
+By collecting metadata about such resources, the Semantic Farm supports
+researchers to find the appropriate (persistent) identifier schema to annotate
+their (meta)data to be more FAIR (findable, accessible, interoperable,
+reusable).
+
+The NFDI
+[Section (Meta)data, Terminologies, Provenance](https://www.nfdi.de/section-meta/?lang=en)
+proposes the Semantic Farm as a
+[Basic Service for NFDI (Base4NFDI)](https://base4nfdi.de)
+
+Why should it be a Base4NFDI service?
+
+Who are the stakeholders?
+
+1. NFDI Sections
+   - [Section Common Infrastructure](https://www.nfdi.de/section-infra/?lang=en)
+     - Data Integration
+     - Data Management Planning
+     - Data Science and Artificial Intelligence
+     - Electronic Lab Notebooks
+     - Persistent Identifiers (PID)
+   - Section Metadata's charter said to do a survey of consortia ontology
+     usage - this is a place to concretize it and make actionable
+   - Section EduTrain uses Semantic Farm in the DALIA project to make OERs
+     citable
+2. NFDI Consortia
+   - Chemistry and Cat did pilot where they consolidated all the ontologies they
+     use. This helps them communicate to all scientist in the consortia
+   - Culture demonstrated w/ my blog post
+   - Need to reach out to other sections...
+3. Base4NFDI
+   - TS4NFDI technologies use Semantic Farm in the core already (e.g., ontology
+     lookup service) in their implementation for supporting cross-references.
+     There are also several ideas for incubators to more tightly integrate
+     Semantic Farm into TS4NFDI to better support TS4NFDI users. Semantic Farm
+   - DMP4NFDI can use Semantic Farm to support writing better data management
+     plans by 1. helping find appropriate ontologies, controlled vocabularies,
+     and other resources that mint semantic spaces to annotate data in a FAIR
+     way and 2. educating writers to understand some practical aspects of
+     semantics
+   - PID4NFDI - need to show how it's complementary and how it's different
+   - KGI4NFDI
+4. NFDI Central
+   - Reporting on semantic spaces produced by consortia, which includes both
+     ontologies and databases. By construction, anything in Semantic Farm has
+     taken a significant step towards FAIR by documenting its accessibility,
+     improving its findability, and implicitly by making info necessary for
+     interoperability
+
+Difference from previous base4nfdi proposals:
+
+1. Semantic Farm already exists, is already running, and is already being
+   demonstrated.
+2. We started with Bioregistry, and the idea is to support whole NFDI
+3. Doesn't need a ton of funding to continue, already has a detailed governacne
+   strucutre to support community maintenance which leads to sustainability and
+   longevity
+4. Has international partners outside NFDI / europe that are invested in it.
+
+Complementary tools in NFDI
+
+- comparison to BARTOC
diff --git a/_posts/2026/2026-03-16-semantic-mapping-sources.md b/_posts/2026/2026-03-16-semantic-mapping-sources.md
@@ -0,0 +1,72 @@
+---
+layout: post
+title: Where do Semantic Mappings Come From?
+date: 2026-01-20 11:42:00 +0100
+author: Charles Tapley Hoyt
+tags:
+  - SSSOM
+  - semantic mappings
+  - knowledge graphs
+---
+
+The first challenge with semantic mappings is the variety of forms they can
+take. This both includes different data models and serializations of those
+models. This problem is effectively solved, but I think is worth reviewing for
+historical purposes (please let me know if I missed something):
+
+<img src="https://forge.extranet.logilab.fr/uploads/-/system/project/avatar/107/external-content.duckduckgo.com.jpeg" align="left" style="max-height: 3em;" alt="SKOS logo"/>
+[Simple Knowledge Organization System (SKOS)](https://www.w3.org/TR/skos-reference)
+is a data model for RDF to represent controlled vocabularies, taxonomies,
+dictionaries, thesauri, and other semantic artifacts. It defines several
+semantic mapping predicates including for broad matches, narrow matches, close
+matches, related matches, and exact matches.
+
+[JSKOS (JSON for Knowledge Organization Systems)](https://gbv.github.io/jskos/#mapping),
+a JSON-based extension of the SKOS data model. I recently wrote a post about
+converting between [SSSOM and JSKOS]({% post_url 2026-01-15-sssom-to-jskos %}).
+
+<img src="https://www.jean-delahousse.net/wp-content/uploads/2020/09/Owl_logo-258x300.png"  align="left" style="max-height: 3em; margin-right: 0.5em;" alt="OWL logo">
+[Web Ontology Language (OWL)](https://www.w3.org/TR/owl2-syntax/) is primarily
+used for ontologies. It has first-class language support for encoding
+equivalences between classes, properties, or individuals. Other semantic
+mappings can be encoded as annotation properties on classes, properties, or
+individuals, e.g., using SKOS predicates.
+
+<img src="https://obofoundry.org/images/foundrylogo.png"  align="left" style="max-height: 3em; margin-right: 0.5em;" alt="OBO logo">
+The
+[OBO Flat File Format](https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html)
+is a simplified version of OWL with macros most useful for curating biomedical
+ontologies. It has the same abilities as OWL, but also the `xref` macro which
+corresponds to `oboInOwl:hasDbXref` relations, which are by nature imprecise and
+therefore used in a variety of ways.
+
+<img src="https://avatars.githubusercontent.com/u/77892844?v=4" align="left" style="max-height: 3em; margin-right: 0.5em;" alt="SSSOM logo">
+The
+[Simple Standard for Sharing Ontological Mappings (SSSOM)](https://mapping-commons.github.io/sssom/)
+is a fit-for-purpose format for semantic mappings between classes, properties,
+or individuals. SSSOM guides curators towards inputting key metadata that are
+typically missing from other formalisms and is gaining wider community adoption.
+Importantly, SSSOM integrates into ontology curation workflows, especially for
+[Ontology Development Kit (ODK)](https://incatools.github.io/ontology-development-kit)
+users.
+
+The
+[Expressive and Declarative Ontology Alignment Language (EDOAL)](https://moex.gitlabpages.inria.fr/alignapi/edoal.html)
+lives in a similar space to SSSOM, but IMO was much less approachable (c.f.
+XML + Java), and has not seen a lot of traction in the biomedical space.
+
+<img src="https://ontoportal.org/images/logo.png" align="left" style="max-height: 3em; margin-right: 0.5em;" alt="OntoPortal logo"/>
+[OntoPortal](https://ontoportal.org/) has its own data model for semantic
+mappings that has low metadata precision. I recently wrote a post on converting
+[OntoPortal to SSSOM]({% post_url 2025-11-23-sssom-from-bioportal %}). OntoPortal would also like
+to invest more in SSSOM infrastructure if it can organize funding and human resources.
+
+<img src="https://upload.wikimedia.org/wikipedia/commons/6/66/Wikidata-logo-en.svg" align="left" style="max-height: 3em" alt="Wikidata logo">
+[Wikidata](https://www.wikidata.org) has its own data model for semantic
+mappings that include higher precision metadata. I recently wrote a post on
+mapping between the data models from [SSSOM and
+Wikidata]({% post_url 2026-01-07-sssom-to-wikidata %}).
+
+Finally, there's a long tail of mappings that live in poorly annotated CSV, TSV,
+Excel, and other formats. Similarly, mappings can live in plain RDF files, e.g.,
+encoded with SKOS predicates, but without high precision metadata.