From dd00cfa9d91b575e02897a47aa0655137d0f0853 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Tue, 23 Jun 2026 16:16:44 -0400 Subject: [PATCH 1/3] Add guide on choosing entity_id and entity_uri for HERD references Add a documentation page explaining how to populate the entity_id and entity_uri fields when adding HERD external resource references: - entity_id should be a CURIE (prefix:identifier) whose prefix is registered with bioregistry.io, which maps it to a canonical resolvable URL and avoids ambiguity between overlapping identifier schemes. - entity_uri should be the URL the CURIE resolves to (lookupable via https://bioregistry.io/). - Includes a table of commonly used registries (NCBITaxon, ROR, ORCID, UBERON, MBA, HBA, DANDI) with example entity_id/entity_uri pairs. - Documents the fallback for resources without per-term URLs (e.g. the D99 macaque atlas): put the resource URL in entity_uri and the term's atlas-specific ID in entity_id. Adds the page to the Resources toctree. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + .../external_resources_entity_guide.rst | 127 ++++++++++++++++++ docs/source/index.rst | 1 + 3 files changed, 129 insertions(+) create mode 100644 docs/source/external_resources_entity_guide.rst diff --git a/CHANGELOG.md b/CHANGELOG.md index 10088f94d..0c84c4852 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,7 @@ ## PyNWB 4.0.0 (Upcoming) ### Documentation and tutorial enhancements +- Added a guide on choosing `entity_id` and `entity_uri` when adding HERD external resource references, recommending CURIEs registered with [bioregistry.io](https://bioregistry.io) (e.g. `NCBITaxon`, `ROR`, `ORCID`, `UBERON`, `MBA`, `HBA`, `DANDI`) and documenting how to handle resources whose terms have no individually resolvable URL. @bendichter - Added `app.readthedocs.org/projects/pynwb/*` to `linkcheck_ignore` to stop the Sphinx linkcheck CI job from intermittently failing when GitHub Actions runners get throttled by readthedocs. @h-mayorquin [#2191](https://github.com/NeurodataWithoutBorders/pynwb/pull/2191) - Added documentation for `ExternalImage` to the images tutorial. @h-mayorquin [#2159](https://github.com/NeurodataWithoutBorders/pynwb/pull/2159) - Fixed broken and redirecting links in documentation. @bendichter [#2165](https://github.com/NeurodataWithoutBorders/pynwb/pull/2165) diff --git a/docs/source/external_resources_entity_guide.rst b/docs/source/external_resources_entity_guide.rst new file mode 100644 index 000000000..1584498df --- /dev/null +++ b/docs/source/external_resources_entity_guide.rst @@ -0,0 +1,127 @@ +.. _external_resources_entity_guide: + +Choosing ``entity_id`` and ``entity_uri`` for external references +================================================================= + +When you annotate data with an external resource using +:py:meth:`HERD.add_ref `, each reference records two +fields that identify the external term: + +``entity_id`` + A compact identifier (a `CURIE `_) of the form + ``prefix:identifier`` (e.g. ``NCBITaxon:10090``). The ``prefix`` names the registry or + ontology and the ``identifier`` is the term's accession within it. + +``entity_uri`` + The full URL that the ``entity_id`` resolves to — a persistent, dereferenceable web + address for that exact term. + +Recommended practice +--------------------- + +#. **Use a CURIE for** ``entity_id``. Prefer an identifier whose ``prefix`` is registered with + `bioregistry.io `_. The Bioregistry is a comprehensive registry of + prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of + the many overlapping identifier schemes (e.g. ``NCBITaxon`` vs. ``taxonomy`` vs. + ``NCBI_TAXON``). + +#. **Use the resolved URL for** ``entity_uri``. The ``entity_uri`` should be the URL that the + CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry: + visiting ``https://bioregistry.io/`` (for example + ``https://bioregistry.io/NCBITaxon:10090``) redirects to the canonical provider URL, which is + the value to store in ``entity_uri``. + +Keeping ``entity_id`` and ``entity_uri`` consistent in this way means a reader can both +recognize the registry from the compact ``entity_id`` and dereference the ``entity_uri`` to land +on an authoritative description of the term. + +Commonly used registries +------------------------- + +All of the registries below are registered with the Bioregistry. The ``entity_uri`` column shows +the canonical URL the example ``entity_id`` resolves to. + +.. list-table:: + :header-rows: 1 + :widths: 12 26 26 36 + + * - Prefix + - Use for + - Example ``entity_id`` + - Example ``entity_uri`` + * - ``NCBITaxon`` + - Species + - ``NCBITaxon:10090`` + - ``http://purl.obolibrary.org/obo/NCBITaxon_10090`` + * - ``ROR`` + - Organizations / institutions + - ``ROR:013meh722`` + - ``https://ror.org/013meh722`` + * - ``ORCID`` + - People (researchers) + - ``ORCID:0000-0002-1825-0097`` + - ``https://orcid.org/0000-0002-1825-0097`` + * - ``UBERON`` + - Brain regions (cross-species) + - ``UBERON:0001950`` + - ``http://purl.obolibrary.org/obo/UBERON_0001950`` + * - ``MBA`` + - Brain regions (Allen Mouse Brain Atlas) + - ``MBA:385`` + - ``https://purl.brain-bican.org/ontology/mbao/MBA_385`` + * - ``HBA`` + - Brain regions (Allen Human Brain Atlas) + - ``HBA:4005`` + - ``https://purl.brain-bican.org/ontology/hbao/HBA_4005`` + * - ``DANDI`` + - Dandisets + - ``DANDI:000015`` + - ``https://dandiarchive.org/dandiset/000015`` + +Example +------- + +.. code-block:: python + + # the species of the subject, mapped to NCBI Taxonomy + herd.add_ref( + container=nwbfile.subject, + attribute="species", + key="Mus musculus", + entity_id="NCBITaxon:10090", + entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090", + ) + +Resources without individually resolvable URLs +---------------------------------------------- + +Some resources do not provide a dereferenceable URL for each individual term. For example, many +brain atlases (such as the macaque **D99** atlas) publish a single document or download for the +whole atlas rather than one persistent URL per region. + +In that case: + +* Put the **URL of the resource as a whole** in ``entity_uri`` (e.g. the atlas's landing or + download page). +* Put the resource's **identifier for the specific term** — for example, the brain area ID used + by the atlas — in ``entity_id``. + +This keeps every reference dereferenceable to *something* authoritative (the resource) while +still recording the precise term identifier, even when a per-term URL does not exist. + +.. code-block:: python + + # a region from an atlas that has no per-region URL: identify the region by its + # atlas-specific ID and point entity_uri at the atlas itself + herd.add_ref( + container=electrodes_table, + attribute="location", + key="area_42", + entity_id="42", + entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/", + ) + +.. seealso:: + + :py:class:`HERD ` for the full API, and + :py:meth:`HERD.add_ref ` for adding references. diff --git a/docs/source/index.rst b/docs/source/index.rst index bf00f8ed8..723905822 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -59,6 +59,7 @@ breaking down the barriers to data sharing in neuroscience. validation export + external_resources_entity_guide api_docs .. toctree:: From 5f2020360c75f62e6cb4c2af647724539d2e11e5 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Tue, 23 Jun 2026 17:06:11 -0400 Subject: [PATCH 2/3] Add "Common NWB field(s)" column to the registry table Per review feedback, map each general concept to the NWB fields it commonly annotates (e.g. species -> Subject.species, people -> NWBFile.experimenter, brain regions -> ElectrodeGroup.location / ImagingPlane.location), so users can connect a concept to where it appears in an NWB file. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/source/external_resources_entity_guide.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/source/external_resources_entity_guide.rst b/docs/source/external_resources_entity_guide.rst index 1584498df..57f71561a 100644 --- a/docs/source/external_resources_entity_guide.rst +++ b/docs/source/external_resources_entity_guide.rst @@ -43,38 +43,46 @@ the canonical URL the example ``entity_id`` resolves to. .. list-table:: :header-rows: 1 - :widths: 12 26 26 36 + :widths: 10 16 22 20 32 * - Prefix - Use for + - Common NWB field(s) - Example ``entity_id`` - Example ``entity_uri`` * - ``NCBITaxon`` - Species + - ``Subject.species`` - ``NCBITaxon:10090`` - ``http://purl.obolibrary.org/obo/NCBITaxon_10090`` * - ``ROR`` - Organizations / institutions + - ``NWBFile.institution`` - ``ROR:013meh722`` - ``https://ror.org/013meh722`` * - ``ORCID`` - People (researchers) + - ``NWBFile.experimenter`` - ``ORCID:0000-0002-1825-0097`` - ``https://orcid.org/0000-0002-1825-0097`` * - ``UBERON`` - Brain regions (cross-species) + - ``ElectrodeGroup.location``, ``ImagingPlane.location``, ``electrodes`` ``location`` column - ``UBERON:0001950`` - ``http://purl.obolibrary.org/obo/UBERON_0001950`` * - ``MBA`` - Brain regions (Allen Mouse Brain Atlas) + - ``ElectrodeGroup.location``, ``ImagingPlane.location``, ``electrodes`` ``location`` column - ``MBA:385`` - ``https://purl.brain-bican.org/ontology/mbao/MBA_385`` * - ``HBA`` - Brain regions (Allen Human Brain Atlas) + - ``ElectrodeGroup.location``, ``ImagingPlane.location``, ``electrodes`` ``location`` column - ``HBA:4005`` - ``https://purl.brain-bican.org/ontology/hbao/HBA_4005`` * - ``DANDI`` - Dandisets + - (identifies the dataset as a whole) - ``DANDI:000015`` - ``https://dandiarchive.org/dandiset/000015`` From 318690661bbb4aa6bc52b44663c4dcb0d9767898 Mon Sep 17 00:00:00 2001 From: Ben Dichter Date: Tue, 23 Jun 2026 17:25:02 -0400 Subject: [PATCH 3/3] Single-source brain-region fields via a footnote The UBERON, MBA, and HBA rows all mapped to the same location fields. Replace the repeated list with a footnote reference so the field list (ElectrodeGroup.location, ImagingPlane.location, electrodes location column) is written once. list-table cannot span cells, so a shared footnote is the cleanest way to avoid the repetition. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/source/external_resources_entity_guide.rst | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/source/external_resources_entity_guide.rst b/docs/source/external_resources_entity_guide.rst index 57f71561a..8bb589633 100644 --- a/docs/source/external_resources_entity_guide.rst +++ b/docs/source/external_resources_entity_guide.rst @@ -67,17 +67,17 @@ the canonical URL the example ``entity_id`` resolves to. - ``https://orcid.org/0000-0002-1825-0097`` * - ``UBERON`` - Brain regions (cross-species) - - ``ElectrodeGroup.location``, ``ImagingPlane.location``, ``electrodes`` ``location`` column + - Brain-region location fields [#loc]_ - ``UBERON:0001950`` - ``http://purl.obolibrary.org/obo/UBERON_0001950`` * - ``MBA`` - Brain regions (Allen Mouse Brain Atlas) - - ``ElectrodeGroup.location``, ``ImagingPlane.location``, ``electrodes`` ``location`` column + - Brain-region location fields [#loc]_ - ``MBA:385`` - ``https://purl.brain-bican.org/ontology/mbao/MBA_385`` * - ``HBA`` - Brain regions (Allen Human Brain Atlas) - - ``ElectrodeGroup.location``, ``ImagingPlane.location``, ``electrodes`` ``location`` column + - Brain-region location fields [#loc]_ - ``HBA:4005`` - ``https://purl.brain-bican.org/ontology/hbao/HBA_4005`` * - ``DANDI`` @@ -86,6 +86,9 @@ the canonical URL the example ``entity_id`` resolves to. - ``DANDI:000015`` - ``https://dandiarchive.org/dandiset/000015`` +.. [#loc] Brain-region annotations commonly apply to ``ElectrodeGroup.location``, + ``ImagingPlane.location``, and the ``location`` column of the ``electrodes`` table. + Example -------