diff --git a/CHANGELOG.md b/CHANGELOG.md index 10088f94d..0c84c4852 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,7 @@ ## PyNWB 4.0.0 (Upcoming) ### Documentation and tutorial enhancements +- Added a guide on choosing `entity_id` and `entity_uri` when adding HERD external resource references, recommending CURIEs registered with [bioregistry.io](https://bioregistry.io) (e.g. `NCBITaxon`, `ROR`, `ORCID`, `UBERON`, `MBA`, `HBA`, `DANDI`) and documenting how to handle resources whose terms have no individually resolvable URL. @bendichter - Added `app.readthedocs.org/projects/pynwb/*` to `linkcheck_ignore` to stop the Sphinx linkcheck CI job from intermittently failing when GitHub Actions runners get throttled by readthedocs. @h-mayorquin [#2191](https://github.com/NeurodataWithoutBorders/pynwb/pull/2191) - Added documentation for `ExternalImage` to the images tutorial. @h-mayorquin [#2159](https://github.com/NeurodataWithoutBorders/pynwb/pull/2159) - Fixed broken and redirecting links in documentation. @bendichter [#2165](https://github.com/NeurodataWithoutBorders/pynwb/pull/2165) diff --git a/docs/source/external_resources_entity_guide.rst b/docs/source/external_resources_entity_guide.rst new file mode 100644 index 000000000..8bb589633 --- /dev/null +++ b/docs/source/external_resources_entity_guide.rst @@ -0,0 +1,138 @@ +.. _external_resources_entity_guide: + +Choosing ``entity_id`` and ``entity_uri`` for external references +================================================================= + +When you annotate data with an external resource using +:py:meth:`HERD.add_ref `, each reference records two +fields that identify the external term: + +``entity_id`` + A compact identifier (a `CURIE `_) of the form + ``prefix:identifier`` (e.g. ``NCBITaxon:10090``). The ``prefix`` names the registry or + ontology and the ``identifier`` is the term's accession within it. + +``entity_uri`` + The full URL that the ``entity_id`` resolves to — a persistent, dereferenceable web + address for that exact term. + +Recommended practice +--------------------- + +#. **Use a CURIE for** ``entity_id``. Prefer an identifier whose ``prefix`` is registered with + `bioregistry.io `_. The Bioregistry is a comprehensive registry of + prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of + the many overlapping identifier schemes (e.g. ``NCBITaxon`` vs. ``taxonomy`` vs. + ``NCBI_TAXON``). + +#. **Use the resolved URL for** ``entity_uri``. The ``entity_uri`` should be the URL that the + CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry: + visiting ``https://bioregistry.io/`` (for example + ``https://bioregistry.io/NCBITaxon:10090``) redirects to the canonical provider URL, which is + the value to store in ``entity_uri``. + +Keeping ``entity_id`` and ``entity_uri`` consistent in this way means a reader can both +recognize the registry from the compact ``entity_id`` and dereference the ``entity_uri`` to land +on an authoritative description of the term. + +Commonly used registries +------------------------- + +All of the registries below are registered with the Bioregistry. The ``entity_uri`` column shows +the canonical URL the example ``entity_id`` resolves to. + +.. list-table:: + :header-rows: 1 + :widths: 10 16 22 20 32 + + * - Prefix + - Use for + - Common NWB field(s) + - Example ``entity_id`` + - Example ``entity_uri`` + * - ``NCBITaxon`` + - Species + - ``Subject.species`` + - ``NCBITaxon:10090`` + - ``http://purl.obolibrary.org/obo/NCBITaxon_10090`` + * - ``ROR`` + - Organizations / institutions + - ``NWBFile.institution`` + - ``ROR:013meh722`` + - ``https://ror.org/013meh722`` + * - ``ORCID`` + - People (researchers) + - ``NWBFile.experimenter`` + - ``ORCID:0000-0002-1825-0097`` + - ``https://orcid.org/0000-0002-1825-0097`` + * - ``UBERON`` + - Brain regions (cross-species) + - Brain-region location fields [#loc]_ + - ``UBERON:0001950`` + - ``http://purl.obolibrary.org/obo/UBERON_0001950`` + * - ``MBA`` + - Brain regions (Allen Mouse Brain Atlas) + - Brain-region location fields [#loc]_ + - ``MBA:385`` + - ``https://purl.brain-bican.org/ontology/mbao/MBA_385`` + * - ``HBA`` + - Brain regions (Allen Human Brain Atlas) + - Brain-region location fields [#loc]_ + - ``HBA:4005`` + - ``https://purl.brain-bican.org/ontology/hbao/HBA_4005`` + * - ``DANDI`` + - Dandisets + - (identifies the dataset as a whole) + - ``DANDI:000015`` + - ``https://dandiarchive.org/dandiset/000015`` + +.. [#loc] Brain-region annotations commonly apply to ``ElectrodeGroup.location``, + ``ImagingPlane.location``, and the ``location`` column of the ``electrodes`` table. + +Example +------- + +.. code-block:: python + + # the species of the subject, mapped to NCBI Taxonomy + herd.add_ref( + container=nwbfile.subject, + attribute="species", + key="Mus musculus", + entity_id="NCBITaxon:10090", + entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090", + ) + +Resources without individually resolvable URLs +---------------------------------------------- + +Some resources do not provide a dereferenceable URL for each individual term. For example, many +brain atlases (such as the macaque **D99** atlas) publish a single document or download for the +whole atlas rather than one persistent URL per region. + +In that case: + +* Put the **URL of the resource as a whole** in ``entity_uri`` (e.g. the atlas's landing or + download page). +* Put the resource's **identifier for the specific term** — for example, the brain area ID used + by the atlas — in ``entity_id``. + +This keeps every reference dereferenceable to *something* authoritative (the resource) while +still recording the precise term identifier, even when a per-term URL does not exist. + +.. code-block:: python + + # a region from an atlas that has no per-region URL: identify the region by its + # atlas-specific ID and point entity_uri at the atlas itself + herd.add_ref( + container=electrodes_table, + attribute="location", + key="area_42", + entity_id="42", + entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/", + ) + +.. seealso:: + + :py:class:`HERD ` for the full API, and + :py:meth:`HERD.add_ref ` for adding references. diff --git a/docs/source/index.rst b/docs/source/index.rst index bf00f8ed8..723905822 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -59,6 +59,7 @@ breaking down the barriers to data sharing in neuroscience. validation export + external_resources_entity_guide api_docs .. toctree::