Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## PyNWB 4.0.0 (Upcoming)

### Documentation and tutorial enhancements
- Added a guide on choosing `entity_id` and `entity_uri` when adding HERD external resource references, recommending CURIEs registered with [bioregistry.io](https://bioregistry.io) (e.g. `NCBITaxon`, `ROR`, `ORCID`, `UBERON`, `MBA`, `HBA`, `DANDI`) and documenting how to handle resources whose terms have no individually resolvable URL. @bendichter
- Added `app.readthedocs.org/projects/pynwb/*` to `linkcheck_ignore` to stop the Sphinx linkcheck CI job from intermittently failing when GitHub Actions runners get throttled by readthedocs. @h-mayorquin [#2191](https://github.com/NeurodataWithoutBorders/pynwb/pull/2191)
- Added documentation for `ExternalImage` to the images tutorial. @h-mayorquin [#2159](https://github.com/NeurodataWithoutBorders/pynwb/pull/2159)
- Fixed broken and redirecting links in documentation. @bendichter [#2165](https://github.com/NeurodataWithoutBorders/pynwb/pull/2165)
Expand Down
138 changes: 138 additions & 0 deletions docs/source/external_resources_entity_guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
.. _external_resources_entity_guide:

Choosing ``entity_id`` and ``entity_uri`` for external references
=================================================================

When you annotate data with an external resource using
:py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>`, each reference records two
fields that identify the external term:
Comment thread
oruebel marked this conversation as resolved.

``entity_id``
A compact identifier (a `CURIE <https://www.w3.org/TR/curie/>`_) of the form
``prefix:identifier`` (e.g. ``NCBITaxon:10090``). The ``prefix`` names the registry or
ontology and the ``identifier`` is the term's accession within it.

``entity_uri``
The full URL that the ``entity_id`` resolves to — a persistent, dereferenceable web
address for that exact term.

Recommended practice
---------------------

#. **Use a CURIE for** ``entity_id``. Prefer an identifier whose ``prefix`` is registered with
`bioregistry.io <https://bioregistry.io>`_. The Bioregistry is a comprehensive registry of
prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of
the many overlapping identifier schemes (e.g. ``NCBITaxon`` vs. ``taxonomy`` vs.
``NCBI_TAXON``).

#. **Use the resolved URL for** ``entity_uri``. The ``entity_uri`` should be the URL that the
CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry:
visiting ``https://bioregistry.io/<entity_id>`` (for example
``https://bioregistry.io/NCBITaxon:10090``) redirects to the canonical provider URL, which is
the value to store in ``entity_uri``.

Keeping ``entity_id`` and ``entity_uri`` consistent in this way means a reader can both
recognize the registry from the compact ``entity_id`` and dereference the ``entity_uri`` to land
on an authoritative description of the term.

Commonly used registries
-------------------------

All of the registries below are registered with the Bioregistry. The ``entity_uri`` column shows
the canonical URL the example ``entity_id`` resolves to.

.. list-table::
Comment thread
oruebel marked this conversation as resolved.
:header-rows: 1
:widths: 10 16 22 20 32

* - Prefix
- Use for
- Common NWB field(s)
- Example ``entity_id``
- Example ``entity_uri``
* - ``NCBITaxon``
- Species
- ``Subject.species``
- ``NCBITaxon:10090``
- ``http://purl.obolibrary.org/obo/NCBITaxon_10090``
* - ``ROR``
- Organizations / institutions
- ``NWBFile.institution``
- ``ROR:013meh722``
- ``https://ror.org/013meh722``
* - ``ORCID``
- People (researchers)
- ``NWBFile.experimenter``
- ``ORCID:0000-0002-1825-0097``
- ``https://orcid.org/0000-0002-1825-0097``
* - ``UBERON``
- Brain regions (cross-species)
- Brain-region location fields [#loc]_
- ``UBERON:0001950``
- ``http://purl.obolibrary.org/obo/UBERON_0001950``
* - ``MBA``
- Brain regions (Allen Mouse Brain Atlas)
- Brain-region location fields [#loc]_
- ``MBA:385``
- ``https://purl.brain-bican.org/ontology/mbao/MBA_385``
* - ``HBA``
- Brain regions (Allen Human Brain Atlas)
- Brain-region location fields [#loc]_
- ``HBA:4005``
- ``https://purl.brain-bican.org/ontology/hbao/HBA_4005``
* - ``DANDI``
- Dandisets
- (identifies the dataset as a whole)
- ``DANDI:000015``
- ``https://dandiarchive.org/dandiset/000015``

.. [#loc] Brain-region annotations commonly apply to ``ElectrodeGroup.location``,
``ImagingPlane.location``, and the ``location`` column of the ``electrodes`` table.

Example
-------

.. code-block:: python

# the species of the subject, mapped to NCBI Taxonomy
herd.add_ref(
container=nwbfile.subject,
attribute="species",
key="Mus musculus",
entity_id="NCBITaxon:10090",
entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090",
)

Resources without individually resolvable URLs
----------------------------------------------

Some resources do not provide a dereferenceable URL for each individual term. For example, many
brain atlases (such as the macaque **D99** atlas) publish a single document or download for the
whole atlas rather than one persistent URL per region.

In that case:

* Put the **URL of the resource as a whole** in ``entity_uri`` (e.g. the atlas's landing or
download page).
* Put the resource's **identifier for the specific term** — for example, the brain area ID used
by the atlas — in ``entity_id``.

This keeps every reference dereferenceable to *something* authoritative (the resource) while
still recording the precise term identifier, even when a per-term URL does not exist.

.. code-block:: python

# a region from an atlas that has no per-region URL: identify the region by its
# atlas-specific ID and point entity_uri at the atlas itself
herd.add_ref(
container=electrodes_table,
attribute="location",
key="area_42",
entity_id="42",
entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/",
)

.. seealso::

:py:class:`HERD <hdmf.common.resources.HERD>` for the full API, and
:py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>` for adding references.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ breaking down the barriers to data sharing in neuroscience.

validation
export
external_resources_entity_guide
api_docs

.. toctree::
Expand Down
Loading