Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions docs/source/external_resources_entity_guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
.. _external_resources_entity_guide:

Using Ontologies and Identifiers with NWB
=========================================

Neurophysiology data is full of terms that mean something specific outside of your file: the
species of a subject, the institution that collected the data, the researchers who ran the
experiment, or the brain region a probe was implanted in. Writing these as free text (``"mouse"``,
``"the Allen Institute"``, ``"V1"``) is easy to do but hard to compute on — different files spell
the same thing different ways, and a reader has no authoritative reference for what exactly was
meant.

**External resources** solve this by linking a term in your file to a standardized entry in an
external **ontology**, **registry**, or **atlas** — for example linking the species
``"Mus musculus"`` to its entry in the NCBI Taxonomy. This makes your annotations unambiguous,
machine-readable, and interoperable: tools can group, search, and compare data across files and
labs because everyone points at the same canonical identifier.

In NWB, these links are stored using HDMF's **HERD** (HDMF External Resources Data) structure,
which records, for each annotation, the term as it appears in your file together with a compact
identifier (``entity_id``) and a resolvable URL (``entity_uri``) for the external entry.

How to add external resources to an NWB file
--------------------------------------------

There are two complementary ways to connect NWB data to external terms, both provided by HDMF:

* **HERD** lets you attach references to existing values in a file — recording that a given
attribute or column value corresponds to a specific external term. See the
:hdmf-docs:`HERD tutorial <tutorials/plot_external_resources.html>` for a walkthrough of
:py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>`.
* **TermSet** lets you validate values *as you write them*, constraining a field to terms drawn
from a chosen ontology. See the :hdmf-docs:`TermSet tutorial <tutorials/plot_term_set.html>`,
and the PyNWB
:pynwb-docs:`How to Configure Term Validations <tutorials/general/plot_configurator.html>`
tutorial for configuring term validation across a file.

The rest of this page covers a question that comes up with both approaches: once you have picked
an external term, what exactly should go in the ``entity_id`` and ``entity_uri`` fields?

Choosing ``entity_id`` and ``entity_uri``
-----------------------------------------

When you annotate data with an external resource using
:py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>`, each reference records two
fields that identify the external term:

``entity_id``
A compact identifier (a `CURIE <https://www.w3.org/TR/curie/>`_) of the form
``prefix:identifier`` (e.g. ``NCBITaxon:10090``). The ``prefix`` names the registry or
ontology and the ``identifier`` is the term's accession within it.

``entity_uri``
The full URL that the ``entity_id`` resolves to — a persistent, dereferenceable web
address for that exact term.

Recommended practice
^^^^^^^^^^^^^^^^^^^^^

#. **Use a CURIE for** ``entity_id``. Prefer an identifier whose ``prefix`` is registered with
`bioregistry.io <https://bioregistry.io>`_. The Bioregistry is a comprehensive registry of
prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of
the many overlapping identifier schemes (e.g. ``NCBITaxon`` vs. ``taxonomy`` vs.
``NCBI_TAXON``).

#. **Use the resolved URL for** ``entity_uri``. The ``entity_uri`` should be the URL that the
CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry:
visiting ``https://bioregistry.io/<entity_id>`` (for example
``https://bioregistry.io/NCBITaxon:10090``) redirects to the canonical provider URL, which is
the value to store in ``entity_uri``.

Keeping ``entity_id`` and ``entity_uri`` consistent in this way means a reader can both
recognize the registry from the compact ``entity_id`` and dereference the ``entity_uri`` to land
on an authoritative description of the term.

Commonly used registries
^^^^^^^^^^^^^^^^^^^^^^^^^

All of the registries below are registered with the Bioregistry. The ``entity_uri`` column shows
the canonical URL the example ``entity_id`` resolves to.

.. list-table::
:header-rows: 1
:widths: 10 16 22 20 32

* - Prefix
- Use for
- Common NWB field(s)
- Example ``entity_id``
- Example ``entity_uri``
* - ``NCBITaxon``
- Species
- ``Subject.species``
- ``NCBITaxon:10090``
- ``http://purl.obolibrary.org/obo/NCBITaxon_10090``
* - ``ROR``
- Organizations / institutions
- ``NWBFile.institution``
- ``ROR:013meh722``
- ``https://ror.org/013meh722``
* - ``ORCID``
- People (researchers)
- ``NWBFile.experimenter``
- ``ORCID:0000-0002-1825-0097``
- ``https://orcid.org/0000-0002-1825-0097``
* - ``UBERON``
- Brain regions (cross-species)
- Brain-region location fields [#loc]_
- ``UBERON:0001950``
- ``http://purl.obolibrary.org/obo/UBERON_0001950``
* - ``MBA``
- Brain regions (Allen Mouse Brain Atlas)
- Brain-region location fields [#loc]_
- ``MBA:385``
- ``https://purl.brain-bican.org/ontology/mbao/MBA_385``
* - ``HBA``
- Brain regions (Allen Human Brain Atlas)
- Brain-region location fields [#loc]_
- ``HBA:4005``
- ``https://purl.brain-bican.org/ontology/hbao/HBA_4005``
* - ``DANDI``
- Dandisets
- (identifies the dataset as a whole)
- ``DANDI:000015``
- ``https://dandiarchive.org/dandiset/000015``

.. [#loc] Brain-region annotations commonly apply to ``ElectrodeGroup.location``,
``ImagingPlane.location``, and the ``location`` column of the ``electrodes`` table.

Example
^^^^^^^

.. code-block:: python

# the species of the subject, mapped to NCBI Taxonomy
herd.add_ref(
container=nwbfile.subject,
attribute="species",
key="Mus musculus",
entity_id="NCBITaxon:10090",
entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090",
)

Resources without individually resolvable URLs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some resources do not provide a dereferenceable URL for each individual term. For example, many
brain atlases (such as the macaque **D99** atlas) publish a single document or download for the
whole atlas rather than one persistent URL per region.

In that case:

* Put the **URL of the resource as a whole** in ``entity_uri`` (e.g. the atlas's landing or
download page).
* Put the resource's **identifier for the specific term** — for example, the brain area ID used
by the atlas — in ``entity_id``.

This keeps every reference dereferenceable to *something* authoritative (the resource) while
still recording the precise term identifier, even when a per-term URL does not exist.

.. code-block:: python

# a region from an atlas that has no per-region URL: identify the region by its
# atlas-specific ID and point entity_uri at the atlas itself
herd.add_ref(
container=electrodes_table,
attribute="location",
key="area_42",
entity_id="42",
entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This URI doesn't resolve

Suggested change
entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/",
entity_uri="https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/nonhuman/macaque_tempatl/atlas_d99v2.html",

)

.. seealso::

:py:class:`HERD <hdmf.common.resources.HERD>` for the full API, and
:py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>` for adding references.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ for each of those tasks and point you to the best tools to use for your preferre
conversion_tutorial/user_guide
file_read/file_read
extensions_tutorial/extensions_tutorial_home
external_resources_entity_guide
core_tools/core_tools_home

.. toctree::
Expand Down