-
Notifications
You must be signed in to change notification settings - Fork 94
Add guide on choosing entity_id and entity_uri for HERD references #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| .. _external_resources_entity_guide: | ||
|
|
||
| Choosing ``entity_id`` and ``entity_uri`` for external references | ||
| ================================================================= | ||
|
|
||
| When you annotate data with an external resource using | ||
| :py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>`, each reference records two | ||
| fields that identify the external term: | ||
|
|
||
| ``entity_id`` | ||
| A compact identifier (a `CURIE <https://www.w3.org/TR/curie/>`_) of the form | ||
| ``prefix:identifier`` (e.g. ``NCBITaxon:10090``). The ``prefix`` names the registry or | ||
| ontology and the ``identifier`` is the term's accession within it. | ||
|
|
||
| ``entity_uri`` | ||
| The full URL that the ``entity_id`` resolves to — a persistent, dereferenceable web | ||
| address for that exact term. | ||
|
|
||
| Recommended practice | ||
| --------------------- | ||
|
|
||
| #. **Use a CURIE for** ``entity_id``. Prefer an identifier whose ``prefix`` is registered with | ||
| `bioregistry.io <https://bioregistry.io>`_. The Bioregistry is a comprehensive registry of | ||
| prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of | ||
| the many overlapping identifier schemes (e.g. ``NCBITaxon`` vs. ``taxonomy`` vs. | ||
| ``NCBI_TAXON``). | ||
|
|
||
| #. **Use the resolved URL for** ``entity_uri``. The ``entity_uri`` should be the URL that the | ||
| CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry: | ||
| visiting ``https://bioregistry.io/<entity_id>`` (for example | ||
| ``https://bioregistry.io/NCBITaxon:10090``) redirects to the canonical provider URL, which is | ||
| the value to store in ``entity_uri``. | ||
|
|
||
| Keeping ``entity_id`` and ``entity_uri`` consistent in this way means a reader can both | ||
| recognize the registry from the compact ``entity_id`` and dereference the ``entity_uri`` to land | ||
| on an authoritative description of the term. | ||
|
|
||
| Commonly used registries | ||
| ------------------------- | ||
|
|
||
| All of the registries below are registered with the Bioregistry. The ``entity_uri`` column shows | ||
| the canonical URL the example ``entity_id`` resolves to. | ||
|
|
||
| .. list-table:: | ||
|
oruebel marked this conversation as resolved.
|
||
| :header-rows: 1 | ||
| :widths: 10 16 22 20 32 | ||
|
|
||
| * - Prefix | ||
| - Use for | ||
| - Common NWB field(s) | ||
| - Example ``entity_id`` | ||
| - Example ``entity_uri`` | ||
| * - ``NCBITaxon`` | ||
| - Species | ||
| - ``Subject.species`` | ||
| - ``NCBITaxon:10090`` | ||
| - ``http://purl.obolibrary.org/obo/NCBITaxon_10090`` | ||
| * - ``ROR`` | ||
| - Organizations / institutions | ||
| - ``NWBFile.institution`` | ||
| - ``ROR:013meh722`` | ||
| - ``https://ror.org/013meh722`` | ||
| * - ``ORCID`` | ||
| - People (researchers) | ||
| - ``NWBFile.experimenter`` | ||
| - ``ORCID:0000-0002-1825-0097`` | ||
| - ``https://orcid.org/0000-0002-1825-0097`` | ||
| * - ``UBERON`` | ||
| - Brain regions (cross-species) | ||
| - Brain-region location fields [#loc]_ | ||
| - ``UBERON:0001950`` | ||
| - ``http://purl.obolibrary.org/obo/UBERON_0001950`` | ||
| * - ``MBA`` | ||
| - Brain regions (Allen Mouse Brain Atlas) | ||
| - Brain-region location fields [#loc]_ | ||
| - ``MBA:385`` | ||
| - ``https://purl.brain-bican.org/ontology/mbao/MBA_385`` | ||
| * - ``HBA`` | ||
| - Brain regions (Allen Human Brain Atlas) | ||
| - Brain-region location fields [#loc]_ | ||
| - ``HBA:4005`` | ||
| - ``https://purl.brain-bican.org/ontology/hbao/HBA_4005`` | ||
| * - ``DANDI`` | ||
| - Dandisets | ||
| - (identifies the dataset as a whole) | ||
| - ``DANDI:000015`` | ||
| - ``https://dandiarchive.org/dandiset/000015`` | ||
|
|
||
| .. [#loc] Brain-region annotations commonly apply to ``ElectrodeGroup.location``, | ||
| ``ImagingPlane.location``, and the ``location`` column of the ``electrodes`` table. | ||
|
|
||
| Example | ||
| ------- | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| # the species of the subject, mapped to NCBI Taxonomy | ||
| herd.add_ref( | ||
| container=nwbfile.subject, | ||
| attribute="species", | ||
| key="Mus musculus", | ||
| entity_id="NCBITaxon:10090", | ||
| entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090", | ||
| ) | ||
|
|
||
| Resources without individually resolvable URLs | ||
| ---------------------------------------------- | ||
|
|
||
| Some resources do not provide a dereferenceable URL for each individual term. For example, many | ||
| brain atlases (such as the macaque **D99** atlas) publish a single document or download for the | ||
| whole atlas rather than one persistent URL per region. | ||
|
|
||
| In that case: | ||
|
|
||
| * Put the **URL of the resource as a whole** in ``entity_uri`` (e.g. the atlas's landing or | ||
| download page). | ||
| * Put the resource's **identifier for the specific term** — for example, the brain area ID used | ||
| by the atlas — in ``entity_id``. | ||
|
|
||
| This keeps every reference dereferenceable to *something* authoritative (the resource) while | ||
| still recording the precise term identifier, even when a per-term URL does not exist. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| # a region from an atlas that has no per-region URL: identify the region by its | ||
| # atlas-specific ID and point entity_uri at the atlas itself | ||
| herd.add_ref( | ||
| container=electrodes_table, | ||
| attribute="location", | ||
| key="area_42", | ||
| entity_id="42", | ||
| entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/", | ||
| ) | ||
|
|
||
| .. seealso:: | ||
|
|
||
| :py:class:`HERD <hdmf.common.resources.HERD>` for the full API, and | ||
| :py:meth:`HERD.add_ref <hdmf.common.resources.HERD.add_ref>` for adding references. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.