Skip to content

Add HERD tutorial and DANDI streaming example#2200

Draft
rly wants to merge 19 commits into
devfrom
herd-tutorial
Draft

Add HERD tutorial and DANDI streaming example#2200
rly wants to merge 19 commits into
devfrom
herd-tutorial

Conversation

@rly

@rly rly commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Motivation

HERD (HDMF External Resources Data Structure) is now a stable hdmf-common type (no longer experimental), and NWB schema 2.10.0 added an optional slot for it at /general/external_resources. PyNWB had no HERD tutorial, and the only existing guidance (in hdmf) stopped at HERD.from_zip(...) without showing how to access the loaded data (hdmf-dev/hdmf#1325).

This adds:

  • docs/gallery/general/plot_external_resources.py (executed): a concise, NWB-focused tutorial that creates a HERD, annotates NWB objects (Subject.species, a DynamicTable column) with add_ref, stores the HERD inside the NWB file, round-trips it, and inspects the loaded data via to_dataframe() and the individual interlinked tables. It links out to the comprehensive hdmf HERD tutorial rather than duplicating it.
  • docs/gallery/general/resources_streaming.py (rendered but not executed in CI, following the streaming.py convention): a companion example that annotates multiple NWB files streamed from a DANDI dandiset with a single shared HERD.

It also removes the now-dead "HERD is experimental" warning filter from tests/unit/test_resources.py, since HERD no longer emits that warning (HERD._experimental == False).

This supersedes #1781, whose plot_resources.py was an unfinished port of the hdmf tutorial that predates HERD becoming non-experimental and storable in an NWB file. The streaming example here salvages and updates that PR's DANDI idea, so its author @mavaylon1 is credited as a co-author.

TODO:

How to test the behavior?

# Run the executed tutorial directly (no warnings, cleans up after itself):
python docs/gallery/general/plot_external_resources.py

# Run the HERD unit tests:
pytest tests/unit/test_resources.py

The streaming example reads over the network and is intentionally not executed during the docs build.

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

🤖 Generated with Claude Code

Add a concise, NWB-focused gallery tutorial for HERD (HDMF External
Resources Data Structure), which is now a stable hdmf-common type that can
be stored inside an NWB file at /general/external_resources. The tutorial
covers creating a HERD, annotating NWB objects with add_ref, storing the
HERD in the file, round-tripping it, and inspecting the loaded data via
to_dataframe() and the individual interlinked tables.

Add a companion non-executed example showing how to annotate multiple NWB
files streamed from a DANDI dandiset with a single HERD, salvaging the idea
from the stale #1781.

Remove the now-dead "HERD is experimental" warning filter from
test_resources.py, since HERD no longer emits that warning.

Co-Authored-By: Matthew Avaylon <22578631+mavaylon1@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.29%. Comparing base (f7274a6) to head (3c46bd6).

Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #2200   +/-   ##
=======================================
  Coverage   95.29%   95.29%           
=======================================
  Files          30       30           
  Lines        3039     3039           
  Branches      450      450           
=======================================
  Hits         2896     2896           
  Misses         87       87           
  Partials       56       56           
Flag Coverage Δ
integration 73.14% <ø> (ø)
unit 85.98% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rly rly marked this pull request as draft June 23, 2026 02:47
rly and others added 9 commits June 22, 2026 19:47
Drop the copied gallery_thumbnails_external_resources.png (it showed the
add/remove-containers graphic) and the sphinx_gallery_thumbnail_path
directives. Sphinx-gallery falls back to its default placeholder until
dedicated thumbnails are authored in gallery_thumbnails.pptx.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add dedicated thumbnails for the two HERD tutorials (authored in
gallery_thumbnails.pptx) and wire them in via sphinx_gallery_thumbnail_path.
Rename the single-file tutorial heading to "Linking to External Resources
(HERD)" to match its thumbnail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a GALLERY_ORDER_END mechanism to the gallery sort key so the two HERD
tutorials appear together at the end of the General tutorials section,
after the alphabetically sorted galleries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the experimental/stable framing and the storage-location details
from the HERD tutorial intro and write/read section. Assign the HERD
directly via nwbfile.external_resources = HERD() instead of a one-off
variable. Replace the species-table example with annotating the electrodes
table location column against the Allen Mouse Brain CCFv3 (VISp, structure
385), and switch the subject to Mus musculus for consistency. Restructure
the read so the "Access the loaded data" narrative renders at top level
instead of being indented inside the IO context manager.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Unpack the entity_uri reuse logic from a ternary into explicit if/else with
comments, and annotate the species/experimenter values inline for clarity.
Add a section showing how to reload a saved HERD with from_zip and use it to
annotate the institution of a streamed file against its ROR identifier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the GALLERY_ORDER_END mechanism with an explicit full ordering of the
general tutorials in GALLERY_ORDER, with the two HERD tutorials at the end.

Add the get_object_entities accessor to the HERD tutorial's read section,
commented out with a pointer to hdmf#1496, since it currently fails on a HERD
read back from a file. Keeping it commented keeps the executed tutorial (and
the RTD preview build) green until that fix is released.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PowerPoint's "Save as Picture" pads exported PNGs with a transparent band
(here ~28px at the top) regardless of object placement, a long-standing
quirk. Trim the fully transparent margins so the thumbnails are tightly
cropped to the card, matching the other gallery thumbnails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the explicit load_namespaces=True from the NWBHDF5IO calls since it is
the default. In the load-external-HERD example, show how to view the subject's
species annotation with get_object_entities.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After loading the HERD and adding the institution annotation, write it to a
new zip archive so the annotation is saved rather than left unused.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rly

rly commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@oruebel @bendichter two small parts of these tutorials require changes in HDMF (as mentioned above). I would appreciate your feedback on the tutorials while those issues get resolved.

@bendichter

bendichter commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Comment on lines +93 to +99
nwbfile.external_resources.add_ref(
container=nwbfile.electrodes,
attribute="location",
key="VISp",
entity_id="385",
entity_uri="https://api.brain-map.org/api/v2/data/Structure/385.json",
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +87 to +88
file=read_nwbfile,
container=read_nwbfile.subject,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why it is necessary to provide the file arg here. Can't you resolve the file from the container directly?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to populate this automatically and error out when file not present

Comment on lines +80 to +85
if entity is not None:
# the entity is already in the HERD, so reuse it and keep its existing URI
entity_uri = None
else:
# the entity is not yet in the HERD, so provide its URI to create it
entity_uri = "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090"

@bendichter bendichter Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you just provide the entity uri again as the same string? Does it create a duplicate row?

@bendichter bendichter Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check to see if these cases are handled. We want to ensure that the tables are always normalized, so adding an identical entry should not duplicate a row

@bendichter

Copy link
Copy Markdown
Collaborator

Can we talk about how we are using entity_id?

In some cases, you have a prefix:suffix compact ID. In some cases, this is resolvable in identifiers.org, e.g. ROR:013sk6x84: https://identifiers.org/resolve?query=ROR:013sk6x84 resolves to https://ror.org/013sk6x84. My understanding is this is how the prefix:suffix compact identifiers are meant to be used. DANDI works the same way — http://identifiers.org/DANDI:000015 resolves to https://dandiarchive.org/dandiset/000015.

However there are some cases where this isn’t working. For species, we are using NCBI_TAXON:10090, which does not resolve (https://identifiers.org/resolve?query=NCBI_TAXON:10090). However, taxonomy:10090 does resolve: https://identifiers.org/resolve?query=taxonomy:10090 goes to several links that all point to the mouse.

And then there’s the neural data, where the entity_id is simply 385, with no prefix and no clear way to map this to any external resource without the uri.

It looks like there are 3 separate uses for entity_id here, and only one of them really makes sense to me, where the id is resolvable at identifiers.org

@bendichter

Copy link
Copy Markdown
Collaborator

It looks like https://bioregistry.io/ resolves everything we want so far:

  • NCBITaxon: (species)
  • ROR: (organizations)
  • ORCID: (people)
  • MBA: (mouse brain atlast)
  • UBERON: (cross-species brain atlas)
  • HBA: (human brain atlas)

it also resolves DANDI:

@bendichter

Copy link
Copy Markdown
Collaborator

We need a page that is an explanation of best practices for using HERD in the context of NWB files. I'm going to draft it here, as I think this is where it should go, but I would also be OK with putting this in NWB Inspector or nwb.org.

Comment thread docs/gallery/general/plot_external_resources.py Outdated
Comment thread docs/gallery/general/resources_streaming.py Outdated
Comment thread docs/gallery/general/resources_streaming.py Outdated
bendichter added a commit to bendichter/hdmf that referenced this pull request Jun 23, 2026
The file is now always resolved automatically from the container's parent
hierarchy via _get_file_from_container, so an external reference can only be
added to a container that has already been added to a file. Passing a file
explicitly is no longer possible (or needed).

- add_ref: drop the `file` docval arg; always resolve from the container.
- add_ref_termset: drop the now-vestigial `file` arg (it only forwarded to
  add_ref, which no longer accepts it).
- Update the plot_external_resources gallery tutorial to the new API and
  adjust the surrounding prose about how the file is resolved.
- Update unit tests to parent each container to its file before add_ref.

Addresses NeurodataWithoutBorders/pynwb#2200 review:
NeurodataWithoutBorders/pynwb#2200 (comment)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# :py:class:`~pynwb.file.NWBFile`.

nwbfile.external_resources = HERD()

@oruebel oruebel Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if the nwbfile already has an existing HERD (e.g., if the file was read from disk).

We already use the pattern, where e.g., nwbfile.intracellular_recordings returns the object that exists (and returns None if it is missing) and nwbfile.get_intracellular_recordings() either returns the existing object or constructs a new one if it is missing.

Since there is only one HERD per file, I think we can simply use the same pattern here. I.e., use nwbfile.external_resources to access HERD and have nwbfile.get_external_resources construct a new HERD if it is missing.

# files with a single HERD. For the full HERD API, see the
# `HDMF HERD tutorial <https://hdmf.readthedocs.io/en/stable/tutorials/plot_external_resources.html>`_.

os.remove(filename)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to remove the file as part of the tutorial or does the build/pytest take care of the clean-up of files?

Comment on lines +149 to +156
###############################################################################
# View the individual tables:

read_herd.keys.to_dataframe()

###############################################################################

read_herd.entities.to_dataframe()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be removed since it is repetitive with the "Inspect the HERD" section. Maybe the "Inspect HERD" section could be moved here to occur after read, which is the more common place where a user would likely need this too.

Comment on lines +93 to +94
nwbfile.external_resources.add_ref(
container=nwbfile.electrodes,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look for ragged columns with a VectorIndex? Do we in this case annotate the VectorIndex or the VectorData it points to? I think currently a user would probably try to annotate the `VectorIndex, which I think is fine, but I'm wondering whether that works with HERD (e.g., if it checks the presence of a value and VectorIndex is an int)?

rly and others added 5 commits June 26, 2026 14:49
- add_ref resolves the file from the container; drop the removed file argument
- get_object_entities now works on a HERD read back from a file
- use bioregistry CURIEs and resolvable entity URIs per nwb-overview guidance
- check each streamed metadata value before annotating it
- store the HERD in the file without removing it inline (cleaned up by test.py)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- exclude resources_streaming.py from the offline example tests
- add tests/read_dandi/read_dandi.py to run the dandi reads and the streaming
  tutorial, removing the files the tutorial generates
- remove external_resources_tutorial.nwb in clean_up_tests
- enable the daily schedule for the DANDI read workflow

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants