Add HERD tutorial and DANDI streaming example by rly · Pull Request #2200 · NeurodataWithoutBorders/pynwb

rly · 2026-06-23T02:37:44Z

Motivation

HERD (HDMF External Resources Data Structure) is now a stable hdmf-common type (no longer experimental), and NWB schema 2.10.0 added an optional slot for it at /general/external_resources. PyNWB had no HERD tutorial, and the only existing guidance (in hdmf) stopped at HERD.from_zip(...) without showing how to access the loaded data (hdmf-dev/hdmf#1325).

This adds:

docs/gallery/general/plot_external_resources.py (executed): a concise, NWB-focused tutorial that creates a HERD, annotates NWB objects (Subject.species, a DynamicTable column) with add_ref, stores the HERD inside the NWB file, round-trips it, and inspects the loaded data via to_dataframe() and the individual interlinked tables. It links out to the comprehensive hdmf HERD tutorial rather than duplicating it.
docs/gallery/general/resources_streaming.py (rendered but not executed in CI, following the streaming.py convention): a companion example that annotates multiple NWB files streamed from a DANDI dandiset with a single shared HERD.

It also removes the now-dead "HERD is experimental" warning filter from tests/unit/test_resources.py, since HERD no longer emits that warning (HERD._experimental == False).

This supersedes #1781, whose plot_resources.py was an unfinished port of the hdmf tutorial that predates HERD becoming non-experimental and storable in an NWB file. The streaming example here salvages and updates that PR's DANDI idea, so its author @mavaylon1 is credited as a co-author.

TODO:

Add thumbnails with source pptx
Update if Fix HERD.get_object_entities failing on a HERD read from file hdmf-dev/hdmf#1497 is merged first
Requires Fix HERD.from_zip to honor the type map hdmf-dev/hdmf#1506 to be merged

How to test the behavior?

# Run the executed tutorial directly (no warnings, cleans up after itself):
python docs/gallery/general/plot_external_resources.py

# Run the HERD unit tests:
pytest tests/unit/test_resources.py

The streaming example reads over the network and is intentionally not executed during the docs build.

Checklist

Did you update CHANGELOG.md with your changes?
Have you checked our Contributing document?
Have you ensured the PR clearly describes the problem and the solution?
Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

🤖 Generated with Claude Code

Add a concise, NWB-focused gallery tutorial for HERD (HDMF External Resources Data Structure), which is now a stable hdmf-common type that can be stored inside an NWB file at /general/external_resources. The tutorial covers creating a HERD, annotating NWB objects with add_ref, storing the HERD in the file, round-tripping it, and inspecting the loaded data via to_dataframe() and the individual interlinked tables. Add a companion non-executed example showing how to annotate multiple NWB files streamed from a DANDI dandiset with a single HERD, salvaging the idea from the stale #1781. Remove the now-dead "HERD is experimental" warning filter from test_resources.py, since HERD no longer emits that warning. Co-Authored-By: Matthew Avaylon <22578631+mavaylon1@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-23T02:38:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.29%. Comparing base (f7274a6) to head (3c46bd6).

Additional details and impacted files

@@           Coverage Diff           @@
##              dev    #2200   +/-   ##
=======================================
  Coverage   95.29%   95.29%           
=======================================
  Files          30       30           
  Lines        3039     3039           
  Branches      450      450           
=======================================
  Hits         2896     2896           
  Misses         87       87           
  Partials       56       56

Flag	Coverage Δ
integration	`73.14% <ø> (ø)`
unit	`85.98% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Drop the copied gallery_thumbnails_external_resources.png (it showed the add/remove-containers graphic) and the sphinx_gallery_thumbnail_path directives. Sphinx-gallery falls back to its default placeholder until dedicated thumbnails are authored in gallery_thumbnails.pptx. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add dedicated thumbnails for the two HERD tutorials (authored in gallery_thumbnails.pptx) and wire them in via sphinx_gallery_thumbnail_path. Rename the single-file tutorial heading to "Linking to External Resources (HERD)" to match its thumbnail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a GALLERY_ORDER_END mechanism to the gallery sort key so the two HERD tutorials appear together at the end of the General tutorials section, after the alphabetically sorted galleries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Remove the experimental/stable framing and the storage-location details from the HERD tutorial intro and write/read section. Assign the HERD directly via nwbfile.external_resources = HERD() instead of a one-off variable. Replace the species-table example with annotating the electrodes table location column against the Allen Mouse Brain CCFv3 (VISp, structure 385), and switch the subject to Mus musculus for consistency. Restructure the read so the "Access the loaded data" narrative renders at top level instead of being indented inside the IO context manager. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Unpack the entity_uri reuse logic from a ternary into explicit if/else with comments, and annotate the species/experimenter values inline for clarity. Add a section showing how to reload a saved HERD with from_zip and use it to annotate the institution of a streamed file against its ROR identifier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the GALLERY_ORDER_END mechanism with an explicit full ordering of the general tutorials in GALLERY_ORDER, with the two HERD tutorials at the end. Add the get_object_entities accessor to the HERD tutorial's read section, commented out with a pointer to hdmf#1496, since it currently fails on a HERD read back from a file. Keeping it commented keeps the executed tutorial (and the RTD preview build) green until that fix is released. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PowerPoint's "Save as Picture" pads exported PNGs with a transparent band (here ~28px at the top) regardless of object placement, a long-standing quirk. Trim the fully transparent margins so the thumbnails are tightly cropped to the card, matching the other gallery thumbnails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop the explicit load_namespaces=True from the NWBHDF5IO calls since it is the default. In the load-external-HERD example, show how to view the subject's species annotation with get_object_entities. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

After loading the HERD and adding the institution annotation, write it to a new zip archive so the annotation is saved rather than left unused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rly · 2026-06-23T08:44:34Z

@oruebel @bendichter two small parts of these tutorials require changes in HDMF (as mentioned above). I would appreciate your feedback on the tutorials while those issues get resolved.

bendichter · 2026-06-23T12:58:25Z

https://pynwb--2200.org.readthedocs.build/en/2200/tutorials/general/plot_external_resources.html

https://pynwb--2200.org.readthedocs.build/en/2200/tutorials/general/resources_streaming.html

bendichter · 2026-06-23T14:39:24Z

+nwbfile.external_resources.add_ref(
+    container=nwbfile.electrodes,
+    attribute="location",
+    key="VISp",
+    entity_id="385",
+    entity_uri="https://api.brain-map.org/api/v2/data/Structure/385.json",
+)


Would we want to use https://purl.brain-bican.org/ontology/mbao/MBA_385 instead?

bendichter · 2026-06-23T14:44:34Z

+                file=read_nwbfile,
+                container=read_nwbfile.subject,


I don't understand why it is necessary to provide the file arg here. Can't you resolve the file from the container directly?

we want to populate this automatically and error out when file not present

bendichter · 2026-06-23T14:47:34Z

+            if entity is not None:
+                # the entity is already in the HERD, so reuse it and keep its existing URI
+                entity_uri = None
+            else:
+                # the entity is not yet in the HERD, so provide its URI to create it
+                entity_uri = "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090"


What happens if you just provide the entity uri again as the same string? Does it create a duplicate row?

let's check to see if these cases are handled. We want to ensure that the tables are always normalized, so adding an identical entry should not duplicate a row

bendichter · 2026-06-23T15:59:26Z

Can we talk about how we are using entity_id?  In some cases, you have a prefix:suffix compact ID. In some cases, this is resolvable in identifiers.org, e.g. ROR:013sk6x84: https://identifiers.org/resolve?query=ROR:013sk6x84 resolves to https://ror.org/013sk6x84. My understanding is this is how the prefix:suffix compact identifiers are meant to be used. DANDI works the same way — http://identifiers.org/DANDI:000015 resolves to https://dandiarchive.org/dandiset/000015.

However there are some cases where this isn’t working. For species, we are using NCBI_TAXON:10090, which does not resolve (https://identifiers.org/resolve?query=NCBI_TAXON:10090). However, taxonomy:10090 does resolve: https://identifiers.org/resolve?query=taxonomy:10090 goes to several links that all point to the mouse.

And then there’s the neural data, where the entity_id is simply 385, with no prefix and no clear way to map this to any external resource without the uri.

It looks like there are 3 separate uses for entity_id here, and only one of them really makes sense to me, where the id is resolvable at identifiers.org

bendichter · 2026-06-23T17:54:16Z

It looks like https://bioregistry.io/ resolves everything we want so far:

NCBITaxon: (species)
ROR: (organizations)
ORCID: (people)
MBA: (mouse brain atlast)
UBERON: (cross-species brain atlas)
HBA: (human brain atlas)

it also resolves DANDI:

bendichter · 2026-06-23T17:59:26Z

We need a page that is an explanation of best practices for using HERD in the context of NWB files. I'm going to draft it here, as I think this is where it should go, but I would also be OK with putting this in NWB Inspector or nwb.org.

The file is now always resolved automatically from the container's parent hierarchy via _get_file_from_container, so an external reference can only be added to a container that has already been added to a file. Passing a file explicitly is no longer possible (or needed). - add_ref: drop the `file` docval arg; always resolve from the container. - add_ref_termset: drop the now-vestigial `file` arg (it only forwarded to add_ref, which no longer accepts it). - Update the plot_external_resources gallery tutorial to the new API and adjust the surrounding prose about how the file is resolved. - Update unit tests to parent each container to its file before add_ref. Addresses NeurodataWithoutBorders/pynwb#2200 review: NeurodataWithoutBorders/pynwb#2200 (comment) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

oruebel · 2026-06-24T20:23:56Z

+# :py:class:`~pynwb.file.NWBFile`.
+
+nwbfile.external_resources = HERD()
+


What happens here if the nwbfile already has an existing HERD (e.g., if the file was read from disk).

We already use the pattern, where e.g., nwbfile.intracellular_recordings returns the object that exists (and returns None if it is missing) and nwbfile.get_intracellular_recordings() either returns the existing object or constructs a new one if it is missing.

Since there is only one HERD per file, I think we can simply use the same pattern here. I.e., use nwbfile.external_resources to access HERD and have nwbfile.get_external_resources construct a new HERD if it is missing.

oruebel · 2026-06-24T20:30:01Z

+# files with a single HERD. For the full HERD API, see the
+# `HDMF HERD tutorial <https://hdmf.readthedocs.io/en/stable/tutorials/plot_external_resources.html>`_.
+
+os.remove(filename)


Is it necessary to remove the file as part of the tutorial or does the build/pytest take care of the clean-up of files?

oruebel · 2026-06-24T20:32:21Z

+###############################################################################
+# View the individual tables:
+
+read_herd.keys.to_dataframe()
+
+###############################################################################
+
+read_herd.entities.to_dataframe()


I think this could be removed since it is repetitive with the "Inspect the HERD" section. Maybe the "Inspect HERD" section could be moved here to occur after read, which is the more common place where a user would likely need this too.

oruebel · 2026-06-24T20:46:05Z

+nwbfile.external_resources.add_ref(
+    container=nwbfile.electrodes,


How does this look for ragged columns with a VectorIndex? Do we in this case annotate the VectorIndex or the VectorData it points to? I think currently a user would probably try to annotate the `VectorIndex, which I think is fine, but I'm wondering whether that works with HERD (e.g., if it checks the presence of a value and VectorIndex is an int)?

- add_ref resolves the file from the container; drop the removed file argument - get_object_entities now works on a HERD read back from a file - use bioregistry CURIEs and resolvable entity URIs per nwb-overview guidance - check each streamed metadata value before annotating it - store the HERD in the file without removing it inline (cleaned up by test.py) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- exclude resources_streaming.py from the offline example tests - add tests/read_dandi/read_dandi.py to run the dandi reads and the streaming tutorial, removing the files the tutorial generates - remove external_resources_tutorial.nwb in clean_up_tests - enable the daily schedule for the DANDI read workflow Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rly mentioned this pull request Jun 23, 2026

Add HERD gallery to pynwb with streaming #1781

Closed

9 tasks

rly marked this pull request as draft June 23, 2026 02:47

rly and others added 9 commits June 22, 2026 19:47

Persist the updated HERD in the streaming tutorial

e750039

After loading the HERD and adding the institution annotation, write it to a new zip archive so the annotation is saved rather than left unused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rly requested review from bendichter and oruebel June 23, 2026 08:26

This was referenced Jun 23, 2026

HERD: add a custom repr/_repr_html_ so a read-back HERD does not appear empty hdmf-dev/hdmf#1508

Closed

HERD.add_ref: default key to a scalar attribute's value to remove redundant argument hdmf-dev/hdmf#1509

Closed

rly mentioned this pull request Jun 23, 2026

Add HERD repr/_repr_html_ surfacing references as a flattened table hdmf-dev/hdmf#1510

Merged

6 tasks

bendichter reviewed Jun 23, 2026

View reviewed changes

Comment thread docs/gallery/general/plot_external_resources.py Outdated

Comment thread docs/gallery/general/resources_streaming.py Outdated

Comment thread docs/gallery/general/resources_streaming.py Outdated

Merge branch 'dev' into herd-tutorial

b3d1320

bendichter mentioned this pull request Jun 23, 2026

Remove the file argument from HERD.add_ref; resolve the file automatically hdmf-dev/hdmf#1512

Merged

This was referenced Jun 23, 2026

HERD.add_ref: only warn on entity_uri mismatch, not on re-passing the same URI hdmf-dev/hdmf#1513

Merged

Add guide on choosing entity_id and entity_uri for HERD references #2206

Closed

bendichter added 3 commits June 24, 2026 11:05

Update docs/gallery/general/plot_external_resources.py

34ed3e2

Update docs/gallery/general/resources_streaming.py

c30f82b

Update docs/gallery/general/resources_streaming.py

0233613

oruebel reviewed Jun 24, 2026

View reviewed changes

rly mentioned this pull request Jun 24, 2026

Add guide on choosing entity_id and entity_uri for HERD references NeurodataWithoutBorders/nwb-overview#192

Open

oruebel reviewed Jun 24, 2026

View reviewed changes

rly and others added 5 commits June 26, 2026 14:49

Merge remote-tracking branch 'origin/dev' into herd-tutorial

05573ae

Merge branch 'dev' into herd-tutorial

3ffa634

Merge remote-tracking branch 'origin/herd-tutorial' into herd-tutorial

3c46bd6

		# :py:class:`~pynwb.file.NWBFile`.

		nwbfile.external_resources = HERD()

		nwbfile.external_resources.add_ref(
		container=nwbfile.electrodes,

Uh oh!

Conversation

rly commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

How to test the behavior?

Checklist

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rly commented Jun 23, 2026

Uh oh!

bendichter commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bendichter Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

bendichter Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

bendichter Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

bendichter Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bendichter Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bendichter commented Jun 23, 2026

Uh oh!

bendichter commented Jun 23, 2026

Uh oh!

bendichter commented Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oruebel Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oruebel Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

oruebel Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

oruebel Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rly commented Jun 23, 2026 •

edited

Loading

codecov Bot commented Jun 23, 2026 •

edited

Loading

bendichter commented Jun 23, 2026 •

edited

Loading

bendichter Jun 23, 2026 •

edited

Loading

bendichter Jun 23, 2026 •

edited

Loading

oruebel Jun 24, 2026 •

edited

Loading