Add HERD tutorial and DANDI streaming example#2200
Conversation
Add a concise, NWB-focused gallery tutorial for HERD (HDMF External Resources Data Structure), which is now a stable hdmf-common type that can be stored inside an NWB file at /general/external_resources. The tutorial covers creating a HERD, annotating NWB objects with add_ref, storing the HERD in the file, round-tripping it, and inspecting the loaded data via to_dataframe() and the individual interlinked tables. Add a companion non-executed example showing how to annotate multiple NWB files streamed from a DANDI dandiset with a single HERD, salvaging the idea from the stale #1781. Remove the now-dead "HERD is experimental" warning filter from test_resources.py, since HERD no longer emits that warning. Co-Authored-By: Matthew Avaylon <22578631+mavaylon1@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #2200 +/- ##
=======================================
Coverage 95.29% 95.29%
=======================================
Files 30 30
Lines 3039 3039
Branches 450 450
=======================================
Hits 2896 2896
Misses 87 87
Partials 56 56
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Drop the copied gallery_thumbnails_external_resources.png (it showed the add/remove-containers graphic) and the sphinx_gallery_thumbnail_path directives. Sphinx-gallery falls back to its default placeholder until dedicated thumbnails are authored in gallery_thumbnails.pptx. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add dedicated thumbnails for the two HERD tutorials (authored in gallery_thumbnails.pptx) and wire them in via sphinx_gallery_thumbnail_path. Rename the single-file tutorial heading to "Linking to External Resources (HERD)" to match its thumbnail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a GALLERY_ORDER_END mechanism to the gallery sort key so the two HERD tutorials appear together at the end of the General tutorials section, after the alphabetically sorted galleries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the experimental/stable framing and the storage-location details from the HERD tutorial intro and write/read section. Assign the HERD directly via nwbfile.external_resources = HERD() instead of a one-off variable. Replace the species-table example with annotating the electrodes table location column against the Allen Mouse Brain CCFv3 (VISp, structure 385), and switch the subject to Mus musculus for consistency. Restructure the read so the "Access the loaded data" narrative renders at top level instead of being indented inside the IO context manager. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Unpack the entity_uri reuse logic from a ternary into explicit if/else with comments, and annotate the species/experimenter values inline for clarity. Add a section showing how to reload a saved HERD with from_zip and use it to annotate the institution of a streamed file against its ROR identifier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the GALLERY_ORDER_END mechanism with an explicit full ordering of the general tutorials in GALLERY_ORDER, with the two HERD tutorials at the end. Add the get_object_entities accessor to the HERD tutorial's read section, commented out with a pointer to hdmf#1496, since it currently fails on a HERD read back from a file. Keeping it commented keeps the executed tutorial (and the RTD preview build) green until that fix is released. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PowerPoint's "Save as Picture" pads exported PNGs with a transparent band (here ~28px at the top) regardless of object placement, a long-standing quirk. Trim the fully transparent margins so the thumbnails are tightly cropped to the card, matching the other gallery thumbnails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the explicit load_namespaces=True from the NWBHDF5IO calls since it is the default. In the load-external-HERD example, show how to view the subject's species annotation with get_object_entities. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After loading the HERD and adding the institution annotation, write it to a new zip archive so the annotation is saved rather than left unused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@oruebel @bendichter two small parts of these tutorials require changes in HDMF (as mentioned above). I would appreciate your feedback on the tutorials while those issues get resolved. |
| nwbfile.external_resources.add_ref( | ||
| container=nwbfile.electrodes, | ||
| attribute="location", | ||
| key="VISp", | ||
| entity_id="385", | ||
| entity_uri="https://api.brain-map.org/api/v2/data/Structure/385.json", | ||
| ) |
There was a problem hiding this comment.
Would we want to use https://purl.brain-bican.org/ontology/mbao/MBA_385 instead?
| file=read_nwbfile, | ||
| container=read_nwbfile.subject, |
There was a problem hiding this comment.
I don't understand why it is necessary to provide the file arg here. Can't you resolve the file from the container directly?
There was a problem hiding this comment.
we want to populate this automatically and error out when file not present
| if entity is not None: | ||
| # the entity is already in the HERD, so reuse it and keep its existing URI | ||
| entity_uri = None | ||
| else: | ||
| # the entity is not yet in the HERD, so provide its URI to create it | ||
| entity_uri = "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090" |
There was a problem hiding this comment.
What happens if you just provide the entity uri again as the same string? Does it create a duplicate row?
There was a problem hiding this comment.
let's check to see if these cases are handled. We want to ensure that the tables are always normalized, so adding an identical entry should not duplicate a row
|
Can we talk about how we are using entity_id? In some cases, you have a prefix:suffix compact ID. In some cases, this is resolvable in identifiers.org, e.g. ROR:013sk6x84: https://identifiers.org/resolve?query=ROR:013sk6x84 resolves to https://ror.org/013sk6x84. My understanding is this is how the prefix:suffix compact identifiers are meant to be used. DANDI works the same way — http://identifiers.org/DANDI:000015 resolves to https://dandiarchive.org/dandiset/000015. However there are some cases where this isn’t working. For species, we are using NCBI_TAXON:10090, which does not resolve (https://identifiers.org/resolve?query=NCBI_TAXON:10090). However, taxonomy:10090 does resolve: https://identifiers.org/resolve?query=taxonomy:10090 goes to several links that all point to the mouse. And then there’s the neural data, where the entity_id is simply 385, with no prefix and no clear way to map this to any external resource without the uri. It looks like there are 3 separate uses for entity_id here, and only one of them really makes sense to me, where the id is resolvable at identifiers.org |
|
It looks like https://bioregistry.io/ resolves everything we want so far:
it also resolves DANDI: |
|
We need a page that is an explanation of best practices for using HERD in the context of NWB files. I'm going to draft it here, as I think this is where it should go, but I would also be OK with putting this in NWB Inspector or nwb.org. |
The file is now always resolved automatically from the container's parent hierarchy via _get_file_from_container, so an external reference can only be added to a container that has already been added to a file. Passing a file explicitly is no longer possible (or needed). - add_ref: drop the `file` docval arg; always resolve from the container. - add_ref_termset: drop the now-vestigial `file` arg (it only forwarded to add_ref, which no longer accepts it). - Update the plot_external_resources gallery tutorial to the new API and adjust the surrounding prose about how the file is resolved. - Update unit tests to parent each container to its file before add_ref. Addresses NeurodataWithoutBorders/pynwb#2200 review: NeurodataWithoutBorders/pynwb#2200 (comment) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| # :py:class:`~pynwb.file.NWBFile`. | ||
|
|
||
| nwbfile.external_resources = HERD() | ||
|
|
There was a problem hiding this comment.
What happens here if the nwbfile already has an existing HERD (e.g., if the file was read from disk).
We already use the pattern, where e.g., nwbfile.intracellular_recordings returns the object that exists (and returns None if it is missing) and nwbfile.get_intracellular_recordings() either returns the existing object or constructs a new one if it is missing.
Since there is only one HERD per file, I think we can simply use the same pattern here. I.e., use nwbfile.external_resources to access HERD and have nwbfile.get_external_resources construct a new HERD if it is missing.
| # files with a single HERD. For the full HERD API, see the | ||
| # `HDMF HERD tutorial <https://hdmf.readthedocs.io/en/stable/tutorials/plot_external_resources.html>`_. | ||
|
|
||
| os.remove(filename) |
There was a problem hiding this comment.
Is it necessary to remove the file as part of the tutorial or does the build/pytest take care of the clean-up of files?
| ############################################################################### | ||
| # View the individual tables: | ||
|
|
||
| read_herd.keys.to_dataframe() | ||
|
|
||
| ############################################################################### | ||
|
|
||
| read_herd.entities.to_dataframe() |
There was a problem hiding this comment.
I think this could be removed since it is repetitive with the "Inspect the HERD" section. Maybe the "Inspect HERD" section could be moved here to occur after read, which is the more common place where a user would likely need this too.
| nwbfile.external_resources.add_ref( | ||
| container=nwbfile.electrodes, |
There was a problem hiding this comment.
How does this look for ragged columns with a VectorIndex? Do we in this case annotate the VectorIndex or the VectorData it points to? I think currently a user would probably try to annotate the `VectorIndex, which I think is fine, but I'm wondering whether that works with HERD (e.g., if it checks the presence of a value and VectorIndex is an int)?
- add_ref resolves the file from the container; drop the removed file argument - get_object_entities now works on a HERD read back from a file - use bioregistry CURIEs and resolvable entity URIs per nwb-overview guidance - check each streamed metadata value before annotating it - store the HERD in the file without removing it inline (cleaned up by test.py) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- exclude resources_streaming.py from the offline example tests - add tests/read_dandi/read_dandi.py to run the dandi reads and the streaming tutorial, removing the files the tutorial generates - remove external_resources_tutorial.nwb in clean_up_tests - enable the daily schedule for the DANDI read workflow Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Motivation
HERD (HDMF External Resources Data Structure) is now a stable
hdmf-commontype (no longer experimental), and NWB schema 2.10.0 added an optional slot for it at/general/external_resources. PyNWB had no HERD tutorial, and the only existing guidance (in hdmf) stopped atHERD.from_zip(...)without showing how to access the loaded data (hdmf-dev/hdmf#1325).This adds:
docs/gallery/general/plot_external_resources.py(executed): a concise, NWB-focused tutorial that creates a HERD, annotates NWB objects (Subject.species, aDynamicTablecolumn) withadd_ref, stores the HERD inside the NWB file, round-trips it, and inspects the loaded data viato_dataframe()and the individual interlinked tables. It links out to the comprehensive hdmf HERD tutorial rather than duplicating it.docs/gallery/general/resources_streaming.py(rendered but not executed in CI, following thestreaming.pyconvention): a companion example that annotates multiple NWB files streamed from a DANDI dandiset with a single shared HERD.It also removes the now-dead
"HERD is experimental"warning filter fromtests/unit/test_resources.py, since HERD no longer emits that warning (HERD._experimental == False).This supersedes #1781, whose
plot_resources.pywas an unfinished port of the hdmf tutorial that predates HERD becoming non-experimental and storable in an NWB file. The streaming example here salvages and updates that PR's DANDI idea, so its author @mavaylon1 is credited as a co-author.TODO:
How to test the behavior?
The streaming example reads over the network and is intentionally not executed during the docs build.
Checklist
flake8from the source directory.🤖 Generated with Claude Code