Skip to content

Commit 076f02b

Browse files
Cass Daltonclaude
andcommitted
Add HDF5 metadata sidecar support (hdf5-meta extension)
Implements the optional hdf5-meta extension: a columnar HDF5 sidecar alongside the .sigmf-meta JSON for faster, smaller column-oriented metadata access on Recordings with large captures/annotations arrays. - sigmf/hdf5.py: writer (write_hdf5_sidecar), full-dict reader (read_hdf5_sidecar), and SigMFFileHDF5 — a lazy, columnar reader that serves captures/annotations as numpy columns/arrays without building per-row dicts (the actual speedup). Entry points hdf5.open (zero JSON) and hdf5.fromfile (discover via one JSON read, prefer fresh sidecar). - SigMFFile.tofile(write_hdf5=True): writes the sidecar and declares the extension. sigmf.fromfile is unchanged and always reads pure JSON, so existing behavior and the authoritative-JSON contract are preserved. - Stale-sidecar guard via a source-metadata digest; h5py is an optional dependency (pip install sigmf[hdf5]), lazily imported. - tests/test_hdf5.py: round-trip, columnar reads, discovery, stale/ corrupt/missing fallback, edge cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent a3db459 commit 076f02b

7 files changed

Lines changed: 1071 additions & 1 deletion

File tree

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,33 @@ meta.tofile("recording")
6161
meta.tofile("recording.sigmf.gz")
6262
```
6363

64+
### HDF5 metadata sidecar (optional)
65+
66+
For recordings with very large `captures`/`annotations`, the optional
67+
`hdf5-meta` extension can write a columnar HDF5 sidecar next to the
68+
`.sigmf-meta` file. The JSON metadata stays complete and authoritative; the
69+
sidecar is a smaller, faster cache for column-oriented access. Requires the
70+
optional `h5py` dependency: `pip install sigmf[hdf5]`.
71+
72+
```python
73+
import sigmf
74+
from sigmf import hdf5
75+
76+
# write the sidecar alongside the JSON (declares the hdf5-meta extension)
77+
meta.tofile("recording", write_hdf5=True) # also writes recording.sigmf-meta.h5
78+
79+
# fast columnar read: open ONLY the sidecar, no JSON parsing, no per-row dicts
80+
with hdf5.open("recording.sigmf-meta.h5") as fast:
81+
starts = fast.annotations_column("core:sample_start") # numpy column
82+
table = fast.annotations_array() # structured array
83+
84+
# or discover via the JSON once, then prefer the sidecar when present & fresh
85+
fast = hdf5.fromfile("recording.sigmf-meta") # SigMFFileHDF5 if usable, else SigMFFile
86+
87+
# the standard reader is unchanged and always reads pure JSON
88+
meta = sigmf.fromfile("recording.sigmf-meta")
89+
```
90+
6491
### Docs
6592

6693
**[Please visit our documentation for full API reference and more info.](https://sigmf.readthedocs.io/en/latest/)**

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,15 @@ dependencies = [
3838
sigmf_validate = "sigmf.validate:main"
3939
sigmf_convert = "sigmf.convert.__main__:main"
4040
[project.optional-dependencies]
41+
hdf5 = [
42+
"h5py", # for the optional hdf5-meta metadata sidecar
43+
]
4144
test = [
4245
"ruff",
4346
"pytest",
4447
"pytest-cov",
4548
"hypothesis", # next-gen testing framework
49+
"h5py", # exercise the optional hdf5-meta sidecar in tests
4650
]
4751

4852
[tool.setuptools]

sigmf/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
archive,
1414
archivereader,
1515
error,
16+
hdf5,
1617
keys,
1718
schema,
1819
siggen,

0 commit comments

Comments
 (0)