Skip to content

Commit 90f1ab4

Browse files
committed
[#282]:svarga:docs, decorate H5D public surface + clean group titles
Parallel of the H5A decoration sweep for the dataset (H5D*) family. Most H5D files already had alias-first docstrings from the prior sweep -- this turn focuses on the gaps (H5Dread, H5Dappend, H5Dsparse, H5Dopen) and on shortening the @defgroup titles across the API so the rendered Topics / Modules tree reads cleanly. Unlike H5A, the H5D public surface does not use SFINAE on the parent type -- every signature takes a concrete h5::fd_t / h5::ds_t -- so the H5CPP_*_RET return-type macro pattern from H5A does not apply here. The lowercase hid_t template-param rename does not apply either. DOCSTRINGS * H5Dread.hpp -- 8 overloads modernised from the older chained-aliases format to the full template (one-liner @brief, detailed description, \par_* parameter aliases, @throws, @code example, \sa_h5cpp / \sa_hdf5, @sa cross-refs): - read(ds, T* ptr, args...) -- low-level raw pointer - read(fd, path, T* ptr, args...) - read(file_path, path, T* ptr, args...) - read(ds, T& ref, args...) -- primary by-reference path - read(fd, path, T& ref, args...) - read(file_path, path, T& ref, args...) - T read(ds, args...) -- return-by-value primary - T read(fd, path, args...) - T read(file_path, path, args...) Each docstring distinguishes its overload's role (primary vs convenience) and points at sibling overloads through @sa. * H5Dappend.hpp -- 4 public entry points decorated: - append(pt, const T&) -- buffered element append - append(pt, const T*) -- raw-chunk path - flush(pt) -- explicit chunk flush - reset(pt) -- dimension-tracker reset Includes streaming-loop @code example showing the create / append / flush flow against an extendable chunked dataset. * H5Dsparse.hpp -- both public entry points decorated: - write<T, LOC>(parent, path, sparse_src) - read<T, LOC>(parent, path) Documents the CSC group layout, scipy / 10x / Loompy interop, the uint32-on-disk index width limit, the sync() / makeCompressed() preconditions, and the ColMajor static_assert. Cross-references the Supported Linear Algebra Types page § Sparse storage layout. * H5Dopen.hpp -- single public entry point modernised: - open(fd, path, dapl) -- with note on the high-throughput pipeline tag auto-initialisation path. GROUP TITLES (h5cpp/H5config.hpp) * Renamed the @defgroup titles so the rendered group page titles read as short noun phrases instead of the inline-signature form the project used historically: - io-create "template <T> ds_t create( ... );" -> "HDF5 datasets -- create" - io-read "h5::read<T>( ds | path [,offset]...)" -> "HDF5 datasets -- read" - io-write "herr_t h5::write<T>( ds | path, object<T>...)" -> "HDF5 datasets -- write" - io-append "h5::append<T>( pt , T object);" -> "HDF5 packet table -- append" - io-wrap "`handle` | `type_id` with RAII" -> "RAII handles" - file-io "`h5::open` | `h5::create` | `h5::mute` | `h5::unmute`" -> "HDF5 files" * Added @defgroup sparse-io ("HDF5 sparse datasets") so the func_sparse_hdr alias actually produces a group page (was previously a dangling reference). Mirrors the attribute-io fix from the H5A sweep. * The attribute-io title rename ("HDF5 attributes") landed in the H5A commit and is unchanged here. ALIAS VOCABULARY * docs/links/h5cpp.txt + docs/aliases.md catalog -- added \func_append_hdr (-> @InGroup io-append) alongside the existing func_*_hdr family. Now used by the H5Dappend.hpp public surface. VERIFICATION * End-to-end compile + run on the H5D public surface: create<float> with current_dims{10,10}, write a vector, read it back, append 32 samples through h5::pt_t with chunk{16}, flush. All operations return 0, no diagnostic output. * Doxygen build clean -- no warnings.log produced.
1 parent 0504008 commit 90f1ab4

17 files changed

Lines changed: 770 additions & 506 deletions

docs/aliases.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,11 +139,12 @@ Conventions:
139139

140140
| Alias | Expands to | Use on |
141141
|---|---|---|
142-
| `\func_read_hdr` | `@ingroup io-read` | every `h5::read` overload |
143-
| `\func_write_hdr` | `@ingroup io-write` | every `h5::write` overload |
144-
| `\func_create_hdr` | `@ingroup io-create` | every `h5::create` overload |
142+
| `\func_read_hdr` | `@ingroup datasets` | every `h5::read` overload |
143+
| `\func_write_hdr` | `@ingroup datasets` | every `h5::write` overload |
144+
| `\func_create_hdr` | `@ingroup datasets` | every `h5::create` overload |
145+
| `\func_append_hdr` | `@ingroup datasets` | `h5::append` / `h5::flush` / `h5::reset` (packet table) |
145146
| `\func_attr_hdr` | `@ingroup attribute-io` | every `h5::aread` / `h5::awrite` |
146-
| `\func_sparse_hdr` | `@ingroup sparse-io` | sparse read/write overloads |
147+
| `\func_sparse_hdr` | `@ingroup datasets` | sparse read/write overloads |
147148
| `\func_async_hdr` | `@ingroup async-io` | `h5::async::*` factories |
148149
| `\func_traversal_hdr` | `@ingroup traversal` | `h5::ls` / `h5::dfs` / `h5::bfs` |
149150
| `\func_read_desc` | one-line read description | combine with `func_read_hdr` |

docs/links/h5cpp.txt

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,16 +82,23 @@ ALIASES += returns_ref="@return `h5::reference_t` handle (rule-of-five RAII; thr
8282
ALIASES += returns_paths="@return `std::vector<std::string>` of object paths relative to the start node"
8383

8484
############ FUNCTION GROUP HEADERS
85-
ALIASES += func_create_hdr="\ingroup io-create"
85+
# All dataset-side overloads land in the single `datasets` group (HDF5
86+
# datasets) — mirrors how `attribute-io` carries the full attribute
87+
# surface. The per-operation aliases (\func_read_hdr, \func_write_hdr,
88+
# etc.) are kept as semantic markers in source so a reader can tell at
89+
# a glance which surface a function belongs to; the @ingroup target is
90+
# uniform on the rendered side.
91+
ALIASES += func_create_hdr="\ingroup datasets"
8692
ALIASES += func_create_links="\sa_h5cpp \sa_hdf5 \sa_stl \sa_linalg"
8793

88-
ALIASES += func_write_hdr="\ingroup io-write"
94+
ALIASES += func_write_hdr="\ingroup datasets"
8995
ALIASES += func_write_desc="Write an object into an HDF5 dataset. The dataset must exist or be created first with `h5::create`."
9096

91-
ALIASES += func_read_hdr="\ingroup io-read"
97+
ALIASES += func_read_hdr="\ingroup datasets"
9298
ALIASES += func_read_desc="Read data from an HDF5 dataset into memory. Optional offset/stride/count/block arguments select a hyperslab; omitting them reads the entire dataset."
9399

100+
ALIASES += func_append_hdr="\ingroup datasets"
94101
ALIASES += func_attr_hdr="\ingroup attribute-io"
95-
ALIASES += func_sparse_hdr="\ingroup sparse-io"
102+
ALIASES += func_sparse_hdr="\ingroup datasets"
96103
ALIASES += func_async_hdr="\ingroup async-io"
97104
ALIASES += func_traversal_hdr="\ingroup traversal"

docs/reports/architecture/h5cpp-async-mode-compile-time-thread-safety-design.md

Lines changed: 0 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,5 @@
11
@page reports_async_mode_thread_safety h5cpp Async Mode — Compile-Time Thread-Safety via Type-Level Mode Discrimination
22

3-
**Author:** Winston (System Architect)
4-
**Date:** 2026-05-17
5-
**Scope:** Add a thread-safe operating mode to h5cpp, opt-in at file-open time, enforced at compile
6-
time via the existing `hid_t<..., false, false, ...>` type specialization. Mode is binary: classic
7-
(seamless C API mixing, single-threaded) versus async (executor-routed, compile-time block on C
8-
API). No HDF5 rebuild, no VOL connector, no source-compat impact for classic users.
9-
**Related issues:** h5cpp #239 (v1.12 back-compat — must land first); follows
10-
[[h5cpp-threaded-pipeline-sigma-queue-design]] which addressed compression parallelism.
11-
12-
---
13-
143
## 1. Problem Statement
154

165
h5cpp guarantees seamless mixing of typed RAII descriptors with raw HDF5 C API calls. This is the
@@ -420,56 +409,3 @@ CMake target asserts these fail with the expected diagnostic via
420409
|| Users who need *both* thread-safety and raw C mixing simultaneously have no answer here. (Neither does HDF5.) |
421410
|| Executor lifetime requires care — `std::shared_ptr` ownership through all derived descriptors. |
422411

423-
## 10. Positioning
424-
425-
This feature is **conference-talk material** the day it ships:
426-
427-
- "Compile-time mode separation for HDF5 thread safety" — CppCon, Meeting C++, ACCU
428-
- Concrete demo: same code, two modes, compiler enforces the contract
429-
- Comparison to HDF5's `--enable-threadsafe` build (global mutex, custom build) and to alternatives
430-
(kdb+ pricing, ArcticDB language constraints)
431-
- Pitch: "h5cpp is the only C++ HDF5 library that makes thread safety a first-class, compile-time
432-
property"
433-
434-
Aligns with the broader Vargalabs positioning discussed in
435-
[[h5cpp-product-positioning]] — H5CPP as the credibility-building loss leader, with each major
436-
feature drop generating a fresh talk-and-blog cycle.
437-
438-
## 11. Open Questions for Steven
439-
440-
1. **Default tag name.** `h5::async` is concise but overloaded with other meanings (coroutines,
441-
futures). Alternatives: `h5::concurrent`, `h5::threaded`, `h5::mt`, `h5::thread_safe`. Each has
442-
trade-offs. Prefer `h5::async` for brevity; flag if there's a clash with an existing symbol.
443-
444-
2. **Mode escape hatch.** Should there be *any* way to obtain a raw `::hid_t` from an `async_fd_t`
445-
for special cases? An explicit `fd.unsafe_handle()` that returns `::hid_t` and documents the
446-
thread-safety contract is loud enough to be safe. Default position: provide it, name it clearly,
447-
document it as "you are now responsible for executor coordination."
448-
449-
3. **`H5ES` integration.** HDF5 ≥ 1.13 has native event-set async (`H5Dread_async`, etc.). Should
450-
async mode optionally route through these instead of our executor? Pro: less code we write. Con:
451-
HDF5 floor moves to 1.13, more complexity, dual code paths. Default position: skip for now,
452-
revisit if user demand surfaces.
453-
454-
4. **`pt_t<async_ds_t>` API symmetry.** Should `h5::append(async_pt, item)` block per-item, or
455-
batch locally and only block on chunk commit (like classic `pt_t`)? Strongly prefer the latter
456-
for performance. Confirm.
457-
458-
5. **Phase 1 PR scoping.** Pieces 1-3 (type plumbing + `open`/`create` only) ship as one PR for
459-
review of the core design, before investing in operation overloads. Confirms the type-system
460-
approach works in the real codebase before committing to the full surface.
461-
462-
6. **Executor reuse across files.** Multiple `async_fd_t` instances opened from the same path
463-
today get separate executors. Should they share? Probably not — file isolation is cleaner — but
464-
document the behavior.
465-
466-
---
467-
468-
## Decision
469-
470-
Recommended: **proceed with this approach.** It is the smallest, fastest, lowest-risk path to
471-
shippable thread-safe h5cpp. It defers (without precluding) the VOL connector and the greenfield
472-
storage-system options. It preserves classic h5cpp identity. It ships in 5-6 weeks.
473-
474-
Next step on Steven's approval: file the umbrella issue against `staging` and start phase 1 in a
475-
fresh worktree once #239 lands.

docs/reports/architecture/h5cpp-compiler-multi-backend-architecture.md

Lines changed: 0 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
@page reports_compiler_multi_backend_architecture h5cpp-compiler Multi-Backend Architecture
22

3-
**Date:** 2026-05-21
4-
**Authors:** Steven Varga, Winston (Architecture)
5-
**Status:** Design; tier-1 (HDF5) implemented today; tier-2 backends (Protobuf, JSON Schema, SQL DDL, Avro) on the AI roadmap
6-
**Repos:** vargaconsulting/h5cpp, vargaconsulting/h5cpp-compiler
7-
8-
## TL;DR
9-
103
One C++ struct → many on-disk and over-the-wire artifacts. The h5cpp-compiler (or the C++26 reflection-based equivalent inside h5cpp) walks each user type exactly once and dispatches to a set of independent **producers**, each emitting its own artifact: HDF5 type registrations, Protobuf `.proto`, JSON Schema, SQL DDL, Avro schemas. Each producer reads its own attribute namespace; universal attributes apply across all.
114

125
The same struct can be persisted to disk as HDF5, exposed as an LLM tool-call schema via JSON Schema, advertised as a Protobuf message to an RPC server, and migrated into a SQL warehouse — from one source of truth.
@@ -238,41 +231,3 @@ Under Clang Tooling (the "today" vehicle), the same producers are C++ classes in
238231
239232
The user-facing surface — annotations on user structs, call to `h5::write(...)` for HDF5, build-system steps for other artifacts — is identical across both vehicles. See the reflection roadmap doc for the full transition plan.
240233
241-
## Relation to other workspace documents
242-
243-
- **`tasks/h5cpp-compiler-scatter-gather-design.md`** — defines tier classification, attribute system at the field level (the `h5::` namespace), and the cascade/strict policy macro for HDF5. This doc extends that to multiple backends.
244-
- **`tasks/h5cpp-reflection-cpp26-roadmap.md`** — defines the dual-vehicle strategy (reflection vs Clang Tooling) and the C++26 transition. This doc shows that strategy holds across all backends.
245-
- **`memory: project_h5cpp_compiler_ai_roadmap`** (Claude auto-memory pointer) — captures the four-backend ship order: Protobuf → JSON → SQL → Avro. This doc is the technical spec the roadmap pointed at.
246-
- **`tasks/h5cpp-compiler-prior-art-survey.md`** — competitive landscape; relevant because the multi-backend story is what differentiates h5cpp-compiler from `rootcling` (ROOT-only) and from serde (Rust, format-agnostic but no native HDF5).
247-
248-
## Open questions and follow-ups
249-
250-
1. **Per-backend default attribute resolution order.** When `h5::name`, `h5::sql::name`, and `h5::sql::column_name` are all present (varying specificity), which wins? Need to spec a precedence rule per backend.
251-
2. **Cross-backend type-system mismatches.** Some C++ types map cleanly to some backends and awkwardly to others (e.g., `std::variant<A,B,C>` is natural in Avro union, contortion in SQL, opaque in HDF5). Document the per-backend coping strategy for each tier-2/tier-3/tier-4 type.
252-
3. **Producer registration mechanism.** How is a new backend added? A static registry in `h5cpp-compiler`? A header-only producer template under `h5cpp/codegen/<backend>/`? Decision affects the extensibility story for third parties.
253-
4. **Class-level naming convention propagation.** `h5::name_all("snake_case")` at class level — does it propagate into per-backend producers automatically, or does each backend have its own `*_name_all`? Suggest: propagate by default; backend-specific override available.
254-
5. **`h5::tool_format("openai" | "anthropic" | "mcp")` envelope wrappers.** The JSON producer wrapping output in tool-calling envelopes is what makes h5cpp-compiler an AI-friendly tool. Worth its own design pass: what does an "MCP server tool descriptor" envelope look like, exactly? Reference: Anthropic's MCP spec.
255-
6. **CMake API stability.** The multi-format `h5cpp_compiler_generate` invocation is a breaking change to the helper. Worth a major-version bump for the CMake helper module; document migration.
256-
7. **Compilation cost.** Five producers running on every walk multiplies cost by ~5x in the worst case. Mitigation: emit only the requested FORMATS; cache producer outputs; in the reflection vehicle, each backend producer is independent and parallelizable.
257-
258-
## Implementation phasing (cross-reference)
259-
260-
The order from `project_h5cpp_compiler_ai_roadmap`:
261-
262-
1. **Protobuf** — finish the stub already wired in `h5cpp-compiler` (`--protocol-buffers` flag added 2026-05-21)
263-
2. **JSON Schema + C++ codec** — highest reuse: schema for contracts, codec for actual JSON I/O
264-
3. **SQL DDL** — smallest type-mapping surface; most users have a DB
265-
4. **Avro** — nearly free once JSON Schema is done; same type catalogue, different envelope
266-
267-
Each step is one new producer + an entry in the dispatch table. The walker and attribute infrastructure stay the same.
268-
269-
## Sources
270-
271-
- `tasks/h5cpp-compiler-scatter-gather-design.md` (in-workspace; design source for tier classification and h5cpp:: attribute set)
272-
- `tasks/h5cpp-reflection-cpp26-roadmap.md` (in-workspace; dual-vehicle strategy)
273-
- `tasks/h5cpp-compiler-prior-art-survey.md` (in-workspace; competitive context)
274-
- [Protocol Buffers Style Guide](https://protobuf.dev/programming-guides/style/)
275-
- [JSON Schema 2020-12](https://json-schema.org/draft/2020-12)
276-
- [Apache Avro Specification](https://avro.apache.org/docs/current/specification/)
277-
- [Anthropic MCP Specification](https://modelcontextprotocol.io/specification)
278-
- [OpenAI Function Calling JSON Schema format](https://platform.openai.com/docs/guides/function-calling)

docs/reports/architecture/h5cpp-compiler-scatter-gather-design.md

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,5 @@
11
@page reports_compiler_scatter_gather_design h5cpp-compiler Scatter/Gather Design
22

3-
**Date:** 2026-05-21
4-
**Authors:** Steven Varga, Winston (Architecture)
5-
**Status:** Design approved; implementation pending
6-
**Repo:** vargaconsulting/h5cpp-compiler (branch: staging)
7-
8-
## Decision
93

104
h5cpp-compiler will extend its AST matcher to "color" types into tiers and emit different template specializations per tier. The h5cpp library dispatches between them via a `has_scatter<T>` trait at compile-time. User-facing API (`h5::write`, `h5::read`) remains unchanged.
115

@@ -449,37 +443,3 @@ frame_t f = simulate();
449443
h5::write(fd, "frames", f); // dispatch chosen at compile-time via has_scatter trait
450444
```
451445

452-
## Relationships
453-
454-
- **AI roadmap** (separate document): scatter/gather is orthogonal to the protobuf/JSON/SQL/Avro backend roadmap. It is a property of the *walker* and the *library dispatch*, applicable to every backend. The same matcher emits scatter glue alongside whatever schema artifact is requested.
455-
- **h5cpp reflection sandwich**: this design slots into the existing core + compiler-generated shim + io layering. The shim already holds `register_struct<T>` specializations; adding `scatter<T>` / `gather<T>` is a same-layer extension.
456-
- **Non-HDF5 backends**: the write-side approach generalizes — `hvl_t` becomes `iovec` (for `writev(2)`), `ibv_sge` (for RDMA), or `rte_mbuf` chains (for DPDK). The matcher tier classification is reusable; only the per-tier producer changes.
457-
458-
## Implementation Notes
459-
460-
- Cycles and polymorphism in tier 4 require runtime visited-ID tables and type registries. The generator can scaffold the call sites, but the user must accept the buffer copy.
461-
- For tier 2+ types, do not add new entries to `cpp2hid` — that table is for `H5T_NATIVE_*` only. STL types like `std::complex`, `std::array`, `std::pair` belong in h5cpp's library-side type traits, not in the compiler.
462-
- The compiler is a **multi-trait generator**, not a single-format emitter. One AST walk emits whatever specializations the type needs.
463-
464-
## Reference Examples
465-
466-
Four canonical input examples — one per tier — live in the h5cpp-compiler repo at `examples/tier-{one,two,three,four}/`. Each directory mirrors tier-one's structure: `CMakeLists.txt`, `README.md`, the user's class header, `vector.cpp` driver, and `generated.h` (real for tier 1, stub bootstrap placeholder for tiers 2–4 until scatter/gather codegen lands).
467-
468-
Each row below shows the **user class** the example defines (this is what the tier classification applies to), the HDF5 layout the compiler will emit for that class, and current implementation status.
469-
470-
| Directory | User class (the tier-classified C++ type) | HDF5 the compiler will emit | Status |
471-
|---|---|---|---|
472-
| `examples/tier-one/` | `sn::sensor::reading_t` — scalars + `double[3]` axes + scalar temperature; tier 1 because all fields are POD | `H5T_COMPOUND { uint64, uint32, ARRAY[3] double, float }`, chunked + gzip variant | ✔ Implemented; build + run verified |
473-
| `examples/tier-two/` | `sn::sensor::session_t``std::string` label + 2× `std::vector<double>`; tier 2 because of the string and vector fields | `H5T_COMPOUND { uint64, uint64, VLEN_STRING, VLEN<double>, VLEN<double> }`, chunked, global-heap payloads | 🚧 Target state; build fails until scatter/gather lands |
474-
| `examples/tier-three/` | `sn::sensor::network_t``std::vector<std::string>` + `std::map<uint32_t, std::vector<sample_t>>`; tier 3 because of the map and ragged vector-of-string | Either nested VLEN compound OR decomposed `/scans/.../keys`, `/offsets`, `/values` dataset group | 🚧 Target state |
475-
| `examples/tier-four/` | `sn::sensor::log_t``std::vector<event_t>` where `event_t` has `std::variant<reading_t, calibration_t, fault_t>`; tier 4 because of the variant payload | `H5T_OPAQUE` payload + sibling tag dataset, or union-style compound with discriminant; field carries `[[h5::serialize_full]]` | 🚧 Target state; opt-in required |
476-
477-
Each file shows the *user-facing input*: the user's class definitions plus the `h5::write` / `h5::read` call site that "colors" the class for the AST matcher. The shim code that the compiler emits in response is described above (Per-Type Generated Artifacts). For tiers 2–4 the `generated.h` placeholder carries an explicit bootstrap comment naming the missing feature, so the failing build serves as the visible checkpoint.
478-
479-
**Note on top-level container wrapping.** When the user invokes `h5::write(fd, "ds", std::vector<T>{…})`, the *vector itself* is not a tier-classified user class — the library's existing top-level template handles it by iterating per element. The tier of `T` (the user class) is what determines the HDF5 layout per row and the MPI compatibility:
480-
481-
- `std::vector<reading_t>` (tier-1 element) → 1-D dataset of fixed compound, MPI ✔
482-
- `std::vector<session_t>` (tier-2 element) → 1-D dataset of VLEN compound per row, MPI ✘
483-
- `std::vector<network_t>` (tier-3 element) → ditto, worse
484-
485-
Wrapping a tier-N class in `std::vector` does not change N. It only repeats the per-element layout across rows.

docs/reports/architecture/h5cpp-compiler-scatter-gather-visitor-refactor.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
@page reports_compiler_scatter_gather_visitor_refactor h5cpp-compiler Scatter/Gather Visitor Refactor
22

3-
**Author:** Steven Varga
4-
**Date:** 2026-05-25
5-
**Status:** Design approved; implementation pending
6-
**Scope:** h5cpp library + h5cpp-compiler
7-
**Replaces:** `h5cpp-compiler-scatter-gather-design.md` §Per-Type Generated Artifacts (Scatter Path)
8-
9-
---
103

114
## 1. Problem Statement
125

0 commit comments

Comments
 (0)