Skip to content

feat(moq-net): per-track durable cache on TrackState (RAM + disk/remote tiers)#1841

Open
kixelated wants to merge 25 commits into
devfrom
claude/moq-archive-planning-r7xgqp
Open

feat(moq-net): per-track durable cache on TrackState (RAM + disk/remote tiers)#1841
kixelated wants to merge 25 commits into
devfrom
claude/moq-archive-planning-r7xgqp

Conversation

@kixelated

@kixelated kixelated commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

A per-track durable cache in moq-net: keep recent groups past the live window and serve them
back on a FETCH, spilling to local disk and optionally remote object storage (object_store). It
lives in moq-net so any consumer of a track (relay, edge, archiver) gets durable caching for free.

This started as a design doc (rs/moq-archive/DESIGN.md, since deleted) and is now implemented.
The final shape, after review, puts the cache on the shared TrackState rather than as a
separate handle wired onto each endpoint:

  • RAM is the live window. No second in-memory copy of recent groups: the cache reuses
    TrackState.groups, the buffer the track already keeps for live subscribers. A group is
    serialized to cache::Group and handed to the disk tier only when it ages out of that window.
    Inserting a group is caching it; the old spawned populate-subscriber is gone.
  • Two retention gates in evict_expired, both sized by TrackInfo::cache. A group is evicted
    when it trips either:
    • wall-clock — received (an Instant stamped on arrival, never on the wire) more than the
      window ago. The hard memory backstop against timestamp abuse.
    • media-time — its last frame timestamp is more than the window behind the live media edge.
      Bounds a startup stampede where a burst of buffered media arrives "all at once".
  • Serving. fetch_group resolves a live miss from the disk/remote tiers via an async lookup
    spawned on the first poll. A store hit resolves the fetch; a store miss chains upstream
    the lookup task queues the request for a TrackDynamic (a wire FETCH for a relay) when one
    exists, so the fetch resolves once upstream serves the group into the live window. Queuing only
    after the store misses keeps the store the fast path and avoids a redundant upstream fetch. With
    no handler, a miss is NotFound. get_group stays synchronous and live-only. Because the cache
    is on the shared state, every TrackConsumer of the track serves from it automatically.
  • Tiers. Disk and remote are object_store, target-gated to cfg(not(target_arch = "wasm32"))
    so native builds get them with no flag and wasm drops the server-side stack automatically. The
    disk byte format (segment.rs: self-describing segments + rollup), the multi-tier index
    (index.rs), and the object_store glue (store.rs) are unchanged; the flush task writes one
    disk segment per eviction pass and rolls the oldest disk segments up into large remote objects.

See rs/moq-net/CACHE.md for the full write-up.

Public API (moq-net, library crate — targets dev)

New:

  • module moq_net::cache: Disk (+ new/with_remote), Bounds, Limit, Frame, Group
    (+ read/produce/size/ts_first/ts_last), Batch, and the segment / store submodules.
    Disk / store are native-only (target-gated).
  • TrackProducer::with_cache(cache::Disk) -> Self (native-only).

Relative to dev there are no removals (the cache is new on this branch). TrackFetch gains a
private field. TrackInfo::cache is not removed yet — it still sources the retention window;
making retention purely local (and dropping the wire field) is deferred.

Cross-package sync

The cache is local policy, never on the wire, so js/net and doc/concept need no matching change.
moq-cli/moq-relay flags for with_cache are noted as follow-up in CACHE.md.

Test plan

  • cargo test -p moq-net --lib (427 pass), including the new get_group_does_not_read_disk,
    fetch_group_serves_evicted_group_from_disk, fetch_chains_upstream_on_cache_miss,
    fetch_cache_miss_without_dynamic_is_not_found, and tiers_evict_then_fetch_back, plus the
    retained segment/index/store unit tests.
  • cargo clippy -p moq-net --all-targets --all-features, cargo fmt, rustdoc (-D warnings),
    cargo check --no-default-features, and the native dependents (moq-relay, moq-cli).
  • wasm target build (gating verified by inspection; the local env's getrandom wasm backend
    isn't configured, so CI is the real check).

🤖 Generated with Claude Code

(Written by Claude)

Design-only first pass for a new moq-archive binary/library that records a
single track to tiered storage (RAM -> disk -> S3 via object_store) and serves
old groups back through the moq-lite-05 FETCH path (TrackDynamic). Covers the
segment/index on-disk format, out-of-order group handling, per-tier optional
retention measured by media timestamp with wall-clock fallback, the public API
sketch, and open questions. No implementation yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Comment thread rs/moq-archive/DESIGN.md Outdated
the final frame count. This is our "safe to flush" signal.

**Serving side (produce a track + answer fetches):**
- `TrackProducer::dynamic() -> TrackDynamic`.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just take a TrackDynamic.

Comment thread rs/moq-archive/DESIGN.md Outdated
- `GroupRequest::sequence() -> u64`, `GroupRequest::accept(info) -> GroupProducer`. We fill the
returned `GroupProducer` with `create_frame` / `write` / `finish` from storage, then
`GroupProducer::finish()`.
- To expose the track over a session, wrap the producer in a `BroadcastProducer` and publish

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the caller will do this. For example, moq-relay might first try to use moq-archive to handle any dynamic requests.

I think moq-archive also needs it's own dynamic() to make that work? Because if something isn't on disk or on S3, we'll have to ask the original publisher for it.

That might be an argument for stuffing this logic directly into moq_net::TrackProducer and moq_net::TrackConsumer? It feels a bit gross but IDK the API is far more friendly. We would also definitely know when data is evicted from RAM cache too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captured this as the lead open question and kept v1 wiring the chain in moq-archive (caller owns the TrackProducer, archive takes the TrackDynamic plus an upstream TrackConsumer to forward misses + record). Agreed the moq-net-native version is the friendlier API and the only one that reliably knows when a group is evicted from the RAM cache, which is the natural flush trigger.

The way I'd frame the decision: do we want moq_net::TrackProducer to accept a pluggable storage backend (a trait Cache it consults on a miss and notifies on eviction), so the archive becomes "attach storage to a track" rather than "wire a fallback chain"? That keeps storage code out of moq-net but gives moq-net the one hook (eviction) we can't get from the outside. If you're good with that shape I'll redesign the public API around it before any implementation; if you'd rather keep moq-net pure for now, the chain version stands. Which way do you want to go?

(Written by Claude)


Generated by Claude Code

Comment thread rs/moq-archive/DESIGN.md Outdated
prefix and namespaced by broadcast/track:

```
<root>/<broadcast>/<track>/segments/<segment-id> # concatenated groups

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broadcast and track names can have slashes. I guess we need to escape them.

Comment thread rs/moq-archive/DESIGN.md Outdated

## Open questions

1. **`frame_start` granularity.** moq-lite-05 FETCH can request "group N starting at frame K".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might remove this from moq-lite-05. The current API doesn't even support it.

Comment thread rs/moq-archive/DESIGN.md Outdated
Need a crash-consistency story: write the segment object first, then its index entries, so a
half-written segment is simply never indexed (and is GC'd by a startup sweep of unindexed
segments).
3. **Aggregation/compaction shape.** When promoting disk -> S3, do we copy segments 1:1 or

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge multiple groups at each rollup step. So RAM can be fragmented, but each rollup becomes less fragmented.

ex. store for up to 30s in RAM, flush in 10s segments to disk. Store on disk for up to 5m, flush to S3 in 1m batches?

IDK if we want to get an LRU involved or we just eyeball RAM usage though. I feel like an LRU is correct in general for both RAM and disk though. We could use the used() and unused() state too I guess. No sense keeping unused stuff in RAM if we could combine it with the next flush to disk.

Comment thread rs/moq-archive/DESIGN.md Outdated
concatenate many small disk segments into one big S3 object (rewriting offsets in the
index)? Concatenation is better for S3 request economics but adds a rewrite step. Lean
toward 1:1 in v1, compaction later.
4. **Serving the *latest* group / live edge.** v1 answers FETCH for past groups. Should the

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always keep the latest group in RAM.

Comment thread rs/moq-archive/DESIGN.md Outdated
4. **Serving the *latest* group / live edge.** v1 answers FETCH for past groups. Should the
archive also serve a live `subscribe` (replay newest groups as they land) so it can stand in
for a departed origin? That is closer to DVR and probably a follow-up.
5. **Index for a hot, long archive.** A multi-day archive has a large index. JSONL + in-RAM

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to nail down the index file format for sure.

Address the review on the planning doc:
- serve() takes a TrackDynamic, not a TrackProducer; archive is a composable
  link in a cache chain (relay -> archive -> origin) and forwards storage
  misses to an upstream TrackConsumer. Note the open question of folding this
  into moq_net core types.
- Tiering reworked around progressive rollup: each step concatenates multiple
  units from the tier above (RAM 30s -> 10s disk segments -> 1m S3 objects),
  so fragmentation drops downward. Resolves the 1:1-vs-concatenate question.
- Eviction is LRU + size budget, not just age; always keep the latest group in
  RAM; use moq-net used()/unused() to flush unused groups early.
- Nail down the index format: per-segment postcard footer + per-track manifest
  (Parquet/Iceberg shaped), replacing the JSONL sketch.
- Percent-encode broadcast/track names since they contain slashes.
- Drop sub-group frame_start (likely removed from moq-lite-05; unsupported by
  the current API); serve whole groups.
- Add a prior-art survey (BookKeeper, Kafka KIP-405, Haystack/Bitcask,
  Parquet/Iceberg) and why no single embeddable crate covers batch+index+S3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo

Copy link
Copy Markdown
Collaborator Author

Folded all the review comments into the latest commit:

  • Take a TrackDynamic (not a TrackProducer): the archive is now a composable link in a cache chain (relay → archive → origin); caller owns the producer.
  • Miss fallback / chaining: on a storage miss the archive forwards to an upstream TrackConsumer instead of returning NotFound; whether this should instead live inside moq_net core types is now the lead open question (replied on that thread for your call).
  • Rollup: each tier concatenates multiple units from the tier above (RAM 30s → 10s disk segments → 1m S3 objects), so fragmentation drops downward. Resolves the old 1:1-vs-concatenate question toward concatenate-at-every-rollup.
  • Eviction: LRU + byte budget per tier, not just age; always keep the latest group in RAM; use used()/unused() to flush unused groups early.
  • Index format nailed down: per-segment postcard footer (group offset table) + per-track manifest, Parquet/Iceberg-shaped, replacing the JSONL sketch.
  • Slashes: percent-encode broadcast/track names into single opaque path segments.
  • Sub-group frame_start: dropped (likely removed from moq-lite-05, unsupported by the current API); the archive serves whole groups.

Also added a prior-art survey (BookKeeper ledgers, Kafka KIP-405 segments, Haystack/Bitcask, Parquet/Iceberg) and why no single embeddable crate covers batch + index + S3 tiering, in answer to the earlier "is there an existing library?" question.

(Written by Claude)


Generated by Claude Code

Add a Usage section with concrete call sites:
- Scenario A: a standalone VOD node serving a recorded broadcast back; the
  per-track serve(TrackDynamic) API fits because the archive is the publisher.
- Scenario B: why moq-relay cannot use the per-track API as-is. The relay only
  holds an OriginProducer and forwards whole broadcasts; the per-track
  TrackProducer + dynamic() is created inside moq-net's subscriber fan-out, so
  there is no seam to hand the archive a TrackDynamic.
- Scenario C: the recommended moq-net seam, a pluggable Cache trait consulted
  on a miss and notified on eviction, attached to the origin, reducing the
  relay integration to one line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo

Copy link
Copy Markdown
Collaborator Author

Added a Usage mockups section with concrete call sites, and you're right that the relay doesn't fit the per-track API. Three scenarios:

A. Standalone / VOD node — works with serve(TrackDynamic) today, because the archive is the publisher: it publish_broadcasts, and for each requested_track it accepts the producer, takes producer.dynamic(), and serves from storage. No path collision since the live broadcast is gone.

B. Why moq-relay can't use it as-is — the relay only holds a single OriginProducer and forwards whole broadcasts; it never constructs a TrackProducer or calls .dynamic(). That happens inside moq-net's lite/ietf subscriber fan-out, so there's no point in relay code to hand the archive a TrackDynamic. Forcing it would mean republishing every broadcast through an archive-owned BroadcastProducer and re-accepting every track, i.e. reimplementing fan-out around the archive.

C. The moq-net seam (recommended) — a pluggable trait Cache that moq-net consults on a miss and notifies on eviction, attached at the origin where it already owns the per-track RAM cache. Then the whole relay integration is:

let origin = Origin::random().produce().with_cache(archive);

moq-net catches evictions (the natural flush trigger, which the chain version can't see) and serves misses before they cost an upstream FETCH. The public Archive becomes a Cache impl plus the standalone record/serve helpers.

This is the same fork as the open question I flagged, now made concrete: do you want me to redesign the API around a moq_net::Cache trait (relay-friendly, needs a small moq-net addition), or keep the per-track chain version (standalone/VOD only for now)? I'd recommend the Cache trait. Pushed in 93046420.

(Written by Claude)


Generated by Claude Code

Add rs/moq-net/CACHE.md: a spike for a per-track group cache owned by
TrackProducer. Concrete CacheConfig value (no trait, no callback), per-track
[min, max] bounds on size and duration, watermark flush that batches the
max-min band into one segment (which an LRU cannot do), RAM -> disk -> remote
tiers via object_store behind a feature flag, served by ranged read with no
fault-in. Removes the wire-visible TrackInfo.cache in favor of local,
producer-owned policy. Adds an interval flush backstop for low-rate tracks.

Reconcile rs/moq-archive/DESIGN.md scenario C and open question 1 to the
concrete cache (drop the rejected Cache trait sketch) and cross-link the spike.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
@kixelated kixelated changed the title docs(moq-archive): design proposal for track archival crate docs: design for track archival + per-track moq-net cache Jun 21, 2026
claude added 14 commits June 21, 2026 16:24
Sketch the CacheArgs clap group for `moq serve` / `moq accept`: --cache-ram
(+ -min), --cache-disk(-age), --cache-remote(-age), --cache-interval, mapping
onto the per-track [min, max] bounds and the RAM -> disk -> remote cascade.
Absent --cache-ram leaves caching off. Design only; wiring waits on the
moq_net::CacheConfig API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
The cache becomes a cloneable `Cache` handle (built from CacheConfig) whose
clone shares the same tiers, so one cache can back both a track's TrackProducer
and its TrackConsumer. Add TrackConsumer::with_cache and spell out the consumer
fetch semantics: fetch_group / get_group resolve from the cache first (RAM sync,
disk/remote after a ranged read), miss falls through to the wire and populates
the cache, and live subscribe groups populate it too. A cache-backed consumer
with no upstream answers FETCH straight from storage (the archive serve path).
Inserts dedup by sequence so sharing one cache across both sides is safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Add the moq_net::cache module: a per-track bounded group cache with the
produce/consume split. cache::Config::produce() yields a cache::Producer (write
half, not Clone) and Producer::consume() a cache::Consumer (read half, Clone),
sharing one store so a cache backs both a track's producer and consumer.

Eviction is a high/low watermark, not an LRU: an insert over the high watermark
drains the oldest groups down to the low watermark and returns them as one Batch
(the caller persists it to the next tier), which is what lets a group-per-frame
audio track avoid one tiny object per group. Bounds are per-track on bytes and
duration (media-timestamp span), the latest group is never evicted, and inserts
dedup by sequence so sharing one cache across both endpoints is safe.

14 unit tests cover get/miss, dedup, byte and duration watermarks, batch
contents, always-keep-latest, hysteresis within the band, unbounded and
min-unset edges, and out-of-order inserts.

Disk/remote tiers and the TrackProducer/TrackConsumer with_cache wiring remain
design; CACHE.md is reconciled to the implemented names and marks what is built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
The module-level //! doc used intra-doc links to same-module items (Batch,
Producer, Consumer), which rustdoc cannot resolve from an inner module doc
(even with self:: paths), failing `cargo doc -D warnings` in CI. Use plain
code spans there; item-level links are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Add cache::segment: the on-disk byte format for the cache's disk/remote tiers
and the rollup that compacts small segments into one larger object.

A segment is one band of groups (a Batch) serialized as group blobs back to
back, a footer offset table, then a fixed 8-byte trailer (footer length +
magic). The trailer being last and fixed-size lets a reader fetch it with one
tail-ranged GET, parse the footer, then fetch just the byte range of the wanted
group. Each blob is self-delimiting (frame count, then length-prefixed frames
carrying their optional media timestamp). Timestamps store raw value+scale, so
a non-micro timescale (e.g. 90kHz video) round-trips exactly. Reuses the QUIC
VarInt codec. rollup copies group blobs verbatim and rewrites offsets, so it is
lossless and does not re-encode frames.

To serialize losslessly, cache::Group now carries per-frame timestamps
(cache::Frame { timestamp, payload }) with ts_first()/ts_last() derived for the
duration bound; the RAM tier and its tests are updated accordingly. The module
becomes a directory (cache/mod.rs + cache/segment.rs).

12 new segment tests (batch/single round-trip, footer summary, lossless
non-micro scale, mixed/absent timestamps, empty group and empty batch, missing
sequence, bad magic, truncation, rollup concat + offsets + single-segment +
corrupt-input). All 415 moq-net lib tests pass; clippy and rustdoc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Add cache::index: the storage-agnostic layer that ties the segment format and
rollup into serving across tiers. It maps each group sequence to a Location
(tier + segment + byte range), so a fetch is "locate, then ranged-read that
segment." It tracks per-tier byte and duration totals, and drives promotion:
Index::promotion picks the oldest disk segments once the disk tier is over its
high watermark (draining to the low watermark, oldest first), and
Index::apply_promotion registers the rolled-up remote segment, repoints those
sequences at the remote tier, and drops the promoted disk segments.

The index holds only metadata, never group bytes, so it is the part that stays
in memory while bytes live on disk/remote. Add segment::group_from_blob, the
ranged-read decode entry point (decode one group from just its blob bytes), and
Segment::byte_len for tier accounting. The remaining object_store put/get_range
glue is a thin layer over these decisions.

7 index tests including an end-to-end check: encode segments, locate, ranged-
read via group_from_blob, then promote (rollup) and confirm every group still
decodes identically through the remote segment. Full moq-net suite 422 pass;
clippy and rustdoc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
The cloned_ref_to_slice_refs lint (rust 1.96) flags &[x.clone()]; use
std::slice::from_ref(&x) instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Wrap the long rollup line flagged by cargo fmt --check in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Add cache::Group::read (async: drain a live GroupConsumer into a cache::Group,
reading each frame's payload and timestamp, resolving on group finish) and
cache::Group::produce (sync: rebuild a live GroupConsumer from a cache::Group,
validating frame timestamps against the track timescale). These are the bridge
the TrackProducer populate path and TrackConsumer serve path both use.

Two async round-trip tests (timed and untimed groups): live -> cache -> live ->
cache preserves sequence, payloads, and per-frame timestamps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
TrackProducer::with_cache(cache::Producer) spawns an internal subscriber that
drains each finished group into the cache (producer fills). The subscription
keeps the track active while caching, independent of downstream demand.

TrackConsumer::with_cache(cache::Consumer) attaches a read-through cache:
get_group and fetch_group resolve from it on a live-state miss, rebuilding the
group at the track's timescale. fetch_group serves from the cache before
failing with NotFound or waiting on a TrackDynamic, via a pre-resolved branch
added to TrackFetch. So a consumer sharing the producer's cache supports fetch
without a wire round-trip.

Three tests: producer fills the cache and a shared reader sees the group;
get_group and fetch_group fall through to the cache and read back byte-for-byte.
Full moq-net suite 427 pass; clippy and rustdoc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Add the cache-tiered feature (off by default so RAM-only and wasm builds stay
dependency-free) and cache::store::Store, the object_store glue over the index:

- flush(batch): segment::encode the band, put it as one disk segment, record it
  in the index, then compact.
- get(seq): index.locate -> get_range the blob -> segment::group_from_blob.
- compact(): when the disk tier is over its bounds, read the oldest segments,
  segment::rollup them into one remote object, apply_promotion to repoint the
  index, and delete the disk objects; with no remote tier, evict them instead.

object_store is added default-features = false (core + memory + local fs, no
cloud SDKs). Add Index::evict for the no-remote eviction path.

5 tests against object_store::memory::InMemory: flush/get round-trip, promotion
to remote preserving all groups, eviction of the oldest without a remote, plus
a non-gated Index::evict unit test. Default suite 428 pass; with cache-tiered,
39 cache tests pass; clippy/rustdoc/fmt and --no-default-features clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Correctness:
- index::promotion never selects the single newest disk segment (mirrors the
  RAM "keep latest" rule). Without it, an unset low watermark drained the whole
  disk tier on one over-max trip -> data loss of recent groups on the no-remote
  eviction path.
- store crash-consistency: write the object before mutating the index. flush
  puts the disk segment under index.next_id() then add()s; compact uploads the
  rolled remote object before apply_promotion. A failed put/upload now leaves
  the index (and the disk segments it points at) intact instead of stranding
  sequences on a nonexistent object. Added Index::next_id() for the peek.
- store::get guards offset+length with checked_add (a corrupt footer could
  overflow u64 and produce a bad range), matching segment::blob.
- TrackConsumer::fetch_group rebuilds the cached group synchronously and treats
  a rebuild error as a miss (falling through to the live path), consistent with
  get_group. Previously a produce() error (e.g. timescale mismatch on a
  not-yet-accepted wire consumer) surfaced as a hard fetch failure. TrackFetch
  now holds the rebuilt GroupConsumer.

Cleanup:
- store_of expects a configured remote tier for a remote location instead of
  silently falling back to disk.
- index is internal orchestration: gate it behind cache-tiered (its only user
  is the gated store) and drop it from the public surface; segment::GroupEntry
  gets #[non_exhaustive].
- Document the two with_cache caveats the review surfaced (the internal
  subscriber disables demand teardown; a stalled group head-of-line-blocks
  caching of later finished groups).

Default suite 427 pass; with cache-tiered, 39 cache tests pass. clippy (both
feature sets), rustdoc, and fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo

Copy link
Copy Markdown
Collaborator Author

Extensive review (self-review with three finder passes + verification)

Ran a high-recall review over origin/dev...HEAD (correctness, removed-behavior, cross-file, reuse/simplification/efficiency, altitude, conventions). Fixes pushed in 71deaf25. Below: the public-API surface, what was fixed, and two open design calls for you.

Public API surface (all additive → not breaking; dev is correct)

New under moq_net::cache (pub mod cache):

  • Config (#[non_exhaustive], Default), Bounds, Limit, Frame, Group, Batch
  • Producer (intentionally not Clone), Consumer (Clone); Group::read / Group::produce bridges
  • cache::segment: encode, rollup, group_from_blob, Segment, GroupEntry (#[non_exhaustive]), Error
  • cache::store (behind cache-tiered): Store, Error
  • TrackProducer::with_cache, TrackConsumer::with_cache

No existing signature changed. The one planned breaking change, removing the wire field TrackInfo.cache, is still pending and is dev-appropriate.

Correctness findings — fixed

  1. Disk tier could wipe itself. index::promotion drained the entire disk tier when the low watermark was unset (floor of zero), so one over-max trip evicted even the newest flushed groups on the no-remote path. Now keeps the newest segment, mirroring the RAM tier's "keep latest" rule.
  2. Crash-consistency. store::flush/compact mutated the index before the put/upload. A failed write stranded sequences pointing at a nonexistent object. Reordered to write-then-index (added Index::next_id() to reserve the key); a failure now leaves the index intact.
  3. Overflow guard. store::get computed offset + length unchecked; a corrupt footer could overflow u64. Now checked_add, matching segment::blob.
  4. fetch vs get inconsistency. fetch_group's cache path surfaced a produce() rebuild error (e.g. timescale mismatch on a not-yet-accepted wire consumer) as a hard fetch failure, while get_group treated it as a miss. Now both rebuild synchronously and miss-through on error.

Cleanup — applied

  • store_of now expects a configured remote tier instead of silently reading disk for a remote location.
  • index is internal orchestration (only the gated store uses it) → gated behind cache-tiered and dropped from the public surface; segment::GroupEntry got #[non_exhaustive].
  • Varint coding correctly reuses VarInt::{encode,decode}_quic (not duplicated); both error enums use thiserror + #[from] + #[non_exhaustive]; no em dashes.

Two open decisions for you (not unilaterally changed)

A. TrackProducer::with_cache populate disables demand teardown. The populate path subscribes internally, so producer.unused() never resolves while a cache is attached, and a relay that drops idle tracks by demand (lite/ietf subscriber, session.rs) won't drop a cached one. That's intended for "keep recording when idle" (archive/DVR), but wrong for a transient relay cache. Options: keep as-is (documented now); populate on group eviction instead (no subscription, but needs a sync frame-snapshot primitive on groups); or populate from the relay's existing fan-out consumer. This is the same lifecycle question I flagged earlier — worth settling before wiring the relay.

B. Same populate loop head-of-line-blocks on each group's finish (Group::read resolves only when a group finishes), so a stalled group delays caching of later finished ones. Fixable with concurrent reads (FuturesUnordered) if we keep the subscribe approach — I held off pending the decision in A, since the populate design may change.

Both caveats are now documented on with_cache. Everything else is fixed; default suite 427 pass, cache-tiered 39 pass, clippy/rustdoc/fmt clean.

(Written by Claude)


Generated by Claude Code

claude added 2 commits June 22, 2026 02:32
Delete rs/moq-archive (a design doc only, never a crate); the cache lives in
moq-net now. Scrub the archive references from CACHE.md.

Replace the `cache-tiered` feature with target-gating: object_store is a
server-side library that doesn't build for wasm, but it needn't be opt-in.
Move it to [target.'cfg(not(target_arch = "wasm32"))'.dependencies] and gate
the index/store modules with cfg(not(target_arch = "wasm32")). Native builds
now get the disk/remote tiers automatically with no feature flag, wasm drops
them automatically, and the tier tests run in the default `cargo test` on the
host. Default suite 438 pass; clippy/rustdoc/fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
cache::Config gains an optional `disk: Disk` tier (native-only): an object_store,
a key prefix, retention bounds, and an optional remote rollup store. Build with
Disk::new(...).with_remote(...).

Config::produce() now spawns a background task (when a disk tier is set) that
drains RAM-evicted bands through an mpsc channel into a shared store::Store
(behind a tokio RwLock so reads run concurrently with the flusher).
Producer::insert hands each evicted band to that task instead of returning it.
Consumer gains an async fetch() that reads RAM, then disk, then remote (get()
stays a sync RAM-only lookup). State now holds the RAM Bounds directly rather
than the whole Config.

So a cache with a disk tier now actually spills to and serves from disk/remote
with no extra wiring, on native; wasm builds drop the tier fields entirely.

Test: a Producer with an InMemory disk tier, insert past the RAM watermark, and
the consumer fetches the evicted group back from disk. Default suite 439 pass;
clippy/rustdoc/fmt and --no-default-features clean. CACHE.md updated to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
}

/// The first frame's media timestamp, if any. Used as the group's lower time bound.
pub fn ts_first(&self) -> Option<Timestamp> {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that timestamp is optional, and only supported in moq-lite-05. If we actually need this timing information, we also need to store an Instant when the frame was received (wall clock timestamp).

I don't know if it should be a separate field, or if timestamp should be an enum (producer supplied, or received time).

/// Drain a live [`GroupConsumer`](crate::GroupConsumer) into a cached group, reading every
/// frame's payload and timestamp. Resolves once the group is finished, so this is how the
/// producer side snapshots a finished group for caching.
pub async fn read(mut group: crate::GroupConsumer) -> Result<Self, crate::Error> {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this more efficient with a poll variant? So we don't create the Vec until we know the exact number of frames.


/// Rebuild a live [`GroupConsumer`](crate::GroupConsumer) from this cached group, for serving a
/// fetch. `timescale` must match the track's: each frame timestamp is validated against it.
pub fn produce(&self, timescale: impl Into<Option<Timescale>>) -> Result<crate::GroupConsumer, crate::Error> {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe timescale should be part of Group, and rename it to GroupInfo? Kinda weird that it's a separate argument.

And why do we even need the timescale passed in? Shouldn't we know it from the track cache object that created this group?

Comment thread rs/moq-net/src/model/cache/mod.rs Outdated
Comment on lines +199 to +200
/// Zero unless both ends carry a timestamp, so a track without media timestamps applies no
/// duration pressure (byte bounds still apply).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use wall clock timestamp when no timescale is used.

In fact, that might even be a separate bounds. The producer can lie about media timestamps, but they can't lie about wall clock arrival timestamps.

/// group sequences to their location. Bands flushed from the RAM tier land here; old disk segments
/// roll up into the remote tier, or are evicted when there is none.
pub struct Store {
disk: Arc<dyn ObjectStore>,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disk should also be optional.

Maybe the stores should be split and take a <T: ObjectStore>? IDK

Comment thread rs/moq-net/src/model/track.rs Outdated
/// finishes, so a stalled group head-of-line-blocks the caching of later finished ones.
pub fn with_cache(self, mut cache: cache::Producer) -> Self {
let mut subscriber = self.subscribe(None);
web_async::spawn(async move {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Why not just insert into the cache when inserting a group???

Also we should use the cache for dynamic requests too. It should be an Option on State or something.

Comment thread rs/moq-net/src/model/track.rs Outdated
/// Attach a read-through cache: `get_group` / `fetch_group` resolve locally on a cache hit.
/// Share the [`cache::Producer::consume`] handle of the cache a [`TrackProducer`] fills to
/// serve a track's recent groups without a wire fetch.
pub fn with_cache(mut self, cache: cache::Consumer) -> Self {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this? Could it be automatic if the producer is configured with one?

Comment thread rs/moq-net/src/model/track.rs Outdated
sequence: u64,
/// A group already rebuilt from the read-through cache. When set, the fetch resolves from it
/// instead of polling the live state.
cached: Option<GroupConsumer>,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we put the Cache in TrackState instead. And we check it on the first poll instead.

Also, isn't the remote cache async? How do we know if this is cached already?

Comment thread rs/moq-net/Cargo.toml Outdated
[features]
# Disk and remote cache tiers (object_store). Off by default so RAM-only and
# wasm builds stay dependency-free.
cache-tiered = ["dep:object_store"]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK maybe call it fs instead. It would be nice to have default for non-WASM.

Copy link
Copy Markdown
Collaborator Author

These cohere into a real redesign of how the cache attaches to a track, and I think it's the right one. Synthesis + answers, grouped (covering all nine comments):

The big one: cache on TrackState, insert-on-produce, serve-on-poll (track.rs 709, 1008, 1245)

Agreed, and it's strictly better than what's there — it deletes the spawned-subscriber populate, which is exactly what caused the two issues I flagged in review (the populate subscription pinned unused() so a cached track never tore down, and it head-of-line-blocked on each group's finish). Both vanish. Shape:

  • TrackState gets cache: Option<cache::Producer> — the cache the track fills.
  • create_group / append_group insert the finished group into the cache directly, no spawn, no second subscription. [709 — "why not just insert into the cache when inserting a group"]
  • poll_fetch consults the cache on a miss, so fetch_group and TrackDynamic requests serve from it. [1245, and "use the cache for dynamic requests too"]
  • The consumer-side TrackConsumer::with_cache / cached get_group bolt-on goes away; it's all driven from TrackState. [1008 — "could it be automatic" → yes]

The async wrinkle you flagged [1245]: we can't synchronously know whether a group is in the disk/remote tier — that read is async. So it can't be a sync check in poll_fetch. Model: on the first poll of a fetch (RAM miss), kick off an async cache lookup; the fetch stays Pending until it resolves; on hit the group is produced into the track (same path as GroupRequest::accept, so the cache is effectively a built-in dynamic handler); on miss it falls through to the upstream wire FETCH / NotFound. Resolution chain: RAM (sync) → disk/remote (async) → dynamic/upstream. "Check on first poll" = "start the async lookup on first poll", which is clean once the cache lives on TrackState.

Timestamps: wall-clock vs media (mod.rs 195, 256, 220)

You're right that media timestamp is optional (lite-05 only) and a producer can lie about it, so eviction shouldn't depend on it. Proposed model:

  • Record a wall-clock arrival time when a group is cached and key retention/eviction on that — trustworthy, always present. [256] This also simplifies today's span() (drop the media-ts duration logic).
  • Keep per-frame media Option<Timestamp> for lossless replay only (the consumer needs it), not for eviction.
  • timescale on the group, not a produce() arg [220]: yes — once the cache is on TrackState, the track knows its timescale (TrackInfo.timescale), so produce() reads it from there and the arg disappears. Renaming the cached GroupGroupInfo (sequence + timescale + frames) is reasonable.
  • Two genuine calls for you: (a) wall-clock as a separate received: Instant field vs folding timestamp into an enum { Media(Timestamp), Received(Instant) } [195]; (b) whether wall-clock and media are separate bounds (trust wall-clock for eviction, optionally bound media for seeking) [256]. I lean: a separate received field + a single wall-clock retention bound, media kept only for replay.

Smaller ones

  • store.rs 33 — disk optional + generic: I'll make disk optional (flush straight to remote). On <T: ObjectStore> vs Arc<dyn>: disk and remote are usually different backends (local FS + S3), which a single T can't express — so dyn is the right call for mixed tiers. Keeping dyn.
  • Cargo.toml 18 — fs feature: the target-gate already gives "on by default for non-wasm, off for wasm, no flag" — i.e. the "default for non-WASM" you want, without a feature that could be wrongly enabled on wasm. A named fs feature only works if it's not in default (else wasm breaks) or if moq-wasm sets default-features = false. Prefer the target-gate (no footgun), or want an explicit fs feature for opt-out on native?
  • mod.rs 207 — poll variant for Group::read: noted, but with insert-on-produce we capture frames as they're written rather than re-reading a finished group, so this read loop likely disappears. Folding in.

Two decisions I need before the rewrite: the timestamp shape (separate received field vs enum; separate bounds or not), and the feature question (target-gate vs fs). With those, I'll land the TrackState-integrated version as one change (deleting the spawn + the lifecycle/HOL issues). Proceed on my leanings, or hold for your calls?

(Written by Claude)


Generated by Claude Code

…up window

The cache is no longer a separate Producer/Consumer pair bolted onto each track
endpoint. It lives on the shared TrackState, so the RAM tier is the track's own
live `groups` buffer and the disk/remote tiers hang off the same state. One store
backs the producer and every consumer automatically.

- RAM = `TrackState.groups`. A group is serialized to `cache::Group` and handed to
  the disk tier only when it ages out of the live window, not via a spawned
  populate subscriber (removed). Inserting a group is caching it.
- Two retention gates in `evict_expired`, both sized by `TrackInfo::cache`: a
  wall-clock gate on the receive `Instant` (the hard memory backstop) and a
  media-time gate on the last frame timestamp vs the live media edge (bounds a
  startup stampede). A group is evicted when it trips either.
- `fetch_group` serves a live miss from the disk/remote tiers via an async lookup
  spawned on the first poll; a hit resolves the fetch, a miss falls through to the
  live decision. `get_group` stays synchronous and live-only.
- `TrackProducer::with_cache` now takes a `cache::Disk` (native-only) and stores the
  spilled tiers on the state; `TrackConsumer::with_cache` and the read-through
  `cache::Producer`/`Consumer`/`Config`/RAM-watermark types are gone.

The disk byte format (segment.rs), multi-tier index (index.rs), and object_store
glue (store.rs) are unchanged. Updated CACHE.md to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
@kixelated kixelated changed the title docs: design for track archival + per-track moq-net cache feat(moq-net): per-track durable cache on TrackState (RAM + disk/remote tiers) Jun 22, 2026
A track with a durable cache attached now falls through to a TrackDynamic (the
wire FETCH) when the disk/remote tiers miss, instead of dead-ending in NotFound.

The store-lookup task, on a miss, queues the request for a TrackDynamic when one
exists; the TrackFetch then resolves once upstream serves the group into the live
window. Queuing only after the store misses keeps the store the fast path and
avoids a redundant upstream fetch when the group is already cached. With no
handler, a miss is still NotFound (now via the async lookup rather than synchronously).
A fetch past the final sequence or on an aborted track skips the cache and reports
synchronously as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
claude added 3 commits June 22, 2026 18:14
…roups

Three robustness fixes from review of the durable cache:

- Bound the flush channel (was unbounded). A queued eviction pass pins its
  groups' frame buffers, so an unbounded channel let a slow disk migrate the RAM
  the live tier just freed into the channel backlog, defeating the memory bound
  eviction exists to enforce. `evict` now `try_send`s (it runs under the track
  state lock and must not block) and drops on a full backlog, since the cache is
  best-effort: a hole beats unbounded RAM growth.
- Only spill finished groups. Draining an open group via `Group::read` parks the
  flush task until the group completes (or forever if the writer stalled). An
  unfinished evicted group is dropped from the live tier as before, just not cached.
- Coalesce every queued eviction pass into one segment in the flush task (drain
  the channel with try_recv after each recv), so a backlog or a stampede-trim
  becomes one disk object instead of one per pass, and sort the batch by sequence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
cargo-deny flagged RUSTSEC-2026-0185, a remote memory-exhaustion vulnerability
in quinn-proto 0.11.14 (unbounded out-of-order stream reassembly). Pulled in
transitively via quinn -> web-transport-quinn -> moq-native. 0.11.15 is the fix
release and a drop-in patch (same dependency tree), so this is a surgical lockfile
bump. Unrelated to the cache feature; it just lands here to get the PR's CI green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Two review follow-ups:

- Media-time gate now drops stale arrivals outright instead of archiving them.
  A group evicted by the media gate (media already past the window the instant it
  lands: a startup burst or a lagging publisher) is not spilled to the cache, and
  if it is still open it is aborted (Error::Old) so a producer still downloading a
  group too stale to keep stops wasting bandwidth and releases its buffers. Only a
  finished group aged out by the wall-clock gate is archived. A finished
  media-stale group (e.g. a deliberately fetched old group) is dropped without
  aborting, so a consumer still reading it is unaffected.

- compact no longer holds the store lock across the remote upload. Split into
  plan_compaction (locked: snapshot the rollup, reading disk bytes and building the
  rolled object) -> Rollup::upload (unlocked: the slow remote put) -> apply_compaction
  (locked: repoint the index, delete disk). The rolled object is refcounted Bytes,
  so the snapshot is cheap, and the disk segments stay indexed until apply, so a
  concurrent fetch is unaffected. Store::compact keeps the all-in-one path for tests.
  The flush task binds each phase to its own statement so the lock guard drops at the
  `;`; holding it across the match also self-deadlocked the re-entrant write() in apply.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R8gBynFAeeVnxuffr4ZKUo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants