M3 foundation: v3 registry + convergent publish + composable FS5 layers#2
Merged
Conversation
Per-vault namespace tag on registry entries: TYPE(0x5c) | KEYTYPE(0xed) | PUBKEY | VAULT_ID(16) | REVISION(8 BE) | LEN | PAYLOAD | SIG Lookup is `(pubkey, vault_id)` so one device-signing-key signs across every vault. The 16-byte vault_id derives from the vault root's recovery_secret (set up on the FS5 side). `StreamKey::PublicKeyEd25519` stays for non-vault callers (DirActor, bindings) — their data round-trips unchanged. `should_store` becomes strict-CAS: same-revision-different-hash is rejected so a concurrent writer reliably detects "I lost the race" via verify-after-set. The Ord tie-break is unchanged for read-side reconciliation across replicas. `MULTIHASH_BLAKE3` is re-exported from `s5_core` so registry payloads and BlobIds spell the same byte the same way. Assisted-by: claude-opus-4-7
The previous `tokio::fs::write` opens with O_TRUNC then streams, leaving an empty/partial file in the window between truncate and write completion. Concurrent readers under registry workloads that do get-after-set saw "insufficient bytes for deserialization" the moment 5+ writers raced. Switch to write-tmp + rename. Tmp suffix combines pid (cross-process uniqueness) with a process-wide atomic counter (intra-process uniqueness — two concurrent put_bytes calls in the same process otherwise race on a shared tmp path). Assisted-by: claude-opus-4-7
Composable per-layer/per-blob primitives so writable mounts, multi-peer merges, and concurrent publishes compose cleanly. ReadableLayer trait: Snapshot, MergedView, and WritableOverlay all implement the same `(get, get_raw, scan, chunk_mask)` contract. chunk_mask has a default; Snapshot reads from BuildContext, MergedView delegates to layer 0, WritableOverlay to its base. Pipeline (new): per-blob ops machinery (encrypt/decompress/upload, walk_byte_stream, child_for). Lifted out of Snapshot so WritableFs and ingest can hold the per-blob ops without a full Snapshot. NodeCache wraps the shared decoded-node cache. MergedView: k-way priority merge over Vec<Arc<dyn ReadableLayer>>. Per-stream-exhaustion flag fixes the unfold re-poll panic that fired when MergedView was used downstream of WritableOverlay. WritableOverlay: owns base + Pipeline so callers reach for both via one Arc. Exposes `flush(store)` — the canonical "fold entries into a fresh prolly tree on top of the base" op, used by both rw mounts and ingest. Two-way merge in scan() got the same exhaustion fix. Snapshot becomes thinner: per-blob methods delegate to as_pipeline(). KEY_SLOT_RECOVERY (0x12) added — per-vault recovery_secret slot from which vault_id and recovery_signing_key derive. BlobPipeline.skip_when_unhelpful is per-pipeline (in TraversalContext) so the policy travels with the encoding definition and propagates correctly through merge_contexts. Assisted-by: claude-opus-4-7
`run_publish` is now a verify-and-retry loop. Each attempt:
1. Fetch the latest registry-published TN.
2. If our local TN's tree differs, merge_and_persist both into the
union (changes win on collision; tombstones resolve normally).
3. Encrypt + upload the merged TN.
4. Sign + registry.set at prev_revision + 1.
5. Read back: if registry holds our hash, we won. Otherwise loop
with jittered exponential backoff (max 16 attempts).
Combined with strict-CAS in s5_core and atomic put_bytes in
s5_store_local (this PR), concurrent rw-mount-flush + snap + parallel
snaps all converge to the union without per-vault locking. See
tests/concurrent_publish.rs.
`peer_load::load_peer_snapshot` (new): given a peer's
device_signing_pubkey + the shared vault_id, fetch their published
TN, age-decrypt with our recipient identity, return a Snapshot
ready to drop into a MergedView. Primitive behind live multi-peer
mounts.
Recovery derivation chain: KEY_SLOT_RECOVERY → vault_id (16-byte
truncated blake3) and recovery_signing_key (Ed25519). The recovery
entry maps recovery_pubkey → device_signing_pubkey under the same
vault_id, so a holder of just the paper key can discover the
device's vault registry entries.
verify_recovery_secret_invariant: fetched-previous-published-TN's
recovery_secret must match local; mismatch fails the publish to
prevent foreign history splicing.
Assisted-by: claude-opus-4-7
NodeConfigVault gets the M3 schema:
- recipients: Vec<String> — [key.*] names the published TN is
age-encrypted to
- sources: Vec<String> — [source.*] names this vault ingests
- blob_stores: Vec<String> — read-fallback chain for file content
- meta_targets: Vec<String> — relay destinations for the encrypted
TN. Distinct from blob_stores by design — implicit fallback would
leak vault structure to backends meant only for opaque content.
- plaintext_tree: bool — store FS5 tree nodes in plaintext
(for content-store interop, e.g. Hugging Face Xet). The
published TN is still age-encrypted to recipients.
- watch: bool — flag for the future inotify path.
Drops the placeholder `peers: Vec<String>` field.
NodeConfigRegistry loses the `Remote` and `Tee` variants — they
wrapped the old peer-tied registry transport. Local, Redb, Memory,
Store(name), and Multi(backends, write_policy) remain.
config.validate() does cross-reference validation at config-load
time so missing recipients/sources/stores/keys surface as a startup
error rather than as an opaque task failure.
RPC responses become tagged enums where they previously used
sentinels. `RunTaskResponse` is `Spawned { task_id, spec_json }` or
`Refused { error }`; the `S5NodeClient::run_task` wrapper flattens
to `Result<SpawnedTask>`. Replaces the prior `task_id == 0`
sentinel — daemon-side dispatch failures now come back as a real
Err with the actual error string.
Assisted-by: claude-opus-4-7
Tracks the StreamKey API change (storage_key/from_storage_key
replaces to_bytes/from_bytes), the WritableOverlay::new signature
(Box<dyn ReadableLayer>), and the removal of
NodeConfigRegistry::Remote/Tee:
- bindings/{s5_flutter, s5_wasm}: switch from StreamKey::Vault
placeholder to legacy PublicKeyEd25519 for non-vault use.
- s5_fs/src/actor{, /persistence}: same — DirActor is non-vault.
- registries/{redb, store}: storage_key() for the backend key
encoding (uniform across all StreamKey variants).
- ingest/local/src/backup: WritableOverlay::new takes Box.
- s5_cli/{main, cmd/{blobs, mount, snapshots}}: drop peer-tied
operations that referenced the removed registry variants.
- vup_cli/src/recovery: tests updated to the new
recovery_signing_key signature (recovery_secret bytes, not
age_secret + vault_name).
Assisted-by: claude-opus-4-7
Two new integration tests covering the foundation work: - concurrent_publish: back-to-back, parallel, and 10-way parallel divergent publishers all converge to the union of every writer's changes. Demonstrates the convergence loop + strict-CAS + atomic put_bytes interplay end-to-end against MultiRegistry over a real LocalStore relay. - peer_load: device A publishes via TaskExecutor + TaskSpec::Backup, then a fresh code path (no daemon, no per-device state) calls load_peer_snapshot against the same relay store + a fresh MultiRegistry, verifies the produced Snapshot's tree surfaces through MergedView. local_links_serve was carried forward from the working tree (absent on upstream/main) — exercises the pinned-link store path. Drops three legacy tests (fs_sync_complete, fs_sync_large_blob, workflows) that depended on the pre-v2 s5_fs sync surface and the removed s5_node::sync module. Their multi-device-sync coverage is replaced by the async_relay test landing in PR B. Assisted-by: claude-opus-4-7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Foundation work for Milestone 3 (Multi-Device Sync & Sharing). Lands the wire format, per-layer/per-blob abstractions, and convergence semantics that the upcoming feature PRs (relay E2E, FUSE mount, share-link export, snap-watch) all build on. No user-facing CLI surface added in this PR.
The marquee shift: concurrent producers (rw-mount flush + snap + parallel snaps from N writers) converge to the union of every writer's changes without any per-vault lock. The merge happens at the data layer (prolly tree merge_and_persist + CAS dedup); this PR adds the operational layer that actually triggers it.
What's in
feat(s5_core)!: v3 registry wire format. Per-vault namespace tag on registry entries:TYPE(0x5c) | KEYTYPE(0xed) | PUBKEY | VAULT_ID(16) | REVISION | LEN | PAYLOAD | SIG. Lookup is(pubkey, vault_id), so one device-signing-key signs across every vault, with VAULT_ID disambiguating.should_storebecomes strict-CAS so a concurrent writer reliably detects "I lost the race" via verify-after-set; theOrdtie-break for read-side reconciliation is unchanged.fix(s5_store_local): atomic put_bytes via tmp+rename. Closes theO_TRUNCwindow where concurrent registry writers tripped "insufficient bytes for deserialization" the moment 5+ publishers raced.feat(s5_fs_v2): ReadableLayer + Pipeline + MergedView + WritableOverlay. Composable read/write primitives.Snapshot,MergedView, andWritableOverlayall implementReadableLayer.Pipeline(withNodeCache) is the per-blob ops machinery, lifted out ofSnapshotso writable mounts can hold it without dragging a full snapshot.WritableOverlayowns base + Pipeline + the entry buffer, and exposesflush(store)as the canonical "fold the entries into a fresh prolly tree on top of the base" operation, used by both rw mounts and ingest.feat(s5_node): publish convergence loop + peer-snapshot loading.run_publishis now a verify-and-retry-with-merge loop (max 16 attempts, jittered backoff). Combined with strict-CAS in s5_core and atomic put_bytes in s5_store_local, this is what lets concurrent producers converge without locking.peer_load::load_peer_snapshotis the read-side primitive behind live multi-peer mounts: given a peer's device signing pubkey + the shared vault_id, fetch their published Transparent Node, age-decrypt with our recipient identity, return aSnapshotready to drop into aMergedView. Recovery derivation chain:KEY_SLOT_RECOVERY(in the vault root's TraversalContext) yieldsvault_idandrecovery_signing_key.feat(s5_node, s5_node_api)!: M3 vault config schema + Result-shaped RPC.NodeConfigVaultgets the M3 schema (recipients,sources,blob_stores,meta_targets,plaintext_tree,watch).meta_targetsis intentionally separate fromblob_storesbecause implicit fallback would leak vault structure to backends meant only for opaque content.RunTaskResponsebecomes a tagged enum (Spawned/Refused); the CLI client wrapper flattens toResult<SpawnedTask>, so daemon-side dispatch failures come back as realErrs with the actual error string instead of the priortask_id == 0sentinel.chore: adapt consumers to v3 wire format + new APIs. Bindings, registries, ingest, s5_fs, and s5_cli adapt to the StreamKey API change, the WritableOverlay constructor, and the removal of peer-tied registry variants.test(s5_node): concurrent_publish + peer_load round-trip.concurrent_publishcovers back-to-back, parallel, and 10-way parallel divergent publishers, asserting every writer's changes are reachable in the latest published TN.peer_loadproves the read side: device A publishes via TaskExecutor, then a fresh code path (no daemon, no per-device state) loads A's snapshot viaload_peer_snapshotand surfaces it throughMergedView. Drops three legacy tests that depended on the pre-v2s5_fs::syncsurface; their multi-device coverage is replaced by theasync_relaytest landing in the next PR.Validation
Breaking changes
0x01→0x5c. Old bytes rejected on parse.NodeConfigRegistry::{Remote, Tee}removed (peer-tied transport).