Skip to content

feat(ocap-kernel): bound relay hints in OCAP URLs and relay pool size#929

Merged
sirtimid merged 6 commits into
mainfrom
sirtimid/bounded-relay-management
Apr 13, 2026
Merged

feat(ocap-kernel): bound relay hints in OCAP URLs and relay pool size#929
sirtimid merged 6 commits into
mainfrom
sirtimid/bounded-relay-management

Conversation

@sirtimid
Copy link
Copy Markdown
Contributor

@sirtimid sirtimid commented Apr 10, 2026

Summary

OCAP URLs embed relay multiaddrs as discovery hints. After #887 propagated these hints on redemption, the relay list in URLs (and in the KV store) grows unboundedly as kernels encounter new relays through URL exchange. This PR bounds both the URL hint count and the relay pool size.

After investigating whether a gossip protocol was needed (#889), the conclusion was that the problem is URL size, not relay discovery — passive propagation via URL exchange is sufficient at current scale. A gossip protocol can be added later if needed.

Changes

  • Add RelayEntry type with addr, lastSeen, isBootstrap metadata for prioritized relay selection
  • Cap relay hints embedded in OCAP URLs to MAX_URL_RELAY_HINTS (3), selecting bootstrap relays first, then most recently seen
  • Bound the relay pool to MAX_KNOWN_RELAYS (20) with eviction of oldest non-bootstrap entries
  • Mark bootstrap relays at kernel init; merge with (not overwrite) existing learned relays
  • Auto-migrate legacy string[] relay storage format to RelayEntry[]
  • Update store API: getRelayEntries() / setRelayEntries() / getKnownRelayAddresses() replace getKnownRelays() / setKnownRelays()

Testing

All changes are covered by unit tests. New tests verify URL hint capping, bootstrap relay priority in selection, pool cap enforcement (in both addKnownRelays and initRemoteIdentity), eviction of oldest non-bootstrap entries, lastSeen timestamp updates on re-observation, bootstrap flag clearing on re-init with a different set, and legacy storage format auto-migration. Existing tests were updated for the new store API. All 82 tests pass, lint and build are clean.

Closes #889


Note

Medium Risk
Changes relay persistence and selection logic used for OCAP URL issuance, including an on-read KV migration and eviction behavior that could affect connectivity if misconfigured or buggy.

Overview
OCAP URL issuance now limits embedded relay hints (default 3) by selecting relays from a prioritized pool (bootstrap first, then most recently seen).

Relay storage is refactored from a raw string[] to RelayEntry[] (addr, lastSeen, isBootstrap), with a bounded relay pool (default 20) that evicts oldest non-bootstrap entries and merges/updates bootstrap relays on init; legacy knownRelays data is auto-migrated on read.

initRemoteComms/RPC adds maxUrlRelayHints and maxKnownRelays, strips undefined params via ifDefined, and avoids passing mnemonic to platform services; tests are updated/expanded for capping, ordering, eviction, and migration.

Reviewed by Cursor Bugbot for commit 9bbc7e9. Bugbot is set up for automated code reviews on this repo. Configure here.

@sirtimid sirtimid requested a review from a team as a code owner April 10, 2026 13:15
sirtimid and others added 3 commits April 13, 2026 12:22
Cap relay hints embedded in OCAP URLs to 3 (MAX_URL_RELAY_HINTS) with
priority selection (bootstrap first, then most recently seen). Bound the
relay pool to 20 entries (MAX_KNOWN_RELAYS) with eviction of oldest
non-bootstrap entries. This prevents unbounded URL growth as kernels
exchange relay hints across the network.

Closes #889

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g, add tests

- Avoid direct mutation of potentially-hardened entries in initRemoteIdentity
  bootstrap clearing loop (use spread instead)
- Extract getRelayEntries as local function in makeKernelStore to avoid
  fragile this binding in getKnownRelayAddresses
- Add test: init clears bootstrap flag on relays removed from bootstrap set
- Add test: init enforces MAX_KNOWN_RELAYS pool cap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sirtimid sirtimid force-pushed the sirtimid/bounded-relay-management branch from 6c18881 to 1690fd8 Compare April 13, 2026 10:24
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1690fd8. Configure here.

Comment thread packages/ocap-kernel/src/remotes/kernel/remote-comms.ts
sirtimid and others added 2 commits April 13, 2026 13:07
…rable, fix review issues

Extract relay and remote identity store methods from store/index.ts into
store/methods/relay.ts following the existing methods pattern. Make
MAX_URL_RELAY_HINTS and MAX_KNOWN_RELAYS configurable via
RemoteCommsOptions (defaults: 3 and 20), threaded through the full RPC
handler -> Kernel -> initRemoteIdentity options flow.

Address all review findings: add write-path validation with superstruct
RelayEntryStruct, wrap JSON.parse with contextual error, log migration
and eviction events, warn when bootstrap count exceeds pool cap, document
Date.now() SES assumption and default value rationale, add missing tests
for sorting stability, bootstrap-exceeds-cap, mixed legacy format, and
configurable caps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review findings across the bounded relay management feature:
tighten RelayEntryStruct to reject NaN/Infinity/negative/empty values,
unify read/write validation paths, harden initRemoteIdentity return
value, add defensive cap validation, prevent mnemonic leaking to
platform services, simplify RPC handler and options passing, fix
inaccurate SES/hardened-object comments and stale JSDoc references,
and add missing test coverage for edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 13, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 78.71%
⬆️ +0.12%
8688 / 11037
🔵 Statements 78.53%
⬆️ +0.13%
8830 / 11243
🔵 Functions 76.27%
⬆️ +0.14%
2035 / 2668
🔵 Branches 76.42%
⬆️ +0.13%
3740 / 4894
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/ocap-kernel/src/remotes/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/kernel/remote-comms.ts 100%
🟰 ±0%
98.48%
⬆️ +0.98%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/rpc/kernel-control/init-remote-comms.ts 100%
⬆️ +6.25%
100%
⬆️ +10.00%
100%
🟰 ±0%
100%
⬆️ +6.25%
packages/ocap-kernel/src/store/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/store/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/store/methods/relay.ts 100% 91.66% 100% 100%
Generated in workflow #4271 for commit 9bbc7e9 by the Vitest Coverage Report Action

grypez
grypez previously requested changes Apr 13, 2026
Copy link
Copy Markdown
Contributor

@grypez grypez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code complexity seems high for what amounts to a FIFO queue on relay addresses. I think we are due a review of our connection logic. In context of the codebase, no issues. One nit.

Comment thread packages/ocap-kernel/src/rpc/kernel-control/init-remote-comms.ts Outdated
Replace inline Object.fromEntries filter with the existing ifDefined
utility from @metamask/kernel-utils, as suggested in PR review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sirtimid sirtimid requested a review from grypez April 13, 2026 16:14
@sirtimid
Copy link
Copy Markdown
Contributor Author

sirtimid commented Apr 13, 2026

The code complexity seems high for what amounts to a FIFO queue on relay addresses. I think we are due a review of our connection logic. In context of the codebase, no issues. One nit.

@grypez It's not a simple FIFO queue, the complexity exists because there are two tiers of relays with different eviction semantics: Bootstrap relays and Learned relays. But I agree, a holistic look at the connection layer could probably simplify things.

@sirtimid sirtimid enabled auto-merge April 13, 2026 16:16
@grypez grypez dismissed their stale review April 13, 2026 16:32

addressed

Copy link
Copy Markdown
Contributor

@grypez grypez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sirtimid sirtimid added this pull request to the merge queue Apr 13, 2026
Merged via the queue into main with commit 04e9315 Apr 13, 2026
35 checks passed
@sirtimid sirtimid deleted the sirtimid/bounded-relay-management branch April 13, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider adding relay gossip protocol

2 participants