fix(broker): recover existing observer token on name-conflict mint#1227
fix(broker): recover existing observer token on name-conflict mint#1227willwashburn wants to merge 1 commit into
Conversation
POST /api/observer-token mints a fixed-name observer token per workspace
(Pear always uses "pear-dashboard-observer"), but relaycast enforces a
(workspace_id, name) unique index, so a second mint under the same name
fails with observer_token_name_conflict (409, relaycast#232 — not yet
released). Since callers have no way to know in advance whether a token
under that name already exists, repeat minting needs to just work.
When create_observer_token fails with that specific error code, fall back
to listing existing observer tokens for the workspace, finding the one
matching the attempted name, and rotating it to obtain fresh raw token
material — the only way to recover a usable value, since the original raw
token was never persisted anywhere. Any other create error (timeout,
network failure, a different API error) still propagates as a failure and
does not trigger this fallback. If no token matches the name despite the
conflict (e.g. a race with a concurrent revoke), the original conflict
error is propagated rather than papering over it.
Behavioral note: this means a caller that's minted before will get their
token rotated (invalidating the previous raw value) on every subsequent
mint through this path. Acceptable here since Pear's mintObserverToken
always treats a freshly-returned token as authoritative and re-caches it,
but flagging explicitly since it's a real, intentional side effect.
Adds RelaycastHttpClient::list_observer_tokens/rotate_observer_token
wrappers mirroring create_observer_token's existing pattern, and changes
all three to preserve the underlying RelayError (via anyhow::Error::from
instead of anyhow::anyhow!("{error}")) so the conflict code can be matched
structurally instead of via string matching. The list+rotate fallback
reuses the same http_api_observer_token_timeout() as the initial create
call, bounding it independently so a hung fallback can't block the runtime
task either.
Depends on relaycast#232 for the underlying 500->409 fix; until that ships
and relaycast-cloud picks it up, production mints will still 500 on
conflict rather than reaching this new fallback path.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR changes Relaycast observer-token SDK error handling to preserve structured errors via ChangesObserver token mint and recovery flow
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related PRs
Suggested reviewers: Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements a fallback mechanism for observer token minting when a name conflict occurs (HTTP 409 observer_token_name_conflict). Instead of failing, the broker now lists existing observer tokens, finds the conflicting token by name, and rotates it to recover a fresh, usable token. This change includes new methods on RelaycastHttpClient, helper functions and enums to manage the minting outcome, and comprehensive unit tests covering various success, error, and timeout scenarios. I have no further feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/broker/src/runtime/api.rs (1)
137-160: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winMisleading error message on recovery-path failure.
The failure arm reuses
"Failed to create observer token: {error}"even though the error here originates fromlist_observer_tokensorrotate_observer_token, notcreate_observer_token. This makes it harder to tell from logs/API responses whether the original create or the fallback list+rotate is what actually failed.💡 Suggested fix
match fallback { Ok(Ok(observer_token)) => Ok(ObserverTokenMintOutcome::RecoveredViaRotate(observer_token)), Ok(Err(error)) => Err(ObserverTokenMintError::Failed(format!( - "Failed to create observer token: {error}" + "Failed to recover existing observer token via list+rotate: {error}" ))), Err(_) => Err(ObserverTokenMintError::TimedOut), }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/broker/src/runtime/api.rs` around lines 137 - 160, The recovery failure in recover_observer_token_after_name_conflict is reporting the wrong operation, since the error comes from list_observer_tokens or rotate_observer_token rather than create_observer_token. Update the Err branch in recover_observer_token_after_name_conflict to emit a recovery-specific message that clearly names the fallback path (list+rotate) and preserves the underlying error, so logs and API responses distinguish original token creation failures from recovery-path failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/broker/src/runtime/api.rs`:
- Around line 137-160: The recovery failure in
recover_observer_token_after_name_conflict is reporting the wrong operation,
since the error comes from list_observer_tokens or rotate_observer_token rather
than create_observer_token. Update the Err branch in
recover_observer_token_after_name_conflict to emit a recovery-specific message
that clearly names the fallback path (list+rotate) and preserves the underlying
error, so logs and API responses distinguish original token creation failures
from recovery-path failures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9401c965-a86a-4f2e-9d7a-d57c5172ee6d
📒 Files selected for processing (4)
crates/broker/src/relaycast/ws.rscrates/broker/src/runtime/api.rscrates/broker/src/runtime/mod.rscrates/broker/src/runtime/tests.rs
There was a problem hiding this comment.
2 issues found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/broker/src/runtime/tests.rs">
<violation number="1" location="crates/broker/src/runtime/tests.rs:3056">
P3: This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.</violation>
</file>
<file name="crates/broker/src/runtime/api.rs">
<violation number="1" location="crates/broker/src/runtime/api.rs:147">
P1: A repeat mint can return a token with different permissions than `/api/observer-token` requested: the recovery path rotates the first listed token whose name matches, without checking `candidate.scopes` against `default_observer_token_scopes()`. Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| let existing = http_client.list_observer_tokens().await?; | ||
| let matched = existing | ||
| .into_iter() | ||
| .find(|candidate| candidate.name == token_name) |
There was a problem hiding this comment.
P1: A repeat mint can return a token with different permissions than /api/observer-token requested: the recovery path rotates the first listed token whose name matches, without checking candidate.scopes against default_observer_token_scopes(). Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/broker/src/runtime/api.rs, line 147:
<comment>A repeat mint can return a token with different permissions than `/api/observer-token` requested: the recovery path rotates the first listed token whose name matches, without checking `candidate.scopes` against `default_observer_token_scopes()`. Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.</comment>
<file context>
@@ -23,6 +23,142 @@ pub(crate) fn default_observer_token_scopes() -> Vec<ObserverScope> {
+ let existing = http_client.list_observer_tokens().await?;
+ let matched = existing
+ .into_iter()
+ .find(|candidate| candidate.name == token_name)
+ .ok_or(conflict_error)?;
+ http_client.rotate_observer_token(&matched.id).await
</file context>
| .find(|candidate| candidate.name == token_name) | |
| .find(|candidate| { | |
| let expected_scopes = default_observer_token_scopes(); | |
| candidate.name == token_name | |
| && candidate.scopes.len() == expected_scopes.len() | |
| && expected_scopes | |
| .iter() | |
| .all(|scope| candidate.scopes.contains(scope)) | |
| }) |
| let result = mint_or_recover_observer_token( | ||
| &client, | ||
| "pear-dashboard-observer", | ||
| Duration::from_millis(50), |
There was a problem hiding this comment.
P3: This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/broker/src/runtime/tests.rs, line 3056:
<comment>This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.</comment>
<file context>
@@ -2807,3 +2807,265 @@ fn default_observer_token_scopes_are_read_only_and_exclude_unneeded_scopes() {
+ let result = mint_or_recover_observer_token(
+ &client,
+ "pear-dashboard-observer",
+ Duration::from_millis(50),
+ )
+ .await;
</file context>
| Duration::from_millis(50), | |
| Duration::from_millis(200), |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29ffaff267
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| recover_observer_token_after_name_conflict( | ||
| http_client, | ||
| token_name, | ||
| timeout_duration, |
There was a problem hiding this comment.
Keep observer-token recovery within the HTTP deadline
When the create call returns a name conflict near the default 20s observer-token timeout, passing the full timeout into the recovery path gives list+rotate another 20s; /api/observer-token only waits 30s for reply_rx in listen_api_create_observer_token, so a recovery that succeeds after about 31–40s is reported to the caller as broker request timed out and the broker reply is dropped. Please keep the total create+recover budget under the HTTP handler deadline, or coordinate/raise the outer deadline instead of granting a fresh full window here.
Useful? React with 👍 / 👎.
| .into_iter() | ||
| .find(|candidate| candidate.name == token_name) | ||
| .ok_or(conflict_error)?; | ||
| http_client.rotate_observer_token(&matched.id).await |
There was a problem hiding this comment.
Reconcile scopes before rotating an existing observer token
When a token with the requested name already exists but was created with different scopes or filters, this fallback rotates and returns that token as-is. The normal create path always requests default_observer_token_scopes() with no filters, so a manual or older pear-dashboard-observer token that includes extra scopes such as files:read (or restrictive filters) would make /api/observer-token hand out credentials with unexpected access or visibility; update/verify the matched token metadata before rotating, or fail if it does not match the endpoint contract.
Useful? React with 👍 / 👎.
Summary
POST /api/observer-token(#1223) mints a scoped, read-only Relaycast observer token per workspace. Pear mints one per project using a fixed name (pear-dashboard-observer) every time it needs one — it has no way to know in advance whether a token under that name was already minted for that workspace.relaycast enforces a
(workspace_id, name)unique index on observer tokens. relaycast PR #232 (not yet released) fixes a bug where minting a second token under the same name crashed with an uncaught 500 instead of returning a clean409 observer_token_name_conflict. Once #232 ships, repeat mints will fail cleanly — but they'll still fail, which breaks Pear's re-mint-on-demand flow.This PR adds a fallback: when
create_observer_tokenfails specifically withobserver_token_name_conflict, list existing observer tokens for the workspace, find the one matching the attempted name, and rotate it to obtain fresh, usable raw token material (the original raw token was never persisted anywhere, so rotating is the only way to recover a usable value). The response shape is identical to a normal create.Any other error from
create_observer_token(timeout, network failure, a different API error) still propagates as a failure and does not trigger this fallback.A caller that's minted an observer token under a given name before will get that token rotated (invalidating whatever raw value was previously issued under that name) on every subsequent mint that hits this fallback path. This is intentional and unavoidable — the original raw value can't be recovered any other way. It's fine for this endpoint's known caller: Pear's
mintObserverToken(inbroker.ts) always treats a freshly-returned token as authoritative and re-caches it. Any other, currently-hypothetical holder of a previous raw value under the same name would silently lose access when this path triggers.Implementation
RelaycastHttpClient(crates/broker/src/relaycast/ws.rs) gainslist_observer_tokens()androtate_observer_token(id)wrappers, mirroring the existingcreate_observer_tokenwrapper's structure (SDK error handling viarelay_client()).relaycast::RelayErrorvia.map_err(anyhow::Error::from)(previouslyanyhow::anyhow!("{error}"), which discarded the source and made the error un-downcastable) so callers can match on the structured API error code instead of string-matchingDisplayoutput.crates/broker/src/runtime/api.rs(theListenApiRequest::CreateObserverTokenhandler) gainsmint_or_recover_observer_token, which wraps the create call, detectsobserver_token_name_conflictviaRelayError::code(), and on match callsrecover_observer_token_after_name_conflict(list + find-by-name + rotate). If no token matches the attempted name despite the conflict (e.g. a race with a concurrent revoke), the original conflict error is propagated rather than panicking or synthesizing a misleading response.http_api_observer_token_timeout()duration already used for the initial create call (a fresh timeout window for the fallback, not shared budget), so a hung fallback can't block the runtime task indefinitely either.Depends on
relaycast PR #232 (500→409 fix for the underlying name-conflict error) — not yet released. Until it ships and
relaycast-cloudpicks it up, production observer-token mints will still 500 on conflict rather than reaching this new fallback path. This PR's new behavior is not exercisable against production until then, but is fully covered by unit tests against a mocked relaycast API.Test plan
cargo check -p agent-relay-brokercargo clippy -p agent-relay-broker -- -D warningscargo fmt -p agent-relay-broker -- --checkcargo test -p agent-relay-broker --lib— 836 passed (832 baseline + 4 new), 0 failed, 4 ignoredcrates/broker/src/runtime/tests.rs, all against a mocked relaycast HTTP API (httpmock):observer_token_name_conflict_falls_back_to_list_and_rotate— a 409observer_token_name_conflicton create triggers list+rotate and returns the rotated tokenobserver_token_non_conflict_error_does_not_trigger_fallback— a 403 on create propagates as a failure without ever calling list/rotateobserver_token_conflict_without_matching_name_propagates_original_error— a 409 whose name isn't found in the list propagates the original conflict error instead of panicking, and never calls rotate against an unrelated tokenobserver_token_fallback_respects_the_supplied_timeout— a slow list response during the fallback still respects the timeout budget🤖 Generated with Claude Code