Skip to content

fix(broker): recover existing observer token on name-conflict mint#1227

Open
willwashburn wants to merge 1 commit into
mainfrom
fix/observer-token-name-conflict-fallback
Open

fix(broker): recover existing observer token on name-conflict mint#1227
willwashburn wants to merge 1 commit into
mainfrom
fix/observer-token-name-conflict-fallback

Conversation

@willwashburn

@willwashburn willwashburn commented Jul 2, 2026

Copy link
Copy Markdown
Member

Summary

POST /api/observer-token (#1223) mints a scoped, read-only Relaycast observer token per workspace. Pear mints one per project using a fixed name (pear-dashboard-observer) every time it needs one — it has no way to know in advance whether a token under that name was already minted for that workspace.

relaycast enforces a (workspace_id, name) unique index on observer tokens. relaycast PR #232 (not yet released) fixes a bug where minting a second token under the same name crashed with an uncaught 500 instead of returning a clean 409 observer_token_name_conflict. Once #232 ships, repeat mints will fail cleanly — but they'll still fail, which breaks Pear's re-mint-on-demand flow.

This PR adds a fallback: when create_observer_token fails specifically with observer_token_name_conflict, list existing observer tokens for the workspace, find the one matching the attempted name, and rotate it to obtain fresh, usable raw token material (the original raw token was never persisted anywhere, so rotating is the only way to recover a usable value). The response shape is identical to a normal create.

Any other error from create_observer_token (timeout, network failure, a different API error) still propagates as a failure and does not trigger this fallback.

⚠️ Behavioral note

A caller that's minted an observer token under a given name before will get that token rotated (invalidating whatever raw value was previously issued under that name) on every subsequent mint that hits this fallback path. This is intentional and unavoidable — the original raw value can't be recovered any other way. It's fine for this endpoint's known caller: Pear's mintObserverToken (in broker.ts) always treats a freshly-returned token as authoritative and re-caches it. Any other, currently-hypothetical holder of a previous raw value under the same name would silently lose access when this path triggers.

Implementation

  • RelaycastHttpClient (crates/broker/src/relaycast/ws.rs) gains list_observer_tokens() and rotate_observer_token(id) wrappers, mirroring the existing create_observer_token wrapper's structure (SDK error handling via relay_client()).
  • All three wrappers now preserve the underlying relaycast::RelayError via .map_err(anyhow::Error::from) (previously anyhow::anyhow!("{error}"), which discarded the source and made the error un-downcastable) so callers can match on the structured API error code instead of string-matching Display output.
  • crates/broker/src/runtime/api.rs (the ListenApiRequest::CreateObserverToken handler) gains mint_or_recover_observer_token, which wraps the create call, detects observer_token_name_conflict via RelayError::code(), and on match calls recover_observer_token_after_name_conflict (list + find-by-name + rotate). If no token matches the attempted name despite the conflict (e.g. a race with a concurrent revoke), the original conflict error is propagated rather than panicking or synthesizing a misleading response.
  • The list+rotate fallback is bounded by the same http_api_observer_token_timeout() duration already used for the initial create call (a fresh timeout window for the fallback, not shared budget), so a hung fallback can't block the runtime task indefinitely either.

Depends on

relaycast PR #232 (500→409 fix for the underlying name-conflict error) — not yet released. Until it ships and relaycast-cloud picks it up, production observer-token mints will still 500 on conflict rather than reaching this new fallback path. This PR's new behavior is not exercisable against production until then, but is fully covered by unit tests against a mocked relaycast API.

Test plan

  • cargo check -p agent-relay-broker
  • cargo clippy -p agent-relay-broker -- -D warnings
  • cargo fmt -p agent-relay-broker -- --check
  • cargo test -p agent-relay-broker --lib — 836 passed (832 baseline + 4 new), 0 failed, 4 ignored
  • New tests in crates/broker/src/runtime/tests.rs, all against a mocked relaycast HTTP API (httpmock):
    • observer_token_name_conflict_falls_back_to_list_and_rotate — a 409 observer_token_name_conflict on create triggers list+rotate and returns the rotated token
    • observer_token_non_conflict_error_does_not_trigger_fallback — a 403 on create propagates as a failure without ever calling list/rotate
    • observer_token_conflict_without_matching_name_propagates_original_error — a 409 whose name isn't found in the list propagates the original conflict error instead of panicking, and never calls rotate against an unrelated token
    • observer_token_fallback_respects_the_supplied_timeout — a slow list response during the fallback still respects the timeout budget

🤖 Generated with Claude Code

Review in cubic

POST /api/observer-token mints a fixed-name observer token per workspace
(Pear always uses "pear-dashboard-observer"), but relaycast enforces a
(workspace_id, name) unique index, so a second mint under the same name
fails with observer_token_name_conflict (409, relaycast#232 — not yet
released). Since callers have no way to know in advance whether a token
under that name already exists, repeat minting needs to just work.

When create_observer_token fails with that specific error code, fall back
to listing existing observer tokens for the workspace, finding the one
matching the attempted name, and rotating it to obtain fresh raw token
material — the only way to recover a usable value, since the original raw
token was never persisted anywhere. Any other create error (timeout,
network failure, a different API error) still propagates as a failure and
does not trigger this fallback. If no token matches the name despite the
conflict (e.g. a race with a concurrent revoke), the original conflict
error is propagated rather than papering over it.

Behavioral note: this means a caller that's minted before will get their
token rotated (invalidating the previous raw value) on every subsequent
mint through this path. Acceptable here since Pear's mintObserverToken
always treats a freshly-returned token as authoritative and re-caches it,
but flagging explicitly since it's a real, intentional side effect.

Adds RelaycastHttpClient::list_observer_tokens/rotate_observer_token
wrappers mirroring create_observer_token's existing pattern, and changes
all three to preserve the underlying RelayError (via anyhow::Error::from
instead of anyhow::anyhow!("{error}")) so the conflict code can be matched
structurally instead of via string matching. The list+rotate fallback
reuses the same http_api_observer_token_timeout() as the initial create
call, bounding it independently so a hung fallback can't block the runtime
task either.

Depends on relaycast#232 for the underlying 500->409 fix; until that ships
and relaycast-cloud picks it up, production mints will still 500 on
conflict rather than reaching this new fallback path.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@willwashburn willwashburn requested a review from khaliqgant as a code owner July 2, 2026 03:19
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR changes Relaycast observer-token SDK error handling to preserve structured errors via anyhow::Error::from, and adds a mint_or_recover_observer_token flow that recovers via listing and rotating tokens on a 409 name-conflict, wired into the observer-token API route with new tests.

Changes

Observer token mint and recovery flow

Layer / File(s) Summary
Structured SDK error preservation
crates/broker/src/relaycast/ws.rs
create_observer_token, list_observer_tokens, and rotate_observer_token preserve structured SDK errors via anyhow::Error::from instead of string formatting; doc comments updated.
Mint outcome/error types and conflict detection
crates/broker/src/runtime/api.rs
Adds ObserverTokenMintOutcome and ObserverTokenMintError enums with helper methods, imports RelayError, and adds a predicate for observer_token_name_conflict.
mint_or_recover_observer_token and rotation fallback
crates/broker/src/runtime/api.rs
New mint_or_recover_observer_token creates a token under timeout, then falls back to recover_observer_token_after_name_conflict, which lists tokens and rotates the matching one on 409 conflict.
Route wiring and re-exports
crates/broker/src/runtime/api.rs, crates/broker/src/runtime/mod.rs
The /api/observer-token handler calls the new mint function, branching logs for created vs recovered outcomes and typed errors; new items re-exported from the runtime module.
Tests for mint/recover behavior
crates/broker/src/runtime/tests.rs
New async tests cover conflict-triggered rotation, non-conflict propagation, no-match propagation, and timeout handling.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Possibly related PRs

  • AgentWorkforce/relay#1223: Both PRs touch the broker's observer-token flow, with this PR building on the minting logic and 409 conflict handling introduced there.

Suggested reviewers: khaliqgant

Poem

A token conflicts, oh what a fright,
But rabbit rotates it, sets it right!
No more strings, just typed delight,
Errors downcast in broad daylight,
Hop, hop, hooray for the mint-or-recover flight! 🐇🔑

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and clearly summarizes the main change: recovering observer tokens on name-conflict mint.
Description check ✅ Passed The description matches the template with Summary and Test Plan content and includes a Screenshots section placeholder.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/observer-token-name-conflict-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a fallback mechanism for observer token minting when a name conflict occurs (HTTP 409 observer_token_name_conflict). Instead of failing, the broker now lists existing observer tokens, finds the conflicting token by name, and rotates it to recover a fresh, usable token. This change includes new methods on RelaycastHttpClient, helper functions and enums to manage the minting outcome, and comprehensive unit tests covering various success, error, and timeout scenarios. I have no further feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/broker/src/runtime/api.rs (1)

137-160: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Misleading error message on recovery-path failure.

The failure arm reuses "Failed to create observer token: {error}" even though the error here originates from list_observer_tokens or rotate_observer_token, not create_observer_token. This makes it harder to tell from logs/API responses whether the original create or the fallback list+rotate is what actually failed.

💡 Suggested fix
     match fallback {
         Ok(Ok(observer_token)) => Ok(ObserverTokenMintOutcome::RecoveredViaRotate(observer_token)),
         Ok(Err(error)) => Err(ObserverTokenMintError::Failed(format!(
-            "Failed to create observer token: {error}"
+            "Failed to recover existing observer token via list+rotate: {error}"
         ))),
         Err(_) => Err(ObserverTokenMintError::TimedOut),
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/broker/src/runtime/api.rs` around lines 137 - 160, The recovery
failure in recover_observer_token_after_name_conflict is reporting the wrong
operation, since the error comes from list_observer_tokens or
rotate_observer_token rather than create_observer_token. Update the Err branch
in recover_observer_token_after_name_conflict to emit a recovery-specific
message that clearly names the fallback path (list+rotate) and preserves the
underlying error, so logs and API responses distinguish original token creation
failures from recovery-path failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/broker/src/runtime/api.rs`:
- Around line 137-160: The recovery failure in
recover_observer_token_after_name_conflict is reporting the wrong operation,
since the error comes from list_observer_tokens or rotate_observer_token rather
than create_observer_token. Update the Err branch in
recover_observer_token_after_name_conflict to emit a recovery-specific message
that clearly names the fallback path (list+rotate) and preserves the underlying
error, so logs and API responses distinguish original token creation failures
from recovery-path failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9401c965-a86a-4f2e-9d7a-d57c5172ee6d

📥 Commits

Reviewing files that changed from the base of the PR and between 7a98c6e and 29ffaff.

📒 Files selected for processing (4)
  • crates/broker/src/relaycast/ws.rs
  • crates/broker/src/runtime/api.rs
  • crates/broker/src/runtime/mod.rs
  • crates/broker/src/runtime/tests.rs

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/broker/src/runtime/tests.rs">

<violation number="1" location="crates/broker/src/runtime/tests.rs:3056">
P3: This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.</violation>
</file>

<file name="crates/broker/src/runtime/api.rs">

<violation number="1" location="crates/broker/src/runtime/api.rs:147">
P1: A repeat mint can return a token with different permissions than `/api/observer-token` requested: the recovery path rotates the first listed token whose name matches, without checking `candidate.scopes` against `default_observer_token_scopes()`. Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

let existing = http_client.list_observer_tokens().await?;
let matched = existing
.into_iter()
.find(|candidate| candidate.name == token_name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: A repeat mint can return a token with different permissions than /api/observer-token requested: the recovery path rotates the first listed token whose name matches, without checking candidate.scopes against default_observer_token_scopes(). Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/broker/src/runtime/api.rs, line 147:

<comment>A repeat mint can return a token with different permissions than `/api/observer-token` requested: the recovery path rotates the first listed token whose name matches, without checking `candidate.scopes` against `default_observer_token_scopes()`. Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.</comment>

<file context>
@@ -23,6 +23,142 @@ pub(crate) fn default_observer_token_scopes() -> Vec<ObserverScope> {
+        let existing = http_client.list_observer_tokens().await?;
+        let matched = existing
+            .into_iter()
+            .find(|candidate| candidate.name == token_name)
+            .ok_or(conflict_error)?;
+        http_client.rotate_observer_token(&matched.id).await
</file context>
Suggested change
.find(|candidate| candidate.name == token_name)
.find(|candidate| {
let expected_scopes = default_observer_token_scopes();
candidate.name == token_name
&& candidate.scopes.len() == expected_scopes.len()
&& expected_scopes
.iter()
.all(|scope| candidate.scopes.contains(scope))
})

let result = mint_or_recover_observer_token(
&client,
"pear-dashboard-observer",
Duration::from_millis(50),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/broker/src/runtime/tests.rs, line 3056:

<comment>This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.</comment>

<file context>
@@ -2807,3 +2807,265 @@ fn default_observer_token_scopes_are_read_only_and_exclude_unneeded_scopes() {
+    let result = mint_or_recover_observer_token(
+        &client,
+        "pear-dashboard-observer",
+        Duration::from_millis(50),
+    )
+    .await;
</file context>
Suggested change
Duration::from_millis(50),
Duration::from_millis(200),

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29ffaff267

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

recover_observer_token_after_name_conflict(
http_client,
token_name,
timeout_duration,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep observer-token recovery within the HTTP deadline

When the create call returns a name conflict near the default 20s observer-token timeout, passing the full timeout into the recovery path gives list+rotate another 20s; /api/observer-token only waits 30s for reply_rx in listen_api_create_observer_token, so a recovery that succeeds after about 31–40s is reported to the caller as broker request timed out and the broker reply is dropped. Please keep the total create+recover budget under the HTTP handler deadline, or coordinate/raise the outer deadline instead of granting a fresh full window here.

Useful? React with 👍 / 👎.

.into_iter()
.find(|candidate| candidate.name == token_name)
.ok_or(conflict_error)?;
http_client.rotate_observer_token(&matched.id).await

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile scopes before rotating an existing observer token

When a token with the requested name already exists but was created with different scopes or filters, this fallback rotates and returns that token as-is. The normal create path always requests default_observer_token_scopes() with no filters, so a manual or older pear-dashboard-observer token that includes extra scopes such as files:read (or restrictive filters) would make /api/observer-token hand out credentials with unexpected access or visibility; update/verify the matched token metadata before rotating, or fail if it does not match the endpoint contract.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant