fix(broker): recover existing observer token on name-conflict mint by willwashburn · Pull Request #1227 · AgentWorkforce/relay

willwashburn · 2026-07-02T03:19:42Z

Summary

POST /api/observer-token (#1223) mints a scoped, read-only Relaycast observer token per workspace. Pear mints one per project using a fixed name (pear-dashboard-observer) every time it needs one — it has no way to know in advance whether a token under that name was already minted for that workspace.

relaycast enforces a (workspace_id, name) unique index on observer tokens. relaycast PR #232 (not yet released) fixes a bug where minting a second token under the same name crashed with an uncaught 500 instead of returning a clean 409 observer_token_name_conflict. Once #232 ships, repeat mints will fail cleanly — but they'll still fail, which breaks Pear's re-mint-on-demand flow.

This PR adds a fallback: when create_observer_token fails specifically with observer_token_name_conflict, list existing observer tokens for the workspace, find the one matching the attempted name, and rotate it to obtain fresh, usable raw token material (the original raw token was never persisted anywhere, so rotating is the only way to recover a usable value). The response shape is identical to a normal create.

Any other error from create_observer_token (timeout, network failure, a different API error) still propagates as a failure and does not trigger this fallback.

⚠️ Behavioral note

A caller that's minted an observer token under a given name before will get that token rotated (invalidating whatever raw value was previously issued under that name) on every subsequent mint that hits this fallback path. This is intentional and unavoidable — the original raw value can't be recovered any other way. It's fine for this endpoint's known caller: Pear's mintObserverToken (in broker.ts) always treats a freshly-returned token as authoritative and re-caches it. Any other, currently-hypothetical holder of a previous raw value under the same name would silently lose access when this path triggers.

Implementation

RelaycastHttpClient (crates/broker/src/relaycast/ws.rs) gains list_observer_tokens() and rotate_observer_token(id) wrappers, mirroring the existing create_observer_token wrapper's structure (SDK error handling via relay_client()).
All three wrappers now preserve the underlying relaycast::RelayError via .map_err(anyhow::Error::from) (previously anyhow::anyhow!("{error}"), which discarded the source and made the error un-downcastable) so callers can match on the structured API error code instead of string-matching Display output.
crates/broker/src/runtime/api.rs (the ListenApiRequest::CreateObserverToken handler) gains mint_or_recover_observer_token, which wraps the create call, detects observer_token_name_conflict via RelayError::code(), and on match calls recover_observer_token_after_name_conflict (list + find-by-name + rotate). If no token matches the attempted name despite the conflict (e.g. a race with a concurrent revoke), the original conflict error is propagated rather than panicking or synthesizing a misleading response.
The list+rotate fallback is bounded by the same http_api_observer_token_timeout() duration already used for the initial create call (a fresh timeout window for the fallback, not shared budget), so a hung fallback can't block the runtime task indefinitely either.

Depends on

relaycast PR #232 (500→409 fix for the underlying name-conflict error) — not yet released. Until it ships and relaycast-cloud picks it up, production observer-token mints will still 500 on conflict rather than reaching this new fallback path. This PR's new behavior is not exercisable against production until then, but is fully covered by unit tests against a mocked relaycast API.

Test plan

cargo check -p agent-relay-broker
cargo clippy -p agent-relay-broker -- -D warnings
cargo fmt -p agent-relay-broker -- --check
cargo test -p agent-relay-broker --lib — 836 passed (832 baseline + 4 new), 0 failed, 4 ignored
New tests in crates/broker/src/runtime/tests.rs, all against a mocked relaycast HTTP API (httpmock):
- observer_token_name_conflict_falls_back_to_list_and_rotate — a 409 observer_token_name_conflict on create triggers list+rotate and returns the rotated token
- observer_token_non_conflict_error_does_not_trigger_fallback — a 403 on create propagates as a failure without ever calling list/rotate
- observer_token_conflict_without_matching_name_propagates_original_error — a 409 whose name isn't found in the list propagates the original conflict error instead of panicking, and never calls rotate against an unrelated token
- observer_token_fallback_respects_the_supplied_timeout — a slow list response during the fallback still respects the timeout budget

🤖 Generated with Claude Code

POST /api/observer-token mints a fixed-name observer token per workspace (Pear always uses "pear-dashboard-observer"), but relaycast enforces a (workspace_id, name) unique index, so a second mint under the same name fails with observer_token_name_conflict (409, relaycast#232 — not yet released). Since callers have no way to know in advance whether a token under that name already exists, repeat minting needs to just work. When create_observer_token fails with that specific error code, fall back to listing existing observer tokens for the workspace, finding the one matching the attempted name, and rotating it to obtain fresh raw token material — the only way to recover a usable value, since the original raw token was never persisted anywhere. Any other create error (timeout, network failure, a different API error) still propagates as a failure and does not trigger this fallback. If no token matches the name despite the conflict (e.g. a race with a concurrent revoke), the original conflict error is propagated rather than papering over it. Behavioral note: this means a caller that's minted before will get their token rotated (invalidating the previous raw value) on every subsequent mint through this path. Acceptable here since Pear's mintObserverToken always treats a freshly-returned token as authoritative and re-caches it, but flagging explicitly since it's a real, intentional side effect. Adds RelaycastHttpClient::list_observer_tokens/rotate_observer_token wrappers mirroring create_observer_token's existing pattern, and changes all three to preserve the underlying RelayError (via anyhow::Error::from instead of anyhow::anyhow!("{error}")) so the conflict code can be matched structurally instead of via string matching. The list+rotate fallback reuses the same http_api_observer_token_timeout() as the initial create call, bounding it independently so a hung fallback can't block the runtime task either. Depends on relaycast#232 for the underlying 500->409 fix; until that ships and relaycast-cloud picks it up, production mints will still 500 on conflict rather than reaching this new fallback path. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

coderabbitai · 2026-07-02T03:20:21Z

📝 Walkthrough

Walkthrough

This PR changes Relaycast observer-token SDK error handling to preserve structured errors via anyhow::Error::from, and adds a mint_or_recover_observer_token flow that recovers via listing and rotating tokens on a 409 name-conflict, wired into the observer-token API route with new tests.

Changes

Observer token mint and recovery flow

Layer / File(s)	Summary
Structured SDK error preservation `crates/broker/src/relaycast/ws.rs`	`create_observer_token`, `list_observer_tokens`, and `rotate_observer_token` preserve structured SDK errors via `anyhow::Error::from` instead of string formatting; doc comments updated.
Mint outcome/error types and conflict detection `crates/broker/src/runtime/api.rs`	Adds `ObserverTokenMintOutcome` and `ObserverTokenMintError` enums with helper methods, imports `RelayError`, and adds a predicate for `observer_token_name_conflict`.
mint_or_recover_observer_token and rotation fallback `crates/broker/src/runtime/api.rs`	New `mint_or_recover_observer_token` creates a token under timeout, then falls back to `recover_observer_token_after_name_conflict`, which lists tokens and rotates the matching one on 409 conflict.
Route wiring and re-exports `crates/broker/src/runtime/api.rs`, `crates/broker/src/runtime/mod.rs`	The `/api/observer-token` handler calls the new mint function, branching logs for created vs recovered outcomes and typed errors; new items re-exported from the runtime module.
Tests for mint/recover behavior `crates/broker/src/runtime/tests.rs`	New async tests cover conflict-triggered rotation, non-conflict propagation, no-match propagation, and timeout handling.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Possibly related PRs

AgentWorkforce/relay#1223: Both PRs touch the broker's observer-token flow, with this PR building on the minting logic and 409 conflict handling introduced there.

Suggested reviewers: khaliqgant

Poem

A token conflicts, oh what a fright,
But rabbit rotates it, sets it right!
No more strings, just typed delight,
Errors downcast in broad daylight,
Hop, hop, hooray for the mint-or-recover flight! 🐇🔑

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and clearly summarizes the main change: recovering observer tokens on name-conflict mint.
Description check	✅ Passed	The description matches the template with Summary and Test Plan content and includes a Screenshots section placeholder.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/observer-token-name-conflict-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request implements a fallback mechanism for observer token minting when a name conflict occurs (HTTP 409 observer_token_name_conflict). Instead of failing, the broker now lists existing observer tokens, finds the conflicting token by name, and rotates it to recover a fresh, usable token. This change includes new methods on RelaycastHttpClient, helper functions and enums to manage the minting outcome, and comprehensive unit tests covering various success, error, and timeout scenarios. I have no further feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

coderabbitai

🧹 Nitpick comments (1)

crates/broker/src/runtime/api.rs (1)

137-160: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Misleading error message on recovery-path failure.

The failure arm reuses "Failed to create observer token: {error}" even though the error here originates from list_observer_tokens or rotate_observer_token, not create_observer_token. This makes it harder to tell from logs/API responses whether the original create or the fallback list+rotate is what actually failed.

💡 Suggested fix

     match fallback {
         Ok(Ok(observer_token)) => Ok(ObserverTokenMintOutcome::RecoveredViaRotate(observer_token)),
         Ok(Err(error)) => Err(ObserverTokenMintError::Failed(format!(
-            "Failed to create observer token: {error}"
+            "Failed to recover existing observer token via list+rotate: {error}"
         ))),
         Err(_) => Err(ObserverTokenMintError::TimedOut),
     }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/broker/src/runtime/api.rs` around lines 137 - 160, The recovery
failure in recover_observer_token_after_name_conflict is reporting the wrong
operation, since the error comes from list_observer_tokens or
rotate_observer_token rather than create_observer_token. Update the Err branch
in recover_observer_token_after_name_conflict to emit a recovery-specific
message that clearly names the fallback path (list+rotate) and preserves the
underlying error, so logs and API responses distinguish original token creation
failures from recovery-path failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/broker/src/runtime/api.rs`:
- Around line 137-160: The recovery failure in
recover_observer_token_after_name_conflict is reporting the wrong operation,
since the error comes from list_observer_tokens or rotate_observer_token rather
than create_observer_token. Update the Err branch in
recover_observer_token_after_name_conflict to emit a recovery-specific message
that clearly names the fallback path (list+rotate) and preserves the underlying
error, so logs and API responses distinguish original token creation failures
from recovery-path failures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9401c965-a86a-4f2e-9d7a-d57c5172ee6d

📥 Commits

Reviewing files that changed from the base of the PR and between 7a98c6e and 29ffaff.

📒 Files selected for processing (4)

crates/broker/src/relaycast/ws.rs
crates/broker/src/runtime/api.rs
crates/broker/src/runtime/mod.rs
crates/broker/src/runtime/tests.rs

cubic-dev-ai

2 issues found across 4 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/broker/src/runtime/tests.rs">

<violation number="1" location="crates/broker/src/runtime/tests.rs:3056">
P3: This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.</violation>
</file>

<file name="crates/broker/src/runtime/api.rs">

<violation number="1" location="crates/broker/src/runtime/api.rs:147">
P1: A repeat mint can return a token with different permissions than `/api/observer-token` requested: the recovery path rotates the first listed token whose name matches, without checking `candidate.scopes` against `default_observer_token_scopes()`. Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-07-02T03:29:55Z

+        let existing = http_client.list_observer_tokens().await?;
+        let matched = existing
+            .into_iter()
+            .find(|candidate| candidate.name == token_name)


P1: A repeat mint can return a token with different permissions than /api/observer-token requested: the recovery path rotates the first listed token whose name matches, without checking candidate.scopes against default_observer_token_scopes(). Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/broker/src/runtime/api.rs, line 147: <comment>A repeat mint can return a token with different permissions than `/api/observer-token` requested: the recovery path rotates the first listed token whose name matches, without checking `candidate.scopes` against `default_observer_token_scopes()`. Consider requiring the exact expected scope set before rotating, and propagating the original conflict when the existing token doesn't match.</comment> <file context> @@ -23,6 +23,142 @@ pub(crate) fn default_observer_token_scopes() -> Vec<ObserverScope> { + let existing = http_client.list_observer_tokens().await?; + let matched = existing + .into_iter() + .find(|candidate| candidate.name == token_name) + .ok_or(conflict_error)?; + http_client.rotate_observer_token(&matched.id).await </file context>

Suggested change

.find(|candidate| candidate.name == token_name)

.find(|candidate| {

let expected_scopes = default_observer_token_scopes();

candidate.name == token_name

&& candidate.scopes.len() == expected_scopes.len()

&& expected_scopes

.iter()

.all(|scope| candidate.scopes.contains(scope))

})

cubic-dev-ai · 2026-07-02T03:29:56Z

+    let result = mint_or_recover_observer_token(
+        &client,
+        "pear-dashboard-observer",
+        Duration::from_millis(50),


P3: This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/broker/src/runtime/tests.rs, line 3056: <comment>This timeout test can become flaky on slower CI because the 50ms budget also applies to the initial create request, not just the delayed list fallback. A slightly larger timeout that remains below the mocked 300ms list delay would still exercise fallback cancellation while avoiding pre-fallback timeouts.</comment> <file context> @@ -2807,3 +2807,265 @@ fn default_observer_token_scopes_are_read_only_and_exclude_unneeded_scopes() { + let result = mint_or_recover_observer_token( + &client, + "pear-dashboard-observer", + Duration::from_millis(50), + ) + .await; </file context>

Suggested change

Duration::from_millis(50),

Duration::from_millis(200),

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29ffaff267

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T03:30:28Z

+            recover_observer_token_after_name_conflict(
+                http_client,
+                token_name,
+                timeout_duration,


Keep observer-token recovery within the HTTP deadline

When the create call returns a name conflict near the default 20s observer-token timeout, passing the full timeout into the recovery path gives list+rotate another 20s; /api/observer-token only waits 30s for reply_rx in listen_api_create_observer_token, so a recovery that succeeds after about 31–40s is reported to the caller as broker request timed out and the broker reply is dropped. Please keep the total create+recover budget under the HTTP handler deadline, or coordinate/raise the outer deadline instead of granting a fresh full window here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T03:30:28Z

+            .into_iter()
+            .find(|candidate| candidate.name == token_name)
+            .ok_or(conflict_error)?;
+        http_client.rotate_observer_token(&matched.id).await


Reconcile scopes before rotating an existing observer token

When a token with the requested name already exists but was created with different scopes or filters, this fallback rotates and returns that token as-is. The normal create path always requests default_observer_token_scopes() with no filters, so a manual or older pear-dashboard-observer token that includes extra scopes such as files:read (or restrictive filters) would make /api/observer-token hand out credentials with unexpected access or visibility; update/verify the matched token metadata before rotating, or fail if it does not match the endpoint contract.

Useful? React with 👍 / 👎.

willwashburn requested a review from khaliqgant as a code owner July 2, 2026 03:19

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jul 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(broker): recover existing observer token on name-conflict mint#1227

fix(broker): recover existing observer token on name-conflict mint#1227
willwashburn wants to merge 1 commit into
mainfrom
fix/observer-token-name-conflict-fallback

willwashburn commented Jul 2, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Jul 2, 2026

Uh oh!

cubic-dev-ai Bot Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            .find(|candidate| candidate.name == token_name)
+            .find(|candidate| {
+                let expected_scopes = default_observer_token_scopes();
+                candidate.name == token_name
+                    && candidate.scopes.len() == expected_scopes.len()
+                    && expected_scopes
+                        .iter()
+                        .all(|scope| candidate.scopes.contains(scope))
+            })

Uh oh!

Conversation

willwashburn commented Jul 2, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

⚠️ Behavioral note

Implementation

Depends on

Test plan

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willwashburn commented Jul 2, 2026 •

edited by cubic-dev-ai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading