Skip to content

Slack integration event fixes + #125 workspace-key wiring#132

Open
kjgbot wants to merge 7 commits into
mainfrom
fix/slack-integration-event-context-retry
Open

Slack integration event fixes + #125 workspace-key wiring#132
kjgbot wants to merge 7 commits into
mainfrom
fix/slack-integration-event-context-retry

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented Jun 6, 2026

Summary

6 commits, each cross-reviewed before landing. Verification: node --test 89/89, vitest 119/119, tsc clean.

🛑 DRAFT — merge hold per Khaliq pending discussion with Will.

🤖 Generated with Claude Code

kjgbot and others added 6 commits June 6, 2026 14:40
A blind Slack logical claim (context read returned nothing) suppressed the
late content-bearing alias copy without recording its hash, so every genuine
edit within the 1h replay TTL was also suppressed. The claim now learns the
content hash from the suppressed copy (expiry unchanged), letting differing
content inject again. Also gives waitForDropped a timeout parameter the
sparse-read suppression test already relied on, and covers blind-first →
suppressed copy → edit-injects ordering.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cloud sandbox brokers previously always created their own relay workspace,
isolating cloud agents from the project's local agents (#125). The local
broker remains the workspace creator; its workspace_key is now exposed via
brokerManager.workspaceKeyForProject() and sent best-effort on the
provisioning POST /box together with a stable, pear-assigned broker
instance name (cloud-<cloudAgentId prefix>), which cloud injects verbatim
as AGENT_RELAY_WORKSPACE_KEY / AGENT_RELAY_BROKER_NAME. Identity is
provision-time only: GET/PATCH bodies are unchanged. Local spawns honor an
explicit AGENT_RELAY_WORKSPACE_KEY pin through a single typed seam until
@agent-relay/harness-driver publishes workspaceKey in RuntimeSpawnOptions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The broker's explicit workspace join (#125, relay 6419d59c) fails loud
instead of creating a fresh workspace; classifyWorkspaceJoinFailure() keys
on the contract strings so the start error status distinguishes fatal key
rejection (401/403) from retryable rate limiting (429). Cast site comment
updated: the drop is verified against the built 8.3.0+T3 harness-driver
types and rides the published version bump. Also adds the #127 Stage 1
read-only observer design spec with the relay/cloud rulings folded in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A sandbox broker predating the strict-join contract (#125) silently ignores
the injected AGENT_RELAY_WORKSPACE_KEY and creates an isolated workspace —
the exact failure the contract fixed, reborn through a stale snapshot bake.
The warm path now records the workspace key it actually sent per project and
attachCloudSandbox compares it against the sandbox session's workspace_key:
a mismatch logs loudly and publishes a cloud_workspace_key_mismatch broker
event carrying 8-char prefixes of both keys, so a stale-binary deployment is
attributable from logs alone. Keyless and legacy warms stay silent, and
detach clears the per-project key so re-attaches cannot compare against a
stale value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements multi-instance observer support and workspace-key coordination, alongside Slack integration improvements. It adds a design spec for Stage 1 read-only observer capability, implements per-project workspace-key propagation across local broker, cloud provisioning, and attachment paths to prevent stale-binary mismatches, and refines Slack event deduplication to handle blind-content scenarios where initial context is unavailable but content becomes available later.

Changes

Multi-Instance Stage 1 Design & Observer Support

Layer / File(s) Summary
Stage 1 Read-Only Observer Specification
docs/specs/2026-06-06-multi-instance-stage1-observer.md
Complete design spec defining instance naming, shared /pear/project.json project definition, invite/join token flow with schema validation, two-layer read-only enforcement (Pear-side role guards + broker-side readonly capability), PTY fan-out with sequence-based deduplication, relaycast presence with TTL/heartbeat, required IPC additions, dependency sequencing behind PEAR_MULTI_INSTANCE flag, open questions, and unit/integration/manual test plan.

Workspace-Key Propagation for Correct Broker Joining

Layer / File(s) Summary
Workspace-join failure classification and local broker pinning
src/main/broker.ts, src/main/broker.test.ts
Introduces classifyWorkspaceJoinFailure() to distinguish fatal rejected from retryable rate-limited failures. Extends CloudAgentSandboxHandle with sentWorkspaceKey field. Local broker spawn now conditionally includes pinned workspaceKey from AGENT_RELAY_WORKSPACE_KEY env var. Adds workspaceKeyForProject() helper to retrieve local broker's workspace key from session metadata. Updates startup error handling to emit tailored status messages. Test harness extended with nextCloudSessionMetadata injection and comprehensive workspace-key coverage.
Cloud agent workspace-key state management and provisioning
src/main/cloud-agent.ts, src/main/cloud-agent.test.ts
Broker adapter interface extended with optional workspaceKeyForProject(projectId) method. CloudAgentManager introduces sentWorkspaceKeys map to track per-project keys, integrates retrieval into warmBox to pass workspaceKey/brokerName into POST request, clears keys on detach, and passes tracked key into sandbox during attachment. fetchBox extended to conditionally include workspace-key/broker in payload. Test suite updated with boxRequestBody/expectBoxPostBody helpers validating key inclusion/omission across warm/attach/detach flows and PATCH request scoping.
Cloud attachment workspace-key mismatch detection
src/main/broker.ts
Post-attach comparison of provisioning-provided sentWorkspaceKey against sandbox broker's actual workspace_key. Logs stale-binary diagnostic with prefixed key values on mismatch. Publishes cloud_workspace_key_mismatch broker event containing sandbox id and both keys for observability.

Slack Integration Deduplication – Blind Content Handling

Layer / File(s) Summary
Blind claim deduplication for sparse-context Slack events
src/main/integration-event-bridge.ts, src/main/__tests__/integration-event-bridge.test.ts
Refines claimSlackLogicalInjection to initialize content-hash tracking when an existing entry has no recorded hashes: initializes contentHashes with current hash (if present) and returns false to suppress the blind claim, allowing later edits with different content to be injected. Test validates that an initial raw-id event with unavailable content is suppressed, subsequent slug-alias copy is deduplicated without error, and after content changes, a later slug-alias edit injects the updated message.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • AgentWorkforce/pear#125: Implementation directly addresses workspace-key plumbing for correct broker workspace joining during multi-instance observer setup.

Possibly related PRs

  • AgentWorkforce/pear#116: Both PRs modify Slack deduplication in integration-event-bridge.ts for raw-id vs slug-alias blind-claim and content-hash scenarios with similar test coverage.
  • AgentWorkforce/pear#130: Both PRs refine claimSlackLogicalInjection blind/sparse-context behavior and update integration-event-bridge.test.ts coverage for the same deduplication scenario.

Suggested labels

size:XL

Poem

🐰 A second observer joins the fold,
With workspace keys to strictly hold,
No writes allowed, just watching near,
And Slack dedupes both blind and clear—
MultiInstance, Stage One's here!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the two main changes: Slack integration event fixes and workspace-key wiring for issue #125, with a Stage 1 multi-instance observer spec also included.
Description check ✅ Passed The description clearly relates to the changeset, detailing Slack event fixes, workspace-key implementation for #125, failure classification, and the #127 spec included in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/slack-integration-event-context-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Stage 1 of the Multi-Instance feature, enabling a read-only observer mode. Key changes include adding the design specification, updating BrokerManager to support explicit workspace keys and detect cloud workspace mismatches, updating CloudAgentManager to pass local workspace keys when provisioning sandboxes, and enhancing IntegrationEventBridge with Slack logical injection state, retries, and sparse-data deduplication. Feedback on these changes suggests using optional chaining on session metadata to prevent potential runtime errors, and utilizing index signatures instead of dot notation on Record types to avoid TypeScript compilation issues.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/main/broker.ts
if (!session) return undefined
try {
const metadata = await session.client.getSession()
return metadata.workspace_key || undefined
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use optional chaining (metadata?.workspace_key) to safely access the property, preventing potential runtime errors if getSession() returns null or undefined.

      return metadata?.workspace_key || undefined

Comment thread src/main/broker.ts
// sent; prefixes keep the event diagnosable from logs without leaking
// whole keys.
const sentWorkspaceKey = handle.sentWorkspaceKey?.trim() || undefined
const sandboxWorkspaceKey = sessionMetadata.workspace_key || undefined
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use optional chaining (sessionMetadata?.workspace_key) to prevent a potential runtime TypeError if getSession() returns null or undefined under unexpected conditions.

      const sandboxWorkspaceKey = sessionMetadata?.workspace_key || undefined

Comment on lines +1559 to +1561
if (keys.length === 1 && keys[0] === 'path') return data.path === path
if (keys.length === 2 && keys[0] === 'deleted' && keys[1] === 'path') {
return data.path === path && typeof data.deleted === 'boolean'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since data is typed as Record<string, unknown> via isRecord, accessing properties using dot notation (data.path, data.deleted) can cause TypeScript compilation errors under strict configurations. Use index signatures (data['path'], data['deleted']) instead.

Suggested change
if (keys.length === 1 && keys[0] === 'path') return data.path === path
if (keys.length === 2 && keys[0] === 'deleted' && keys[1] === 'path') {
return data.path === path && typeof data.deleted === 'boolean'
if (keys.length === 1 && keys[0] === 'path') return data['path'] === path
if (keys.length === 2 && keys[0] === 'deleted' && keys[1] === 'path') {
return data['path'] === path && typeof data['deleted'] === 'boolean'
}

@kjgbot kjgbot marked this pull request as ready for review June 6, 2026 13:25
@agent-relay-code
Copy link
Copy Markdown
Contributor

Reviewed PR #132 against the current checkout and found no reproducible breakage requiring edits. No bot/reviewer comments were present in .workforce.

Validated locally:

  • npm install
  • npx vitest --run src/main/broker.test.ts src/main/cloud-agent.test.ts
  • npm test -- src/main/__tests__/integration-event-bridge.test.ts
  • npx vitest --run
  • npm run build

@agent-relay-code
Copy link
Copy Markdown
Contributor

ℹ️ pr-reviewer: review only — no file changes were applied to the PR (nothing to commit after review). The notes below are advisory and were not pushed.

Reviewed PR #132 against the current checkout and found no reproducible breakage requiring edits. No bot/reviewer comments were present in .workforce.

Validated locally:

  • npm install
  • npx vitest --run src/main/broker.test.ts src/main/cloud-agent.test.ts
  • npm test -- src/main/__tests__/integration-event-bridge.test.ts
  • npx vitest --run
  • npm run build

@kjgbot
Copy link
Copy Markdown
Contributor Author

kjgbot commented Jun 6, 2026

Read-only cross-review from cloud-worker: no blocking findings. Focused on 95d31e4 and 8350eb0 plus the integration-event bridge changes; strict-join classifier matches relay error strings, sentWorkspaceKeys lifecycle is covered, and focused tests/tsc pass. Non-blocking note: the mismatch event is diagnosable via payload/log prefixes, but the Broker Details summary does not surface those prefixes. GitHub would not let this account approve its own PR.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 7 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/main/broker.ts">

<violation number="1" location="src/main/broker.ts:1339">
P2: Workspace key is leaked to logs during broker start. The new `workspaceKey` field is included in a serialized debug log.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/main/broker.ts
channels: nextChannels,
binaryArgs: { persist: true },
binaryPath: resolveBundledBrokerBinary(),
...(explicitWorkspaceKey ? { workspaceKey: explicitWorkspaceKey } : {}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Workspace key is leaked to logs during broker start. The new workspaceKey field is included in a serialized debug log.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/main/broker.ts, line 1339:

<comment>Workspace key is leaked to logs during broker start. The new `workspaceKey` field is included in a serialized debug log.</comment>

<file context>
@@ -1304,12 +1323,20 @@ export class BrokerManager {
         channels: nextChannels,
         binaryArgs: { persist: true },
         binaryPath: resolveBundledBrokerBinary(),
+        ...(explicitWorkspaceKey ? { workspaceKey: explicitWorkspaceKey } : {}),
         env: {
           PATH: augmentedPath(),
</file context>

Comment thread src/main/integration-event-bridge.ts
Main independently landed the Slack context-preview/dedupe line as #130 with
a stronger claim shape: contentHashes as a Set, so late alias copies of ANY
previously seen edit stay suppressed (the single-hash variant would re-inject
a stale copy of edit N-1 after edit N). Resolution keeps main's Set design
and re-applies this branch's blind-claim fix on top: a blind claim learns the
hash from the suppressed late content copy, so genuine edits within the TTL
still inject. The edit-after-blind regression test is retained.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Jun 6, 2026

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@agent-relay-code
Copy link
Copy Markdown
Contributor

Reviewed PR #132 against the provided diff and current checkout. I did not find a reproducible PR breakage that required code edits, so I left the implementation unchanged.

Local verification run:

  • npm ci
  • npx --no-install vitest run src/main/broker.test.ts src/main/cloud-agent.test.ts passed
  • npm test -- src/main/__tests__/integration-event-bridge.test.ts passed
  • npx --no-install vitest run passed
  • npm run build passed

I also tried npx --no-install tsc --noEmit -p tsconfig.node.json; it fails on broad existing repo configuration/type issues outside this PR’s focused surface, while the repo’s actual build script succeeds.

@agent-relay-code
Copy link
Copy Markdown
Contributor

ℹ️ pr-reviewer: review only — no file changes were applied to the PR (nothing to commit after review). The notes below are advisory and were not pushed.

Reviewed PR #132 against the provided diff and current checkout. I did not find a reproducible PR breakage that required code edits, so I left the implementation unchanged.

Local verification run:

  • npm ci
  • npx --no-install vitest run src/main/broker.test.ts src/main/cloud-agent.test.ts passed
  • npm test -- src/main/__tests__/integration-event-bridge.test.ts passed
  • npx --no-install vitest run passed
  • npm run build passed

I also tried npx --no-install tsc --noEmit -p tsconfig.node.json; it fails on broad existing repo configuration/type issues outside this PR’s focused surface, while the repo’s actual build script succeeds.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main/broker.ts (1)

1326-1392: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact workspace keys before logging or sending start errors to the renderer.

workspaceKey is now part of opts, so Line 1349 serializes the full key into the main-process logs. The new rejected/rate-limited branches at Lines 1387-1392 also forward String(err) verbatim, and those relay error strings include the explicit key. The attach-time mismatch path already limits itself to 8-character prefixes, so these two sinks reintroduce the exact secret leakage this PR is otherwise careful to avoid.

🔒 Suggested redaction
       const explicitWorkspaceKey = process.env.AGENT_RELAY_WORKSPACE_KEY?.trim() || undefined
       const opts: AgentRelaySpawnOptions & { workspaceKey?: string } = {
         cwd,
         brokerName: name,
         channels: nextChannels,
         binaryArgs: { persist: true },
         binaryPath: resolveBundledBrokerBinary(),
         ...(explicitWorkspaceKey ? { workspaceKey: explicitWorkspaceKey } : {}),
         env: {
           PATH: augmentedPath(),
           ...(agentRelayMcpCommand ? { AGENT_RELAY_MCP_COMMAND: agentRelayMcpCommand } : {})
         },
         onStderr: (line: string) => {
           console.error(`[broker stderr:${normalizedProjectId}]`, line)
         }
       }

-      console.log('[broker] Starting with opts:', JSON.stringify({ ...opts, projectId: normalizedProjectId }))
+      console.log('[broker] Starting with opts:', JSON.stringify({
+        ...opts,
+        projectId: normalizedProjectId,
+        ...(explicitWorkspaceKey ? { workspaceKey: `${explicitWorkspaceKey.slice(0, 8)}…` } : {})
+      }))
       const client = await AgentRelayClient.spawn(opts)
@@
-      const statusMessage = joinFailure === 'rate-limited'
-        ? `Workspace join rate-limited (retryable): ${String(err)}`
-        : joinFailure === 'rejected'
-          ? `Workspace key rejected — broker refused to create a fresh workspace: ${String(err)}`
-          : String(err)
+      const redactedError = String(err).replace(
+        /(explicit workspace key )(\S+)/iu,
+        (_match, prefix, key) => `${prefix}${String(key).slice(0, 8)}…`
+      )
+      const statusMessage = joinFailure === 'rate-limited'
+        ? `Workspace join rate-limited (retryable): ${redactedError}`
+        : joinFailure === 'rejected'
+          ? `Workspace key rejected — broker refused to create a fresh workspace: ${redactedError}`
+          : redactedError
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/broker.ts` around lines 1326 - 1392, The log and error paths are
leaking explicit workspaceKey from opts and from error strings; update
startBroker/startPromise error handling and the JSON.stringify of opts so
workspaceKey is redacted: when building the opts object or right before logging
in the console.log('[broker] Starting with opts:', ...) replace
opts.workspaceKey (or explicitWorkspaceKey) with a masked value (e.g., first 8
chars + '…' or '<redacted>'), and sanitize any error string used in
console.error and in sendStatusToWindow by detecting and replacing the full
workspace key substring with the same masked form (apply this sanitization to
the err variable returned from classifyWorkspaceJoinFailure and before calling
sendStatusToWindow and console.error). Ensure you reference and modify the
variables/functions AgentRelaySpawnOptions opts/explicitWorkspaceKey,
startBroker/startPromise error catch, classifyWorkspaceJoinFailure,
console.error call, and sendStatusToWindow so no full workspace key is logged or
forwarded to the renderer.
🧹 Nitpick comments (4)
docs/specs/2026-06-06-multi-instance-stage1-observer.md (4)

188-189: ⚡ Quick win

Clarify PTY backfill question against stated contract.

Section 4 (lines 131-134) already defines the Stage 1 catch-up contract as "current visible-screen PTY snapshot + live stream from join" with "NO durable per-observer scrollback contract". However, Q4 here asks "does the broker keep enough scrollback per PTY to replay on attach, or is 'live from join' the Stage 1 contract?"

This appears inconsistent. Either:

  1. Section 4 should be softened to "proposed contract pending relay-worker confirmation", or
  2. Q4 should be reframed as "does current broker implementation already provide the visible-screen snapshot, or does that need new machinery?"

Clarifying whether this is a contract question (already answered) or an implementation verification question would help readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md` around lines 188 -
189, The document is inconsistent about the Stage 1 PTY backfill: Section 4
currently states the contract as "current visible-screen PTY snapshot + live
stream from join" with "NO durable per-observer scrollback contract", while Q4
asks whether the broker keeps scrollback for replay on attach; reconcile by
either softening Section 4 to mark that as a proposed contract pending
relay-worker confirmation, or reword Q4 to ask an implementation verification
(e.g., "does the current broker implementation already provide the
visible-screen snapshot, or is additional machinery required?") and explicitly
reference the Stage 1 catch-up contract phrase so readers can see this is a
contract vs implementation question.

69-70: 💤 Low value

Consider early schema version detection in invite token.

The spec mentions schemaVersion in /pear/project.json and fail-loud behavior for unknown versions. The invite token (lines 78-86) only carries v: 1 for token format. If the project.json schema evolves, an observer with an old Pear build would only discover incompatibility after joining. Consider whether the invite token should also advertise the project.json schemaVersion so version mismatch can be detected before workspace join, avoiding unnecessary broker setup for incompatible observers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md` around lines 69 -
70, The invite token currently only carries the token format version (`v: 1`);
update the invite token payload (the place that encodes `v: 1`) to also include
the project's `/pear/project.json` schemaVersion so observers can detect schema
mismatches before joining; specifically, add a field like `projectSchemaVersion`
to the token generation/validation logic and ensure the join/observe flow (where
`v: 1` is parsed) checks that token field against the local supported
`/pear/project.json` schemaVersion and rejects or warns early if incompatible.

44-46: ⚡ Quick win

Clarify machineSlug collision handling.

The spec states machineSlug is "hostname-derived, 8 chars, persisted in local app config on first run" but doesn't specify what happens if two machines share a hostname or if the 8-char truncation creates a collision. Consider adding collision detection/resolution strategy (e.g., append random suffix on collision, or add this to Open Questions if relay-worker needs to define uniqueness guarantees).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md` around lines 44 -
46, The spec's machine identifier "machineSlug" (used in the name pattern
pear-<relayWorkspaceId>-<machineSlug>) needs explicit collision handling: update
the document to specify how to detect and resolve collisions when the
hostname-derived 8-char machineSlug already exists (e.g., check
registry/ownership store on startup or during registration, and on collision
append a deterministic or random suffix such as "-01" or a short hash, persist
the resolved value in local app config so it remains stable), or alternatively
add this as an explicit Open Question assigning relay-worker or the registration
protocol to guarantee uniqueness; reference the machineSlug symbol and the name
pattern pear-<relayWorkspaceId>-<machineSlug> so implementers know where to
apply the check.

144-152: 💤 Low value

Consider presence heartbeat failure recovery.

The spec defines TTL-expiry on heartbeat silence (every 60s) but doesn't specify what happens if an instance's heartbeat fails due to transient network issues while the instance and broker remain healthy. Should the instance:

  1. Detect its own tombstone and re-publish presence?
  2. Show "connection degraded" UI?
  3. Rely on relaycast reconnect to republish automatically?

This may be a relaycast-level behavior, but clarifying the expected recovery path would help implementation and testing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md` around lines 144 -
152, The spec is missing behavior for transient heartbeat failures for the
presence channel `#pear-presence-<projectId>` (heartbeat every 60s, message
fields `{instanceName, role, joinedAt}`, plus `kind`); update the document to
require that an instance: (a) locally detects its presence record TTL expiration
(tombstone) and immediately attempts to re-publish its presence entry with a
refreshed `joinedAt`/timestamp, (b) emits a local "connection degraded" UI state
while in the tombstoned-but-still-running window, and (c) still relies on
relaycast reconnect semantics as a fallback so duplicate publishes are
idempotent; mention where `joinedAt`/`kind` should be updated on re-publish and
add a short test case asserting re-publish after transient network loss.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md`:
- Around line 190-200: Update the Test plan to add explicit unit and integration
tests covering workspace join failure modes and broker event deduplication: add
unit cases for workspace.join(token) to assert behavior for invalid/expired
tokens and proper HTTP classification (401/403 as fatal vs 429 as retryable),
and add integration tests that simulate duplicate/replayed broker events
(specifically agent_spawned and agent_exited) delivered to multiple
BrokerManager instances to verify idempotent handling and no double-fanout;
reference the existing PTY seq dedupe test as a template for replay/duplicate
delivery harness setup.

In `@src/main/broker.test.ts`:
- Around line 384-491: The tests in broker.test.ts cover workspace-key happy
paths but miss validating the new classifyWorkspaceJoinFailure() branches in
BrokerManager.start(): add two unit tests that simulate relay errors classified
as "was rejected" and "was rate-limited" and assert the window receives a
'broker:status' payload with the expected status/message; specifically, create
tests that instantiate BrokerManager, create a mock window (createMockWindow),
arrange the mocked relay/driver to produce the error text that
classifyWorkspaceJoinFailure() matches, call manager.start(...) (or the code
path that triggers workspace join failure), and assert win.webContents.send was
called with 'broker:status' and a payload indicating the rejection or rate-limit
condition; ensure tests clean up by calling manager.shutdown() and mirror the
style of the surrounding tests (use PROJECT_ID, BrokerManager.start, and
expect(win.webContents.send) assertions).

---

Outside diff comments:
In `@src/main/broker.ts`:
- Around line 1326-1392: The log and error paths are leaking explicit
workspaceKey from opts and from error strings; update startBroker/startPromise
error handling and the JSON.stringify of opts so workspaceKey is redacted: when
building the opts object or right before logging in the console.log('[broker]
Starting with opts:', ...) replace opts.workspaceKey (or explicitWorkspaceKey)
with a masked value (e.g., first 8 chars + '…' or '<redacted>'), and sanitize
any error string used in console.error and in sendStatusToWindow by detecting
and replacing the full workspace key substring with the same masked form (apply
this sanitization to the err variable returned from classifyWorkspaceJoinFailure
and before calling sendStatusToWindow and console.error). Ensure you reference
and modify the variables/functions AgentRelaySpawnOptions
opts/explicitWorkspaceKey, startBroker/startPromise error catch,
classifyWorkspaceJoinFailure, console.error call, and sendStatusToWindow so no
full workspace key is logged or forwarded to the renderer.

---

Nitpick comments:
In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md`:
- Around line 188-189: The document is inconsistent about the Stage 1 PTY
backfill: Section 4 currently states the contract as "current visible-screen PTY
snapshot + live stream from join" with "NO durable per-observer scrollback
contract", while Q4 asks whether the broker keeps scrollback for replay on
attach; reconcile by either softening Section 4 to mark that as a proposed
contract pending relay-worker confirmation, or reword Q4 to ask an
implementation verification (e.g., "does the current broker implementation
already provide the visible-screen snapshot, or is additional machinery
required?") and explicitly reference the Stage 1 catch-up contract phrase so
readers can see this is a contract vs implementation question.
- Around line 69-70: The invite token currently only carries the token format
version (`v: 1`); update the invite token payload (the place that encodes `v:
1`) to also include the project's `/pear/project.json` schemaVersion so
observers can detect schema mismatches before joining; specifically, add a field
like `projectSchemaVersion` to the token generation/validation logic and ensure
the join/observe flow (where `v: 1` is parsed) checks that token field against
the local supported `/pear/project.json` schemaVersion and rejects or warns
early if incompatible.
- Around line 44-46: The spec's machine identifier "machineSlug" (used in the
name pattern pear-<relayWorkspaceId>-<machineSlug>) needs explicit collision
handling: update the document to specify how to detect and resolve collisions
when the hostname-derived 8-char machineSlug already exists (e.g., check
registry/ownership store on startup or during registration, and on collision
append a deterministic or random suffix such as "-01" or a short hash, persist
the resolved value in local app config so it remains stable), or alternatively
add this as an explicit Open Question assigning relay-worker or the registration
protocol to guarantee uniqueness; reference the machineSlug symbol and the name
pattern pear-<relayWorkspaceId>-<machineSlug> so implementers know where to
apply the check.
- Around line 144-152: The spec is missing behavior for transient heartbeat
failures for the presence channel `#pear-presence-<projectId>` (heartbeat every
60s, message fields `{instanceName, role, joinedAt}`, plus `kind`); update the
document to require that an instance: (a) locally detects its presence record
TTL expiration (tombstone) and immediately attempts to re-publish its presence
entry with a refreshed `joinedAt`/timestamp, (b) emits a local "connection
degraded" UI state while in the tombstoned-but-still-running window, and (c)
still relies on relaycast reconnect semantics as a fallback so duplicate
publishes are idempotent; mention where `joinedAt`/`kind` should be updated on
re-publish and add a short test case asserting re-publish after transient
network loss.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 96885536-6d00-47dc-8adb-6b244263823c

📥 Commits

Reviewing files that changed from the base of the PR and between 0660358 and 087e4b9.

📒 Files selected for processing (7)
  • docs/specs/2026-06-06-multi-instance-stage1-observer.md
  • src/main/__tests__/integration-event-bridge.test.ts
  • src/main/broker.test.ts
  • src/main/broker.ts
  • src/main/cloud-agent.test.ts
  • src/main/cloud-agent.ts
  • src/main/integration-event-bridge.ts

Comment on lines +190 to +200
## Test plan sketch

- Unit: invite token round-trip (incl. version/role rejection), role guard on
every mutating IPC handler (table-driven), project.json snapshot apply +
revision conflict.
- Integration: two BrokerManager instances against one broker (the
broker.test.ts harness already mocks spawn; extend with a second client),
PTY seq dedupe under interleaved chunks, observer join while agent mid-run.
- Manual/e2e (needs T2-style debug logging): second machine joins via token,
watches a live agent, kill -9 the host instance → observer survives in
read-only state with stale-host indicator.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Expand test plan to cover workspace join failures and broker event deduplication.

The test plan covers PTY deduplication well but has gaps relative to critical behaviors described earlier:

  1. Workspace join failure scenarios (section 2, lines 94-98): No tests listed for invalid/expired workspace keys, 401/403 fatal rejection vs 429 retryable rate-limiting. These are critical fail-loud paths that distinguish this from #125's "silent fresh workspace" failure mode.

  2. Broker event deduplication (section 4, lines 139-142): PTY chunk dedupe is covered (line 197), but agent_spawned/agent_exited events that "fan out to all instances" aren't explicitly tested for duplicate delivery scenarios.

As per coding guidelines, broker start and event streaming changes require regression tests with duplicate/replay cases. Consider adding:

  • Unit: workspace.join(token) rejection paths (bad key, expired key, 401 vs 429 classification)
  • Integration: broker event replay/duplicate delivery to multiple instances
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/specs/2026-06-06-multi-instance-stage1-observer.md` around lines 190 -
200, Update the Test plan to add explicit unit and integration tests covering
workspace join failure modes and broker event deduplication: add unit cases for
workspace.join(token) to assert behavior for invalid/expired tokens and proper
HTTP classification (401/403 as fatal vs 429 as retryable), and add integration
tests that simulate duplicate/replayed broker events (specifically agent_spawned
and agent_exited) delivered to multiple BrokerManager instances to verify
idempotent handling and no double-fanout; reference the existing PTY seq dedupe
test as a template for replay/duplicate delivery harness setup.

Source: Coding guidelines

Comment thread src/main/broker.test.ts
Comment on lines +384 to +491
it('passes an explicit workspace key env pin to local broker spawn options', async () => {
const previousWorkspaceKey = process.env.AGENT_RELAY_WORKSPACE_KEY
process.env.AGENT_RELAY_WORKSPACE_KEY = 'rk_live_pinned'
const manager = new BrokerManager()

try {
await manager.start(PROJECT_ID, '/tmp/project-1', 'pear-project-1', undefined as never, [])

expect(mock.HarnessDriverClient.spawn).toHaveBeenCalledWith(expect.objectContaining({
brokerName: 'pear-project-1',
workspaceKey: 'rk_live_pinned'
}))
} finally {
if (previousWorkspaceKey === undefined) {
delete process.env.AGENT_RELAY_WORKSPACE_KEY
} else {
process.env.AGENT_RELAY_WORKSPACE_KEY = previousWorkspaceKey
}
await manager.shutdown()
}
})

it('reads the local broker workspace key for cloud provisioning', async () => {
const manager = new BrokerManager()
const local = await startLocal(manager)
local.getSession.mockResolvedValueOnce({ workspace_key: 'rk_live_project' })

await expect(manager.workspaceKeyForProject(PROJECT_ID)).resolves.toBe('rk_live_project')

await manager.shutdown()
})

it('omits the project workspace key when no local broker exposes one', async () => {
const manager = new BrokerManager()
await startLocal(manager)

await expect(manager.workspaceKeyForProject(PROJECT_ID)).resolves.toBeUndefined()
await expect(manager.workspaceKeyForProject('missing-project')).resolves.toBeUndefined()

await manager.shutdown()
})

it('emits a cloud workspace mismatch event when the sandbox ignores the sent key', async () => {
const manager = new BrokerManager()
const win = createMockWindow()
mock.state.nextCloudSessionMetadata.push({ workspace_key: 'rk_sand_456' })

await manager.attachCloudSandbox(PROJECT_ID, {
sandboxId: 'sandbox-1',
execUrl: 'https://sandbox.example',
sentWorkspaceKey: 'rk_sent_123'
}, win)

expect(win.webContents.send).toHaveBeenCalledWith(
'broker:event',
expect.objectContaining({
kind: 'cloud_workspace_key_mismatch',
projectId: PROJECT_ID,
cloudSandboxId: 'sandbox-1',
sentWorkspaceKeyPrefix: 'rk_sent_',
sandboxWorkspaceKeyPrefix: 'rk_sand_',
detail: expect.stringContaining('stale broker binary')
})
)

await manager.shutdown()
})

it('does not emit a cloud workspace mismatch event when the sandbox joins the sent key', async () => {
const manager = new BrokerManager()
const win = createMockWindow()
mock.state.nextCloudSessionMetadata.push({ workspace_key: 'rk_live_same' })

await manager.attachCloudSandbox(PROJECT_ID, {
sandboxId: 'sandbox-1',
execUrl: 'https://sandbox.example',
sentWorkspaceKey: 'rk_live_same'
}, win)

const mismatchEvents = (win.webContents.send as ReturnType<typeof vi.fn>).mock.calls
.filter(([channel, payload]) =>
channel === 'broker:event' &&
(payload as { kind?: string }).kind === 'cloud_workspace_key_mismatch'
)
expect(mismatchEvents).toHaveLength(0)

await manager.shutdown()
})

it('does not emit a cloud workspace mismatch event on keyless legacy attaches', async () => {
const manager = new BrokerManager()
const win = createMockWindow()
mock.state.nextCloudSessionMetadata.push({ workspace_key: 'rk_live_sandbox' })

await manager.attachCloudSandbox(PROJECT_ID, {
sandboxId: 'sandbox-1',
execUrl: 'https://sandbox.example'
}, win)

const mismatchEvents = (win.webContents.send as ReturnType<typeof vi.fn>).mock.calls
.filter(([channel, payload]) =>
channel === 'broker:event' &&
(payload as { kind?: string }).kind === 'cloud_workspace_key_mismatch'
)
expect(mismatchEvents).toHaveLength(0)

await manager.shutdown()
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add rejected/rate-limited start-failure regressions.

These additions cover the happy path for workspace-key wiring, but the new classifyWorkspaceJoinFailure() branches in BrokerManager.start() still have no coverage. A small change in the relay error text or the renderer status mapping would silently collapse back to a generic start error. Please add one "was rejected" case and one "was rate-limited" case that assert the broker:status payload sent to the window.

As per coding guidelines, "Add regression tests when touching broker start, event streaming, PTY buffering, spawned personas, or integration notifications. Include duplicate/replay cases, not just the happy path."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/broker.test.ts` around lines 384 - 491, The tests in broker.test.ts
cover workspace-key happy paths but miss validating the new
classifyWorkspaceJoinFailure() branches in BrokerManager.start(): add two unit
tests that simulate relay errors classified as "was rejected" and "was
rate-limited" and assert the window receives a 'broker:status' payload with the
expected status/message; specifically, create tests that instantiate
BrokerManager, create a mock window (createMockWindow), arrange the mocked
relay/driver to produce the error text that classifyWorkspaceJoinFailure()
matches, call manager.start(...) (or the code path that triggers workspace join
failure), and assert win.webContents.send was called with 'broker:status' and a
payload indicating the rejection or rate-limit condition; ensure tests clean up
by calling manager.shutdown() and mirror the style of the surrounding tests (use
PROJECT_ID, BrokerManager.start, and expect(win.webContents.send) assertions).

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant