Skip to content

Integration-event steer loss: optimistic dedup claim suppresses re-drive of lost steers (relay#1056 §5) #147

@kjgbot

Description

@kjgbot

Summary

Integration-event injection can permanently drop a message when the steer→agent (PTY) hop loses it after the bridge has already claimed the logical-dedup key. Every subsequent re-delivery of the same logical message is then dedup-suppressed for the full SLACK_RECORD_REPLAY_TTL_MS (~1h), so the agent never receives it and no delivery-failure is surfaced. This has dropped real human (Khaliq/Will) Slack messages.

This is the "consistency bolt" described in relay#1056 §5 (delivery/read ack + re-drive).

Confirmed trace (slack-comms, 2026-06-07)

Dropped message ts 1780847052 ("read #1056 and tell me your thoughts"):

15:44:29  received
15:44:40  injecting   recipients=['slack-comms']   <-- claims the logical dedup key
15:44:47+ skipped duplicate path   (×9)            <-- every re-delivery suppressed

The bridge logged injecting (claimed the key) but the steer never reached the agent's conversation. No delivery-failure was logged. Contrast: the immediately-following answers message injected fine — so this is intermittent, tied to the agent PTY being busy/unreachable at inject time.

Why this is NOT the content-collapse path (and why #145 won't fix it)

  • Failure mode (A): on main (pre-Slack integration event fixes + #125 workspace-key wiring #132) a targeted read returns no content → contentHash undefined → claimSlackLogicalInjection() returns false at the if (!contentHash ...) return false guard → collapses to once-per-key. This is what pear#145 fixes.
  • Failure mode (B) — THIS issue: content is available, injecting is logged (key claimed), but the steer is lost on the agent hop. A later identical re-delivery hits existing.contentHashes.has(contentHash) → return false and is suppressed. DO-NOT-MERGE: Split #132 Slack integration event fix #145 does not help: with content defined, an unchanged re-delivery of a lost steer is still indistinguishable from a true duplicate without a delivery signal.

Fix direction

Make the dedup claim contingent on delivery, not on the inject attempt:

  • Commit the logical/contentHash claim only once the steer is confirmed delivered to the agent (pear already has a broker delivery-confirmation signal — it was deliberately not awaited for system/guidance messages; reuse it here for integration-event injects).
  • If delivery is not confirmed within a bounded window, release the provisional claim so a re-delivery re-drives, instead of suppressing for the full replay TTL.
  • Align ack semantics with relay#1056 §5 / Will's answers: support both delivered and read acks, harness-driver-owned.

Spans pear (integration-event-bridge.ts claim/inject logic) + the broker/harness-driver delivery-confirmation signal (relay side).

Interim mitigation (active)

slack-comms treats ~/.agentworkforce/pear/integration-events.log as source of truth and OOB-reads any received event lacking a matching delivered injecting. This is load-bearing until the re-drive fix lands; it is manual and must not be the permanent answer.

Notes

  • Sibling/parent: relay#1056 (Fleet Delivery RFC), §5 consistency bolt.
  • File touched by the in-flight pear#132 split (pear#145) — stack the fix on the canonical bridge state to avoid churn.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions