Skip to content

feat: Emit IntentDisruptedEvent on batch failure to misbehaving clients#1049

Open
Dunsin-cyber wants to merge 1 commit into
arkade-os:masterfrom
Dunsin-cyber:feat/intent-disrupted-event
Open

feat: Emit IntentDisruptedEvent on batch failure to misbehaving clients#1049
Dunsin-cyber wants to merge 1 commit into
arkade-os:masterfrom
Dunsin-cyber:feat/intent-disrupted-event

Conversation

@Dunsin-cyber
Copy link
Copy Markdown
Contributor

@Dunsin-cyber Dunsin-cyber commented Apr 25, 2026

fixes #899

  • Adds a new IntentDisruptedEvent to the batch event stream, emitted only to the participant whose inputs caused a batch failure (missed nonces, invalid signatures, missing/invalid forfeit txs, invalid boarding
    input signatures)
  • Routes the event via the misbehaving participant's VTXO outpoint topics so innocent bystanders never receive it
  • Adds OnIntentDisrupted to the BatchEventsHandler interface in client-lib; JoinBatchSession always exits after this event
  • Adds GetEventStreamTopics helper to client-lib; used by tests to build per-client topic subscriptions.

Summary by CodeRabbit

  • New Features

    • Introduced intent disruption notifications that inform users when their intents are disrupted, including the round ID and disruption reason (e.g., missing cosigner submission or forfeit signature).
  • Tests

    • Added end-to-end tests to verify intent disruption events are correctly emitted and targeted to affected users.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Walkthrough

This PR introduces a new IntentDisruptedEvent that is emitted when an intent causes a batch to fail, allowing clients to decide whether to queue a new intent or wait for re-selection. The event includes the intent ID, round ID, and disruption reason, and is published through the batch event stream using topic-based filtering.

Changes

Cohort / File(s) Summary
API Specifications
api-spec/openapi/swagger/ark/v1/service.openapi.json, api-spec/openapi/swagger/ark/v1/types.openapi.json, api-spec/protobuf/ark/v1/service.proto, api-spec/protobuf/ark/v1/types.proto
Define new IntentDisruptedEvent schema with id, intentId, and reason fields across OpenAPI and protobuf specs; add intentDisrupted property to GetEventStreamResponse.
Core Event & Domain
internal/core/application/service_event.go, internal/core/domain/events_repo.go
Introduce application-layer IntentDisrupted event type and corresponding EventTypeIntentDisrupted constant for event type discrimination.
Ban & Round Finalization Logic
internal/core/application/ban.go, internal/core/application/service.go
Emit IntentDisrupted events in ban workflows when cosigner inputs are banned or forfeit signature is missing; thread registeredIntents to ban goroutine for event construction.
gRPC Handler & Client Library
internal/interface/grpc/handlers/arkservice.go, pkg/client-lib/client/client.go, pkg/client-lib/client/grpc/types.go, pkg/client-lib/batch_session_handler.go
Map application event to gRPC response; add IntentDisruptedEvent client type; introduce OnIntentDisrupted callback in BatchEventsHandler and terminate batch session on disruption.
E2E Testing
internal/test/e2e/delegate_utils_test.go, internal/test/e2e/e2e_test.go
Add event handlers for IntentDisruptedEvent in test utilities and comprehensive E2E test scenarios verifying event emission during ban conditions and topic-based event targeting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pubsub BatchFailed #765: Both PRs introduce new application-level event types and update the gRPC event-listening handler to map those events to streamed responses.

Suggested reviewers

  • altafan
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: emitting IntentDisruptedEvent to misbehaving clients on batch failure.
Linked Issues check ✅ Passed The PR fully implements issue #899 requirements: introduces IntentDisruptedEvent, routes it to intent's topic on batch failure, enabling clients to know whether to queue new intent or wait for re-selection.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing IntentDisruptedEvent functionality across API specs, core logic, gRPC handlers, and client library as scoped by issue #899.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/core/application/service.go (1)

3160-3169: ⚠️ Potential issue | 🟡 Minor

Cache failure silently disables IntentDisrupted routing.

If GetSelectedIntents fails, registeredIntents will be nil and banForfeitCollectionTimeout will execute without intent context, which means the disrupting participant cannot be identified by their VTXO outpoint topics and won't receive IntentDisruptedEvent (the very signal this PR introduces). The round still fails correctly, but the user-facing benefit of the feature is silently lost on this code path.

Consider either:

  • Failing the round more loudly here (alert / error log instead of warn) so the operator notices, or
  • Refactoring so the ban goroutine retrieves intents itself and surfaces the lookup failure consistently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/service.go` around lines 3160 - 3169, Currently we
pass possibly-nil registeredIntents into banForfeitCollectionTimeout which
silently loses IntentDisrupted routing on GetSelectedIntents failure; change the
flow so banForfeitCollectionTimeout fetches intents itself (call
s.cache.Intents().GetSelectedIntents inside banForfeitCollectionTimeout using
the same ctx and roundId), handle and log any error there at error level
(log.WithError(err).Error(...)) and exit the goroutine if the lookup fails, and
remove the external dependency on registeredIntents from the call site (i.e.,
stop passing registeredIntents from this block); ensure any logic that emits
IntentDisruptedEvent only runs when the fetched intents are non-nil/valid.
🧹 Nitpick comments (3)
pkg/client-lib/client/client.go (1)

112-116: Consider documenting Id semantics (round id).

Id here is the round id (per the proto definition string id = 1; // round id), but the struct also has IntentId, which can confuse callers. A short doc comment (or renaming to e.g. RoundId) would make the intent unambiguous. Renaming would be a breaking change to the public client API, so a doc comment is safer.

📝 Suggested doc comment
-type IntentDisruptedEvent struct {
-	Id       string
-	IntentId string
-	Reason   string
-}
+type IntentDisruptedEvent struct {
+	// Id is the round id that the disrupted intent was associated with.
+	Id       string
+	IntentId string
+	Reason   string
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/client-lib/client/client.go` around lines 112 - 116, The field naming in
IntentDisruptedEvent is ambiguous: add a short doc comment above the
IntentDisruptedEvent type explaining that Id is the round id (per proto: "string
id = 1; // round id") and that IntentId is the intent identifier so callers
aren’t confused; reference the IntentDisruptedEvent type and its fields (Id,
IntentId) in the comment rather than renaming the field to avoid breaking the
public API.
pkg/client-lib/batch_session_handler.go (1)

183-189: Handler's nil return is silently replaced by a synthetic error.

If OnIntentDisrupted returns nil, this code overwrites it with fmt.Errorf("intent disrupted: %s", e.Reason) so JoinBatchSession always exits with a non-nil error. This is inconsistent with OnBatchFailed just above (Lines 177-182), which honors a nil return and lets the loop continue.

If the design is "always exit on IntentDisrupted, regardless of handler outcome," that's fine — but the handler contract becomes confusing because the return value is effectively ignored when nil. Two cleaner options:

Option A — exit with handler's error verbatim (let handler decide)
 case client.IntentDisruptedEvent:
   e := event.(client.IntentDisruptedEvent)
-  err := eventsHandler.OnIntentDisrupted(ctx, e)
-  if err == nil {
-    err = fmt.Errorf("intent disrupted: %s", e.Reason)
-  }
-  return "", "", -1, nil, nil, err
+  return "", "", -1, nil, nil, eventsHandler.OnIntentDisrupted(ctx, e)
Option B — keep current semantics but document on the interface

Add a comment on BatchEventsHandler.OnIntentDisrupted (Line 58) noting that the session will always terminate after this callback and that a nil return will be wrapped into a generic error. This sets implementer expectations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/client-lib/batch_session_handler.go` around lines 183 - 189, The code
currently overwrites a nil return from eventsHandler.OnIntentDisrupted by
creating a synthetic error, causing JoinBatchSession to always exit with an
error; change the logic in the IntentDisruptedEvent case (handling
client.IntentDisruptedEvent and calling eventsHandler.OnIntentDisrupted) to
return the handler's error verbatim (i.e., remove the fmt.Errorf wrapping and
the conditional that replaces nil with a new error) so a nil from
OnIntentDisrupted is honored, or alternatively if you intend to always exit
document this behavior on the BatchEventsHandler.OnIntentDisrupted interface
(but do not silently replace the handler's return value in JoinBatchSession).
internal/core/application/ban.go (1)

96-110: Per-intent dedup looks correct; note ordering inconsistency vs forfeit path.

Deduplicating notifications per intent.Id via notifiedIntents is correct — multiple cosigner mappings cannot retrigger the event for the same intent within a single call.

Minor inconsistency: here the event is emitted before Convictions().Add(...) (line 115), while in banForfeitCollectionTimeout the conviction is persisted first (line 200) and the event is emitted afterward. Not necessarily a defect — the event is informational for the offending participant — but if downstream consumers ever correlate IntentDisrupted with stored convictions, the ordering will produce different observable states between the two paths. Consider aligning the order (persist conviction first, then emit) for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/ban.go` around lines 96 - 110, The IntentDisrupted
event is emitted before the conviction is persisted here, causing an ordering
mismatch with banForfeitCollectionTimeout which calls Convictions().Add(...)
before emitting; change the sequence in this block so you call
Convictions().Add(...) (using the same conviction creation logic) and persist
the conviction first, then compute topics via intentTopics and send the
IntentDisrupted event on s.eventsCh, keeping the per-intent dedup via
notifiedIntents unchanged (refer to notifiedIntents, IntentDisrupted,
Convictions().Add, intentTopics, eventsCh and the banForfeitCollectionTimeout
flow to mirror the ordering).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/test/e2e/e2e_test.go`:
- Around line 5002-5070: Bob's handlers for JoinBatchSession should mirror
Alice's so Bob doesn't exit early: add an onBatchFailed handler (no-op that
returns nil to keep waiting) into the customBatchEventsHandler passed to
JoinBatchSession for bobStream (the same place where onIntentDisrupted sets
bobDisruptedEvent), stop swallowing bobErr (remove `_ = bobErr`) and add
assertions similar to Alice's (e.g., require.Error/require.NotNil on
bobErr/bobDisruptedEvent as appropriate) so any leaked IntentDisrupted to Bob is
detected.
- Around line 4991-5060: Bob's customBatchEventsHandler must explicitly set
onBatchFailed to return nil so the default behavior (which returns an error and
ends the session) doesn't terminate Bob's JoinBatchSession before
IntentDisrupted is handled; update the bob-side handler (the
customBatchEventsHandler passed into arksdk.JoinBatchSession with bobStream and
referenced as bobErr) to include onBatchFailed: func(ctx context.Context, event
client.BatchFailedEvent) error { return nil } just like Alice's handler.

---

Outside diff comments:
In `@internal/core/application/service.go`:
- Around line 3160-3169: Currently we pass possibly-nil registeredIntents into
banForfeitCollectionTimeout which silently loses IntentDisrupted routing on
GetSelectedIntents failure; change the flow so banForfeitCollectionTimeout
fetches intents itself (call s.cache.Intents().GetSelectedIntents inside
banForfeitCollectionTimeout using the same ctx and roundId), handle and log any
error there at error level (log.WithError(err).Error(...)) and exit the
goroutine if the lookup fails, and remove the external dependency on
registeredIntents from the call site (i.e., stop passing registeredIntents from
this block); ensure any logic that emits IntentDisruptedEvent only runs when the
fetched intents are non-nil/valid.

---

Nitpick comments:
In `@internal/core/application/ban.go`:
- Around line 96-110: The IntentDisrupted event is emitted before the conviction
is persisted here, causing an ordering mismatch with banForfeitCollectionTimeout
which calls Convictions().Add(...) before emitting; change the sequence in this
block so you call Convictions().Add(...) (using the same conviction creation
logic) and persist the conviction first, then compute topics via intentTopics
and send the IntentDisrupted event on s.eventsCh, keeping the per-intent dedup
via notifiedIntents unchanged (refer to notifiedIntents, IntentDisrupted,
Convictions().Add, intentTopics, eventsCh and the banForfeitCollectionTimeout
flow to mirror the ordering).

In `@pkg/client-lib/batch_session_handler.go`:
- Around line 183-189: The code currently overwrites a nil return from
eventsHandler.OnIntentDisrupted by creating a synthetic error, causing
JoinBatchSession to always exit with an error; change the logic in the
IntentDisruptedEvent case (handling client.IntentDisruptedEvent and calling
eventsHandler.OnIntentDisrupted) to return the handler's error verbatim (i.e.,
remove the fmt.Errorf wrapping and the conditional that replaces nil with a new
error) so a nil from OnIntentDisrupted is honored, or alternatively if you
intend to always exit document this behavior on the
BatchEventsHandler.OnIntentDisrupted interface (but do not silently replace the
handler's return value in JoinBatchSession).

In `@pkg/client-lib/client/client.go`:
- Around line 112-116: The field naming in IntentDisruptedEvent is ambiguous:
add a short doc comment above the IntentDisruptedEvent type explaining that Id
is the round id (per proto: "string id = 1; // round id") and that IntentId is
the intent identifier so callers aren’t confused; reference the
IntentDisruptedEvent type and its fields (Id, IntentId) in the comment rather
than renaming the field to avoid breaking the public API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 556a0bd4-47a6-44ff-a65e-bc43d062ef51

📥 Commits

Reviewing files that changed from the base of the PR and between 626abf9 and 6292dce.

⛔ Files ignored due to path filters (3)
  • api-spec/protobuf/gen/ark/v1/indexer.pb.rgw.go is excluded by !**/gen/**
  • api-spec/protobuf/gen/ark/v1/service.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • api-spec/protobuf/gen/ark/v1/types.pb.go is excluded by !**/*.pb.go, !**/gen/**
📒 Files selected for processing (14)
  • api-spec/openapi/swagger/ark/v1/service.openapi.json
  • api-spec/openapi/swagger/ark/v1/types.openapi.json
  • api-spec/protobuf/ark/v1/service.proto
  • api-spec/protobuf/ark/v1/types.proto
  • internal/core/application/ban.go
  • internal/core/application/service.go
  • internal/core/application/service_event.go
  • internal/core/domain/events_repo.go
  • internal/interface/grpc/handlers/arkservice.go
  • internal/test/e2e/delegate_utils_test.go
  • internal/test/e2e/e2e_test.go
  • pkg/client-lib/batch_session_handler.go
  • pkg/client-lib/client/client.go
  • pkg/client-lib/client/grpc/types.go

Comment thread internal/test/e2e/e2e_test.go
Comment thread internal/test/e2e/e2e_test.go
@tiero tiero requested a review from altafan April 30, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IntentDisruptedBatchEvent

1 participant