Skip to content

Replay Outgoing Messages When Inbox Receives Duplicate#4067

Open
iancooper wants to merge 72 commits into
masterfrom
replay_on_seen
Open

Replay Outgoing Messages When Inbox Receives Duplicate#4067
iancooper wants to merge 72 commits into
masterfrom
replay_on_seen

Conversation

@iancooper

Copy link
Copy Markdown
Member

When an Inbox receives a duplicate message, we may choose to replay any outgoing messages that it triggered when it was originally called. This makes us idempotent, in that our behavior matches that of the original call. It also allows us to recreate messages for downstream consumers who might have failed, by triggering from upstream, thus allowing us to replay a whole workflow until we reach the point where it failed.

Closes #2541

iancooper and others added 2 commits April 17, 2026 18:40
Introduces the design for CausationId tracking and OnceOnlyAction.Replay,
enabling automatic replay of downstream outbox messages when the inbox
detects a duplicate command. Accepted after 7 rounds of adversarial review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cate

Create and approve implementation task list (23 tasks) for the replay
outbox messages feature. Tasks cover structural prerequisites, core
behavior (CausationId propagation, UseInboxHandler replay logic,
pipeline validation), telemetry, DI registration, and all 18 persistent
store implementations. Minor ADR clarification for ReplayCausation
semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iancooper iancooper self-assigned this Apr 17, 2026
@iancooper iancooper added 2 - In Progress feature request .NET Pull requests that update .net code V10.X labels Apr 17, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Apr 17, 2026

Copy link
Copy Markdown

Code Review — PR #4067: Replay Outgoing Messages When Inbox Receives Duplicate

This PR contains design and specification documents only — no implementation code yet. The changes are: PROMPT.md (root), docs/adr/0057-replay-outbox-on-inbox-duplicate.md, and 12 spec files under specs/0027-replay-matching-outbox-events-when-inbox-has-already-seen/.


🔴 Critical: PROMPT.md should not be committed

PROMPT.md at the repo root is an AI session state artifact — it tracks which round of ADR review was completed, which spec phases are approved, and what "next step" the AI assistant should take. This is an internal working document for the development workflow tool, not project documentation.

Recommendation: Remove PROMPT.md from this PR and add it to .gitignore. Its content is already captured in the spec files and ADR that are correctly committed.


✅ ADR Quality

docs/adr/0057-replay-outbox-on-inbox-duplicate.md is thorough and follows the project's ADR conventions well:

  • Problem statement is clear: no mechanism to replay a failed workflow from an upstream trigger
  • Decision rationale is solid — using CausationId as a scoped, single-invocation concept (vs. CorrelationId for request-reply, vs. future WorkflowId) avoids conceptual collision
  • The SupportsCausationTracking() runtime schema check is a correct defensive choice — fail fast at startup rather than at runtime
  • The tidy-first prerequisite list (tasks 1–4) correctly separates structural refactors from behavioral changes per project conventions

Design Observations

Interface duplication on SupportsCausationTracking()

Both IAmACausationTrackingInbox and IAmACausationTrackingOutbox expose both sync and async variants:

bool SupportsCausationTracking();
Task<bool> SupportsCausationTrackingAsync(CancellationToken cancellationToken = default);

This is a schema check that typically only needs to run once at startup (or can be cached). Consider whether the async variant adds real value, or if a single sync check (cached after first call) is sufficient. Most schema-existence checks against relational databases complete in microseconds on a live connection. The async overload adds interface surface that every implementing store (all 18 of them) must provide.

ReplayCausation args parameter

void ReplayCausation(string causationId, RequestContext? requestContext,
    Dictionary<string, object>? args = null);

The args dictionary follows the existing Brighter convention for extensibility, which is fine. However, the ADR does not document what keys/values are valid. Adding a brief note about expected args (or confirming none are currently used) would help implementors.

UseInboxHandler prerequisite: switching from InitRequestContext() to this.Context

The ADR correctly flags this as a tidy-first change (Task 1), but notes it "affects the existing Throw and Warn paths as well." This is a subtle behavioral change: existing tests may have been written against the old private context. The task spec calls this out with "needs focused regression tests" — make sure the existing When_Handling_A_Command_Once_Only_With_Throw_Enabled and Warn tests still pass after the refactor, not just after the full implementation.

Describe() global inbox injection

Task 3 adds global inbox attributes to Describe() output to match Build(). The ADR notes this is a shared infrastructure change that "should be treated as a tidy-first structural change with its own focused test coverage." This is the right call — the existing pipeline validation test suite should be run as a regression gate before any behavioral tasks proceed.

Relay vs. Replay enum name

The ADR (round 4 finding #3) notes that the requirements use "e.g." so Replay is not mandated. The name Replay is clear and consistent with the feature concept — this is fine as-is.


Spec Completeness

The tasks.md (23 tasks) is very well structured with clear TDD commands, file locations, and dependency ordering. A few observations:

  • Task 4 (expose Outbox from IAmAnOutboxProducerMediator) is a public API surface change. The ADR should explicitly document this interface change as a breaking/additive decision, not just as an implementation prerequisite.
  • Tasks 19–22 (all 18 persistent stores) are large in scope. The task list correctly gates these behind base test classes (Task 18), but there should be an explicit note about whether the persistent store migration SQL is expected in this PR or arrives separately. The ADR says "a separate PR provides migration tooling" — the tasks.md should reference that dependency.
  • Task 16 (add telemetry to existing Throw/Warn/Add paths) is labeled a "tidy improvement" but is included in the same tasks.md. Consider whether this should be a separate spec/PR to keep scope clear.

Summary

Area Status
PROMPT.md in repo root ❌ Should be removed
ADR quality and completeness ✅ Well-written
Interface design ⚠️ SupportsCausationTracking async variant worth revisiting
Tidy-first discipline ✅ Correctly scoped
Task list TDD discipline ✅ Follows project /test-first conventions
Scope creep (Task 16) ⚠️ Consider separating telemetry tidy from feature work
Persistent store migration dependency ⚠️ Needs explicit cross-PR reference in tasks.md

The core design is sound. The main ask is removing PROMPT.md before merge; the interface and scope observations are lower priority suggestions.

iancooper and others added 2 commits April 20, 2026 17:26
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Apr 20, 2026

Copy link
Copy Markdown

Code Review — PR #4067: Replay Outgoing Messages When Inbox Receives Duplicate

Overview

This PR is a specification and design phase PR only — it contains requirements, ADR 0057, and task definitions for the Replay-on-Inbox-Duplicate feature. No production code has been written yet (the PROMPT.md explicitly states "Implementation — not started"). The 1606 additions are almost entirely documentation and spec artefacts.


Issues

🔴 PROMPT.md Should Not Be Merged

PROMPT.md is a Claude Code session-resume artefact containing internal workflow state (task review history, implementation notes). It is simultaneously committed and gitignored in this PR, which means it will persist in the repository's git history even after being ignored for future commits.

Recommendation: Remove PROMPT.md from the branch before merging. The .gitignore entry is the right call, but the file itself should be deleted so it does not land in master's history.

🟡 README.md for Spec 0027 Shows Tasks as Incomplete

specs/0027-.../README.md shows:

- [ ] Tasks (`tasks.md`)

...but .tasks-approved is committed in the same PR, meaning tasks are approved.

Recommendation: Update the README checkbox to [x] Tasks.

🟡 Inconsistent specs/.current-specspecs/.current_spec Rename

The PR deletes specs/.current-spec (hyphen) and updates specs/.current_spec (underscore). This is a housekeeping fix but worth a brief mention in the PR description so reviewers understand the intent.


ADR 0057 Quality

The ADR is thorough and well-reasoned. Specific strengths:

  • Non-breaking opt-in design via role interfaces (IAmACausationTrackingInbox, IAmACausationTrackingOutbox) is the right approach. Third-party implementations continue to work unchanged.
  • Pipeline validation at startup using Specification<HandlerPipelineDescription> is consistent with ADR 0053 and catches misconfiguration before any message is processed.
  • Tidy-first prerequisites are clearly separated from behavioral changes — switching UseInboxHandler to use the pipeline's this.Context before adding the Replay path is correct.
  • Race condition acknowledged (sweeper dispatches while ReplayCausation clears DispatchedAt): the decision to accept this in favour of downstream inbox deduplication is sound and correctly documented.

One design consideration worth flagging:

SupportsCausationTracking() I/O cost at startup: The ADR describes this as a "runtime schema check" — for relational stores, this almost certainly requires a round-trip to the database (e.g. querying INFORMATION_SCHEMA). Pipeline validation runs at startup, and if the check is synchronous and hits the DB, it adds latency to application startup. For async stores this is worse as there will be a sync-over-async call path. The ADR should document whether this check is cached after the first call, and whether the async variant (SupportsCausationTrackingAsync) is used during async pipeline validation.


Spec / Requirements Quality

  • Requirements are clear and well-scoped.
  • Out-of-scope items (saga orchestration, immediate-send replay, migration tooling) are explicitly listed — good.
  • AC-8 covers all persistent stores, which aligns with the ADR's 18-store list.
  • Minor typo in requirements.md line 507: "Brigter consumer" and "te re-execution" / "re never actioned" — these don't affect correctness but should be fixed before merge.

Summary

Area Status
PROMPT.md in commit history ❌ Remove before merge
specs/0027/README.md tasks checkbox ⚠️ Update to [x]
ADR 0057 design quality ✅ Solid
SupportsCausationTracking() startup I/O cost ⚠️ Document caching/async behaviour
Requirements completeness ✅ Good — minor typos only
Non-breaking / opt-in design ✅ Correct approach

The design is sound and the specification artefacts are high quality. The main blocker before merging is removing PROMPT.md from the branch history.


Review generated by Claude Code

codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Apr 20, 2026

Copy link
Copy Markdown

Code Review — PR #4067: Replay Outgoing Messages When Inbox Receives Duplicate

Note: This PR is a specification/planning PR — it contains no implementation code. All changes are documentation, ADR, and spec artifacts. The review below is therefore focused on the design quality and planning artifacts.


What's in this PR

  • .gitignore updated to exclude PROMPT.md
  • PROMPT.md — a Claude Code session resume file (accidentally committed)
  • docs/adr/0057-replay-outbox-on-inbox-duplicate.md — comprehensive ADR (624 lines)
  • specs/0027-replay-matching-outbox-events-when-inbox-has-already-seen/ — full spec: requirements, tasks (23 tasks), design review, approval markers
  • Review/approval artifacts for specs 0022 and 0026
  • .current_spec pointer updated; duplicate .current-spec (hyphen) file deleted

Issues

PROMPT.md should not be committed

The .gitignore update adds PROMPT.md, which is correct going forward, but the file itself is now tracked and is part of this PR's diff. It contains Claude Code internal session state (task history, round-by-round review notes, implementation roadmap). This should be removed from git tracking:

git rm --cached PROMPT.md

Without this, the session file becomes permanent history and will appear in everyone's working trees on checkout.


ADR Quality — docs/adr/0057-replay-outbox-on-inbox-duplicate.md

The ADR is well-structured and thorough. The following points are worth addressing before or during implementation:

1. DescribePipelines() also needs InboxConfiguration

The ADR correctly identifies that ValidatePipelines() creates PipelineBuilder without InboxConfiguration. However, DescribePipelines() (line 89 in BrighterPipelineValidationExtensions.cs) has the same gap. The review-design.md notes this at score 35. Since DescribePipelines() is a diagnostic tool used by operators to inspect pipeline state, giving it an incomplete view of the pipeline (missing global inbox attributes) makes its output misleading. Worth fixing as part of Task 3.

2. SupportsCausationTracking() synchronous schema check at startup

SupportsCausationTracking() is described as a "permanent runtime schema check." For persistent stores this likely means a DB query (e.g. checking if the CausationId column exists). When called during ValidatePipelines() at startup, this blocks application startup for every registered pipeline. For stores with slow connections (cloud-hosted DBs, cold containers), this could meaningfully extend startup time. The ADR should document the expected performance characteristics of this check, or note that implementations should cache the result.

3. Race condition framing could be clearer

The ADR describes the race condition — sweeper dispatches while ReplayCausation is clearing — and correctly notes it's acceptable because downstream inboxes handle duplicates. However, the framing implies only a single delivery. In practice, a message could be delivered twice: once by the in-flight sweep, and once after the dispatch state is cleared. The mitigation should explicitly say "downstream consumers with OnceOnly inboxes handle the duplicate." Without inboxes downstream, this becomes a real correctness concern. Worth adding a note about the requirement for downstream idempotency.

4. CausationId defaults to request.Id

Using the command's Id as the default CausationId is convenient, but the implications should be documented: if request.Id is not truly unique across handler invocations (e.g. deterministic IDs in tests or retried commands with the same ID), two different handler invocations could share the same CausationId, causing one invocation's replay to clear another's outbox messages. The ADR should note this assumption.


Requirements (requirements.md)

Minor: two typos in the Problem Statement — "Brigter consumer" → "Brighter consumer" and "those that we re never actioned" → "were never actioned". Small but this document is the canonical source of truth for the feature.


Tasks (tasks.md)

The task structure is clean with clear TDD enforcement. A few observations:

Task 3 — DescribePipelines() not in scope

As noted above, DescribePipelines() has the same InboxConfiguration gap as ValidatePipelines(). Consider adding it to Task 3 to fix both paths together.

Task 14 — 7 validation scenarios in one test file

The review-tasks.md (score 50) flags that Task 14 packs 7 scenarios into one test file and recommends [Theory]/[InlineData]. This is the right call — enforce it explicitly in the task description so the implementing agent follows it, rather than leaving it as a "recommendation."

Task 23 — "run all core tests" is underspecified

Task 23 says "Build + run all core tests" without listing which projects constitute "core." Given the breadth of this change (UseInboxHandler, PipelineBuilder, validation, InMemoryInbox/Outbox), consider enumerating the minimum test suite: Paramore.Brighter.Core.Tests, the OnceOnly suite, and the Validation suite at minimum.

Dependency on schema migration PR

The ADR notes a separate schema migration PR must land on master before this feature. This external dependency is not tracked in the task list. Suggest adding a prerequisite note or a blocking PR link so the implementation isn't merged ahead of its schema dependency.


Summary

Area Status
PROMPT.md in repo Must fix — remove from git tracking
ADR structure & completeness Good — minor gaps noted
DescribePipelines() gap Low priority — fix in Task 3
SupportsCausationTracking() startup cost Needs documentation
Race condition clarity Needs a note on downstream idempotency requirement
Requirements typos Trivial — worth fixing
Task 14 scenario structure Clarify [Theory] expectation in task
Task 23 test scope Enumerate specific suites
Schema migration PR dependency Track explicitly

The design itself is sound — the role interface approach (IAmACausationTrackingInbox/Outbox) is non-breaking and consistent with Brighter's existing patterns. The decision to use the outbox sweeper for re-dispatch rather than building a new dispatch path is the right call. The CausationId concept is well-scoped and distinct from CorrelationId/JobId.

# Conflicts:
#	.gitignore
#	specs/.current-spec
#	specs/.current_spec
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review — PR #4067: Replay Outgoing Messages When Inbox Receives Duplicate

Thanks for the very thorough design work — ADR 0057 is well structured (clear Forces, Alternatives Considered, Consequences, Risks table), the role-interface approach (IAmACausationTrackingInbox/IAmACausationTrackingOutbox) is consistent with Brighter's IAmA* conventions, and the opt-in/non-breaking framing is sound. The task breakdown follows the mandated TDD-first workflow and maps cleanly back to the FRs.

Most of my feedback is about scope/hygiene of this PR, since it ships only design+spec docs — no production code or tests yet.

🔴 Blocking / should-resolve-before-merge

  1. Closes #2541 with no implementation. This PR contains the ADR, requirements, and task list but none of the 23 implementation tasks. Merging as-is will auto-close [Feature] Force Replay of Outbox on seeing duplicate Processed Inbox Message #2541 even though the feature isn't built. Suggest either dropping Closes (use "Part of [Feature] Force Replay of Outbox on seeing duplicate Processed Inbox Message #2541" / "Design for [Feature] Force Replay of Outbox on seeing duplicate Processed Inbox Message #2541") or keeping the issue open until the implementation PR lands.

  2. PROMPT.md is committed and added to .gitignore. Patch 1 adds PROMPT.md (tracked), patch 3 adds it to .gitignore. Adding to .gitignore does not untrack an already-tracked file, so it stays in the repo. Per CLAUDE.md, PROMPT.md is intended as temporary cross-conversation state — it carries session-resume notes, review-round history, and a stale status block. Recommend git rm --cached PROMPT.md (and commit) if the intent is to ignore it, or drop it from the PR entirely. As-is the two changes contradict each other.

  3. Unrelated spec state bleeding into this PR. The diff touches files belonging to other specs:

    • specs/0022-Defer-Message-Action-Backstop-Handler/review-tasks.md (new)
    • specs/0026-timed-outbox-archiver-sync-fallback/review-tasks.md and .tasks-approved (new)
    • flips to both specs/.current-spec and specs/.current_spec

    CLAUDE.md's "Change Scope" rule says not to make changes beyond what was requested. These look like workflow scratch state for unrelated specs and should be split out so this PR is just spec 0027.

  4. Two divergent "current spec" files: .current-spec (hyphen) and .current_spec (underscore). The PR edits the hyphen one, then deletes it, and edits the underscore one. This looks like a typo'd duplicate. Worth settling on a single canonical filename so the spec tooling doesn't read stale state.

🟡 Design observations (for the implementation PR)

These don't block a docs PR but are worth tracking against the tasks:

  • ADR InboxItem example uses primary-constructor syntax (ADR ~line 339) while the real src/Paramore.Brighter/InMemoryInbox.cs uses a traditional constructor. review-design.md already flagged this (score 45) but the ADR text wasn't updated — an implementer following the snippet could needlessly refactor the class. Add an "illustrative only" note or align it.
  • Replay no-op when tracking unsupported at runtime. The Consequences section acknowledges Replay silently does nothing if inbox/outbox don't support causation tracking and pipeline validation is the safety net — but several validation branches emit Warnings, not Errors (no outbox, schema not yet migrated). Confirm a Warning is loud enough that a misconfigured Replay is actually noticed, given the runtime behavior is a silent skip.
  • Double-dispatch race (sweeper dispatching while ReplayCausation clears state) is accepted as "downstream inbox dedup handles it." Reasonable, but worth an explicit test asserting idempotency expectations so it's a documented contract, not an assumption.
  • review-tasks.md finding Support management and Monitoring of ServiceActivator instances #6 notes UseInboxHandlerAsync calls base.InitializeFromAttributeParams() twice. Good to fold that fix into Task 1's tidy-first pass as noted.

🟢 Nits

  • requirements.md Problem Statement has several typos: "Brigter consumer", "te re-execution", "those that we re never actioned", "enabled instead.." (double period), "This a terminal step". Minor, but these are the authoritative requirement statements.
  • Terminology mixes TimeFlushed (in-memory) vs DispatchedAt/Dispatched (persistent) — partially reconciled in patch 2; consider a one-line glossary so implementers don't conflate them.

Test coverage

N/A for this PR (docs only). The task list correctly front-loads in-memory /test-first tasks before persistent stores and routes outbox store tests through the Liquid generator — good adherence to the mandatory TDD workflow.


Summary: The design is strong and review-ready. Before merge I'd (1) fix Closes #2541 so the issue isn't prematurely closed, (2) resolve the PROMPT.md tracked-vs-ignored contradiction, and (3) split out the unrelated 0022/0026 spec files and the duplicate .current_spec/.current-spec so this PR is scoped to spec 0027 only.

🤖 Generated with Claude Code

codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

PR Review — #4067 Replay Outgoing Messages When Inbox Receives Duplicate

Reviewed against the repo's CLAUDE.md and spec workflow conventions. Overall: the design work here is excellent — but this PR is design/spec only and ships no implementation, and it carries a few hygiene/scope problems that should be fixed before merge.

Scope of the change

This PR touches 0 files under src/ or tests/. It adds:

  • docs/adr/0057-replay-outbox-on-inbox-duplicate.md (the design)
  • specs/0027-.../ requirements, tasks, review docs, and approval markers
  • PROMPT.md, a .gitignore line, and updates to spec marker files
  • review-task files for unrelated specs 0022 and 0026

That's consistent with the spec workflow (Requirements → ADR → Tasks, all approved), but the PR is framed as the feature itself. See the first blocking point.


🔴 Blocking / should-fix

  1. Closes #2541 will auto-close the issue, but the feature isn't built. This PR delivers the spec/design only (Implementation is [ ] not started in PROMPT.md and tasks.md). Merging it will close [Feature] Force Replay of Outbox on seeing duplicate Processed Inbox Message #2541 even though no behavior exists yet. Either drop Closes (use Refs #2541), or hold the merge until the 23 implementation tasks land.

  2. PROMPT.md is committed and listed in .gitignore (line 346). These contradict each other — the commit "chore: add PROMPT.md to .gitignore" signals intent not to track it, yet it's in the PR. Per CLAUDE.md, PROMPT.md is local, conversation-spanning working state, not a repo artifact. Run git rm --cached PROMPT.md and drop it from the PR (the .gitignore entry can stay).

  3. Unrelated spec files included — scope creep. specs/0022-Defer-Message-Action-Backstop-Handler/review-tasks.md, specs/0026-timed-outbox-archiver-sync-fallback/review-tasks.md, and specs/0026-.../.tasks-approved belong to other features. CLAUDE.mdChange Scope: "Do NOT change … beyond what was explicitly requested." Please split these out.

  4. Two competing marker files: specs/.current-spec (hyphen) and specs/.current_spec (underscore). This PR empties one and updates the other (the merge history shows a conflict between them). Having both is confusing and will keep colliding. Consolidate on a single convention and delete the other.


🟡 Minor

  1. Typos in requirements.md (Problem Statement, line 9; Proposed Solution, line 23): "Brigter", "te re-execution", "we re never actioned", "instead..". Worth a cleanup since this is the authoritative requirements doc.

  2. Two low-scored items from your own review-design.md remain open and are worth carrying into implementation: the InboxItem example uses primary-constructor syntax while the real class doesn't (don't let an implementer refactor it), and DescribePipelines() should also receive InboxConfiguration for consistency with ValidatePipelines().


Design assessment (ADR 0057)

The ADR is genuinely strong — 7 review rounds show. I spot-checked its claims against the current code and they hold up:

  • OnceOnlyAction is a 2-value enum (Throw, Warn) — adding Replay is additive ✅
  • UseInboxHandler does use a private InitRequestContext() today (lines 81/111) — the tidy-first switch to this.Context is a real and correct prerequisite ✅
  • IAmAnOutboxProducerMediator exists and the Outbox property addition is sound ✅

The opt-in / non-breaking strategy, role interfaces (IAmACausationTracking*), reuse of the existing sweeper, and the startup pipeline validation are all well-judged. Alternatives and risks (double-dispatch under sweeper race, inbox-Add-after-outbox-write failure) are addressed honestly.

Two things to watch during implementation:

  • Instance identity between validation and runtime. Validation reads the outbox via IAmAnOutboxProducerMediator.Outbox, while UseInboxHandler receives IAmACausationTrackingOutbox? via constructor DI. These are two resolution paths — make sure they resolve the same instance, or SupportsCausationTracking() could pass at startup against one object while replay runs against another. A test asserting instance identity would be cheap insurance.

  • Cross-PR sequencing. The ADR depends on a separate schema-migration-tooling PR being "merged into master before this feature." That ordering constraint is invisible from this PR; calling it out in the PR description (with the dependent PR number) would prevent a premature merge.


Summary: The design is approved-quality and the task breakdown is thorough. Before merge, please address the four blocking items — most importantly, decide whether this is the spec-only PR (then don't Closes #2541) or wait for implementation, and remove PROMPT.md + the unrelated 0022/0026 files.

🤖 Generated with Claude Code

iancooper and others added 5 commits June 23, 2026 12:50
…relational store tasks

Fold the inbox/outbox CausationId schema change into this spec via
BoxProvisioning instead of a deferred separate PR:

- requirements.md: schema evolution is in-scope via BoxProvisioning
  (data backfill stays out of scope); AC9 + Performance NFR now assert
  the column, the outbox CausationId index, and SupportsCausationTracking().
- ADR 0057: new "Schema Evolution via BoxProvisioning" section covering the
  three store classes (catalog-based relational, Spanner provisioner, NoSQL
  schemaless); per-backend catalog versions (outbox V8; inbox V3 for
  MsSql/MySql/Sqlite, V2 for the V1-only Postgres inbox); Spanner
  VLatestOutbox/VLatestInbox constant bumps + cross-backend drift test;
  Spanner SupportsCausationTracking() as a live column probe.
- tasks.md: split Task 19 -> 19a (atomic structural schema) + 19b-19f
  (per-backend IAmACausationTrackingInbox, test-first); Task 21 -> 21a
  (atomic structural schema + index) + 21b (Liquid generator) + 21c-21g
  (per-backend IAmACausationTrackingOutbox, test-first). The schema layer is
  atomic because SpannerVLatestDriftAgainstRelationalCatalogTests pins
  VLatest* to every relational catalog count.

Includes the 2026-06-17 focused re-review findings
(review-{requirements,design,tasks}.md); all threshold findings addressed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…strumentationOptions

Task 1 (structural/tidy-first):
- Part A: UseInboxHandler/UseInboxHandlerAsync use the pipeline's shared
  IRequestContext (this.Context) instead of a private InitRequestContext(),
  so RequestContext.Bag is shared across the pipeline (prereq for CausationId).
- Part B: expose instrumentationOptions as a protected InstrumentationOptions
  property on RequestHandler<T>/RequestHandlerAsync<T> so derived handlers can
  gate telemetry on InstrumentationOptions.Brighter.
- Tidy: remove duplicate base.InitializeFromAttributeParams() call in async handler.

Behaviour-preserving: OnceOnly tests 10/10 green (net9.0 + net10.0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task 2 (structural/tidy-first): non-positional RequestHandlerAttribute? Attribute
init-property on PipelineStepDescription, populated at both Describe() construction
sites, so validation rules can inspect attribute properties (e.g. OnceOnlyAction).

Behaviour-preserving: Validation tests 90/90 green (net9.0 + net10.0).
Also ticks Task 1 in tasks.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arity with Build()

Task 3 (structural/tidy-first):
- PipelineBuilder.Describe() now injects the global inbox UseInbox(Async)Attribute
  for handlers, using the same guards as Build()'s AddGlobalInboxAttributes
  (HasNoInboxAttributesInPipeline / HasExistingUseInboxAttributesInPipeline) via a
  reflection-only TryCreateGlobalInboxAttribute helper, so descriptions don't drift.
- BrighterPipelineValidationExtensions resolves InboxConfiguration from
  IAmConsumerOptions and passes it to the validate/describe PipelineBuilder
  (null when no consumer options, matching the runtime).

Behaviour-preserving: Validation + Pipeline tests 131/131 green (net9.0 + net10.0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task 4 (structural/tidy-first): add read-only IAmAnOutbox? Outbox to the mediator
interface and implement it in OutboxProducerMediator as the sync or async outbox,
so pipeline validation can inspect the outbox (e.g. for causation-tracking support).

Behaviour-preserving: core + core-tests build clean (net9.0 + net10.0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts:
#	src/Paramore.Brighter/OutboxProducerMediator.cs
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

Code Review — PR #4067: Replay outbox messages on inbox duplicate detection

Reviewed the core library changes (the bulk of the 183 files are generated tests and per-backend query/migration additions that follow a single consistent pattern). This is a well-structured, spec-driven change (ADR 0057, role interfaces, opt-in per backend) with genuinely excellent test coverage. Feedback below, ordered by importance.

🟠 Potential behavioral change — Context as RequestContext

In both UseInboxHandler.Handle and UseInboxHandlerAsync.HandleAsync, the old InitRequestContext() (which always returned a fresh, non-null RequestContext with Span = Activity.Current) was replaced with:

var requestContext = Context as RequestContext;
if (requestContext is not null && !requestContext.Bag.ContainsKey(RequestContextBagNames.CausationId))
    requestContext.Bag[RequestContextBagNames.CausationId] = request.Id.Value;

This is the right move for the feature — the causation id must live on the shared pipeline context so it flows to the outbox Add. But Context is typed IRequestContext? and is pluggable via a custom IAmARequestContextFactory. If a user supplies a context that is not a concrete RequestContext, the as cast yields null, and then:

  • the causation id is silently never set (Replay becomes a no-op), and
  • the existing Throw / Warn / Add paths now pass null to _inbox.Add(...) instead of a context carrying Activity.Current — a change from prior behavior for all OnceOnlyAction values, not just Replay.

Worth either guarding explicitly (fall back to a real context / Activity.Current when the cast fails) or documenting that causation tracking requires the default RequestContext type. At minimum, confirm a test covers a non-RequestContext context.

🟡 Inconsistent DI registration of IAmACausationTrackingOutbox

The two AddProducers overloads register the role interface differently (ServiceCollectionExtensions.cs):

// overload A — conditional, only when supported
if (outbox is IAmACausationTrackingOutbox)
    services.Add(new ServiceDescriptor(typeof(IAmACausationTrackingOutbox), _ => outbox, Singleton));

// overload B — always registered, factory may return null
services.Add(new ServiceDescriptor(typeof(IAmACausationTrackingOutbox),
    sp => (sp.GetRequiredService<IAmAnOutbox>() as IAmACausationTrackingOutbox)!, Singleton));

In overload B the ! null-forgiving operator is misleading: at runtime the factory returns null for an outbox that doesn't implement the role, so a GetRequiredService<IAmACausationTrackingOutbox>() would hand back null rather than throw. It works for the handler (optional ctor arg via GetService), but the two paths should behave the same — prefer the conditional form in both for consistency and to avoid a surprising null from GetRequiredService.

🟡 Memoized schema probe is never invalidated (already documented)

_causationColumnExists on RelationDatabaseOutbox / RelationalDatabaseInbox is probed once and cached for the lifetime of the singleton. The comment is clear that a mid-process migration requires a restart — that's a reasonable trade-off, just make sure it lands in the release notes / upgrade docs so operators who migrate a live store know a restart is needed before Replay works.

🟢 Things that are correct and well done

  • SQL is fully parameterized across all backends; the CausationColumnExists probe interpolates only the {0} table name (consistent with existing GenerateSqlText usage), no injection surface.
  • Null causation → DBNull.Value is handled by the per-backend CreateSqlParameter overloads, so an un-set causation on a migrated table inserts cleanly.
  • Replay semantics are correct: relational ReplayCausationCommand sets Dispatched = NULL and InMemoryOutbox sets TimeFlushed = DateTimeOffset.MinValue — both are exactly what the outstanding/sweeper queries key on, so messages are correctly re-queued.
  • Bulk insert column ordering appends CausationId last in both the generated row tuples and BulkAddCausationCommand — they match.
  • Opt-in role interfaces (IAmACausationTracking{Inbox,Outbox}, IRelationalDatabase*CausationQueries) keep backends that haven't migrated fully backward compatible, and the runtime column probe degrades gracefully.
  • Startup validation (ReplayRequiresCausationTracking) gives clear Error/Warning findings, and SupportsCausationTrackingSafely correctly prevents an unreachable-store probe from crashing host startup.
  • Nice incidental cleanups: removing the duplicated base.InitializeFromAttributeParams call, and capturing the thread-affine span before await in the async Add path.

🟢 Test coverage

Excellent — base tests reused across all 8+ stores, plus legacy/un-migrated paths, telemetry events, the no-outbox graceful-terminal case, causation-id-in-context-bag, and the validation rule. This is the standard the repo's TDD workflow asks for.

Minor / nits

  • WriteReplayEvent adds a CausationId tag whose value can be null via the ActivityTagsCollection initializer; confirm a null tag value is intended (the "Replay Skipped" event name already conveys the no-causation case, so the null tag is arguably redundant).
  • The diff comments are unusually detailed (good for reviewers), but a few are long enough to risk drifting from the code over time (e.g. the AC10 rationale blocks) — fine to keep, just flagging.

Overall: solid, well-tested, backward-compatible feature. The Context as RequestContext edge case is the one item I'd want addressed or explicitly confirmed before merge.

iancooper and others added 2 commits June 24, 2026 08:23
…guard non-RequestContext (PR #4067 review)

MongoDb and Firestore outbox bulk Add overloads dropped the CausationId,
so messages deposited via DepositPost(IEnumerable<>) landed with a null
causation and OnceOnlyAction.Replay silently resent nothing for them. Read
the causation id once and set it on every message, mirroring the single-
message paths. Covered by the existing CausationTrackingOutboxBaseTests bulk
cases the generated Mongo/Firestore tests inherit.

UseInboxHandler(Async) replaced InitRequestContext() with `Context as
RequestContext`, which is null when a custom IAmARequestContextFactory yields
a non-RequestContext — silently changing the Throw/Warn/Add inbox paths to
pass null instead of a context carrying Activity.Current. Fall back to a fresh
RequestContext { Span = Activity.Current } so those paths keep their pre-
feature behaviour, while the normal path still uses the shared context so the
causation id flows to the outbox.

Also align the sync Warn duplicate log level with the async handler
(Warning, was Debug).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…red DI registration (PR #4067 review)

Document the deliberate non-synchronization of the memoized causation-probe
fields (_causationColumnExists ×2, _causationIndexExists ×2): volatile is not
applicable to a nullable value type, and a stale-null read only triggers one
extra idempotent probe that resolves to the same value.

Document why the deferred AddProducers overload cannot register
IAmACausationTrackingOutbox conditionally (the outbox type is unknown at
registration time), so its factory returns null for a non-tracking outbox and
GetRequiredService returns null rather than throwing — consumers use optional
GetService. Comment-only; no behavioural change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Jun 24, 2026

Copy link
Copy Markdown

Code Review: Replay Outgoing Messages When Inbox Receives Duplicate (#4067)

Reviewed the core implementation against ADR 0057. This is a large but well-structured PR — the design is documented thoroughly, follows Brighter's existing conventions (role interfaces IAmA*, the collapsed Specification validation pattern, BoxProvisioning migrations), and is genuinely opt-in and backward compatible. Nice work overall. A few findings below, grouped by severity.

✅ What's solid

  • Write-path gate is correct. The relational stores gate the AddCausationCommand vs AddCommand choice on a memoized bool? (_causationColumnExists), populated by a one-shot lazy probe shared across the sync/async paths — not a probe per deposit. ReplayCausation correctly issues UPDATE … SET Dispatched = NULL WHERE CausationId = @CausationId, and all runtime values are bound as parameters (no injection vector introduced). The documented probe race is benign.
  • DynamoDB GSI handling is correct. The outbox SupportsCausationTracking() does a memoized DescribeTable check for the Causation GSI rather than returning true unconditionally, ReplayCausation paginates over LastEvaluatedKey, and the update is a clean inverse of MarkDispatched. The inbox correctly returns true unconditionally (queries its own primary keys).
  • Validation rule is defensive. ReplayRequiresCausationTracking wraps the SupportsCausationTracking() probe in SupportsCausationTrackingSafely so an unreachable store produces a Warning rather than crashing host startup — a good touch.
  • Telemetry distinguishes `"…Duplicate Replay"` from `"…Duplicate Replay Skipped"` (when no causation id was found), which is better than the ADR sketch.
  • DI registration shares the same singleton outbox instance under IAmACausationTrackingOutbox and only registers it (eager overload) when the interface is implemented.

🟡 Test coverage gap (worth addressing before merge)

The whole feature hinges on the causation id flowing inbox Add → handler → outbox Add within a single Send, so that the downstream outbox messages carry the inbound command's id as their causation id. But every replay test pre-seeds both stores with matching causation ids (When_handling_duplicate_command_with_replay_should_clear_outbox_dispatch etc.), and When_handling_new_command_should_set_causation_id_in_context_bag exercises a Send with no outbox and a handler that doesn't deposit.

So there is no test proving the end-to-end link: a real Send through a handler that calls DepositPost, asserting the resulting outbox message and the inbox entry share the same causation id. This matters because propagation depends on the handler threading the same RequestContext into DepositPost — a handler that calls DepositPost(msg) without the pipeline context gets a fresh context with no causation id, and a later Replay finds nothing to resend (silent no-op).

Suggestions:

  1. Add an integration-style test through CommandProcessor.Send with an inbox + outbox + a handler that deposits, asserting the shared causation id and then a successful replay.
  2. Document (in the ADR/handler guidance) that handlers must pass the request context to DepositPost/Post for Replay to work, since this is an easy footgun.

🟡 Scope creep in .gitignore

The .gitignore diff removes two unrelated entries:

  • PROMPT.md (described in CLAUDE.md as Claude Code session state)
  • *.db-journal

Neither relates to causation tracking. Per CLAUDE.md's "Change Scope" rule ("Do NOT … make changes beyond what was explicitly requested"), these look accidental. Un-ignoring PROMPT.md in particular risks committing local session state. Recommend reverting unless intentional.

🟢 Minor / nits

  • specs/.current-spec and specs/.current_spec (hyphen vs underscore) both exist and are both modified — looks like a stray duplicate tracking file; worth removing one.
  • Deferred AddProducers overload always registers IAmACausationTrackingOutbox with a factory that as-casts and may yield null. Fine for the current optional-ctor consumer, but a future GetRequiredService<IAmACausationTrackingOutbox>() would surprisingly get null instead of throwing. A comment flags it; just noting the latent footgun.
  • ResolveInboxConfiguration(sp) is called twice in ValidatePipelines() — harmless redundancy, could be hoisted to a local.
  • Worth re-confirming the full solution builds and the new Base.Test/generated causation test suites are green across backends — couldn't run the full matrix here.

Nothing blocking on correctness in the code paths I traced; the main ask is closing the end-to-end propagation test gap and trimming the unrelated .gitignore/spec-file changes.

🤖 Generated with Claude Code

…4067 review)

Closes the propagation test gap the PR #4067 review flagged: no test proved
the causation id flows inbox Add -> handler -> outbox Add within a real Send,
nor that a duplicate replays the original outgoing message end-to-end.

Routes everything over the real InternalBus with nothing injected through a
testing seam. A command is posted through the command processor, a Reactor
pump (hosted on a background thread via Performer) consumes it, the handler
records it in the inbox and forwards a downstream event back onto the bus, and
the inbox/outbox record receipt and the outgoing message naturally. Re-posting
the same command exercises the duplicate path: the handler does not re-run and
the outbox replays the original outgoing message.

- ProcessAndForwardHandler: [UseInbox(Replay)] handler that forwards a
  downstream event via Post threading the pipeline RequestContext (the
  propagation link), and signals the test thread through a .NET Channel.
- Replay re-dispatch uses ClearOutbox (the primitive Post is built on), not a
  background sweeper, so the test stays deterministic.
- The shared causation id is proven by the replay itself: ReplayCausation is
  keyed on the inbox's causation id, so the forwarded message only returns to
  outstanding if the outbox stored it under that same id.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

Code Review: Replay Outgoing Messages When Inbox Receives Duplicate (#4067)

Thanks for this — it's a large, carefully-structured change. The ADR (docs/adr/0057) is excellent: the role-interface approach (IAmACausationTrackingInbox/IAmACausationTrackingOutbox), the opt-in/non-breaking design, the write-path gate, and the runtime schema probes are all well reasoned and consistent with existing Brighter patterns. Test coverage is genuinely strong: base tests, in-memory, DI registration, telemetry, sync/async, with-and-without-outbox, and an end-to-end replay test. Below are findings from reviewing the core logic, relational stores, DI wiring, and validation.

Potential issues

1. (Medium) Adding a member to the public IRequestContext interface is a breaking change.
IRequestContext.InstrumentationOptions (src/Paramore.Brighter/IRequestContext.cs:103) is a new interface member. Any third-party or test type implementing IRequestContext will fail to compile/bind. Given the ADR frames the mandatory upgrade for V11, this is probably acceptable — but it should be called out explicitly in the release notes/breaking-changes list, since it affects more than just users adopting Replay.

2. (Low–Medium) Order-dependent double-read of the memoized causation-column flag in the relational outbox.
The Add path decides whether to emit the causation-aware INSERT via CausationColumnExists() (which lazily sets _causationColumnExists), but InitAddDbCommand then independently re-reads _causationColumnExists == true to pick the SQL. These are two separate reads of a field another thread could flip from nulltrue in between. In practice the gate read runs first and sets the field, so it's consistent today — but the correctness relies on call ordering rather than design. The bulk path already does the right thing by threading an explicit includeCausation bool through InitBulkAddDbCommand; doing the same for the single-row path would make this robust instead of order-dependent.

3. (Low) ReplayCausation reuses the mark-dispatched write/observability path, causing semantic mismatches.
In RelationDatabaseOutbox, ReplayCausation[Async] routes through WriteToStore/WriteToStoreAsync, whose catch swallows exceptions matching the unique/duplicate predicate and can log DuplicateDetectedInBatch. Replay is an UPDATE ... SET dispatched = NULL, which won't raise duplicate errors, so it's benign — but a "duplicate in batch" log line on a replay is misleading. Relatedly, the replay span is created with BoxDbOperation.MarkDispatched, even though replay un-dispatches; observability will mislabel replays. Consider a dedicated BoxDbOperation value (and/or a dedicated write helper) for replay.

4. (Low) Silent no-op when a custom IAmARequestContextFactory supplies a non-RequestContext.
In UseInboxHandler.Handle / UseInboxHandlerAsync.HandleAsync the Context as RequestContext ?? new RequestContext { … } fallback stamps the causation id into a fresh context, so it never reaches the outbox Add. The code comment documents this and it degrades safely, but the effect is that Replay becomes a runtime no-op even though startup pipeline validation passed. A one-time debug/warn log on that fallback branch would help operators diagnose "Replay configured but nothing resends."

5. (Low) PipelineBuilder.Describe() now calls handlerMethod.GetOtherHandlersInPipeline() twice (once for before, once for after steps). It's reflection on a startup-only path, so negligible, but the previous single-call-into-local was cheaper and the global-inbox attribute only needs adding to the before list. Reusing one materialized list would be tidier.

Scope concern

6. .gitignore removes PROMPT.md and *.db-journal entries — unrelated to causation/replay. Per the repo's own CLAUDE.md "Change Scope" guidance ("Do NOT … make changes beyond what was explicitly requested"), these look out of scope. Removing the PROMPT.md ignore is also risky given the Context-Management section recommends PROMPT.md for temporary session state that should stay un-committed. Please confirm this is intentional or split it out.

Notes (no action required)

  • The in-memory replay relies on the original dispatched entry still existing; compaction/expiry can remove it before replay, making replay a silent no-op. This is documented as an accepted risk in the ADR — fine.
  • The factory-overload DI registration (ServiceCollectionExtensions.cs) returns null from a singleton factory for non-tracking outboxes; the comment correctly warns to use GetService (not GetRequiredService). Worth a quick check that ServiceProviderHandlerFactory resolves the optional ctor param via GetService and never GetRequiredService for this type.
  • No SQL injection found in the new queries — the causation id is always a bound parameter, and only the table name / generated parameter names are interpolated.

Overall this is high-quality work with sound architecture and thorough tests. The items above are mostly low-severity hardening; #1 (interface break) and #6 (gitignore scope) are the two worth resolving before merge.

🤖 Generated with Claude Code

…ingle-read causation gate, replay span label)

- .gitignore: restore *.db-journal entry (kept the PROMPT.md de-dup)
- PipelineBuilder.Describe: materialize GetOtherHandlersInPipeline() once,
  filter for before/after instead of two reflection passes
- RelationDatabaseOutbox: thread an explicit includeCausation bool from the
  Add[Async] probe into InitAddDbCommand so the SQL choice and parameter list
  derive from a single read of the memoized causation-column flag (mirrors the
  bulk path) instead of an order-dependent double-read
- Observability: add BoxDbOperation.Replay ("replay.causation") and label the
  ReplayCausation spans with it across all stores (relational, DynamoDB v3/v4,
  Mongo, Firestore, InMemory) instead of mislabelling them MarkDispatched

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@iancooper

Copy link
Copy Markdown
Member Author

Thanks for the thorough review — addressed in 5f966d801.

Fixed

  • Support management and Monitoring of ServiceActivator instances #6 (.gitignore scope): restored the *.db-journal entry. The PROMPT.md change was a deliberate de-dup (it was listed twice) — it's still ignored once, so nothing regresses there.
  • Automate the build #5 (double GetOtherHandlersInPipeline()): PipelineBuilder.Describe now materializes the handler list once and filters it for before/after, instead of two reflection passes.
  • No API Documentation #2 (order-dependent memoized read): Add/AddAsync now capture includeCausation from the probe once and thread it into InitAddDbCommand(..., bool includeCausation), mirroring the bulk path. The SQL choice and the parameter list now come from a single read rather than relying on call ordering.
  • Support async versions of Send and Publish #3 (replay span mislabelled MarkDispatched): added a dedicated BoxDbOperation.Replay ("replay.causation") and applied it to every ReplayCausation span — relational (MSSQL/MySQL/Postgres/SQLite), DynamoDB v3/v4, Mongo, Firestore, and InMemory. Left the WriteToStore reuse as-is: replay is an UPDATE ... SET dispatched = NULL, so the unique/duplicate catch branch (and its DuplicateDetectedInBatch log) is unreachable in practice.

Accepted, not changed

Verification: core + affected outbox projects build clean; OnceOnly + pipeline Core tests pass 194/194 on net9.0 (no test asserts the outbox replay span op, so the relabel is safe).

codescene-delta-analysis[bot]

This comment was marked as outdated.

…rce-breaking change (PR #4067 review #1)

Adds a "Replay Outbox Messages on Inbox Duplicate" section to release_notes.md
under Master, with a source-breaking-change note for the new required
IRequestContext.InstrumentationOptions interface member — it affects any
external implementor of IRequestContext, not just Replay adopters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@iancooper

Copy link
Copy Markdown
Member Author

Follow-up on #1: added the breaking-change call-out in 604bf45a0. release_notes.md now has a "Replay Outbox Messages on Inbox Duplicate" section under ## Master with a #### Source-breaking change: IRequestContext.InstrumentationOptions note — flagging that the new required interface member affects any external IRequestContext implementor (not just Replay adopters), and that the shipped RequestContext already implements it (defaults to InstrumentationOptions.All) so normal call sites are unaffected.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude

claude Bot commented Jun 27, 2026

Copy link
Copy Markdown

Code Review: Replay Outgoing Messages When Inbox Receives Duplicate (#4067)

Reviewed the core of this large change (188 files): the causation-tracking design, UseInboxHandler/UseInboxHandlerAsync, the in-memory and relational stores, the DynamoDB outbox, DI wiring, pipeline validation, and the supporting ADR/spec/release-notes. Overall this is a very high-quality, well-documented feature — the ADR 0057 is excellent, backward compatibility is taken seriously (opt-in OnceOnlyAction.Replay, memoized write-path gate, runtime schema probes), and test coverage is broad (base tests + generated per-backend tests + validation tests + an end-to-end replay test). Nicely done.

A few observations, mostly minor, ordered by importance.

1. bool? _causationColumnExists memoization — non-atomic reads (minor concurrency)

RelationDatabaseOutbox, RelationalDatabaseInbox, and the DynamoDB outbox memoize the schema probe in an unsynchronized bool? field, with a detailed comment arguing the race is harmless. The comment's reasoning ("both probes resolve to the same value, so the cached answer never changes") addresses re-computation, but the real hazard is a torn read of the struct: Nullable<bool> is two fields (hasValue, value), and a write of _x = true is not atomic. A concurrent reader can observe {hasValue=true, value=false} mid-write. Worst case: a single deposit during a startup race skips writing CausationId even though the column exists.

Impact is low (the window is the very first probe, and after that the field is stably {true,true}), but the comment slightly overstates the safety guarantee. If you want to close it, an int state (0=unknown/1=false/2=true) with Volatile.Read/Write, or memoizing a reference type (Task<bool> / object?), gives atomic reads.

2. Causation only flows if the handler threads the pipeline Context (adoption sharp edge)

The causation id reaches the outbox only when the producing handler passes the pipeline's RequestContext into Post/DepositPost, exactly as the new ProcessAndForwardHandler test double does (_commandProcessor.Post(outgoing, Context as RequestContext)). A handler that calls _commandProcessor.Post(outgoing) without the context — the common pattern in much existing user code — will silently store a null CausationId, and Replay later becomes a silent no-op (GetCausationId returns the inbox value, but no outbox rows match).

This is inherent to how the RequestContext.Bag propagates, so it's not a bug, but it's a real footgun. Worth (a) calling it out prominently in the user-facing docs, and (b) the UseInboxHandler Duplicate Replay Skipped event already added is a good start — consider an equivalent event/log when the causation is found but the outbox replays zero messages, so operators can diagnose the missing-context case.

3. "Schema does not support causation" is a Warning, not an Error (fail-fast gap)

In ReplayRequiresCausationTracking, a missing role interface is correctly an Error, but an implemented role whose live schema probe returns false (un-migrated column/GSI) is only a Warning. So a Replay-configured pipeline against an un-migrated schema starts successfully and then throws at the first duplicate (see #4). Given that this state guarantees a runtime failure once a duplicate arrives, consider promoting it to Error so it fails fast at startup, consistent with the ADR's stated intent of "a clear startup validation finding rather than a runtime failure."

4. Relational ReplayCausation/GetCausationId don't short-circuit on the column probe

These guard only on CausationQueries is null, not on the memoized CausationColumnExists(). On an un-migrated schema (column absent) they still emit SQL referencing CausationId and throw, rather than degrading to the no-op the Add path achieves via its gate. Short-circuiting on the same memoized probe would make the un-migrated runtime path a graceful no-op and keep behavior uniform across read/write paths. (Tied to #3 — if #3 becomes a hard Error this is unreachable in practice, but defense-in-depth is cheap here.)

5. DynamoDB ReplayCausation is N round-trips

Replay queries the Causation GSI then issues one UpdateItemAsync per matching message. Fine for typical fan-outs and DynamoDB has no conditional batch-update, but for a causation with many messages it's O(n) calls. The ADR already flags indexing; just noting the per-item cost for large causations.

6. Housekeeping: duplicate spec-tracking files

specs/.current-spec and specs/.current_spec (hyphen vs underscore) both exist and are both updated with identical content. This predates the PR, but since it's touched here it'd be a good moment to consolidate to one and delete the stray, to avoid the two drifting apart.

Things I specifically liked

  • The write-path gate design (static driver capability vs. live column existence) is the right call and well explained — it preserves byte-for-byte deposit behavior against un-migrated schemas.
  • The Duplicate Replay vs Duplicate Replay Skipped telemetry distinction is a thoughtful improvement over the ADR sketch.
  • The structural prerequisites (sharing the pipeline RequestContext in UseInboxHandler, enriching PipelineStepDescription, threading InboxConfiguration into validation) are genuine, broadly useful improvements, and the IRequestContext.InstrumentationOptions source-breaking change is clearly documented in the release notes.
  • SupportsCausationTrackingSafely wrapping the probe so an unreachable store can't crash host startup is a nice robustness touch.

Nothing here is blocking; #1#4 are worth a look before merge. Great work overall.

Automated review by Claude — please verify findings against the full diff.

@claude

claude Bot commented Jun 27, 2026

Copy link
Copy Markdown

Code Review — PR 4067: Replay Outbox Messages on Inbox Duplicate Detection

Thanks for this — a large, carefully-structured change (ADR + 7 review rounds show). The CausationId role-interface design (IAmACausationTrackingInbox / IAmACausationTrackingOutbox) is a clean way to keep the feature opt-in and non-breaking, and gating the write-path on a live-schema probe so un-migrated stores keep working is exactly right. Test coverage is excellent: base tests + generated per-backend tests, end-to-end replay, the no-outbox graceful path, telemetry, and the validation-error case are all present.

Findings from a review of the core handlers/validation/interfaces plus a per-backend sweep of the persistent stores:

🔴 Bugs worth fixing

1. Spanner bulk outbox insert throws on a null CausationIdsrc/Paramore.Brighter.Outbox.Spanner/SpannerOutbox.cs:99. The CreateSqlParameter(string, object?) override only attaches the explicit SpannerDbType.String when parameterName == "@CausationId" (exact match). The bulk-insert path names its params @p{i}_CausationId (RelationDatabaseOutbox.cs:1605/1615), which never matches that condition. Spanner requires an explicit type to bind a null, so a bulk AddAsync whose schema has the column but whose RequestContext.Bag carries no causation id (e.g. a producer not fronted by [UseInbox]) creates a typeless null param and throws. The single-add path (@CausationId) is fine. Suggest matching on a suffix, e.g. parameterName.EndsWith("CausationId", StringComparison.Ordinal).

2. DynamoDB replay aborts the whole causation if one message was concurrently deletedsrc/Paramore.Brighter.Outbox.DynamoDB/DynamoDbOutbox.cs (~line 643, and .V4). Each per-item UpdateItemRequest carries ConditionExpression = "attribute_exists(MessageId)". Because GSI reads are eventually consistent, a message can appear in the Causation GSI query yet already be deleted by the time the conditional update runs, raising ConditionalCheckFailedException. That exception is not caught, so it aborts the loop and leaves the rest of the causation un-replayed. Consider catching/ignoring the conditional-check failure per item so replay is best-effort across the whole causation.

🟡 Minor

3. MongoDB sync ReplayCausation mislabels its telemetry spansrc/Paramore.Brighter.Outbox.MongoDb/MongoDbOutbox.cs:866 uses BoxDbOperation.MarkDispatched, whereas async ReplayCausationAsync (:890) correctly uses BoxDbOperation.Replay. Telemetry-only.

4. Public interface addition is a binary/source breakIRequestContext.InstrumentationOptions (src/Paramore.Brighter/IRequestContext.cs:113). Adding a member to a published interface breaks any external custom IRequestContext implementation. A default interface member would keep it non-breaking, or call it out in release notes.

5. DynamoDB replay does N sequential round-trips — one await UpdateItemAsync per message inside the page loop. Correct, but BatchWriteItem/parallelization would cut latency for large causations.

6. Replay silently swallows a duplicate when causation tracking is unavailable — in UseInboxHandler.Handle (:122) / UseInboxHandlerAsync.HandleAsync (:126), if the inbox is not an IAmACausationTrackingInbox or _outbox is null, the duplicate is dropped with no replay. Mitigated by ReplayRequiresCausationTracking reporting an Error/Warning at startup and by the "Replay Skipped" telemetry event (nice touch) — but that only protects users who actually run pipeline validation. Worth confirming validation is on by default for this path.

7. Non-atomic check-then-set on the BagUseInboxHandler.cs:92-93 / UseInboxHandlerAsync.cs:94-95 do if (!Bag.ContainsKey(...)) Bag[...] = .... Bag is a ConcurrentDictionary but the pattern is not atomic; RequestContext is per-request so practical risk is nil. GetOrAdd reads cleaner.

8. Cosmetic — double spaces in some generated causation SQL (MySqlQueries, SqliteQueries). No functional impact.

Checked and liked

  • Read/write key symmetry: outbox/inbox read RequestContextBagNames.CausationId ("Brighter-CausationId"), the exact key the handler stamps, and replay reads the same value back.
  • ReplayCausation SQL is parameterized and correctly scoped to the matching causation across every dialect; the schema probes (sys.columns, information_schema, pg_attribute, pragma_table_info, Spanner INFORMATION_SCHEMA) and the DynamoDB GSI-existence check all correctly drive SupportsCausationTracking.
  • DynamoDB replay queries the GSI (not a scan) and projects only the key attribute; Mongo/Firestore dispatched/outstanding predicates are symmetric with the replay reset.
  • DI: both AddProducers overloads register the role interface against the same outbox singleton, and MS DI honors the optional ctor parameter, so a non-tracking outbox cleanly resolves to a null injection — genuinely non-breaking.
  • Schemaless stores (Mongo/Firestore) returning a static true for SupportsCausationTracking is defensible, but those deployments never get the migrate-your-schema warning that relational/DynamoDB stores produce — intentional inconsistency, maybe worth a one-line comment.

Overall a solid, well-tested change. Items 1 and 2 are the two I would want addressed before merge; the rest are minor/telemetry/docs.

🤖 Generated with Claude Code

# Conflicts:
#	.agent_instructions/generated_tests.md
#	tools/Paramore.Brighter.Test.Generator/Generators/OutboxGenerator.cs
#	tools/Paramore.Brighter.Test.Generator/Paramore.Brighter.Test.Generator.csproj

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (5 files improve in Code Health)

Gates Failed
Prevent hotspot decline (1 hotspot with Complex Method)
Enforce critical code health rules (2 files with Bumpy Road Ahead, Deep, Nested Complexity)
Enforce advisory code health rules (12 files with Complex Method, Complex Conditional, Code Duplication, Constructor Over-Injection, Large Method)

Our agent can fix these. Install it.

Gates Passed
1 Quality Gates Passed

Reason for failure
Prevent hotspot decline Violations Code Health Impact
ServiceCollectionExtensions.cs 1 rule in this hotspot 7.62 → 7.60 Suppress
Enforce critical code health rules Violations Code Health Impact
UseInboxHandler.cs 2 critical rules 10.00 → 8.35 Suppress
UseInboxHandlerAsync.cs 2 critical rules 10.00 → 8.35 Suppress
Enforce advisory code health rules Violations Code Health Impact
UseInboxHandler.cs 2 advisory rules 10.00 → 8.35 Suppress
UseInboxHandlerAsync.cs 2 advisory rules 10.00 → 8.35 Suppress
MySqlOutboxMigrationCatalog.cs 2 advisory rules 10.00 → 9.28 Suppress
MsSqlOutboxMigrationCatalog.cs 2 advisory rules 10.00 → 9.29 Suppress
PostgreSqlOutboxMigrationCatalog.cs 2 advisory rules 10.00 → 9.29 Suppress
InboxMessage.cs 1 advisory rule 10.00 → 9.69 Suppress
SqliteOutboxMigrationCatalog.cs 1 advisory rule 10.00 → 9.69 Suppress
PipelineBuilder.cs 1 advisory rule 7.79 → 7.55 Suppress
RelationalDatabaseInbox.cs 1 advisory rule 7.79 → 7.55 Suppress
BrighterSpanExtensions.cs 1 advisory rule 9.42 → 9.39 Suppress
ServiceCollectionExtensions.cs 1 advisory rule 7.62 → 7.60 Suppress
FirestoreOutbox.cs 1 advisory rule 6.11 → 6.47 Suppress

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
InMemoryOutbox.cs 7.79 → 8.03 Missing Arguments Abstractions
DynamoDbOutbox.cs 6.90 → 7.11 Missing Arguments Abstractions, Primitive Obsession
DynamoDbOutbox.cs 6.90 → 7.11 Missing Arguments Abstractions, Primitive Obsession
MongoDbOutbox.cs 7.55 → 7.79 Missing Arguments Abstractions
FirestoreOutbox.cs 6.11 → 6.47 Code Duplication, Missing Arguments Abstractions

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@iancooper

Copy link
Copy Markdown
Member Author

@lillo42 Might be worth you casting an eye over the generated tests for causation in this one

…ategory trait)

Merging master brought in the generated-test template that emits an xUnit
[Trait("Category", "<provider>")] attribute. The committed Causation outbox
tests predated that template change, so regenerating adds the missing trait
to all 12 provider variants (Spanner already had it). Generated output only —
no logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +628 to +648
foreach (var item in queryResponse.Items)
{
if (!item.TryGetValue("MessageId", out var messageId))
continue;

// Restore the outstanding marker (from the still-present CreatedTime) and clear the
// dispatched state so the sweeper resends the message.
var updateItemRequest = new UpdateItemRequest
{
TableName = _configuration.TableName,
Key = new Dictionary<string, AttributeValue>
{
{ "MessageId", new AttributeValue { S = messageId.S } }
},
UpdateExpression = "SET OutstandingCreatedTime = CreatedTime REMOVE DeliveryTime, DeliveredAt",
ConditionExpression = "attribute_exists(MessageId)"
};

await _client.UpdateItemAsync(updateItemRequest, cancellationToken)
.ConfigureAwait(ContinueOnCapturedContext);
}
Comment on lines +641 to +661
foreach (var item in queryResponse.Items)
{
if (!item.TryGetValue("MessageId", out var messageId))
continue;

// Restore the outstanding marker (from the still-present CreatedTime) and clear the
// dispatched state so the sweeper resends the message.
var updateItemRequest = new UpdateItemRequest
{
TableName = _configuration.TableName,
Key = new Dictionary<string, AttributeValue>
{
{ "MessageId", new AttributeValue { S = messageId.S } }
},
UpdateExpression = "SET OutstandingCreatedTime = CreatedTime REMOVE DeliveryTime, DeliveredAt",
ConditionExpression = "attribute_exists(MessageId)"
};

await _client.UpdateItemAsync(updateItemRequest, cancellationToken)
.ConfigureAwait(ContinueOnCapturedContext);
}

await GenerateAsync(
configuration,
Path.Combine("Outbox", prefix, "Generated", "Causation"),
await GenerateAsync(
configuration,
Path.Combine("Outbox", prefix, "Generated", "Causation"),
Path.Combine("Outbox", "Causation"),

await GenerateAsync(
configuration,
Path.Combine("Outbox", prefix, "Generated", "Causation"),
await GenerateAsync(
configuration,
Path.Combine("Outbox", prefix, "Generated", "Causation"),
Path.Combine("Outbox", "Causation"),
Comment on lines +226 to +235
catch (Exception ex)
{
findings.Add(ValidationResult.Fail(new ValidationError(
ValidationSeverity.Warning,
source,
$"OnceOnlyAction.Replay could not verify causation-tracking support on the {store} store — " +
$"the schema capability probe failed ({ex.GetType().Name}: {ex.Message}). Ensure the store is " +
"reachable and its schema is migrated before relying on Replay")));
return null;
}
Comment on lines +217 to +220
catch (Exception)
{
// best-effort cleanup
}
Comment on lines +217 to +220
catch (Exception)
{
// best-effort cleanup
}
Comment on lines +43 to +46
catch (Exception)
{
// Ignoring any error during delete, it's not important at this point
}
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown

Code Review — Replay Outbox Messages on Inbox Duplicate (#4067)

Reviewed against ADR 0057 / spec 0027. This is a large, well-engineered change: the role-interface design (IAmACausationTrackingInbox / IAmACausationTrackingOutbox) is clean and opt-in, the memoized column-existence probe avoids per-deposit cost while keeping deposits working against un-migrated schemas, the replay SQL is correctly parameterized on every backend, and test coverage (base tests, per-backend, end-to-end, validation) is thorough. Migration versioning is internally consistent (outbox V7->V8, inbox V2->V3, Postgres inbox V1->V2, Spanner VLatestOutbox=8 / VLatestInbox=3) with sound idempotency guards. Issues worth addressing, most important first.

🔴 Spanner bulk outbox insert throws for the common null-causation case

SpannerOutbox.CreateSqlParameter special-cases the causation parameter by exact name to force an explicit SpannerDbType (required so a null -> DBNull.Value binds), but it only matches "@CausationId"src/Paramore.Brighter.Outbox.Spanner/SpannerOutbox.cs:99. The bulk insert path names its per-row parameters @p{i}_CausationIdsrc/Paramore.Brighter/RelationDatabaseOutbox.cs:1615. On Spanner the CausationId column always exists (ships in the DDL) so includeCausation is always true; in the normal (non-replay) case causationId == null. The bulk path then creates a SpannerParameter with DBNull.Value and no SpannerDbType — exactly the untyped-null case the single-Add special case exists to prevent — so a bulk Add/AddAsync of multiple messages throws. Not caught by the base suite, which only exercises bulk deposit with a non-null causation id. Other relational backends tolerate untyped DBNull and are unaffected. Fix: bind the causation parameter via the typed two-arg CreateSqlParameter(name, DbType.String, causationId) on both single and bulk paths (routes through ToSpannerDbType) and drop the name-based special case; add a bulk-deposit-with-null-causation test for Spanner.

🟠 Inbox read path (GetCausationId) is not gated on column existence — breaks the AC10 graceful-degradation promise

The write path degrades on an un-migrated table via the memoized CausationColumnExists() probe, but the read path guards only on CausationQueries is nullsrc/Paramore.Brighter/RelationalDatabaseInbox.cs:320-351. And UseInboxHandler calls GetCausationId on a Replay duplicate without first checking SupportsCausationTracking() — UseInboxHandler.cs:122-126 (and UseInboxHandlerAsync.cs:126-132). So for a backend that implements the causation query interface but whose table has not been migrated (precisely the mixed state the write-path gate tolerates), enabling OnceOnlyAction.Replay makes a duplicate command run SELECT ... [CausationId] ... and throw invalid-column instead of no-opping. Pipeline validation only emits a Warning here (not an Error), so startup succeeds and the failure surfaces at runtime on the first duplicate. Gating GetCausationId on CausationColumnExists() (return null when absent) fixes this and also protects ReplayCausation.

🟡 The feature silently no-ops unless every handler threads its RequestContext into Post/DepositPost

The causation id is stamped into the pipeline RequestContext.Bag by UseInboxHandler, and the outbox Add reads it from the bag — but only if the forwarding handler passes its own Context down. CommandProcessor.InitRequestContext creates a fresh context when none is supplied (CommandProcessor.cs:1533, requestContext ?? _requestContextFactory.Create()). A handler using the idiomatic _commandProcessor.Post(evt) (no context) produces outbox messages with a null CausationId, and a later Replay finds nothing — a silent no-op. The ProcessAndForwardHandler test double does thread Context and its XML comment explains why, but this is an easy footgun and nothing validates it. Please surface it prominently in the user-facing docs/release notes and consider whether a clearer signal is feasible.

🟢 Minor

  • DynamoDB: MessageItem hardcodes the GSI annotation indexName: "Causation" while the probe/query read the configurable DynamoDbConfiguration.CausationIndexName — overriding the config name silently breaks replay. Consistent with the pre-existing Outstanding/Delivered index pattern, so a latent trap rather than a regression.
  • DynamoDB inbox: GetCausationId comment says "we call GetCausationIdAsync" (implying a span is added there); neither method creates a span. Misleading comment only.
  • Torn read on bool? _causationColumnExists (RelationalDatabaseInbox / RelationDatabaseOutbox / DynamoDbOutbox): the value-never-changes argument is value-correct, but a concurrent reader on a weak memory model (ARM) can observe a torn hasValue=true/value=false during the single first write, transiently skipping causation for one message. An int flag with Volatile.Read/Write would close the hole the comment acknowledges.
  • Memoized absent is cached for the instance lifetime — a mid-process migration requires a restart. Documented and acceptable.
  • Duplicate spec pointer files: both specs/.current-spec and specs/.current_spec (hyphen and underscore) are added with identical content — looks like an accidental duplicate.
  • Pre-existing (not this PR): src/Paramore.Brighter.Outbox.MsSql/SqlOutboxBuilder.cs is missing a comma between [SpecVersion] NVARCHAR(255) NULL and PRIMARY KEY ( [Id] ) in the CREATE TABLE. The new [CausationId] line sits above it and the index is a separate statement, so it is unaffected — flagging for awareness since FreshInstallDdl sources this builder.

🟢 Verified correct

Replay SQL (Dispatched = NULL WHERE CausationId = @CausationId on relational; the symmetric SET OutstandingCreatedTime = CreatedTime REMOVE DeliveryTime, DeliveredAt on DynamoDB, with pagination and attribute_exists guard); full parameterization of all queries; column-existence probes (sys.columns, information_schema, pg_attribute, pragma_table_info) mapping via HasRows; single-Add parameter counts; migration idempotency guards; RequestContext.CreateCopy() copies the new InstrumentationOptions; the source-breaking IRequestContext.InstrumentationOptions addition is clearly documented in the release notes; DI registration correctly registers the outbox under the role interface.

Reviewed with assistance from parallel analysis of the persistent-store implementations.

iancooper and others added 3 commits July 3, 2026 10:56
…ths (PR #4067 review)

Spanner's single-arg CreateSqlParameter only forced an explicit SpannerDbType
for the literal "@CausationId", but the bulk insert path names its per-row
params "@p{i}_CausationId". Since Spanner ships the CausationId column in its
DDL, includeCausation is always true; in the normal (non-replay) case the
causation id is null, so the bulk path produced an untyped DBNull that Spanner
rejects — a bulk Add of >1 message threw.

Bind the causation parameter via the typed two-arg CreateSqlParameter(name,
DbType.String, value) on the single-Add, bulk-Add and replay paths (matching
how the bulk insert already types every other column), and drop Spanner's now
-redundant name-based special case. Other relational backends tolerate untyped
DBNull and are unaffected; verified via SQLite outbox 86/86 + causation 24/24.

Adds a Spanner bulk-deposit-with-null-causation regression test (Category
=Spanner, CI-excluded; runs against the emulator).

Also removes the accidental duplicate spec pointer specs/.current_spec
(underscore); specs/.current-spec (hyphen) is the canonical name the /spec
skills read.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
 review)

The inbox write path degrades on an un-migrated table via the memoized
CausationColumnExists() probe, but GetCausationId/GetCausationIdAsync guarded
only on "CausationQueries is null". Every backend implements the causation
query interface, so on a backend whose table has not been migrated (the mixed
state the write-path gate tolerates) a Replay duplicate — UseInboxHandler calls
GetCausationId without first checking SupportsCausationTracking() — ran
SELECT ... [CausationId] ... and threw invalid-column instead of no-opping,
breaking the AC10 graceful-degradation promise. Pipeline validation only warns
here, so startup succeeds and the failure surfaces at runtime on first duplicate.

Gate both read methods on CausationColumnExists()/CausationColumnExistsAsync()
(which already return false when the query interface is absent), returning null
when the column is missing. This also protects ReplayCausation.

Regression test: un-migrated (V2) SQLite inbox returns null from GetCausationId
instead of throwing. SQLite legacy+causation 27/27, Core OnceOnly 40/40, full
SQLite suite 151/151.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…index trap (PR #4067 review)

- release_notes: add a "Usage requirement" call-out that Replay only works when
  handlers thread their RequestContext through Post/DepositPost; a context-less
  Post creates a fresh context, drops the causation id, and makes Replay a silent
  no-op. Includes a ❌/✅ handler example.
- DynamoDbConfiguration.CausationIndexName: <remarks> warning that the value must
  stay "Causation" because the matching GSI hash key on MessageItem is a compile
  -time-constant annotation that cannot read the config; overriding it silently
  breaks replay (same latent trap as the Outstanding/Delivered index names).
- DynamoDbInbox.GetCausationId: correct the misleading comment that implied
  GetCausationIdAsync adds a span (neither method does today).

Docs/comments only — no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Code Review — PR #4067: Replay Outbox Messages on Inbox Duplicate Detection

Reviewed against ADR 0057 and the merge base. This is a large, well-engineered feature (189 files) with a thorough ADR, an opt-in / non-breaking design, role interfaces that follow the existing IAmA* conventions, and strong core test coverage. The write-path gate (memoized live-schema probe) and the pipeline-validation rule are nicely done — the validation rule even improves on the ADR by wrapping the probe in try/catch and distinguishing a "Replay Skipped" telemetry event from a successful replay.

Below are the issues found, ranked. Nothing here is a blocker for the design; the notable ones are runtime-robustness gaps in mixed / partial states.

🔴 High / Medium

1. Relational ReplayCausation is not gated on the live-schema probe → runtime SQL error in mixed-migration state.
src/Paramore.Brighter/RelationDatabaseOutbox.cs (ReplayCausation ~1049, ReplayCausationAsync ~1073) guard only on if (CausationQueries is null) return;. Every other write path (Add, bulk Add, and the inbox GetCausationId) deliberately gates on the live CausationColumnExists() probe precisely because a backend implements IRelationalDatabaseOutboxCausationQueries statically even when its table is un-migrated.

Scenario: inbox migrated (has CausationId), outbox not migrated (interface present, column absent) — e.g. a user who provisions/migrates only the inbox. UseInboxHandler gets a non-null causation id from the inbox and calls _outbox.ReplayCausation(...), which issues UPDATE {0} SET Dispatched = NULL WHERE CausationId = @CausationId against a table with no such column, throwing "invalid column name CausationId" at runtime. This is exactly the graceful-degradation case the write-path gate was built to prevent, and pipeline validation only emits a non-blocking Warning for the un-migrated outbox, so startup proceeds. Suggest adding || !CausationColumnExists() (and the async equivalent) to the early return.

2. Handler replay callsite mirrors the same gap.
UseInboxHandler.cs:122-126 / UseInboxHandlerAsync.cs:126-133 call _outbox.ReplayCausation(...) whenever the inbox returns a causation id, checking only _outbox is not null — never _outbox.SupportsCausationTracking(). Guarding either here or inside ReplayCausation (see #1) closes the mixed-migration hole.

3. DynamoDB ReplayCausationAsync aborts the entire replay on ConditionalCheckFailedException (no catch-and-continue).
src/Paramore.Brighter.Outbox.DynamoDB/DynamoDbOutbox.cs:642-643 and .V4/DynamoDbOutbox.cs:655-656. The per-item UpdateItem uses ConditionExpression = "attribute_exists(MessageId)" but the surrounding try has only a finally. MarkDispatchedAsync catches this exact exception; ReplayCausationAsync does not. Because the Causation GSI is always eventually consistent, the query can return a MessageId for an item that was since swept/purged/TTL-deleted (or deleted concurrently). The condition fails, the exception propagates, and the pagination loop unwinds — remaining messages in the causation are silently never re-dispatched. Suggest catch-and-continue like MarkDispatched.

🟡 Low

4. Torn read of the memoized bool? probe under concurrent first-probe.
_causationColumnExists (relational inbox/outbox) and _causationIndexExists (DynamoDB outbox) are unsynchronized Nullable<bool> — a two-field struct. The code comment argues the race is harmless "because the value never changes once written," but a concurrent reader can observe HasValue=true while value is still the default false (struct assignment is not atomic). At startup, pipeline validation may probe on several threads; the window can yield a spurious false, i.e. a transient "does not support causation tracking" finding or skipped replay. It self-heals, but consider an int + Interlocked or a lock if you want to close it.

5. DynamoDB replay does not reset/extend ExpiresAt (TTL).
ReplayCausation restores outstanding state but leaves the original TTL. On a TTL-configured outbox, a message whose TTL has already elapsed may be deleted before/shortly after replay and never re-dispatched. Only affects outboxes with TimeToLive set.

6. CausationColumnExistsCommand probes are not schema-qualified.
MsSql OBJECT_ID, Postgres to_regclass, MySql table_name all resolve against the connection default schema/DB. A box table in a non-default schema can be probed as "column absent" even when present, silently disabling tracking. This mirrors the existing unqualified FROM {0} runtime queries, so it is consistent rather than a regression — but the feature inherits the limitation.

7. MsSql fresh-install index is not idempotent.
src/Paramore.Brighter.Outbox.MsSql/SqlOutboxBuilder.cs:135 emits a bare CREATE INDEX [idx_..._CausationId] with no sys.indexes / IF NOT EXISTS guard, whereas the V8 migration uses a guarded EXEC(...) and the Sqlite/Postgres/Spanner builders use IF NOT EXISTS. Safe today only because the MsSql CREATE TABLE is itself unguarded (so GetDDL runs on a true fresh install), but it is the odd one out.

8. timeoutInMilliseconds is accepted but ignored by DynamoDB GetCausationId / GetCausationIdAsync (consistent with other DynamoDB methods, but silently non-functional).

9. IRequestContext gained a new InstrumentationOptions member. This is technically a source/binary-breaking change for any external implementer of the interface (no default interface implementation). Likely acceptable given the interface role, but worth calling out for a minor-version bump.

🧪 Test coverage

Coverage is strong overall — sync+async replay, no-outbox terminal step (both), causation-id defaulting, the validation rule five checks plus probe-throws-degrades-to-Warning, the legacy/un-migrated-schema gate per relational backend, the DynamoDB "no GSI" gate, and base tests inherited by every store. Notable gaps:

  • Causation-id preservation/chaining is untested — the if (!requestContext.Bag.ContainsKey(CausationId)) guard (the actual "linking" semantic) is never exercised by seeding a parent causation id into the bag before Send to verify it is not overwritten. Given the feature name, this is the most worthwhile gap to close.
  • Async "Replay Skipped" (null-causation) telemetry, no-instrumentation, and no-span cases are only tested on the sync handler.
  • The handler-level "inbox present but not IAmACausationTrackingInbox" false branch is only covered at config time (validation rule), never at runtime.
  • Store-level replay of an unknown causation id (matches no rows — expected safe no-op).

Overall this is a solid, carefully-scoped implementation. The highest-value follow-ups are gating relational ReplayCausation on the live probe (#1/#2) and making DynamoDB replay resilient to vanished items (#3), since both concern real partial / eventually-consistent states that validation only warns about.

🤖 Review assisted by Claude Code (Opus 4.8, 1M context).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Done feature request .NET Pull requests that update .net code V10.X

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Force Replay of Outbox on seeing duplicate Processed Inbox Message

1 participant