Skip to content

TST-38: Concurrency and race condition stress tests #705

@Chris0Jeky

Description

@Chris0Jeky

Context

The system has several concurrent-access patterns: worker queue claiming, optimistic concurrency on card updates, SignalR presence tracking, and batch processing. These deserve targeted concurrency tests beyond the existing Playwright multi-session coverage.

Test Scenarios

Queue Claim Races

  1. Double-claim prevention: 10 parallel workers all try to claim the same Pending LLM queue item → exactly one succeeds, 9 get DomainException or false return
  2. Capture triage claim: `TryClaimProcessingCaptureAsync` with stale `expectedUpdatedAt` → returns false
  3. Batch processing with concurrent workers: Two workers call `ProcessBatchAsync` simultaneously → no item processed twice

Card Update Conflicts

  1. Concurrent card moves: User A and User B both move the same card to different columns simultaneously → one succeeds, one gets 409 Conflict
  2. Concurrent card edits: Both users update the same card's description → stale-write detection fires
  3. Column reorder race: Two users reorder cards in the same column simultaneously → final state is consistent (no duplicated or lost positions)

Proposal Approval Races

  1. Double-approve: Two requests to approve the same proposal simultaneously → only one succeeds
  2. Approve + Expire race: Proposal is being approved while housekeeping worker expires it → one wins cleanly

Board Presence

  1. Rapid join/leave: 20 connections join and leave a board rapidly → presence snapshot is eventually consistent
  2. Disconnect during edit: User sets editing card, then connection drops → presence snapshot clears the editing state

Webhook Delivery

  1. Concurrent webhook deliveries: Multiple events fire for the same subscription → each gets its own delivery record, no duplicate delivery

Rate Limiting Under Load

  1. Burst beyond limit: 100 requests in quick succession → correct number throttled, retry headers accurate
  2. Cross-user isolation under load: User A hitting rate limit doesn't affect User B's requests

Implementation Notes

  • Use `Task.WhenAll` with multiple `HttpClient` instances for HTTP-level concurrency
  • For queue claim tests: seed items, then fire N parallel claim attempts
  • For card conflicts: use EF Core concurrency tokens already in place
  • For presence tests: create multiple SignalR connections to the same board
  • Consider using `SemaphoreSlim` barriers to ensure truly simultaneous execution
  • Measure: no deadlocks, no data loss, no duplicate processing, consistent final state

Risk Areas

  • SQLite doesn't handle write concurrency as well as PostgreSQL — some tests may surface SQLite-specific serialization issues that would not occur in production with a different database. Document these clearly.
  • The `Task.Delay(backoff)` in retry path holds a scope for the duration — under high concurrency this could exhaust scope factories

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Review

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions