Update documentation

Вадим Козыревский · Вадим Козыревский · commit 6716a4242a47 · 2026-01-28T18:40:05.000+03:00
diff --git a/docs/saga/recovery.md b/docs/saga/recovery.md
@@ -104,23 +104,34 @@ Each saga in storage has a **recovery_attempts** counter. It is used to:
 
 **Automatic increment:** When `recover_saga()` fails (exception during resume), the storage's `increment_recovery_attempts(saga_id, new_status=SagaStatus.FAILED)` is called automatically. Callers do **not** need to call `increment_recovery_attempts` themselves.
 
+**Explicit set:** Use `storage.set_recovery_attempts(saga_id, attempts)` to set the counter to a specific value: e.g. `0` after successfully recovering one of the steps, or the maximum value so the saga is excluded from further recovery without changing its status.
+
 **Getting sagas for recovery:** Use `storage.get_sagas_for_recovery()` instead of a custom query:
 
 ```python
+# All saga types (default)
 ids = await storage.get_sagas_for_recovery(
     limit=50,
     max_recovery_attempts=5,   # Only sagas with recovery_attempts < 5
     stale_after_seconds=120,   # Only sagas not updated in last 2 minutes (avoids picking active sagas)
 )
+
+# Only sagas of a specific type (e.g. one recovery job per saga name)
+ids = await storage.get_sagas_for_recovery(
+    limit=50,
+    max_recovery_attempts=5,
+    saga_name="OrderSaga",
+)
 ```
 
 | Parameter | Description |
 |-----------|-------------|
 | `limit` | Maximum number of saga IDs to return |
 | `max_recovery_attempts` | Only include sagas with `recovery_attempts` strictly less than this value (default: 5) |
 | `stale_after_seconds` | If set, only include sagas whose `updated_at` is older than (now − this value). Use to avoid picking sagas currently being executed. `None` = no filter |
+| `saga_name` | If set, only include sagas with this name (e.g. handler/type name). `None` (default) = return all saga types |
 
-Returns saga IDs in status RUNNING, COMPENSATING, or FAILED, ordered by `updated_at` ascending (oldest first).
+Returns saga IDs in status RUNNING or COMPENSATING, ordered by `updated_at` ascending (oldest first).
 
 ## Strict Backward Recovery
 
@@ -132,18 +143,19 @@ This prevents "zombie states" where compensation actions conflict with new execu
 
 ### Background Recovery Job
 
-Use `storage.get_sagas_for_recovery()` to get saga IDs that need recovery. On recovery failure, `recover_saga()` calls `increment_recovery_attempts` internally — no extra code needed.
+Use `storage.get_sagas_for_recovery()` to get saga IDs that need recovery. On recovery failure, `recover_saga()` calls `increment_recovery_attempts` internally — no extra code needed. You can pass `saga_name` to run separate recovery jobs per saga type.
 
 ```python
 import asyncio
 from cqrs.saga.recovery import recover_saga
 
-async def recovery_job(storage, saga, context_builder, container):
+async def recovery_job(storage, saga, context_builder, container, saga_name=None):
     while True:
         ids = await storage.get_sagas_for_recovery(
             limit=50,
             max_recovery_attempts=5,
             stale_after_seconds=120,  # Avoid sagas currently being executed
+            saga_name=saga_name,     # None = all types; or e.g. "OrderSaga" for one type
         )
         for saga_id in ids:
             try:
@@ -182,6 +194,7 @@ scheduler.start()
 1. **Run recovery periodically** — Background job using `get_sagas_for_recovery()` to scan for incomplete sagas
 2. **Use `max_recovery_attempts`** — Exclude sagas that fail recovery too many times (e.g. 5) to avoid infinite retries
 3. **Use `stale_after_seconds`** — Avoid picking sagas that are currently being executed by another worker
-4. **Handle failures** — Log errors and send alerts; `increment_recovery_attempts` is called automatically by `recover_saga`
-5. **Monitor metrics** — Track recovery rate, duration, failures, and sagas exceeding max attempts
-6. **Use persistent storage** — Memory storage loses data on restart
+4. **Use `saga_name` for per-type recovery** — When running separate recovery jobs per saga type, pass `saga_name` so each job only processes its own sagas
+5. **Handle failures** — Log errors and send alerts; `increment_recovery_attempts` is called automatically by `recover_saga`
+6. **Monitor metrics** — Track recovery rate, duration, failures, and sagas exceeding max attempts
+7. **Use persistent storage** — Memory storage loses data on restart
diff --git a/docs/saga/storage.md b/docs/saga/storage.md
@@ -31,12 +31,14 @@ class ISagaStorage(abc.ABC):
     async def log_step(saga_id, step_name, action, status, details=None) -> None
     async def load_saga_state(saga_id, *, read_for_update: bool = False) -> tuple[SagaStatus, dict, int]
     async def get_step_history(saga_id) -> list[SagaLogEntry]
-    async def get_sagas_for_recovery(limit, max_recovery_attempts=5, stale_after_seconds=None) -> list[uuid.UUID]
+    async def get_sagas_for_recovery(limit, max_recovery_attempts=5, stale_after_seconds=None, saga_name=None) -> list[uuid.UUID]
     async def increment_recovery_attempts(saga_id, new_status: SagaStatus | None = None) -> None
+    async def set_recovery_attempts(saga_id, attempts: int) -> None
 ```
 
-- **get_sagas_for_recovery** — Returns saga IDs that need recovery (RUNNING, COMPENSATING, FAILED) with `recovery_attempts` &lt; `max_recovery_attempts`, optionally filtered by staleness. Used by recovery jobs.
+- **get_sagas_for_recovery** — Returns saga IDs that need recovery (RUNNING, COMPENSATING) with `recovery_attempts` &lt; `max_recovery_attempts`, optionally filtered by staleness and by saga name. When `saga_name` is `None` (default), returns all saga types; when set, only sagas with that name. Used by recovery jobs.
 - **increment_recovery_attempts** — Called automatically by `recover_saga()` on recovery failure; increments `recovery_attempts` and optionally updates status (e.g. to FAILED).
+- **set_recovery_attempts** — Sets the recovery attempt counter to an explicit value. Use to reset after successfully recovering a step (e.g. set to `0`) or to set to the maximum so the saga is excluded from further recovery (e.g. mark as permanently failed without changing status).
 
 ## Memory Storage
 
@@ -83,7 +85,7 @@ Database-backed implementation for production. It uses a session factory to mana
 - `status` (VARCHAR) - PENDING, RUNNING, COMPENSATING, COMPLETED, FAILED
 - `context` (JSON)
 - `version` (INTEGER) - Optimistic locking version (default: 1)
-- `recovery_attempts` (INTEGER) - Number of failed recovery attempts (default: 0); used by `get_sagas_for_recovery` and `increment_recovery_attempts`
+- `recovery_attempts` (INTEGER) - Number of failed recovery attempts (default: 0); used by `get_sagas_for_recovery`, `increment_recovery_attempts`, and `set_recovery_attempts`
 - `created_at`, `updated_at` (TIMESTAMP)
 
 **saga_logs:**