Skip to content

Refactor ARC status model: clean state space, GitSyncStatus enum, state machine with atomic transitions #236

@Zalfsten

Description

@Zalfsten

Context

During a code review of the ARC status model, we identified a mismatch between the intended state space (as expressed by the enum definitions) and the actually implemented state space (as reflected by the code that reads and writes these values). This issue captures the desired target state and the work needed to reach it.


Current situation

Three distinct concepts — currently conflated

Concept Today Problem
Operation result ArcStatus (CREATED, UPDATED, DELETED, REQUESTED) DELETED and REQUESTED are never set; ArcStatus is not a state — it is a verb in the HTTP response body
Harvest lifecycle state ArcLifecycleStatus (ACTIVE, MISSING, DELETED, PROCESSING, INVALID) Only ACTIVE is ever written; PROCESSING and INVALID belong to the Git dimension, not the harvest dimension
Git sync state GitMetadata.status (free string "PENDING"/"SYNCED"/"FAILED") Never updated by the worker — always stays "PENDING"; GIT_PUSH_SUCCESS/FAILED is only written as an ArcEvent

What ArcEvents are used for today

  1. Harvest statisticsget_harvest_statistics() reconstructs arcs_new / arcs_updated / arcs_unchanged by scanning the event log for ARC_CREATED / ARC_UPDATED events. This is fragile: statistics depend on event-log archaeology rather than a structured field.
  2. Git-sync outcome — The Celery worker appends GIT_PUSH_SUCCESS or GIT_PUSH_FAILED to the event log. This is the only way the worker reports back; GitMetadata.status is never touched.

Desired target state

Concept 1 — Operation result (ArcStatus)

ArcStatus is not a state. It is an operation result returned in the HTTP response body, analogous to HTTP 201 vs 200. It should only contain values that are actually assigned:

ArcStatus: CREATED | UPDATED

DELETED and REQUESTED are removed (never assigned today; future deletion semantics belong to ArcLifecycleStatus).

Concept 2 — Harvest lifecycle state (ArcLifecycleStatus)

Describes where an ARC stands in the harvest cycle. Transitions are driven by harvest runs, not by Git sync.

ArcLifecycleStatus: ACTIVE | MISSING | DELETED

State diagram:

(new ARC submitted)
        │
        ▼
     ACTIVE ◄────────────────────────────────┐
        │                                    │
        │  harvest completed,                │  ARC reappears
        │  ARC not seen                      │  in a later harvest
        ▼                                    │
     MISSING ────────────────────────────────┘
        │
        │  missing for N consecutive harvests
        ▼
     DELETED ──── (ARC re-submitted) ──────► ACTIVE

PROCESSING and INVALID are removed from this enum — they describe the Git sync dimension, not the harvest dimension.

Concept 3 — Git sync state (GitSyncStatus)

Describes the outcome of the asynchronous GitLab synchronisation driven by the Celery worker.

GitSyncStatus: PENDING | SYNCING | SYNCED | FAILED

State diagram:

PENDING ──► SYNCING ──► SYNCED
               │
               └──► FAILED ──► PENDING  (on Celery retry)

GitMetadata.status is retyped from str to GitSyncStatus. The worker writes the new status to this field in addition to appending an ArcEvent.


State machine & atomicity

Desired: explicit transition table

Rather than ad-hoc if statements scattered across couchdb.py, transitions should be validated centrally — ideally enforced inside the pre_save_validator hook that already exists on save_document:

_LIFECYCLE_TRANSITIONS: dict[ArcLifecycleStatus, set[ArcLifecycleStatus]] = {
    ArcLifecycleStatus.ACTIVE:   {ArcLifecycleStatus.MISSING},
    ArcLifecycleStatus.MISSING:  {ArcLifecycleStatus.ACTIVE, ArcLifecycleStatus.DELETED},
    ArcLifecycleStatus.DELETED:  {ArcLifecycleStatus.ACTIVE},
}

_GIT_TRANSITIONS: dict[GitSyncStatus, set[GitSyncStatus]] = {
    GitSyncStatus.PENDING:  {GitSyncStatus.SYNCING},
    GitSyncStatus.SYNCING:  {GitSyncStatus.SYNCED, GitSyncStatus.FAILED},
    GitSyncStatus.FAILED:   {GitSyncStatus.PENDING},
    GitSyncStatus.SYNCED:   set(),  # terminal unless ARC content changes → reset to PENDING
}

Any attempt to write an invalid transition raises an error before touching CouchDB.

Atomicity analysis

Transition Atomic? Reason
ArcLifecycleStatus changes (harvest path) ✅ Yes Single CouchDB document, protected by OCC retry in save_document
GitSyncStatus: PENDING → SYNCING + Celery enqueue ❌ No CouchDB write and RabbitMQ enqueue are two separate systems — no distributed transaction
GitSyncStatus: SYNCING → SYNCED/FAILED ✅ Yes Worker writes only to CouchDB

The non-atomic PENDING → SYNCING + enqueue transition follows the outbox pattern pragmatically: write CouchDB first, then enqueue. If the process crashes between the two, the document stays stuck in SYNCING. A periodic watchdog job can detect and reset these.


Tasks

  • Remove DELETED and REQUESTED from ArcStatus; update all tests
  • Remove PROCESSING and INVALID from ArcLifecycleStatus; update all tests
  • Introduce GitSyncStatus enum (PENDING | SYNCING | SYNCED | FAILED) in middleware/shared/api_models/common/models.py
  • Retype GitMetadata.status from str to GitSyncStatus in arc_document.py
  • Write GitSyncStatus in the Celery worker (sync_to_gitlab): PENDING → SYNCING before Git push, SYNCING → SYNCED/FAILED after
  • Implement ArcLifecycleStatus transition table; enforce it via pre_save_validator in save_document
  • Implement GitSyncStatus transition table; enforce it in the worker
  • Replace event-log archaeology in get_harvest_statistics() with structured is_new / has_changes fields on ArcMetadata
  • Expose git_sync_status in the v3 ArcMetadata response model and update the API client
  • Update api_client/models.py (ArcStatus, ArcLifecycleStatus, add GitSyncStatus)
  • Document the state machines in middleware/api/spec/document-store/design.md
  • Add watchdog task (or note as future work) for stuck SYNCING documents

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitectural decisions and designmodelsAPI model definitions and structurerefactoringCode refactoring and cleanupstate-machineState management and transitions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions