Skip to content

Architecture: make WordPress DB the source of truth for DMC workspace state #245

@chubes4

Description

@chubes4

Goal

Make the WordPress/Data Machine database the source of truth for DMC workspace state, cleanup review state, locks, and evidence. The filesystem should be the managed target, not the coordination layer.

Current state

DMC has useful pieces but the architecture is split:

  • Worktree lifecycle metadata is primarily projected through filesystem metadata and runtime scans.
  • Data Machine jobs store execution state and task evidence.
  • Cleanup can schedule chunks through Data Machine jobs.
  • Cleanup review/apply state is still not a first-class DMC database concept.
  • Filesystem JSON cleanup plans were used in operator workflows and should not be the primary path.
  • Lock storage/retention is not clearly surfaced.

Target model

Add DMC-owned database storage for:

datamachine_code_worktrees

Tracks current known workspace inventory.

Fields should include:

  • id
  • handle unique
  • repo
  • branch
  • path
  • primary_path
  • is_primary
  • lifecycle_state
  • origin_site
  • origin_agent
  • origin_session
  • task_url
  • task_ref
  • pr_url
  • created_at
  • last_seen_at
  • last_probe_at
  • last_probe_status
  • dirty_count
  • unpushed_count
  • artifact_count
  • artifact_size_bytes
  • size_bytes
  • cleanup_signal
  • metadata JSON

datamachine_code_cleanup_runs

Tracks review/apply runs.

Fields should include:

  • id
  • run_id unique
  • mode
  • status
  • policy JSON
  • requested_by_user_id
  • requested_by_agent_id
  • parent_job_id
  • batch_job_id
  • created_at
  • started_at
  • completed_at
  • summary JSON

datamachine_code_cleanup_items

Tracks planned/applied/skipped rows.

Fields should include:

  • id
  • run_id
  • handle
  • worktree_id
  • item_type
  • planned_action
  • status
  • reason_code
  • reason
  • bytes_reclaimed
  • job_id
  • chunk_index
  • planned_at
  • applied_at
  • evidence JSON

datamachine_code_locks or DB-backed transient/option lock abstraction

Tracks active/stale locks without unmanaged filesystem bloat.

Fields should include:

  • lock_key
  • owner_type
  • owner_id
  • purpose
  • created_at
  • heartbeat_at
  • expires_at
  • metadata JSON

Design constraints

  • No loose JSON plan files as the primary workflow.
  • CLI review creates a DB-backed run/plan and returns a run_id.
  • Apply/resume/cancel/evidence all operate by run_id.
  • Chunk jobs claim DB rows and update item state.
  • workspace hygiene reads from DB cache by default and can refresh from filesystem.
  • Filesystem scans become refresh/probe operations, not every command's source of truth.
  • Retention prunes old runs/items/locks.

Migration direction

This project does not need long-term backward compatibility. Prefer current contract and explicit adoption:

  1. Create tables and repository classes.
  2. Populate DB on worktree add/finalize/remove/refresh/hygiene refresh.
  3. Add workspace inventory refresh to reconcile DB from filesystem.
  4. Move cleanup run planning into DB rows.
  5. Switch CLI help/docs away from file --apply-plan examples.
  6. Deprecate file-plan apply to escape hatch only.
  7. Add retention for cleanup runs/items/locks.

Acceptance criteria

  • Daily cleanup never requires plan files.
  • Operator can run: workspace cleanup plan, workspace cleanup apply <run-id>, workspace cleanup status <run-id>, workspace cleanup evidence <run-id>.
  • Worktree counts come from DB-backed inventory unless explicitly refreshing.
  • Cleanup run evidence is queryable and retained/pruned by policy.
  • Lock state is visible and self-pruning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions