Goal
Make the WordPress/Data Machine database the source of truth for DMC workspace state, cleanup review state, locks, and evidence. The filesystem should be the managed target, not the coordination layer.
Current state
DMC has useful pieces but the architecture is split:
- Worktree lifecycle metadata is primarily projected through filesystem metadata and runtime scans.
- Data Machine jobs store execution state and task evidence.
- Cleanup can schedule chunks through Data Machine jobs.
- Cleanup review/apply state is still not a first-class DMC database concept.
- Filesystem JSON cleanup plans were used in operator workflows and should not be the primary path.
- Lock storage/retention is not clearly surfaced.
Target model
Add DMC-owned database storage for:
datamachine_code_worktrees
Tracks current known workspace inventory.
Fields should include:
id
handle unique
repo
branch
path
primary_path
is_primary
lifecycle_state
origin_site
origin_agent
origin_session
task_url
task_ref
pr_url
created_at
last_seen_at
last_probe_at
last_probe_status
dirty_count
unpushed_count
artifact_count
artifact_size_bytes
size_bytes
cleanup_signal
metadata JSON
datamachine_code_cleanup_runs
Tracks review/apply runs.
Fields should include:
id
run_id unique
mode
status
policy JSON
requested_by_user_id
requested_by_agent_id
parent_job_id
batch_job_id
created_at
started_at
completed_at
summary JSON
datamachine_code_cleanup_items
Tracks planned/applied/skipped rows.
Fields should include:
id
run_id
handle
worktree_id
item_type
planned_action
status
reason_code
reason
bytes_reclaimed
job_id
chunk_index
planned_at
applied_at
evidence JSON
datamachine_code_locks or DB-backed transient/option lock abstraction
Tracks active/stale locks without unmanaged filesystem bloat.
Fields should include:
lock_key
owner_type
owner_id
purpose
created_at
heartbeat_at
expires_at
metadata JSON
Design constraints
- No loose JSON plan files as the primary workflow.
- CLI review creates a DB-backed run/plan and returns a
run_id.
- Apply/resume/cancel/evidence all operate by
run_id.
- Chunk jobs claim DB rows and update item state.
workspace hygiene reads from DB cache by default and can refresh from filesystem.
- Filesystem scans become refresh/probe operations, not every command's source of truth.
- Retention prunes old runs/items/locks.
Migration direction
This project does not need long-term backward compatibility. Prefer current contract and explicit adoption:
- Create tables and repository classes.
- Populate DB on worktree add/finalize/remove/refresh/hygiene refresh.
- Add
workspace inventory refresh to reconcile DB from filesystem.
- Move cleanup run planning into DB rows.
- Switch CLI help/docs away from file
--apply-plan examples.
- Deprecate file-plan apply to escape hatch only.
- Add retention for cleanup runs/items/locks.
Acceptance criteria
- Daily cleanup never requires plan files.
- Operator can run:
workspace cleanup plan, workspace cleanup apply <run-id>, workspace cleanup status <run-id>, workspace cleanup evidence <run-id>.
- Worktree counts come from DB-backed inventory unless explicitly refreshing.
- Cleanup run evidence is queryable and retained/pruned by policy.
- Lock state is visible and self-pruning.
Goal
Make the WordPress/Data Machine database the source of truth for DMC workspace state, cleanup review state, locks, and evidence. The filesystem should be the managed target, not the coordination layer.
Current state
DMC has useful pieces but the architecture is split:
Target model
Add DMC-owned database storage for:
datamachine_code_worktreesTracks current known workspace inventory.
Fields should include:
idhandleuniquerepobranchpathprimary_pathis_primarylifecycle_stateorigin_siteorigin_agentorigin_sessiontask_urltask_refpr_urlcreated_atlast_seen_atlast_probe_atlast_probe_statusdirty_countunpushed_countartifact_countartifact_size_bytessize_bytescleanup_signalmetadataJSONdatamachine_code_cleanup_runsTracks review/apply runs.
Fields should include:
idrun_iduniquemodestatuspolicyJSONrequested_by_user_idrequested_by_agent_idparent_job_idbatch_job_idcreated_atstarted_atcompleted_atsummaryJSONdatamachine_code_cleanup_itemsTracks planned/applied/skipped rows.
Fields should include:
idrun_idhandleworktree_iditem_typeplanned_actionstatusreason_codereasonbytes_reclaimedjob_idchunk_indexplanned_atapplied_atevidenceJSONdatamachine_code_locksor DB-backed transient/option lock abstractionTracks active/stale locks without unmanaged filesystem bloat.
Fields should include:
lock_keyowner_typeowner_idpurposecreated_atheartbeat_atexpires_atmetadataJSONDesign constraints
run_id.run_id.workspace hygienereads from DB cache by default and can refresh from filesystem.Migration direction
This project does not need long-term backward compatibility. Prefer current contract and explicit adoption:
workspace inventory refreshto reconcile DB from filesystem.--apply-planexamples.Acceptance criteria
workspace cleanup plan,workspace cleanup apply <run-id>,workspace cleanup status <run-id>,workspace cleanup evidence <run-id>.