Skip to content
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Explore the engineering depth behind this plugin:
- **[Configuration System](development/CONFIG_FLOW.md)** - How config loading and merging works
- **[Config Fields Guide](development/CONFIG_FIELDS.md)** - Understanding config keys, `id`, and `name`
- **[Testing Guide](development/TESTING.md)** - Test scenarios, verification procedures, integration testing
- **[OMX Team + Ralph Playbook](development/OMX_TEAM_RALPH_PLAYBOOK.md)** - WSL2-first atomic workflow, fallback routing, and completion evidence gates
- **[TUI Parity Checklist](development/TUI_PARITY_CHECKLIST.md)** - Auth dashboard/UI parity requirements for future changes

## Key Architectural Decisions
Expand Down
287 changes: 287 additions & 0 deletions docs/development/OMX_TEAM_RALPH_PLAYBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
# OMX Team + Ralph Reliability Playbook (WSL2-First)

This runbook defines the repository-standard execution flow for high-rigor work using `omx team` plus `omx ralph`.

## Scope

- Repository-specific workflow for `oc-chatgpt-multi-auth`.
- Primary mode: team execution on WSL2 + tmux.
- Controlled fallback: single-agent Ralph execution.
- Completion requires parity quality gates in both modes.

## Defaults and Guardrails

- Default team topology: `6:executor`.
- Retry policy: fail-fast with at most `2` controlled retries per run.
- No normal shutdown when tasks are non-terminal.
- Mandatory completion gates:
- terminal state (`pending=0`, `in_progress=0`, `failed=0`) for team mode
- `npm run typecheck`
- `npm test`
- `npm run build`
- `npx tsc --noEmit --pretty false` diagnostics
- architect verification (`--architect-tier` and `--architect-ref`)
- Ralph completion requires explicit state cleanup (`omx cancel`).

## Atomic Phases

### Phase 0 - Intake Contract

Lock execution contract for this run:

- target task statement
- default worker topology (`6:executor`)
- gate policy and architect verification format

### Phase 1 - Baseline Integrity Gate

From repo root:

```bash
git fetch origin --prune
git rev-parse origin/main
```

If working on an isolated branch/worktree, confirm:

```bash
git status --short
git branch --show-current
```

### Phase 2 - Mainline Deep Audit

Audit surfaces before mutation:

- workflow docs (`docs/development`)
- scripts contract (`scripts`)
- package scripts (`package.json`)
- `.omx/tmux-hook.json` integrity

### Phase 3 - Isolation Provisioning

Create isolated worktree from synced `origin/main`:

```bash
git worktree add <path> -b <branch-name> origin/main
```

Never implement directly on `main`.

### Phase 4 - Deterministic Routing

Run preflight:

```bash
npm run omx:preflight
```

JSON mode:

```bash
npm run omx:preflight -- --json
```

Optional distro selection:

```bash
npm run omx:preflight -- --distro Ubuntu
```

#### Preflight Exit Codes

| Exit Code | Mode | Meaning | Required Action |
| --- | --- | --- | --- |
| `0` | `team_ready` | Team prerequisites are satisfied | Continue with team mode |
| `2` | `team_blocked` | Fixable blockers (for example hook config) | Fix blockers, rerun preflight |
| `3` | `fallback_ralph` | Team-only prerequisites failed | Execute controlled Ralph fallback |
| `4` | `blocked` | Fatal blocker for both team and fallback (for example `omx` missing in both host and WSL runtimes) | Stop and fix fatal prerequisite |
| `1` | script error | Invocation/runtime failure | Fix command/environment |

### Phase 5 - Ralph Execution Loop

#### Team Path (preferred)

Inside WSL tmux session:

```bash
omx team ralph 6:executor "execute task: <clear task statement>"
```

Capture startup evidence:

```bash
omx team status <team-name>
tmux list-panes -F '#{pane_id}\t#{pane_current_command}\t#{pane_start_command}'
test -f ".omx/state/team/<team-name>/mailbox/leader-fixed.json" && echo "leader mailbox present"
```

Monitor until terminal:

```bash
omx team status <team-name>
```

Terminal gate for normal completion:

- `pending=0`
- `in_progress=0`
- `failed=0`

#### Controlled Fallback Path

Use fallback only when preflight mode is `fallback_ralph`:

```bash
omx ralph "execute task: <clear task statement>"
```

### Phase 6 - Hardening and Evidence

Capture evidence before shutdown/handoff:

```bash
npm run omx:evidence -- --mode team --team <team-name> --architect-tier standard --architect-ref "<architect verdict reference>" --architect-note "<optional note>"
```

Ralph cleanup before fallback evidence:

```bash
omx cancel
```

Fallback evidence:

```bash
npm run omx:evidence -- --mode ralph --architect-tier standard --architect-ref "<architect verdict reference>" --architect-note "<optional note>"
```

Ralph state cleanup (required for completion):

```bash
omx cancel
```

### Phase 7 - Shutdown and Handoff

For team mode, only after evidence passes:

```bash
omx team shutdown <team-name>
test ! -d ".omx/state/team/<team-name>" && echo "team state cleaned"
```

Handoff package must include:

- branch name and commit SHA
- gate evidence file path
- architect verification reference
- unresolved blockers (if any)

## Fail-Fast Controlled Retry Contract

Retry budget is `2` retries maximum for a single run.

Retry triggers:

- team task failures
- no-ACK startup condition
- non-reporting worker condition after triage

Retry steps:

1. Capture current status and error output.
2. Attempt resume:
- `omx team resume <team-name>`
- `omx team status <team-name>`
3. If unresolved, controlled restart:
- `omx team shutdown <team-name>`
- stale pane/state cleanup
- relaunch with same task
4. After second retry failure, stop and escalate as blocked.

## Reliability Remediation

### tmux-hook placeholder target

If `.omx/tmux-hook.json` contains:

```json
"value": "replace-with-tmux-pane-id"
```

Set a real pane id:

```bash
tmux display-message -p '#{pane_id}'
```

Then validate:

```bash
omx tmux-hook validate
omx tmux-hook status
```

### Stale pane and team state cleanup

Inspect panes:

```bash
tmux list-panes -F '#{pane_id}\t#{pane_current_command}\t#{pane_start_command}'
```

Kill stale worker panes only:

```bash
tmux kill-pane -t %<pane-id>
```

Remove stale team state:

```bash
rm -rf ".omx/state/team/<team-name>"
```

## Failure Matrix

| Symptom | Detection | Action |
| --- | --- | --- |
| `tmux_hook invalid_config` | `.omx/logs/tmux-hook-*.jsonl` | fix `.omx/tmux-hook.json`, revalidate |
| `omx team` fails on tmux/WSL prerequisites | command output | use preflight routing, fallback if mode `fallback_ralph` |
| startup without ACK | missing mailbox updates | resume/triage then controlled retry |
| non-terminal task counts at completion | `omx team status` | block shutdown until terminal gate |
| architect verification missing | no `--architect-*` evidence | block completion |
| fatal preflight blocker (`blocked`) | preflight exit code `4` | stop and fix prerequisite |

## Done Checklist

- [ ] Preflight routing executed and mode recorded.
- [ ] Team startup evidence captured (if team mode).
- [ ] Terminal task-state gate satisfied before shutdown.
- [ ] Fresh quality gates passed (`typecheck`, `test`, `build`, diagnostics).
- [ ] Architect verification recorded with tier + reference.
- [ ] Evidence file created under `.omx/evidence/`.
- [ ] Ralph cleanup state is inactive in evidence output (`omx cancel` done before final ralph evidence).
- [ ] Team shutdown + cleanup verified (team mode only).

## Command Reference

```bash
# Preflight
npm run omx:preflight

# Team execution
omx team ralph 6:executor "execute task: <task>"
omx team status <team-name>
omx team resume <team-name>
omx team shutdown <team-name>

# Ralph fallback
omx ralph "execute task: <task>"
omx cancel

# Evidence
npm run omx:evidence -- --mode team --team <team-name> --architect-tier standard --architect-ref "<verdict-ref>"
npm run omx:evidence -- --mode ralph --architect-tier standard --architect-ref "<verdict-ref>"
```
2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@
"test:ui": "vitest --ui",
"test:coverage": "vitest run --coverage",
"coverage": "vitest run --coverage",
"omx:preflight": "node scripts/omx-preflight-wsl2.js",
"omx:evidence": "node scripts/omx-capture-evidence.js",
"audit:prod": "npm audit --omit=dev --audit-level=high",
"audit:all": "npm audit --audit-level=high",
"audit:dev:allowlist": "node scripts/audit-dev-allowlist.js",
Expand Down
Loading