This guide validates the cluster-facing CLI behavior by running real orch
commands (not go test).
This checklist verifies the command-plane path:
orch client CLI -> orch-master daemon -> orch-worker (host manager)
It covers:
masterlifecycle commandsworkerlifecycle commands (single long-lived host worker)- local client run/ps/show/stop flow
- remote master reachability via
--remote - backend coverage handoff for
tmux/zellij/opencode/claude/codex - one worker managing multiple runs on the same host
This file validates the cluster-facing command plane first.
Backend-specific run/attach/send/capture behavior is covered by the companion checklist:
Treat both files together as the complete manual E2E suite:
docs/e2e-master-worker-client.mddocs/e2e-backend-matrix.md
Automation lane planning lives in:
Automation-first entrypoint for the local single-host flow:
scripts/e2e-master-worker-client-local.sh
Automation-first entrypoint for remote reachability smoke:
scripts/e2e-master-worker-client-remote-smoke.sh
Parameterized automation entrypoint for target-host runs:
scripts/e2e-master-worker-client-target.sh
Automation-first entrypoint for same-machine target-host simulation:
scripts/e2e-master-worker-client-target-local.sh
Parameterized automation entrypoint for the real Zeus full flow:
scripts/e2e-master-worker-client-zeus.sh
For targets that need custom SSH flags or a nonstandard port, prefer passing a
full command via TARGET_SSH_CMD instead of relying on a simple host alias.
gitinstalledtmuxinstalled (for non-dry run session checks)- run from repo root (where
./cmd/orchexists) - Unless otherwise noted, these examples assume the project uses the
localfile backend viaissues.path. If you use the GitHub backend instead, replace manual issue-file creation with equivalent GitHub issue setup.
export ROOT="$(mktemp -d /tmp/orch-e2e-XXXXXX)"
mkdir -p "$ROOT"/{home,runtime,state,data,bin,repo/.orch,issues-store/issues,issues-store/runs,outside,origin/example}
export HOME="$ROOT/home"
export XDG_RUNTIME_DIR="$ROOT/runtime"
export XDG_STATE_HOME="$ROOT/state"
export XDG_DATA_HOME="$ROOT/data"
unset ORCH_PROJECT ORCH_REMOTE
go build -o "$ROOT/bin/orch" ./cmd/orch
ORCH_BIN="$ROOT/bin/orch"PROJECT="$(python - <<'PY'
import os, pathlib
print(pathlib.Path(os.path.realpath(os.path.join(os.environ['ROOT'], 'repo'))))
PY
)"
ISSUES="$(python - <<'PY'
import os, pathlib
print(pathlib.Path(os.path.realpath(os.path.join(os.environ['ROOT'], 'issues-store'))))
PY
)"
cat > "$PROJECT/.orch/config.yaml" <<EOF
issues:
path: $ISSUES
EOF
cat > "$PROJECT/README.md" <<'EOF'
# Manual E2E Repo
EOF
cat > "$ISSUES/issues/mwc-local-live.md" <<'EOF'
---
type: issue
id: mwc-local-live
title: Local live run
status: open
---
# Local live run
EOF
git -C "$PROJECT" init
git -C "$PROJECT" config user.email e2e@example.com
git -C "$PROJECT" config user.name E2E
git init --bare "$ROOT/origin/example/manual-e2e-repo.git"
REPO_URL="file://$ROOT/origin/example/manual-e2e-repo.git"
PROJECT_ID="example-manual-e2e-repo"
git -C "$PROJECT" remote add origin "$REPO_URL"
git -C "$PROJECT" add .
git -C "$PROJECT" commit -m "init"
git -C "$PROJECT" push -u origin HEAD
# run runtime commands from the project root
cd "$PROJECT""$ORCH_BIN" master status
"$ORCH_BIN" worker status
"$ORCH_BIN" master start
"$ORCH_BIN" master status
"$ORCH_BIN" worker start
sleep 2
"$ORCH_BIN" worker status
# starting again should not create a duplicate host worker
"$ORCH_BIN" worker start
"$ORCH_BIN" worker statusExpected:
- initial
master statusreportsStatus: not running - initial
worker statusreports the local worker profile and showsLocal Process: missing - after
master start, status reportsStatus: running worker startbrings up one host worker for the local host- after
worker start,worker statusshows both the local process state and the master registration state for that same host/profile - repeating
worker startshould not create an extra duplicate worker for the same host/profile
"$ORCH_BIN" daemon repo register "$REPO_URL"
"$ORCH_BIN" daemon repo listExpected:
daemon repo registerprintsRegistered repo mapping: <repo_id> -> <repo_url>daemon repo listincludes thatrepo_id
RUN_ID="$(date +%Y%m%d-%H%M%S)-local"
"$ORCH_BIN" --project "$PROJECT_ID" run mwc-local-live \
--run-id "$RUN_ID" \
--agent custom \
--agent-cmd "echo cli-e2e; sleep 1" \
--json
"$ORCH_BIN" --project "$PROJECT_ID" ps --issue mwc-local-live --json
"$ORCH_BIN" --project "$PROJECT_ID" show "mwc-local-live#$RUN_ID" --json
"$ORCH_BIN" --project "$PROJECT_ID" stop "mwc-local-live#$RUN_ID" --forceExpected:
- run command returns
"ok": true psreturns at least one item formwc-local-liveshowreturns"ok": trueand run metadata/eventspsJSON includestargetandtarget_hostfields (populated when the run uses--on <target>)
cat > "$ISSUES/issues/mwc-local-live-2.md" <<'EOF'
---
type: issue
id: mwc-local-live-2
title: Local live run 2
status: open
---
# Local live run 2
EOF
RUN_ID_1="$(date +%Y%m%d-%H%M%S)-a"
RUN_ID_2="$(date +%Y%m%d-%H%M%S)-b"
"$ORCH_BIN" --project "$PROJECT_ID" run mwc-local-live \
--run-id "$RUN_ID_1" \
--agent custom \
--agent-cmd "echo cli-e2e-a; sleep 20" \
--json
"$ORCH_BIN" --project "$PROJECT_ID" run mwc-local-live-2 \
--run-id "$RUN_ID_2" \
--agent custom \
--agent-cmd "echo cli-e2e-b; sleep 20" \
--json
"$ORCH_BIN" worker status
"$ORCH_BIN" --project "$PROJECT_ID" ps --jsonExpected:
- both runs become active at the same time
worker statusstill reports one host worker profile, not one worker per run- run multiplicity comes from one worker managing multiple sessions on the host
Pick a free port first (example 60318 below).
Automation entrypoint:
scripts/e2e-master-worker-client-remote-smoke.sh
"$ORCH_BIN" master kill || true
export ORCH_REMOTE=skip
"$ORCH_BIN" master start --listen tcp://127.0.0.1:60318
unset ORCH_REMOTE
"$ORCH_BIN" --remote 127.0.0.1:60318 master status
"$ORCH_BIN" --remote 127.0.0.1:60318 master killExpected:
- remote status prints
Status: running (remote=127.0.0.1:60318)
"$ORCH_BIN" worker stop --all || true
"$ORCH_BIN" master kill || true
chmod -R u+w "$ROOT" || true
rm -rf "$ROOT"Use this when you want a true end-to-end check against a real remote host.
Automation entrypoint:
scripts/e2e-master-worker-client-zeus.sh
Target used in examples:
- host:
zeus - repo:
/home/kento/repos/doeff - issues path: use the actual
issues.pathfrom/home/kento/repos/doeff/.orch/config.yaml(for example/home/kento/repos/doeff-VAULT)
TS="$(date +%Y%m%d-%H%M%S)"
ISSUE_ID="zeus-e2e-$TS"
RUN_ID="$TS-sample"
E2E_ROOT="/tmp/orch-zeus-e2e-$TS"
ssh zeus "mkdir -p $E2E_ROOT/runtime $E2E_ROOT/state $E2E_ROOT/data"
ENV_PREFIX="XDG_RUNTIME_DIR=$E2E_ROOT/runtime XDG_STATE_HOME=$E2E_ROOT/state XDG_DATA_HOME=$E2E_ROOT/data XDG_CONFIG_HOME=$E2E_ROOT/config"
# launch master and worker on Zeus
ssh zeus "$ENV_PREFIX orch master start"
ssh zeus "$ENV_PREFIX orch worker start"
ssh zeus "$ENV_PREFIX orch master status"
ssh zeus "$ENV_PREFIX orch worker status"
# create sample issue
ssh zeus "cat > /home/kento/repos/doeff-VAULT/issues/$ISSUE_ID.md <<'EOF'
---
type: issue
id: $ISSUE_ID
title: Zeus E2E sample
status: open
---
# Zeus E2E sample
EOF"
# register repo mapping for strict project_id routing
#
# Preferred: register by repo URL.
# If the managed clone created from the repo URL does not contain the required
# project config (for example `.orch/config.yaml` is not committed), register
# the operational project root instead.
ssh zeus "$ENV_PREFIX orch daemon repo register /home/kento/repos/doeff"
# runtime commands use repo identity scope
PROJECT_ID="proboscis-doeff"
# run with custom agent that makes a commit and creates a PR
ssh zeus "cat > /tmp/orch-zeus-agent-$ISSUE_ID.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
mkdir -p e2e
cat > e2e/$ISSUE_ID.md <<'EOT'
# Zeus E2E sample change
EOT
git add e2e/$ISSUE_ID.md
git commit -m 'chore(e2e): sample zeus run $ISSUE_ID'
git push -u origin HEAD
branch=$(git rev-parse --abbrev-ref HEAD)
gh pr create --repo proboscis/doeff --title 'chore(e2e): sample zeus run $ISSUE_ID' --body 'Automated sample PR from Zeus manual E2E.' --base main --head "$branch"
EOF
chmod +x /tmp/orch-zeus-agent-$ISSUE_ID.sh"
ssh zeus "$ENV_PREFIX bash -lc 'cd /home/kento/repos/doeff && orch --project $PROJECT_ID run $ISSUE_ID --run-id $RUN_ID --agent custom --agent-cmd '\\''bash /tmp/orch-zeus-agent-$ISSUE_ID.sh'\\'' --json'"
# find and close the sample PR
BRANCH="issue/$ISSUE_ID/run-$RUN_ID"
ssh zeus "gh pr list --repo proboscis/doeff --head $BRANCH --state open --json number,url"
ssh zeus "gh pr close <PR_NUMBER> --repo proboscis/doeff --comment 'Closing sample Zeus E2E PR.' --delete-branch"
# stop the run at the end
ssh zeus "$ENV_PREFIX orch --project $PROJECT_ID stop $ISSUE_ID#$RUN_ID --force"
# cleanup
ssh zeus "rm -f /home/kento/repos/doeff-VAULT/issues/$ISSUE_ID.md /tmp/orch-zeus-agent-$ISSUE_ID.sh"
ssh zeus "$ENV_PREFIX orch worker stop --all"Expected outcomes:
- master and worker report
Status: running orch runreturns"ok": true- a PR is created for the run branch
- PR is closed successfully
orch stop <issue#run>succeeds- If you run Zeus in an isolated XDG sandbox, include
XDG_CONFIG_HOMEin the environment prefix; otherwise daemon project mappings from~/.config/orch/projectsmay leak into the test - For file-backend projects whose config is not committed into the repo clone, repo-URL registration may be insufficient; use the operational project root instead
The core flow above does not fully exercise backend-specific behavior. After sections 1-8 pass, run the companion backend matrix checklist and record results for all of the following lanes:
tmuxzellijopencodeclaudecodex
Use:
# companion checklist
sed -n '1,260p' docs/e2e-backend-matrix.mdMinimum acceptance criteria:
tmux:run,capture,send,stop,restart-fromzellij:run,capture,send,stop,restart-fromopencode:run,capture,send,stop,restart-fromclaude:run,capture,send,stopcodex:run,capture,send,stop
If you only run docs/e2e-master-worker-client.md, backend coverage is incomplete.
Use this when you want to verify the case where the control plane stays on Zeus, but the run itself executes on a Mac target instead of on Zeus.
Additional prerequisites:
- Zeus-side project config includes a
targetsentry for the target Mac - the SSH host alias resolves from Zeus before running orch
- the target Mac has the same project cloned locally
- the target Mac has the required runtime dependencies installed (
git, chosen multiplexer, agent binary) - the target Mac runs one long-lived
orch-workerfor that host/profile - the target Mac has a local repo registration for the same
project_id
Example target config on Zeus:
targets:
- name: mac
host: macSemantics:
nameis the value passed to--onhostis the host/profile identity used to locate the target worker- the default worker identity for that host is
host-<host>
Checklist:
TS="$(date +%Y%m%d-%H%M%S)"
ISSUE_ID="mac-target-e2e-$TS"
RUN_ID="$TS-mac"
PROJECT_ID="proboscis-doeff"
BRANCH="issue/$ISSUE_ID/run-$RUN_ID"
# create sample issue in the Zeus-backed issue store
ssh zeus "cat > /home/kento/repos/doeff-VAULT/issues/$ISSUE_ID.md <<'EOF'
---
type: issue
id: $ISSUE_ID
title: Mac target E2E sample
status: open
---
# Mac target E2E sample
EOF"
# ensure Zeus resolves project identity to the operational root it should use
ssh zeus 'orch daemon repo register /home/kento/repos/doeff'
# ensure the target Mac worker is connected to Zeus
# normal case: start the default host worker on the target host
ssh mac 'orch --remote= daemon repo register /Users/<user>/repos/doeff'
ssh mac 'ORCH_REMOTE=zeus:7777 orch worker start'
ssh mac 'ORCH_REMOTE=zeus:7777 orch worker status'
# run on the Mac target
#
# Run from the operational project root on Zeus so the daemon can discover the
# correct project config for local-mode CLI execution before dispatching to the
# remote target.
ssh zeus "cd /home/kento/repos/doeff && orch --project $PROJECT_ID run $ISSUE_ID \
--run-id $RUN_ID \
--on mac \
--agent custom \
--agent-cmd 'printf mac-target-ready; hostname; sleep 20' \
--json"
# verify the run is tracked as a Mac-targeted run
ssh zeus "orch --project $PROJECT_ID ps --issue $ISSUE_ID --json"
ssh zeus "orch --project $PROJECT_ID show $ISSUE_ID#$RUN_ID --json"
# optional but recommended: capture the remote session output
ssh zeus "orch --project $PROJECT_ID capture $ISSUE_ID#$RUN_ID"
# stop and clean up
ssh zeus "orch --project $PROJECT_ID stop $ISSUE_ID#$RUN_ID --force"
ssh mac 'ORCH_REMOTE=zeus:7777 orch worker stop --all'
ssh zeus "rm -f /home/kento/repos/doeff-VAULT/issues/$ISSUE_ID.md"Expected outcomes:
runreturns"ok": trueps --jsonshowstarget: "mac"orch psshows the execution host in theHOSTcolumn even when the run has no logicaltargetps --jsonor attach metadata exposestarget_host: "mac"captureoutput includes the custom marker (mac-target-ready) or target hostnamestopsucceeds for the Mac-targeted run- repeated
orch worker starton the Mac target should not create duplicate workers for the same host/profile orch worker statuson the Mac target should show the Mac local process state and the Zeus registration state for the same worker profile- If the target host cannot resolve the requested multiplexer in its remote SSH
PATH, expect session creation to fail with
failed to create tmux sessionor the equivalent multiplexer error
This is the minimum manual check for the user story:
master = zeus
run target = mac
- If
daemon repo registerfails right aftermaster start, retry once after a short delay. - If TCP remote status is unreachable, restart with
ORCH_REMOTE=skipset for themaster start --listen ...command. - Ensure
--projectvalue matches the registered repository identity. - For file-backend projects, ensure the issue file is created under the actual
issues.pathconfigured for the project being tested. - For
--on macvalidation, ensure the target host has a local repo mapping for the sameproject_id. A repo identity mapping on Zeus does not provision the worker host automatically. - For
--on mac, verify plain SSH first from Zeus (ssh <target> 'command -v tmux; hostname') before attributing failures to orch itself. - If the target host identity in config (
targets[].host) does not match the hostname the worker auto-detects, start the worker with an explicit--worker-idthat matcheshost-<targets[].host>. - For automated same-machine coverage of the target-host model, run
make e2e-target-host-worker-local. - In this topology, verify run state on both sides when debugging: master (
orch --remote ... ps) and worker-local issues store (issues.path/runs/...) to detect projection/store divergence.