|
| 1 | +# Daemon environment: the GUI-launch PATH/credentials problem |
| 2 | + |
| 3 | +Status: proposed |
| 4 | +Scope: desktop (Electron) launch of the AO daemon on macOS (and any GUI-launched |
| 5 | +desktop platform) |
| 6 | + |
| 7 | +## Summary |
| 8 | + |
| 9 | +When the desktop app is launched from Finder/Dock/Spotlight, the daemon it spawns |
| 10 | +inherits a stunted environment (minimal `PATH`, no shell-exported credentials). |
| 11 | +The daemon then cannot find `zellij`/`git`/the agent CLIs, and the agents it |
| 12 | +launches cannot see API keys. The same app launched from a terminal works, |
| 13 | +because a terminal-started process inherits the shell's fully-populated |
| 14 | +environment. The fix is to resolve the user's login-shell environment once at |
| 15 | +startup and use it as the base for the daemon's environment. |
| 16 | + |
| 17 | +## Problem statement |
| 18 | + |
| 19 | +The Electron supervisor spawns the Go daemon with the environment it forwards in |
| 20 | +`daemonEnv()` (`frontend/src/main.ts`), which is essentially `...process.env` |
| 21 | +plus AO's telemetry defaults. The daemon, in turn, is the parent of every agent |
| 22 | +session (it execs `zellij`, which runs `claude`/`codex`, etc.), and the agent's |
| 23 | +`PATH` is derived from the daemon's own `PATH` |
| 24 | +(`runtimeEnv` -> `HookPATH(m.executable, os.Getenv, ...)` in |
| 25 | +`backend/internal/session_manager/manager.go`). |
| 26 | + |
| 27 | +So whatever environment the daemon receives propagates to the entire stack: |
| 28 | + |
| 29 | +``` |
| 30 | +launchd (or terminal) -> Electron main -> daemon -> zellij -> agent (claude/codex) |
| 31 | +``` |
| 32 | + |
| 33 | +When that environment is impoverished, everything downstream breaks. |
| 34 | + |
| 35 | +### Observed symptoms |
| 36 | + |
| 37 | +All of these were traced to the same root cause: |
| 38 | + |
| 39 | +- Terminal pane stuck on "Terminal disconnected - reattaching...". |
| 40 | +- Terminal pane showing "Terminal ended ... but the session is not marked |
| 41 | + terminated yet." |
| 42 | +- Sessions stuck `idle` + `is_terminated = 0` in the store, never reaped, and |
| 43 | + therefore not restorable (`Restore` requires `IsTerminated`, otherwise |
| 44 | + `ErrNotRestorable`). |
| 45 | +- `zellij list-sessions` showing sessions as alive-but-unreachable or dead, |
| 46 | + depending on which socket universe was inspected. |
| 47 | + |
| 48 | +The unifying cause: the running, GUI-launched daemon cannot execute |
| 49 | +`/opt/homebrew/bin/zellij` (and friends), so its liveness probes error |
| 50 | +(`ProbeFailed`, never `ProbeDead`, so the reaper never terminates the row) and |
| 51 | +its terminal attaches cannot spawn `zellij attach`. |
| 52 | + |
| 53 | +## Root cause: GUI apps do not inherit the shell environment |
| 54 | + |
| 55 | +On macOS, a process's environment is inherited solely from its parent. The |
| 56 | +parent differs by launch method: |
| 57 | + |
| 58 | +- **Terminal launch.** The terminal starts a login/interactive shell |
| 59 | + (`zsh -l`). That shell sources `/etc/zprofile`, `~/.zprofile`, `~/.zshrc`, |
| 60 | + etc. Those files are the only thing that sets the rich environment: |
| 61 | + `eval "$(/opt/homebrew/bin/brew shellenv)"` adds `/opt/homebrew/bin` to |
| 62 | + `PATH`; `export ANTHROPIC_API_KEY=...` exports credentials. Every process |
| 63 | + started from that terminal inherits the result. The app works. |
| 64 | + |
| 65 | +- **Finder/Dock/Spotlight launch.** The app is started by **launchd**, not by a |
| 66 | + shell. launchd hands the process a fixed, minimal environment |
| 67 | + (`PATH=/usr/bin:/bin:/usr/sbin:/sbin`, `HOME`, `USER`, `TMPDIR`, little else). |
| 68 | + No shell runs anywhere in the chain, so no rc/profile file is ever sourced. |
| 69 | + The homebrew `PATH` and the exported credentials simply do not exist for the |
| 70 | + app, and `daemonEnv()` faithfully forwards that minimal env down to the daemon. |
| 71 | + |
| 72 | +This is deliberate on Apple's part: GUI apps are decoupled from interactive shell |
| 73 | +configuration on purpose (it can be slow, interactive, or machine-specific). The |
| 74 | +old `~/.MacOSX/environment.plist` escape hatch was removed years ago. This is the |
| 75 | +single most common macOS-Electron footgun; it is why packages like `fix-path` and |
| 76 | +`shell-env` exist. |
| 77 | + |
| 78 | +### Why "just forward env" is correct in principle |
| 79 | + |
| 80 | +Forwarding the environment is not the bug. The daemon and agents genuinely need: |
| 81 | + |
| 82 | +- `PATH` to resolve `zellij`, `git`, `node`, and the agent CLIs; |
| 83 | +- `HOME` for config/credentials (`~/.gitconfig`, `~/.claude`, `~/.codex`, ssh |
| 84 | + keys); |
| 85 | +- shell-exported credentials (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GH_TOKEN`, |
| 86 | + ...); |
| 87 | +- locale/proxy (`LANG`, `LC_*`, `HTTPS_PROXY`); |
| 88 | +- AO's own vars (telemetry, `AO_DATA_DIR`, `AO_RUN_FILE`, session ids). |
| 89 | + |
| 90 | +The bug is the _source_ of what we forward: under a GUI launch, `process.env` is |
| 91 | +launchd's minimal env, not the shell's. The fix is to forward a _good_ base env, |
| 92 | +not to stop forwarding. |
| 93 | + |
| 94 | +## Proposed solution: resolve the login-shell environment |
| 95 | + |
| 96 | +Do not reconstruct the shell environment by hand. Run the user's login shell |
| 97 | +once, ask it to print its environment, and adopt that as the base for |
| 98 | +`daemonEnv()`. |
| 99 | + |
| 100 | +### The mechanism |
| 101 | + |
| 102 | +``` |
| 103 | +zsh -ilc 'env -0' |
| 104 | +``` |
| 105 | + |
| 106 | +- `-l` (login): source `/etc/zprofile` and `~/.zprofile` (where the homebrew |
| 107 | + `PATH` line typically lives). |
| 108 | +- `-i` (interactive): source `~/.zshrc` (where most `export` lines live). |
| 109 | +- `-c 'env -0'`: run one command and exit. `env` dumps the environment the shell |
| 110 | + built after sourcing all config; `-0` separates entries with NUL bytes instead |
| 111 | + of newlines, so values containing newlines parse unambiguously. |
| 112 | + |
| 113 | +The output is a faithful snapshot of "what a terminal would see." Parse it back |
| 114 | +into key/value pairs and merge it under the existing forwarded env so explicit |
| 115 | +overrides still win: |
| 116 | + |
| 117 | +``` |
| 118 | +finalEnv = { ...shellEnv, ...process.env, AO_*: defaults } |
| 119 | +``` |
| 120 | + |
| 121 | +### Worked example |
| 122 | + |
| 123 | +GUI-launched daemon env (before): |
| 124 | + |
| 125 | +``` |
| 126 | +PATH=/usr/bin:/bin:/usr/sbin:/sbin |
| 127 | +HOME=/Users/<user> |
| 128 | +``` |
| 129 | + |
| 130 | +After `zsh -ilc 'env -0'` resolution: |
| 131 | + |
| 132 | +``` |
| 133 | +PATH=/opt/homebrew/bin:/opt/homebrew/sbin:/usr/bin:/bin:/usr/sbin:/sbin |
| 134 | +HOME=/Users/<user> |
| 135 | +ANTHROPIC_API_KEY=sk-ant-... |
| 136 | +GH_TOKEN=ghp_... |
| 137 | +LANG=en_US.UTF-8 |
| 138 | +``` |
| 139 | + |
| 140 | +The daemon can now resolve `/opt/homebrew/bin/zellij`, and agents inherit the |
| 141 | +credentials. |
| 142 | + |
| 143 | +### Implementation details |
| 144 | + |
| 145 | +Place the resolution in Electron's `daemonEnv()` (`frontend/src/main.ts`), the |
| 146 | +parent that hands env to the daemon. |
| 147 | + |
| 148 | +- **Resolve once, cache.** Sourcing rc files can take 100ms to >1s |
| 149 | + (nvm/pyenv/...). Do it a single time at startup; never per-session. |
| 150 | +- **Pick the shell robustly.** Prefer `process.env.SHELL`; under launchd it may |
| 151 | + be absent, so fall back to the user record |
| 152 | + (`dscl . -read /Users/$USER UserShell`), then `/bin/zsh`. Do not hardcode zsh; |
| 153 | + honor bash/fish. |
| 154 | +- **Isolate the payload.** Interactive shells can print banners/motd/prompts to |
| 155 | + stdout. Bracket the real output with a sentinel and read only after it: |
| 156 | + `zsh -ilc 'echo __AO_ENV_START__; env -0'`. |
| 157 | +- **No stdin, with a timeout.** Run with `</dev/null` and a ~2-3s timeout so a |
| 158 | + misconfigured rc that waits for input cannot hang startup. |
| 159 | +- **Fallback on any failure.** If the probe fails, times out, or exits nonzero, |
| 160 | + fall back to a static base: prepend |
| 161 | + `/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin` and pull |
| 162 | + through known credential vars. A weird shell config then degrades to "zellij |
| 163 | + and git resolve" rather than "broken." |
| 164 | + |
| 165 | +### Platform scope |
| 166 | + |
| 167 | +- macOS: required (this is where the GUI/launchd split bites). |
| 168 | +- Linux: the same class of problem exists for `.desktop`-launched apps; the same |
| 169 | + resolution applies. |
| 170 | +- Windows: not applicable in the same form; a static `PATH` floor is sufficient. |
| 171 | + |
| 172 | +This matches what `shell-env`/`fix-path` do; the logic above is the entirety of |
| 173 | +it. We shell out once to the user's own shell and adopt its result. |
| 174 | + |
| 175 | +## Testing |
| 176 | + |
| 177 | +- Parser unit test: feed NUL-separated output, including a value containing a |
| 178 | + newline and leading banner noise before the sentinel; assert the resulting map |
| 179 | + is correct and the noise is dropped. |
| 180 | +- Fallback test: simulate probe failure/timeout; assert the static PATH floor and |
| 181 | + credential pass-through are applied. |
| 182 | +- Manual: launch the packaged app from Finder (not a terminal) and confirm a new |
| 183 | + session spawns, the terminal attaches, and `zellij`/`git`/agent binaries |
| 184 | + resolve. |
| 185 | + |
| 186 | +## Relevant code |
| 187 | + |
| 188 | +- `frontend/src/main.ts` - `daemonEnv()` (env forwarded to the daemon), daemon |
| 189 | + spawn. |
| 190 | +- `backend/internal/session_manager/manager.go` - `runtimeEnv` / `HookPATH` |
| 191 | + (agent `PATH` derived from the daemon's `PATH`); `spawnEnv`. |
| 192 | +- `backend/internal/adapters/runtime/zellij/zellij.go` - `defaultBinary()` |
| 193 | + (`exec.LookPath("zellij")` against the daemon's `PATH`). |
| 194 | +- `backend/internal/observe/reaper/reaper.go`, |
| 195 | + `backend/internal/lifecycle/runtime.go` - liveness -> termination |
| 196 | + (`ProbeFailed` never terminates, so a daemon that cannot run `zellij` strands |
| 197 | + sessions). |
0 commit comments