Skip to content

Commit 642fa00

Browse files
committed
feat(cloud): detect dead local daemon in cloud status and document launchd unit
`engram cloud status` now probes the local engram serve daemon at 127.0.0.1:7437 (respects ENGRAM_PORT) with a 1s timeout and prints a `Local daemon:` line so users can detect a silently dead autosync after brew upgrade engram, log out, or any binary replacement. Exit code is unchanged (informational) and the probe is only run when cloud is configured. DOCS.md "Running as a Service" gains a launchd (macOS) subsection with a KeepAlive plist template that survives brew upgrade by relaunching engram serve automatically. The Homebrew section in docs/INSTALLATION.md links to the new template so macOS users hit the supervisor guidance right after install. Closes #279
1 parent 7af10b9 commit 642fa00

6 files changed

Lines changed: 483 additions & 2 deletions

File tree

DOCS.md

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -514,7 +514,7 @@ Inspect or replay the `sync_apply_deferred` queue.
514514

515515
### Cloud CLI (opt-in)
516516

517-
- `engram cloud status` — show current cloud config state plus auth/sync readiness without mutating local state
517+
- `engram cloud status` — show current cloud config state plus auth/sync readiness without mutating local state. When cloud is configured, also probes the local `engram serve` daemon at `127.0.0.1:7437` (respects `ENGRAM_PORT`) and prints a `Local daemon:` line (`running` / `not running` / `unreachable`) so you can detect a silently dead autosync. Exit code is unaffected; the line is informational
518518
- `engram cloud enroll <project>` — enroll one project for cloud replication
519519
- `engram cloud config --server <url>` — persist cloud server URL to `~/.engram/cloud.json`
520520
- `engram cloud serve` — run cloud backend API + dashboard (`/dashboard`) using Postgres config from env
@@ -1150,7 +1150,9 @@ Interactive Bubbletea-based terminal UI. Launch with `engram tui`.
11501150

11511151
## Running as a Service
11521152

1153-
### Using systemd
1153+
Without a service supervisor, `engram serve` dies whenever the binary is replaced (e.g. on `brew upgrade engram`) or the host reboots, and autosync stops silently. The templates below restart it automatically. Use `engram cloud status` afterwards to confirm — the `Local daemon:` line should report `running on port 7437`.
1154+
1155+
### Using systemd (Linux)
11541156

11551157
1. Move binary to `~/.local/bin` (ensure it's in your `$PATH`)
11561158
2. Create directories: `mkdir -p ~/.engram ~/.config/systemd/user`
@@ -1176,6 +1178,59 @@ Environment=ENGRAM_DATA_DIR=%h/.engram
11761178
WantedBy=default.target
11771179
```
11781180

1181+
### Using launchd (macOS)
1182+
1183+
This is the recommended setup for Homebrew users on macOS. With `KeepAlive=true`, launchd relaunches `engram serve` automatically after `brew upgrade engram` replaces the binary, so autosync survives upgrades.
1184+
1185+
1. Find your binary path: `which engram` (typically `/opt/homebrew/bin/engram` on Apple Silicon or `/usr/local/bin/engram` on Intel)
1186+
2. Create the data dir if missing: `mkdir -p ~/.engram`
1187+
3. Create `~/Library/LaunchAgents/com.gentleman-programming.engram.plist` with the contents below — replace `<HOME>` with the absolute path of your home directory (`echo $HOME`) and adjust the binary path if `which engram` returned something different
1188+
4. Load it: `launchctl load ~/Library/LaunchAgents/com.gentleman-programming.engram.plist`
1189+
5. Verify: `launchctl list | grep engram` and `engram cloud status` (the `Local daemon:` line should report `running on port 7437`)
1190+
1191+
```xml
1192+
<?xml version="1.0" encoding="UTF-8"?>
1193+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
1194+
<plist version="1.0">
1195+
<dict>
1196+
<key>Label</key>
1197+
<string>com.gentleman-programming.engram</string>
1198+
<key>ProgramArguments</key>
1199+
<array>
1200+
<string>/opt/homebrew/bin/engram</string>
1201+
<string>serve</string>
1202+
</array>
1203+
<key>WorkingDirectory</key>
1204+
<string><HOME></string>
1205+
<key>EnvironmentVariables</key>
1206+
<dict>
1207+
<key>ENGRAM_DATA_DIR</key>
1208+
<string><HOME>/.engram</string>
1209+
<!-- Uncomment and fill these to enable cloud autosync:
1210+
<key>ENGRAM_CLOUD_AUTOSYNC</key>
1211+
<string>1</string>
1212+
<key>ENGRAM_CLOUD_SERVER</key>
1213+
<string>https://your-cloud-host</string>
1214+
<key>ENGRAM_CLOUD_TOKEN</key>
1215+
<string>your-cloud-token</string>
1216+
-->
1217+
</dict>
1218+
<key>RunAtLoad</key>
1219+
<true/>
1220+
<key>KeepAlive</key>
1221+
<true/>
1222+
<key>StandardOutPath</key>
1223+
<string><HOME>/.engram/serve.out.log</string>
1224+
<key>StandardErrorPath</key>
1225+
<string><HOME>/.engram/serve.err.log</string>
1226+
</dict>
1227+
</plist>
1228+
```
1229+
1230+
To unload (stop and disable): `launchctl unload ~/Library/LaunchAgents/com.gentleman-programming.engram.plist`. To reload after editing the plist: unload, then load again.
1231+
1232+
> **Note on `brew upgrade`:** launchd does not expand `$HOME` or `~` inside plist values, which is why the template uses literal absolute paths.
1233+
11791234
---
11801235

11811236
## Design Decisions

cmd/engram/cloud.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -623,17 +623,20 @@ func cmdCloudStatus(cfg store.Config) {
623623
fmt.Println("Auth status: ready (insecure local-dev mode: ENGRAM_CLOUD_INSECURE_NO_AUTH=1)")
624624
fmt.Println("Sync readiness: ready for explicit --project sync (project must be enrolled)")
625625
fmt.Println("Warning: bearer auth is disabled in insecure mode; do not use in production")
626+
printCloudStatusDaemonProbe()
626627
printCloudStatusSyncDiagnostic(cfg)
627628
return
628629
}
629630
fmt.Println("Auth status: token not configured (client token is optional at preflight)")
630631
fmt.Println("Sync readiness: ready to attempt explicit --project sync (project must be enrolled)")
631632
fmt.Println("Hint: if the remote server enforces bearer auth, set ENGRAM_CLOUD_TOKEN")
633+
printCloudStatusDaemonProbe()
632634
printCloudStatusSyncDiagnostic(cfg)
633635
return
634636
}
635637
fmt.Println("Auth status: ready (token provided via runtime cloud config)")
636638
fmt.Println("Sync readiness: ready for explicit --project sync (project must be enrolled)")
639+
printCloudStatusDaemonProbe()
637640
printCloudStatusSyncDiagnostic(cfg)
638641
}
639642

cmd/engram/cloud_daemon_probe.go

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
package main
2+
3+
import (
4+
"context"
5+
"errors"
6+
"fmt"
7+
"io"
8+
"net"
9+
"net/http"
10+
"os"
11+
"strconv"
12+
"strings"
13+
"time"
14+
)
15+
16+
// daemonProbeStatus describes the outcome of probing the local engram daemon.
17+
type daemonProbeStatus string
18+
19+
const (
20+
daemonProbeRunning daemonProbeStatus = "running"
21+
daemonProbeNotRunning daemonProbeStatus = "not_running"
22+
daemonProbeUnreachable daemonProbeStatus = "unreachable"
23+
)
24+
25+
// daemonProbeResult captures the outcome of a single probe.
26+
type daemonProbeResult struct {
27+
Status daemonProbeStatus
28+
Port int
29+
Err error
30+
}
31+
32+
const defaultDaemonProbePort = 7437
33+
34+
// daemonProbeTimeout is a var (not const) so tests can shorten it when
35+
// exercising the "server accepts but never replies" path.
36+
var daemonProbeTimeout = time.Second
37+
38+
// cloudDaemonProbe issues a short timeout GET to /health on the local engram
39+
// HTTP server. Exposed as a variable so tests can stub it.
40+
var cloudDaemonProbe = defaultCloudDaemonProbe
41+
42+
// defaultCloudDaemonProbe performs a real HTTP GET against the local daemon.
43+
// A dial error to 127.0.0.1 is interpreted as "not running"; any other error
44+
// (timeout, non-2xx response, malformed reply) maps to "unreachable" so the
45+
// user can distinguish "the daemon is gone" from "the daemon is misbehaving".
46+
func defaultCloudDaemonProbe(ctx context.Context, port int) daemonProbeResult {
47+
url := fmt.Sprintf("http://127.0.0.1:%d/health", port)
48+
client := &http.Client{Timeout: daemonProbeTimeout}
49+
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
50+
if err != nil {
51+
return daemonProbeResult{Status: daemonProbeUnreachable, Port: port, Err: err}
52+
}
53+
resp, err := client.Do(req)
54+
if err != nil {
55+
var opErr *net.OpError
56+
if errors.As(err, &opErr) && opErr.Op == "dial" {
57+
return daemonProbeResult{Status: daemonProbeNotRunning, Port: port, Err: err}
58+
}
59+
return daemonProbeResult{Status: daemonProbeUnreachable, Port: port, Err: err}
60+
}
61+
defer resp.Body.Close()
62+
_, _ = io.Copy(io.Discard, resp.Body)
63+
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
64+
return daemonProbeResult{Status: daemonProbeRunning, Port: port}
65+
}
66+
return daemonProbeResult{Status: daemonProbeUnreachable, Port: port}
67+
}
68+
69+
// resolveDaemonProbePort mirrors the port resolution used by cmdServe so the
70+
// probe targets the same address the user's serve process is bound to.
71+
func resolveDaemonProbePort() int {
72+
if p := strings.TrimSpace(os.Getenv("ENGRAM_PORT")); p != "" {
73+
if n, err := strconv.Atoi(p); err == nil && n > 0 && n < 65536 {
74+
return n
75+
}
76+
}
77+
return defaultDaemonProbePort
78+
}
79+
80+
// printCloudStatusDaemonProbe prints a single line describing whether the
81+
// local engram daemon answers /health, plus a short hint when it is down.
82+
// Exit code is unchanged: this is informational so cloud status remains a
83+
// non-failing diagnostic surface.
84+
func printCloudStatusDaemonProbe() {
85+
port := resolveDaemonProbePort()
86+
ctx, cancel := context.WithTimeout(context.Background(), daemonProbeTimeout)
87+
defer cancel()
88+
res := cloudDaemonProbe(ctx, port)
89+
switch res.Status {
90+
case daemonProbeRunning:
91+
fmt.Printf("Local daemon: running on port %d\n", res.Port)
92+
case daemonProbeNotRunning:
93+
fmt.Printf("Local daemon: not running on port %d\n", res.Port)
94+
fmt.Println("Hint: run `engram serve` to resume autosync; on macOS see DOCS.md launchd template to keep it alive across upgrades")
95+
default:
96+
if res.Err != nil {
97+
fmt.Printf("Local daemon: unreachable on port %d (probe error: %v)\n", res.Port, res.Err)
98+
} else {
99+
fmt.Printf("Local daemon: unreachable on port %d\n", res.Port)
100+
}
101+
}
102+
}

0 commit comments

Comments
 (0)