Skip to content

Commit b89d25a

Browse files
committed
feat(cloud): detect dead local daemon in cloud status and document launchd unit
`engram cloud status` now probes the local engram serve daemon at 127.0.0.1:7437 (respects ENGRAM_PORT) with a 1s timeout and prints a `Local daemon:` line so users can detect a silently dead autosync after brew upgrade engram, log out, or any binary replacement. Exit code is unchanged (informational) and the probe is only run when cloud is configured. DOCS.md "Running as a Service" gains a launchd (macOS) subsection with a KeepAlive plist template that survives brew upgrade by relaunching engram serve automatically. The Homebrew section in docs/INSTALLATION.md links to the new template so macOS users hit the supervisor guidance right after install. Closes #279
1 parent 42160c7 commit b89d25a

6 files changed

Lines changed: 483 additions & 2 deletions

File tree

DOCS.md

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ Inspect or replay the `sync_apply_deferred` queue.
433433

434434
### Cloud CLI (opt-in)
435435

436-
- `engram cloud status` — show current cloud config state plus auth/sync readiness without mutating local state
436+
- `engram cloud status` — show current cloud config state plus auth/sync readiness without mutating local state. When cloud is configured, also probes the local `engram serve` daemon at `127.0.0.1:7437` (respects `ENGRAM_PORT`) and prints a `Local daemon:` line (`running` / `not running` / `unreachable`) so you can detect a silently dead autosync. Exit code is unaffected; the line is informational
437437
- `engram cloud enroll <project>` — enroll one project for cloud replication
438438
- `engram cloud config --server <url>` — persist cloud server URL to `~/.engram/cloud.json`
439439
- `engram cloud serve` — run cloud backend API + dashboard (`/dashboard`) using Postgres config from env
@@ -1053,7 +1053,9 @@ Interactive Bubbletea-based terminal UI. Launch with `engram tui`.
10531053

10541054
## Running as a Service
10551055

1056-
### Using systemd
1056+
Without a service supervisor, `engram serve` dies whenever the binary is replaced (e.g. on `brew upgrade engram`) or the host reboots, and autosync stops silently. The templates below restart it automatically. Use `engram cloud status` afterwards to confirm — the `Local daemon:` line should report `running on port 7437`.
1057+
1058+
### Using systemd (Linux)
10571059

10581060
1. Move binary to `~/.local/bin` (ensure it's in your `$PATH`)
10591061
2. Create directories: `mkdir -p ~/.engram ~/.config/systemd/user`
@@ -1079,6 +1081,59 @@ Environment=ENGRAM_DATA_DIR=%h/.engram
10791081
WantedBy=default.target
10801082
```
10811083

1084+
### Using launchd (macOS)
1085+
1086+
This is the recommended setup for Homebrew users on macOS. With `KeepAlive=true`, launchd relaunches `engram serve` automatically after `brew upgrade engram` replaces the binary, so autosync survives upgrades.
1087+
1088+
1. Find your binary path: `which engram` (typically `/opt/homebrew/bin/engram` on Apple Silicon or `/usr/local/bin/engram` on Intel)
1089+
2. Create the data dir if missing: `mkdir -p ~/.engram`
1090+
3. Create `~/Library/LaunchAgents/com.gentleman-programming.engram.plist` with the contents below — replace `<HOME>` with the absolute path of your home directory (`echo $HOME`) and adjust the binary path if `which engram` returned something different
1091+
4. Load it: `launchctl load ~/Library/LaunchAgents/com.gentleman-programming.engram.plist`
1092+
5. Verify: `launchctl list | grep engram` and `engram cloud status` (the `Local daemon:` line should report `running on port 7437`)
1093+
1094+
```xml
1095+
<?xml version="1.0" encoding="UTF-8"?>
1096+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
1097+
<plist version="1.0">
1098+
<dict>
1099+
<key>Label</key>
1100+
<string>com.gentleman-programming.engram</string>
1101+
<key>ProgramArguments</key>
1102+
<array>
1103+
<string>/opt/homebrew/bin/engram</string>
1104+
<string>serve</string>
1105+
</array>
1106+
<key>WorkingDirectory</key>
1107+
<string><HOME></string>
1108+
<key>EnvironmentVariables</key>
1109+
<dict>
1110+
<key>ENGRAM_DATA_DIR</key>
1111+
<string><HOME>/.engram</string>
1112+
<!-- Uncomment and fill these to enable cloud autosync:
1113+
<key>ENGRAM_CLOUD_AUTOSYNC</key>
1114+
<string>1</string>
1115+
<key>ENGRAM_CLOUD_SERVER</key>
1116+
<string>https://your-cloud-host</string>
1117+
<key>ENGRAM_CLOUD_TOKEN</key>
1118+
<string>your-cloud-token</string>
1119+
-->
1120+
</dict>
1121+
<key>RunAtLoad</key>
1122+
<true/>
1123+
<key>KeepAlive</key>
1124+
<true/>
1125+
<key>StandardOutPath</key>
1126+
<string><HOME>/.engram/serve.out.log</string>
1127+
<key>StandardErrorPath</key>
1128+
<string><HOME>/.engram/serve.err.log</string>
1129+
</dict>
1130+
</plist>
1131+
```
1132+
1133+
To unload (stop and disable): `launchctl unload ~/Library/LaunchAgents/com.gentleman-programming.engram.plist`. To reload after editing the plist: unload, then load again.
1134+
1135+
> **Note on `brew upgrade`:** launchd does not expand `$HOME` or `~` inside plist values, which is why the template uses literal absolute paths.
1136+
10821137
---
10831138

10841139
## Design Decisions

cmd/engram/cloud.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -622,17 +622,20 @@ func cmdCloudStatus(cfg store.Config) {
622622
fmt.Println("Auth status: ready (insecure local-dev mode: ENGRAM_CLOUD_INSECURE_NO_AUTH=1)")
623623
fmt.Println("Sync readiness: ready for explicit --project sync (project must be enrolled)")
624624
fmt.Println("Warning: bearer auth is disabled in insecure mode; do not use in production")
625+
printCloudStatusDaemonProbe()
625626
printCloudStatusSyncDiagnostic(cfg)
626627
return
627628
}
628629
fmt.Println("Auth status: token not configured (client token is optional at preflight)")
629630
fmt.Println("Sync readiness: ready to attempt explicit --project sync (project must be enrolled)")
630631
fmt.Println("Hint: if the remote server enforces bearer auth, set ENGRAM_CLOUD_TOKEN")
632+
printCloudStatusDaemonProbe()
631633
printCloudStatusSyncDiagnostic(cfg)
632634
return
633635
}
634636
fmt.Println("Auth status: ready (token provided via runtime cloud config)")
635637
fmt.Println("Sync readiness: ready for explicit --project sync (project must be enrolled)")
638+
printCloudStatusDaemonProbe()
636639
printCloudStatusSyncDiagnostic(cfg)
637640
}
638641

cmd/engram/cloud_daemon_probe.go

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
package main
2+
3+
import (
4+
"context"
5+
"errors"
6+
"fmt"
7+
"io"
8+
"net"
9+
"net/http"
10+
"os"
11+
"strconv"
12+
"strings"
13+
"time"
14+
)
15+
16+
// daemonProbeStatus describes the outcome of probing the local engram daemon.
17+
type daemonProbeStatus string
18+
19+
const (
20+
daemonProbeRunning daemonProbeStatus = "running"
21+
daemonProbeNotRunning daemonProbeStatus = "not_running"
22+
daemonProbeUnreachable daemonProbeStatus = "unreachable"
23+
)
24+
25+
// daemonProbeResult captures the outcome of a single probe.
26+
type daemonProbeResult struct {
27+
Status daemonProbeStatus
28+
Port int
29+
Err error
30+
}
31+
32+
const defaultDaemonProbePort = 7437
33+
34+
// daemonProbeTimeout is a var (not const) so tests can shorten it when
35+
// exercising the "server accepts but never replies" path.
36+
var daemonProbeTimeout = time.Second
37+
38+
// cloudDaemonProbe issues a short timeout GET to /health on the local engram
39+
// HTTP server. Exposed as a variable so tests can stub it.
40+
var cloudDaemonProbe = defaultCloudDaemonProbe
41+
42+
// defaultCloudDaemonProbe performs a real HTTP GET against the local daemon.
43+
// A dial error to 127.0.0.1 is interpreted as "not running"; any other error
44+
// (timeout, non-2xx response, malformed reply) maps to "unreachable" so the
45+
// user can distinguish "the daemon is gone" from "the daemon is misbehaving".
46+
func defaultCloudDaemonProbe(ctx context.Context, port int) daemonProbeResult {
47+
url := fmt.Sprintf("http://127.0.0.1:%d/health", port)
48+
client := &http.Client{Timeout: daemonProbeTimeout}
49+
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
50+
if err != nil {
51+
return daemonProbeResult{Status: daemonProbeUnreachable, Port: port, Err: err}
52+
}
53+
resp, err := client.Do(req)
54+
if err != nil {
55+
var opErr *net.OpError
56+
if errors.As(err, &opErr) && opErr.Op == "dial" {
57+
return daemonProbeResult{Status: daemonProbeNotRunning, Port: port, Err: err}
58+
}
59+
return daemonProbeResult{Status: daemonProbeUnreachable, Port: port, Err: err}
60+
}
61+
defer resp.Body.Close()
62+
_, _ = io.Copy(io.Discard, resp.Body)
63+
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
64+
return daemonProbeResult{Status: daemonProbeRunning, Port: port}
65+
}
66+
return daemonProbeResult{Status: daemonProbeUnreachable, Port: port}
67+
}
68+
69+
// resolveDaemonProbePort mirrors the port resolution used by cmdServe so the
70+
// probe targets the same address the user's serve process is bound to.
71+
func resolveDaemonProbePort() int {
72+
if p := strings.TrimSpace(os.Getenv("ENGRAM_PORT")); p != "" {
73+
if n, err := strconv.Atoi(p); err == nil && n > 0 && n < 65536 {
74+
return n
75+
}
76+
}
77+
return defaultDaemonProbePort
78+
}
79+
80+
// printCloudStatusDaemonProbe prints a single line describing whether the
81+
// local engram daemon answers /health, plus a short hint when it is down.
82+
// Exit code is unchanged: this is informational so cloud status remains a
83+
// non-failing diagnostic surface.
84+
func printCloudStatusDaemonProbe() {
85+
port := resolveDaemonProbePort()
86+
ctx, cancel := context.WithTimeout(context.Background(), daemonProbeTimeout)
87+
defer cancel()
88+
res := cloudDaemonProbe(ctx, port)
89+
switch res.Status {
90+
case daemonProbeRunning:
91+
fmt.Printf("Local daemon: running on port %d\n", res.Port)
92+
case daemonProbeNotRunning:
93+
fmt.Printf("Local daemon: not running on port %d\n", res.Port)
94+
fmt.Println("Hint: run `engram serve` to resume autosync; on macOS see DOCS.md launchd template to keep it alive across upgrades")
95+
default:
96+
if res.Err != nil {
97+
fmt.Printf("Local daemon: unreachable on port %d (probe error: %v)\n", res.Port, res.Err)
98+
} else {
99+
fmt.Printf("Local daemon: unreachable on port %d\n", res.Port)
100+
}
101+
}
102+
}

0 commit comments

Comments
 (0)