|
| 1 | +--- |
| 2 | +sidebar_position: 3 |
| 3 | +--- |
| 4 | + |
| 5 | +# Profiling with pprof |
| 6 | + |
| 7 | +Arc exposes Go's built-in `net/http/pprof` profiler — heap, goroutine, CPU, allocations, blocking, mutex, and execution-trace endpoints — for diagnosing memory pressure, hot CPU paths, goroutine leaks, and deadlocks in production. The endpoints are **opt-in** and bound to `localhost` by default; exposing them anywhere else requires a deliberate two-step configuration. |
| 8 | + |
| 9 | +:::info Available since v26.06.1 |
| 10 | +The opt-in pprof listener ships in Arc v26.06.1 ([PR #443](https://github.com/Basekick-Labs/arc/pull/443), [GHSA-j93g-rp6m-j32m](https://github.com/Basekick-Labs/arc/security/advisories/GHSA-j93g-rp6m-j32m)). Prior versions registered pprof on the public API port without authentication — upgrade and adopt the env-var gate below. |
| 11 | +::: |
| 12 | + |
| 13 | +:::danger Production exposure is hostile by default |
| 14 | +A reachable `/debug/pprof/*` endpoint leaks process internals: in-flight SQL strings and msgpack records (via heap dumps), goroutine stacks, environment variables on some Go versions, and lets any caller pin a CPU core for arbitrary seconds via `/debug/pprof/profile?seconds=N`. Treat the pprof listener like a root shell — bind to loopback, restrict by firewall, and turn it off when you're done debugging. |
| 15 | +::: |
| 16 | + |
| 17 | +## Why pprof Is Off by Default |
| 18 | + |
| 19 | +Pre-v26.06.1, `/debug/pprof/*` was mounted on Arc's public Fiber app — no token, no allowlist. An unauthenticated network caller could fetch heap dumps containing recent query text and ingested records. The hardening PR removed pprof from the public app entirely and moved it to a separate listener that only starts when the `ARC_DEBUG_PPROF` env var is set. |
| 20 | + |
| 21 | +The new design has three properties: |
| 22 | + |
| 23 | +1. **Off by default** — `ARC_DEBUG_PPROF` unset means no socket is opened, no goroutine is spawned, the endpoints don't exist on Arc's process. |
| 24 | +2. **Loopback-bound by default** — even with `ARC_DEBUG_PPROF=1`, the listener binds to `127.0.0.1:6060` unless you explicitly override. |
| 25 | +3. **Two-step opt-in for non-loopback** — binding to any non-loopback address (`0.0.0.0:6060`, a public IP, etc.) requires both `ARC_DEBUG_PPROF_ADDR` AND `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1`, so a typo in the bind address can't accidentally expose the endpoint cross-host. |
| 26 | + |
| 27 | +## Configuration |
| 28 | + |
| 29 | +All configuration is via environment variables — pprof is a debugging surface, not a runtime feature, so there's no `[debug]` block in `arc.toml`. |
| 30 | + |
| 31 | +| Variable | Default | Description | |
| 32 | +|---|---|---| |
| 33 | +| `ARC_DEBUG_PPROF` | unset (off) | Set to `1`, `true`, `yes`, or `on` to enable the pprof listener. Any other value (including unset) leaves it off. | |
| 34 | +| `ARC_DEBUG_PPROF_ADDR` | `127.0.0.1:6060` | Bind address for the pprof listener. Accepts any form `net.Listen("tcp", …)` accepts — `127.0.0.1:6060`, `localhost:6060`, `[::1]:6060`, `0.0.0.0:6060`, etc. | |
| 35 | +| `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK` | unset (off) | Required when `ARC_DEBUG_PPROF_ADDR` is non-loopback. Set to `1`/`true`/`yes`/`on`. Without it, Arc logs an error and refuses to start the pprof listener. | |
| 36 | + |
| 37 | +## Enabling pprof on a Single Node |
| 38 | + |
| 39 | +The common case — investigate a single production node from the same host via SSH and a local port-forward: |
| 40 | + |
| 41 | +```bash |
| 42 | +# On the node you want to profile: |
| 43 | +ARC_DEBUG_PPROF=1 ./arc |
| 44 | +``` |
| 45 | + |
| 46 | +Arc emits a startup warning: |
| 47 | + |
| 48 | +``` |
| 49 | +WARN ARC_DEBUG_PPROF is set — pprof endpoints are exposed on this address. |
| 50 | + Restrict access via firewall or unset ARC_DEBUG_PPROF in production. |
| 51 | + addr=127.0.0.1:6060 |
| 52 | +``` |
| 53 | + |
| 54 | +From your laptop, SSH-tunnel the port: |
| 55 | + |
| 56 | +```bash |
| 57 | +ssh -L 6060:127.0.0.1:6060 user@node |
| 58 | +``` |
| 59 | + |
| 60 | +Then point `go tool pprof` at `localhost:6060` on your laptop. See [Profiling Workflows](#profiling-workflows) below. |
| 61 | + |
| 62 | +### With docker-compose |
| 63 | + |
| 64 | +```yaml |
| 65 | +services: |
| 66 | + arc-writer: |
| 67 | + image: basekick/arc:latest |
| 68 | + environment: |
| 69 | + ARC_DEBUG_PPROF: "1" |
| 70 | + # No host port mapping for 6060 — the listener stays inside the container. |
| 71 | + # Use `docker exec` or a sidecar to reach it. |
| 72 | +``` |
| 73 | + |
| 74 | +To reach the in-container listener: |
| 75 | + |
| 76 | +```bash |
| 77 | +docker exec -it arc-writer wget -qO heap.pprof http://127.0.0.1:6060/debug/pprof/heap |
| 78 | +docker cp arc-writer:/heap.pprof ./ |
| 79 | +go tool pprof -http=:8080 heap.pprof |
| 80 | +``` |
| 81 | + |
| 82 | +### With Kubernetes |
| 83 | + |
| 84 | +```yaml |
| 85 | +env: |
| 86 | + - name: ARC_DEBUG_PPROF |
| 87 | + value: "1" |
| 88 | +``` |
| 89 | +
|
| 90 | +Then port-forward: |
| 91 | +
|
| 92 | +```bash |
| 93 | +kubectl port-forward arc-writer-0 6060:6060 |
| 94 | +``` |
| 95 | + |
| 96 | +`kubectl port-forward` only listens on the local machine, so the pprof endpoint stays loopback-bound on the Arc pod AND on your laptop simultaneously. No cluster-network exposure. |
| 97 | + |
| 98 | +## Exposing pprof Cross-Host (Discouraged) |
| 99 | + |
| 100 | +There are cases where loopback isn't enough — for example, a remote profiler that can't open an SSH tunnel, or a multi-tenant box where the operator workstation isn't on the Arc host. Arc supports this with a deliberate two-step opt-in: |
| 101 | + |
| 102 | +```bash |
| 103 | +ARC_DEBUG_PPROF=1 \ |
| 104 | +ARC_DEBUG_PPROF_ADDR=0.0.0.0:6060 \ |
| 105 | +ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1 \ |
| 106 | +./arc |
| 107 | +``` |
| 108 | + |
| 109 | +Without `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1`, Arc logs an **error** and refuses to start the pprof listener — the rest of Arc continues to run normally, but pprof stays off: |
| 110 | + |
| 111 | +``` |
| 112 | +ERROR ARC_DEBUG_PPROF=1 with a non-loopback ARC_DEBUG_PPROF_ADDR requires |
| 113 | + ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1; refusing to start pprof listener |
| 114 | + addr=0.0.0.0:6060 |
| 115 | +``` |
| 116 | + |
| 117 | +When the second opt-in IS set and Arc binds to a non-loopback address, the startup log line is escalated to **error** level (instead of warn) so default alerting policies notice the cross-host exposure on this node: |
| 118 | + |
| 119 | +``` |
| 120 | +ERROR ARC_DEBUG_PPROF is set — pprof endpoints are exposed on this address. |
| 121 | + Restrict access via firewall or unset ARC_DEBUG_PPROF in production. |
| 122 | + addr=0.0.0.0:6060 |
| 123 | +``` |
| 124 | + |
| 125 | +:::danger Firewall is mandatory in this mode |
| 126 | +The pprof listener has no authentication. Anyone who can reach `0.0.0.0:6060` (or whatever address you bound) can fetch heap dumps containing recent query text and ingested records, dump goroutine stacks, and pin CPU cores. Restrict by network ACL, security group, or iptables before turning this on. Unset all three env vars the moment you're done. |
| 127 | +::: |
| 128 | + |
| 129 | +## Profiling Workflows |
| 130 | + |
| 131 | +Once the listener is reachable at `http://localhost:6060` (whether direct or via SSH/kubectl port-forward), `go tool pprof` does the rest. The recipes below assume Go 1.20+. |
| 132 | + |
| 133 | +### Heap (Memory) |
| 134 | + |
| 135 | +The most common case — Arc's RSS is high and you want to know what's holding it. |
| 136 | + |
| 137 | +```bash |
| 138 | +# Live snapshot: |
| 139 | +go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap |
| 140 | + |
| 141 | +# Save for later analysis: |
| 142 | +curl -o heap.pprof http://localhost:6060/debug/pprof/heap |
| 143 | +go tool pprof -http=:8080 heap.pprof |
| 144 | +``` |
| 145 | + |
| 146 | +The `-http=:8080` flag launches the interactive web UI at `http://localhost:8080` — flame graph, top callers, source view. Without it you get the CLI prompt. |
| 147 | + |
| 148 | +Common starting commands at the pprof CLI prompt: |
| 149 | + |
| 150 | +``` |
| 151 | +(pprof) top20 # 20 largest in-use allocations by bytes |
| 152 | +(pprof) top20 -cum # 20 largest by cumulative (function + callees) |
| 153 | +(pprof) list <func> # source-level breakdown of one function |
| 154 | +``` |
| 155 | + |
| 156 | +### CPU Profile |
| 157 | + |
| 158 | +Capture 30 seconds of CPU activity: |
| 159 | + |
| 160 | +```bash |
| 161 | +go tool pprof -http=:8080 'http://localhost:6060/debug/pprof/profile?seconds=30' |
| 162 | +``` |
| 163 | + |
| 164 | +The `seconds` parameter is configurable — 30s is a reasonable default. **Don't go above ~300s** unless you know what you're doing: each in-flight capture holds a connection open and consumes scheduler overhead. Arc's pprof listener has a 10-minute write timeout as the hard ceiling. |
| 165 | + |
| 166 | +### Goroutines |
| 167 | + |
| 168 | +Diagnose a goroutine leak or deadlock: |
| 169 | + |
| 170 | +```bash |
| 171 | +# Summary (top goroutine call sites + counts): |
| 172 | +curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50 |
| 173 | + |
| 174 | +# Full stacks for every goroutine (text): |
| 175 | +curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' > goroutines.txt |
| 176 | + |
| 177 | +# Or via pprof for the UI: |
| 178 | +go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine |
| 179 | +``` |
| 180 | + |
| 181 | +A healthy idle Arc writer typically has ~50–200 goroutines (Fiber workers, WAL writer, ingest shards, compaction scheduler, Raft loops). Thousands of goroutines stuck on the same `chan receive` or `sync.Mutex.Lock` is the diagnostic signature of a stall. |
| 182 | + |
| 183 | +### Execution Trace |
| 184 | + |
| 185 | +Captures every scheduler event for `N` seconds — useful for diagnosing latency spikes: |
| 186 | + |
| 187 | +```bash |
| 188 | +curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5' |
| 189 | +go tool trace -http=:8080 trace.out |
| 190 | +``` |
| 191 | + |
| 192 | +The trace UI shows per-goroutine timelines, GC pauses, and network/syscall waits. Use sparingly — even 5 seconds of trace produces ~10–50 MB of data on a busy writer. |
| 193 | + |
| 194 | +### Block & Mutex Profiles |
| 195 | + |
| 196 | +By default these profiles are zero-rate (Go runtime samples nothing). To enable, you'd need to call `runtime.SetBlockProfileRate` / `runtime.SetMutexProfileFraction` from inside Arc — currently not exposed via env var. If you need block/mutex profiles, open an issue describing the problem you're chasing and we'll add the knobs. |
| 197 | + |
| 198 | +## Operational Notes |
| 199 | + |
| 200 | +### Startup Logging |
| 201 | + |
| 202 | +When `ARC_DEBUG_PPROF` is unset, Arc emits nothing at startup about pprof. The listener is genuinely absent — no port, no handlers, no log noise. |
| 203 | + |
| 204 | +When set, a single warn-level (loopback) or error-level (non-loopback) line names the bind address and reminds you to restrict access. Grep for `ARC_DEBUG_PPROF is set` in your logs to find nodes that left it on accidentally. |
| 205 | + |
| 206 | +### Shutdown Behavior |
| 207 | + |
| 208 | +Arc registers pprof with the same shutdown priority as the main HTTP server. On `SIGTERM` / `SIGINT`, the pprof listener closes **immediately** — in-flight captures (especially long `/debug/pprof/profile?seconds=N` requests) are aborted. This is deliberate: a long pprof capture would otherwise hold the cluster's shared shutdown budget and risk skipping downstream hooks (WAL flush, storage close, auth close), which is a data-loss path on what the operator expected to be a graceful exit. |
| 209 | + |
| 210 | +If your capture was killed by shutdown, just re-run it after Arc restarts. |
| 211 | + |
| 212 | +### Port Conflicts |
| 213 | + |
| 214 | +If the configured bind address is already in use, Arc logs an **error** and continues without the pprof listener — Arc itself doesn't fail to start. Look for: |
| 215 | + |
| 216 | +``` |
| 217 | +ERROR ARC_DEBUG_PPROF=1 but failed to bind pprof listener; continuing without pprof |
| 218 | + addr=127.0.0.1:6060 error="listen tcp 127.0.0.1:6060: bind: address already in use" |
| 219 | +``` |
| 220 | + |
| 221 | +Common causes: |
| 222 | +- A previous Arc process didn't release the port (`lsof -nP -iTCP:6060`). |
| 223 | +- Another Go service on the host already runs pprof on `:6060` (the Go-runtime convention). |
| 224 | +- A non-Arc service grabbed the port. |
| 225 | + |
| 226 | +Resolve the conflict and restart Arc, or set `ARC_DEBUG_PPROF_ADDR` to a different port. |
| 227 | + |
| 228 | +## Security Checklist |
| 229 | + |
| 230 | +Before enabling pprof on a production node: |
| 231 | + |
| 232 | +- [ ] `ARC_DEBUG_PPROF_ADDR` is loopback (default) **or** the host is firewalled to allow only your jumphost / operator workstation. |
| 233 | +- [ ] If non-loopback, `ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1` is set deliberately (not by env-var inheritance from a parent process). |
| 234 | +- [ ] You have a plan to unset `ARC_DEBUG_PPROF` when the investigation is done — pprof should not be left on indefinitely. |
| 235 | +- [ ] On Kubernetes / docker-compose, the pprof port is **not** in the service's port list or compose `ports:` block — only reachable via `kubectl port-forward` or `docker exec`. |
| 236 | +- [ ] Heap dumps you save (`heap.pprof`, `goroutines.txt`, `trace.out`) are treated as sensitive: they contain in-flight query text and ingested records. Don't paste them into public issues; share via your team's secure channel. |
| 237 | + |
| 238 | +## Reference |
| 239 | + |
| 240 | +- Source: [`cmd/arc/debug_pprof.go`](https://github.com/Basekick-Labs/arc/blob/main/cmd/arc/debug_pprof.go) — the listener and the two-step gate. |
| 241 | +- PR that introduced the gate: [#443](https://github.com/Basekick-Labs/arc/pull/443). |
| 242 | +- Advisory: [GHSA-j93g-rp6m-j32m](https://github.com/Basekick-Labs/arc/security/advisories/GHSA-j93g-rp6m-j32m). |
| 243 | +- Upstream Go docs: [`net/http/pprof`](https://pkg.go.dev/net/http/pprof) and [`runtime/pprof`](https://pkg.go.dev/runtime/pprof). |
0 commit comments