You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/GRAFANA-LOGGING.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -331,22 +331,22 @@ With `.env` containing `SIMSTEWARD_LOKI_URL`, `SIMSTEWARD_LOKI_USER`, and `SIMST
331
331
332
332
| Command | Purpose |
333
333
|---------|---------|
334
-
|`npm run loki:query`| One-off `GET .../loki/api/v1/query_range` via `scripts/query-loki-once.mjs`. Flags: `--query` (LogQL), `--limit`, `--lookback` (seconds). |
335
-
|`npm run env:run -- <command>`| Load `.env` into the child process (e.g. `npm run env:run -- pwsh -NoProfile -File scripts/poll-loki.ps1`). |
336
-
|`npm run obs:poll`| Tail-style poll, direct Loki (default). |
337
-
|`npm run obs:poll:grafana`| Same, but `-ViaGrafana` (Bearer → Grafana proxy → Loki). |
338
-
|`npm run obs:poll:grafana:env`| Same as `obs:poll:grafana` but injects `.env` with `dotenv-cli` first (secrets only in the child process). |
334
+
|`pnpm run loki:query`| One-off `GET .../loki/api/v1/query_range` via `scripts/query-loki-once.mjs`. Flags: `--query` (LogQL), `--limit`, `--lookback` (seconds). |
335
+
|`pnpm run env:run -- <command>`| Load `.env` into the child process (e.g. `pnpm run env:run -- pwsh -NoProfile -File scripts/poll-loki.ps1`). |
336
+
|`pnpm run obs:poll`| Tail-style poll, direct Loki (default). |
337
+
|`pnpm run obs:poll:grafana`| Same, but `-ViaGrafana` (Bearer → Grafana proxy → Loki). |
338
+
|`pnpm run obs:poll:grafana:env`| Same as `obs:poll:grafana` but injects `.env` with `dotenv-cli` first (secrets only in the child process). |
339
339
340
-
**Path A (direct Loki):**`SIMSTEWARD_LOKI_*` + `npm run loki:query` or `npm run obs:poll`.
340
+
**Path A (direct Loki):**`SIMSTEWARD_LOKI_*` + `pnpm run loki:query` or `pnpm run obs:poll`.
341
341
342
342
**Path B (Grafana Cloud, elevated `glsa_*` Bearer):** Set **`GRAFANA_URL`** to your stack (`https://<slug>.grafana.net` — **not**`logs-prod-*.grafana.net`). Set **`GRAFANA_LOKI_DATASOURCE_UID`** to the Loki datasource UID in that stack (Connections → Data sources). Set **`GRAFANA_API_TOKEN`***or***`CURSOR_ELEVATED_GRAFANA_TOKEN`** (service account token with permission to query the Loki datasource via the proxy). Then:
`poll-loki.ps1` reads `.env` from disk; `*:env`npm scripts add `dotenv -e .env` so variables are also loaded for the child process without exporting them in the shell.
349
+
`poll-loki.ps1` reads `.env` from disk; `*:env`pnpm scripts add `dotenv -e .env` so variables are also loaded for the child process without exporting them in the shell.
350
350
351
351
**401/403:** On Path A, the `glc_*` policy may lack Loki **read**. On Path B, ensure the Bearer token can query datasources; check datasource UID and stack URL.
Copy file name to clipboardExpand all lines: docs/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ Editing files outside the **SimHub rule doc allowlist** does not attach the full
27
27
28
28
-**Workspace:** Open this repo as a **single-folder** Cursor workspace rooted at `simhub-plugin` so search and tooling are not mixed with unrelated paths (other clones, AppData, etc.).
29
29
-**ContextStream project:** Keep the ContextStream **project path** aligned with that same folder so `ingest_local` / MCP index the intended tree.
30
-
-**Corpus hygiene:**[`.cursorignore`](../.cursorignore) trims noise for Cursor; after changing ignore rules or large doc/code moves, run a **forced** ContextStream ingest (`npm run contextstream:ingest:force` — see [.cursor/skills/contextstream/SKILL.md](../.cursor/skills/contextstream/SKILL.md)).
30
+
-**Corpus hygiene:**[`.cursorignore`](../.cursorignore) trims noise for Cursor; after changing ignore rules or large doc/code moves, run a **forced** ContextStream ingest (`pnpm run contextstream:ingest:force` — see [.cursor/skills/contextstream/SKILL.md](../.cursor/skills/contextstream/SKILL.md)).
31
31
-**Structural graph:** ContextStream **code graph** may not expose C# module edges; use keyword/semantic `search` plus the **Code map** in [ARCHITECTURE.md](ARCHITECTURE.md) for navigation.
32
32
33
33
---
@@ -38,7 +38,7 @@ Editing files outside the **SimHub rule doc allowlist** does not attach the full
38
38
|-----|----------|
39
39
|[USER-FEATURES-PM.md](USER-FEATURES-PM.md)| PM-style user features (12 flows), connections, vision vs shipped vs [PRODUCT-FLOW.md](PRODUCT-FLOW.md)|
40
40
|[USER-FLOWS.md](USER-FLOWS.md)| Step-by-step user journeys through today's UI (mermaid diagrams); PM issues and flow gaps |
41
-
|[observability-local.md](observability-local.md)| Local Grafana/Loki stack, npm scripts, loki-gateway |
41
+
|[observability-local.md](observability-local.md)| Local Grafana/Loki stack, pnpm scripts, loki-gateway |
42
42
|[observability-scaling.md](observability-scaling.md)| Many users, large grids, Loki cardinality |
43
43
|[DATA-ROUTING-OBSERVABILITY.md](DATA-ROUTING-OBSERVABILITY.md)| OTel vs Loki vs Prometheus, ~1k-user sizing, car telemetry taxonomy |
Copy file name to clipboardExpand all lines: docs/TROUBLESHOOTING.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -142,7 +142,7 @@ If you expect SimSteward logs in Grafana (Cloud or local) but see none:
142
142
143
143
1.**Plugin output** — The plugin writes **plugin-structured.jsonl** only (plus WebSocket to the dashboard). It does **not** batch-POST those lines to Loki in-process yet. **`deploy.ps1`** can POST a **`deploy_marker`** when **`SIMSTEWARD_LOKI_URL`** is set (see **`send-deploy-loki-marker.ps1`**). For full logs in Loki, use an external shipper to tail **plugin-structured.jsonl**.
144
144
2.**Env metadata** — Set `SIMSTEWARD_LOKI_URL` and `SIMSTEWARD_LOG_ENV` before SimHub starts (e.g. `.env` loaded by **`deploy.ps1`** / **`run-simhub-local-observability.ps1`**) so JSON includes `loki_push_target` / `log_env`.
145
-
3.**Local stack** — Start observability from `observability/local/` (`npm run obs:up`) so Loki (3100) and Grafana (3000) run; compose does **not** ingest **plugin-structured.jsonl** automatically.
145
+
3.**Local stack** — Start observability from `observability/local/` (`pnpm run obs:up`) so Loki (3100) and Grafana (3000) run; compose does **not** ingest **plugin-structured.jsonl** automatically.
146
146
4.**Auth (Grafana Cloud / gateway)** — For **deploy markers**: Grafana Cloud uses **Basic** (`SIMSTEWARD_LOKI_USER` + **`SIMSTEWARD_LOKI_TOKEN`**); local **loki-gateway** uses **Bearer `LOKI_PUSH_TOKEN`**. Push failures print in the deploy script output.
147
147
5.**Data source in Grafana** — Point the Loki data source at your Loki URL (e.g. `http://localhost:3100` for local). Explore: `{app="sim-steward"}`.
148
148
6.**Debug vs production** — With `SIMSTEWARD_LOG_DEBUG=1`, many more lines (e.g. `tick_stats`, `yaml_update`) are sent. For AI or production dashboards, filter with `| level != "DEBUG"` to avoid noise.
@@ -155,7 +155,7 @@ See **docs/GRAFANA-LOGGING.md** for label schema, event taxonomy, and LogQL exam
155
155
156
156
For the full pipeline (collector, ports, Grafana datasource URL), see **docs/observability-local.md** § Canonical path and § Metrics / OTLP troubleshooting.
157
157
158
-
1.**Nothing in Explore (Prometheus Local)** — Confirm **`npm run obs:up`** is running and **`http://localhost:9090/-/healthy`** returns OK. Smoke: **`npm run obs:poll:prometheus`**.
158
+
1.**Nothing in Explore (Prometheus Local)** — Confirm **`pnpm run obs:up`** is running and **`http://localhost:9090/-/healthy`** returns OK. Smoke: **`pnpm run obs:poll:prometheus`**.
159
159
2.**No `simsteward_*` metrics** — OTLP is disabled unless **`OTEL_EXPORTER_OTLP_ENDPOINT`** or **`SIMSTEWARD_OTLP_ENDPOINT`** is set **before** SimHub starts (SimHub does not load `.env` automatically). Use **`scripts/run-simhub-local-observability.ps1`** or set env in the user/session environment.
160
160
3.**`connection refused` to port 4317** — OpenTelemetry Collector is not up or ports are not mapped; restart compose from the repo root.
161
161
4.**Wrong protocol** — gRPC defaults for **`http://127.0.0.1:4317`**. For HTTP/protobuf on **4318**, set **`OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf`** and point the endpoint at **4318**.
Copy file name to clipboardExpand all lines: docs/observability-local.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,30 +21,30 @@ Quick start for plugin logs in local Grafana/Loki, **optional OTLP metrics** (Op
21
21
2.**Start the stack** (repo root):
22
22
23
23
```powershell
24
-
npm run obs:up
24
+
pnpm run obs:up
25
25
```
26
26
27
-
Or copy `observability/local/.env.observability.example` → `.env.observability.local`, set passwords/tokens, then `npm run obs:up:env`. Check: `npm run obs:ps`.
27
+
Or copy `observability/local/.env.observability.example` → `.env.observability.local`, set passwords/tokens, then `pnpm run obs:up:env`. Check: `pnpm run obs:ps`.
28
28
29
29
3.**Configure the plugin** — SimHub does not load `.env` by default. Recommended: `.\scripts\run-simhub-local-observability.ps1` (sets `SIMSTEWARD_LOKI_URL=http://localhost:3100`, `SIMSTEWARD_LOG_ENV=local`, and OTLP for metrics — see script). Or set those in Windows user env and restart SimHub. See `.env.example` “Local Loki” and “OTLP / Prometheus (local metrics)” blocks.
30
30
31
31
4.**Grafana** — http://localhost:3000 → Explore → Loki → `{app="sim-steward", env="local"}`. Provisioned dashboard **Sim Steward — Deploy health** (`simsteward-deploy-health`) correlates `deploy.ps1` markers (`event=deploy_marker`) with plugin bring-up and errors. Put `SIMSTEWARD_LOKI_URL` (and `LOKI_PUSH_TOKEN` if using loki-gateway) in repo **`.env`** — `deploy.ps1` loads it automatically via `scripts/load-dotenv.ps1` (optional merge: `observability/local/.env.observability.local`).
32
32
33
-
5.**Metrics (optional)** — With the stack up, set **`OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317`** (or use `SIMSTEWARD_OTLP_ENDPOINT`) before starting SimHub. After the plugin loads, Explore → **Prometheus Local** → e.g. `simsteward_process_cpu_percent` or `up{job="otel-collector"}`. Smoke: `npm run obs:poll:prometheus` or `.\scripts\poll-prometheus.ps1`.
33
+
5.**Metrics (optional)** — With the stack up, set **`OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317`** (or use `SIMSTEWARD_OTLP_ENDPOINT`) before starting SimHub. After the plugin loads, Explore → **Prometheus Local** → e.g. `simsteward_process_cpu_percent` or `up{job="otel-collector"}`. Smoke: `pnpm run obs:poll:prometheus` or `.\scripts\poll-prometheus.ps1`.
34
34
35
35
6.**Generate traffic** — Use SimHub + web dashboard; confirm logs in **Explore** with `{app="sim-steward", env="local"}` (no repo-provisioned Grafana dashboards until you add JSON under `observability/local/grafana/provisioning/dashboards/`).
36
36
37
37
**Storage override:** Set `GRAFANA_STORAGE_PATH` in `.env.observability.local`; compose uses `${GRAFANA_STORAGE_PATH:-S:/sim-steward-grafana-storage}`.
38
38
39
-
**Terminal tail:**`npm run obs:poll` (direct Loki :3100) or `npm run obs:poll:grafana` / `.\scripts\poll-loki.ps1 -ViaGrafana` using **GRAFANA_API_TOKEN** (or admin user/password) in repo `.env` — same path Grafana Explore uses (`loki_local` datasource). **Prometheus:**`npm run obs:poll:prometheus` / `.\scripts\poll-prometheus.ps1`.
39
+
**Terminal tail:**`pnpm run obs:poll` (direct Loki :3100) or `pnpm run obs:poll:grafana` / `.\scripts\poll-loki.ps1 -ViaGrafana` using **GRAFANA_API_TOKEN** (or admin user/password) in repo `.env` — same path Grafana Explore uses (`loki_local` datasource). **Prometheus:**`pnpm run obs:poll:prometheus` / `.\scripts\poll-prometheus.ps1`.
40
40
41
41
---
42
42
43
43
## Housekeeping: wipe dashboards’ data (local)
44
44
45
45
To **clear Loki chunks/WAL**, optional **Prometheus TSDB**, and optional Grafana bind-mount state **without** changing compose, `loki-config.yml`, datasource provisioning, `LOKI_PUSH_TOKEN`, or `SIMSTEWARD_LOKI_*`:
46
46
47
-
1. From repo root, run **`npm run obs:wipe -- -Force`** (clears the `loki` and **`prometheus`** subdirectories under `GRAFANA_STORAGE_PATH`).
47
+
1. From repo root, run **`pnpm run obs:wipe -- -Force`** (clears the `loki` and **`prometheus`** subdirectories under `GRAFANA_STORAGE_PATH`).
48
48
2. Optional flags: **`-Grafana`** (wipes `grafana.db`; re-run `scripts/grafana-bootstrap.ps1` if you use `GRAFANA_API_TOKEN`), **`-SampleLogs`** (clears `observability/local/sample-logs/*` files), or **`-All`** for both.
@@ -101,9 +101,9 @@ The stack publishes these **host** ports together; any other process (or second
101
101
102
102
### Metrics / OTLP troubleshooting
103
103
104
-
-**`up{job="otel-collector"} == 0`** — Prometheus cannot reach the collector on `otel-collector:8889` (compose network). Confirm `otel-collector` is running: `npm run obs:ps`.
104
+
-**`up{job="otel-collector"} == 0`** — Prometheus cannot reach the collector on `otel-collector:8889` (compose network). Confirm `otel-collector` is running: `pnpm run obs:ps`.
105
105
-**No `simsteward_*` series** — OTLP is off until **`OTEL_EXPORTER_OTLP_ENDPOINT`** or **`SIMSTEWARD_OTLP_ENDPOINT`** is set **before** SimHub starts. Use **`http://127.0.0.1:4317`** for gRPC; for port **4318** set **`OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf`**.
106
-
-**Connection refused on 4317** — Collector not started or ports not published; run `npm run obs:up` from repo root.
106
+
-**Connection refused on 4317** — Collector not started or ports not published; run `pnpm run obs:up` from repo root.
107
107
-**Grafana Prometheus query errors** — Datasource must be **`http://prometheus:9090`** (container DNS), not `localhost:9090`.
108
108
-**Loki remains authoritative** for `host_resource_sample` until you rely on Prom-only SLOs; metrics duplicate CPU/working set at OTLP export cadence.
0 commit comments