|
| 1 | +# dx_evidence_graph |
| 2 | + |
| 3 | +A Pixie UI dashboard that renders one dx-agent investigation as a |
| 4 | +**severity-weighted, all-protocol pod-to-pod attack graph**. Replaces |
| 5 | +the latency-weighted HTTP service map in `cluster_overview` for |
| 6 | +security work. |
| 7 | + |
| 8 | +* Nodes = pods. Falls back to service → IP, mirroring `net_flow_graph`. |
| 9 | +* Edges = the attack path emitted by dx (delivery → egress → |
| 10 | + execution → collection → exfil → pivot). |
| 11 | +* Display spec: `vispb.Graph`. **`edgeWeightColumn = weight`** |
| 12 | + (open-ended UInt16 sum of CRS severity → edge thickness), |
| 13 | + **`edgeColorColumn = max_severity`** (discrete 2-5 heat → edge |
| 14 | + colour). |
| 15 | +* Read source: `forensic_db.dx_attack_graph` via `px.DataFrame`'s |
| 16 | + `clickhouse_dsn` kwarg (`src/carnot/planner/objects/dataframe.cc:43`). |
| 17 | + |
| 18 | +## Schema — `forensic_db.dx_attack_graph` |
| 19 | + |
| 20 | +Locked with dx-agent in PR #62 / `entlein/dx#68`. The |
| 21 | +`attackgraph.Edge` Go struct is the single source of truth for the |
| 22 | +JSON wire format, the ClickHouse row, and the test fixture. |
| 23 | + |
| 24 | +| Column | Type | Role | |
| 25 | +|---|---|---| |
| 26 | +| `investigation_id` | String | one graph per dx verdict / pivot incident (UI filter key) | |
| 27 | +| `ts` | UInt64 | unix nanos | |
| 28 | +| `requestor_pod` / `responder_pod` | String | the hop (`ns/pod`); `""` if only an IP is known | |
| 29 | +| `requestor_service` / `responder_service` | String | | |
| 30 | +| `requestor_ip` / `responder_ip` | String | peer IP when pod unresolved | |
| 31 | +| `weight` | UInt16 | Σ CRS severity on the hop — `edgeWeightColumn` | |
| 32 | +| `max_severity` | UInt8 | top single-criterion severity (2-5) — `edgeColorColumn` | |
| 33 | +| `confidence` | Float32 | verdict confidence | |
| 34 | +| `edge_kind` | String | `delivery`/`egress`/`execution`/`collection`/`exfil`/`pivot` | |
| 35 | +| `condition` / `criteria` | String | ruled-in condition + criterion label(s) | |
| 36 | +| `num_findings` | UInt32 | | |
| 37 | + |
| 38 | +Table DDL (mirrors `kubescape_logs` partition/TTL convention): |
| 39 | + |
| 40 | +```sql |
| 41 | +CREATE TABLE forensic_db.dx_attack_graph ( ...columns above... ) |
| 42 | +ENGINE = MergeTree |
| 43 | +PARTITION BY toYYYYMM(fromUnixTimestamp64Nano(ts)) |
| 44 | +ORDER BY (investigation_id, requestor_pod, responder_pod) |
| 45 | +TTL toDateTime(fromUnixTimestamp64Nano(ts)) + INTERVAL 30 DAY DELETE; |
| 46 | +``` |
| 47 | + |
| 48 | +## Per-rig ClickHouse DSN |
| 49 | + |
| 50 | +The bundled `vis.json` ships with `clickhouse_dsn` **empty** — the |
| 51 | +default is intentionally non-credentialed so the bundle stays |
| 52 | +portable across clusters. Operators fill the DSN in via the Pixie |
| 53 | +UI script-args panel at run time. |
| 54 | + |
| 55 | +For the in-cluster soc deployment the DSN is: |
| 56 | + |
| 57 | +``` |
| 58 | +forensic_analyst:changeme-analyst@clickhouse-forensic-soc-db.clickhouse.svc.cluster.local:9000/forensic_db |
| 59 | +``` |
| 60 | + |
| 61 | +`forensic_analyst` has read-only SELECT on `forensic_db`; same |
| 62 | +credential the existing `soc/analysis/px_clickhouse/kubescape/observe.pxl` |
| 63 | +script uses for `kubescape_logs`. Override in the UI for other rigs. |
| 64 | + |
| 65 | +## Deploy |
| 66 | + |
| 67 | +Bundle build path: |
| 68 | + |
| 69 | +1. `//src/pxl_scripts:script_bundle` walks every `*.pxl` + `vis.json` |
| 70 | + under `src/pxl_scripts/` and emits `bundle-oss.json` |
| 71 | + (`src/pxl_scripts/BUILD.bazel:34`). |
| 72 | +2. `//src/cloud/proxy:proxy_server_image` bakes the bundle in as a |
| 73 | + container layer at `/bundle` |
| 74 | + (`src/cloud/proxy/BUILD.bazel:36`). |
| 75 | +3. `skaffold run -f skaffold/skaffold_cloud.yaml` rebuilds the |
| 76 | + cloud-proxy image and applies the Deployment. |
| 77 | + |
| 78 | +Vizier / PEM / standalone-pem images are unaffected — this is a |
| 79 | +UI-bundle-only change. |
| 80 | + |
| 81 | +## Out of scope for v1 |
| 82 | + |
| 83 | +* `conn_stats` overlay (the "render the benign neighbourhood + light |
| 84 | + up the attack path" view). Ship the attack-path-only graph first; |
| 85 | + add the join in v2 once the visual has been used on a real |
| 86 | + incident. |
| 87 | +* Time anchoring relative to `ts` rather than free-form `start_time`. |
| 88 | + Operators today use `-15m` defaults; a future widget could centre |
| 89 | + the window on the investigation's first `ts`. |
0 commit comments