Merge pull request #5 from redpanda-data/adp-governance-dashboard-rewrite

micheleRP · web-flow · commit f6b37f01dcbe · 2026-04-29T08:43:33.000-06:00
docs(governance): add Governance Dashboard pages for ADP GA
diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc
@@ -54,7 +54,10 @@
 *** For Builders
 **** xref:ai-gateway:builders/discover-gateways.adoc[Discover gateways]
 * Trust & Governance
-** xref:governance:dashboard.adoc[Governance dashboard]
+** Governance dashboard
+*** xref:governance:dashboard/index.adoc[Read the overview]
+*** xref:governance:dashboard/agent-network.adoc[Agent Network]
+*** xref:governance:dashboard/violations.adoc[Authorization denials and violations]
 ** xref:governance:guardrails.adoc[Configure guardrails]
 ** xref:governance:budgets.adoc[Token budgets and limits]
 ** xref:governance:kill-switch.adoc[Kill switch]
diff --git a/modules/governance/pages/dashboard.adoc b/modules/governance/pages/dashboard.adoc
diff --git a/modules/governance/pages/dashboard/agent-network.adoc b/modules/governance/pages/dashboard/agent-network.adoc
@@ -0,0 +1,25 @@
+= Agent Network
+:description: Investigate your AI deployment with an interactive topology graph showing every agent and the resources it touches.
+:page-topic-type: how-to
+:personas: platform_admin
+// TODO: confirm persona vocabulary. PRD names HoT (Head of Trust) and CISO as primary readers for the topology graph; using canonical `platform_admin` until docs-team-standards confirms whether to add `security_admin` or equivalent.
+
+include::ROOT:partial$adp-la.adoc[]
+
+[NOTE]
+====
+*Coming at GA (2026-06-15).* The Agent Network sub-page ships in V1 of the governance dashboard. The V0 release (2026-05-15) covers the xref:dashboard/index.adoc[overview] only.
+====
+
+The *Agent Network* sub-page provides an interactive topology graph that traces every agent in your deployment through the LLM providers and MCP servers it depends on. Use it when an anomaly in the xref:dashboard/index.adoc[fleet view] needs to be traced through dependencies, or when you need to audit which agents access a sensitive resource.
+
+// TODO: shipping at GA. Fill against the V1 prototype once the Agent Network design lands per the Governance V0 PRD §"Future Versions". Sections to draft: When to use it, Node types, Edge types, Reading the graph (hover/click/right-click interactions), Runtime overlays (status + volume), Table-view alternative, Click-through to agent config, Common investigation flows, Limits.
+
+== Coming at GA
+
+This page is staged ahead of GA. The full how-to is being drafted against the V1 design and the live Agent Network surface; it ships before 2026-06-15.
+
+In the meantime, see:
+
+* xref:dashboard/index.adoc[Read the governance overview] — the V0 surface, available now.
+* xref:dashboard/violations.adoc[Authorization denials and violations] — also shipping at GA.
diff --git a/modules/governance/pages/dashboard/index.adoc b/modules/governance/pages/dashboard/index.adoc
@@ -0,0 +1,217 @@
+= Read the Governance Overview
+:description: See your AI deployment's spending, fleet, and activity in one place; drill into the transcript behind any number to investigate further.
+:page-topic-type: how-to
+:personas: evaluator, platform_admin, app_developer
+// TODO: confirm persona vocabulary. The Governance V0 PRD names HoT (Head of Trust), CIO/CFO, CISO, and FDE; this page uses canonical docs-team-standards personas (`evaluator`, `platform_admin`, `app_developer`) and surfaces the PRD names only in the "Reading the dashboard for your role" section headings. Confirm with docs-team-standards owner whether to add `executive` and `security_admin` (or equivalents) so the persona metadata matches the PRD audience exactly.
+:learning-objective-1: Identify the widgets on the governance overview and what each one shows
+:learning-objective-2: Choose where to focus first based on your role
+:learning-objective-3: Investigate a metric, agent, or event by opening its underlying transcript
+
+include::ROOT:partial$adp-la.adoc[]
+
+The governance overview shows your AI deployment's spending, fleet, and activity on a single page. Every number, agent, and event links to the transcript that produced it, so you can investigate without leaving the dashboard.
+
+After reading this page, you will be able to:
+
+* [ ] {learning-objective-1}
+* [ ] {learning-objective-2}
+* [ ] {learning-objective-3}
+
+== Prerequisites
+
+* Access to the Agentic Data Plane.
++
+// TODO: confirm sign-in URL and IAM/role requirement once the standalone ADP UI ships.
+* At least one agent or MCP server with recent activity, or an empty deployment if you want to see the empty-state guidance below.
+* Read access to the Spending, Agent registry, and MCP server APIs (`dataplane_adp_spending_get`, `dataplane_adp_agent_get`, `dataplane_adp_mcpserver_get`).
++
+// TODO: confirm role-to-permission mapping once the standalone ADP UI ships.
+
+== Open the governance overview
+
+Sign in to the ADP UI and select *Governance* from the sidebar. The overview is the default view.
+
+// TODO: annotated screenshot of the V0 governance overview, with callouts for the five widgets (summary cards, token spend by provider, events timeline, agents table, MCP servers table). V0 prototype lands on `adp-production` 2026-05-08 per the PRD milestone.
+
+The overview shows five widgets, top to bottom:
+
+. *Summary cards* — total spend, agent count, request count, trend versus the previous period.
+. *Token spend by LLM provider* — provider-by-provider cost and request breakdown.
+. *Events over time* — time series of activity, filterable by event type.
+. *Agents* — every agent in your deployment, with status, error count, and tool invocation count.
+. *MCP servers* — every MCP server, with type and connection status.
+
+== Reading each widget
+
+=== Summary cards
+
+The summary cards answer "how much are we using right now?" at a glance.
+
+[cols="1,3"]
+|===
+|Card |What it shows
+
+|*Total spend*
+|Sum of all LLM provider costs in the current period. Reported in microcents (1 cent = 100 microcents, $1 = 10,000 microcents). Divide by 10,000 to get dollars.
+
+|*Agent count*
+|Number of agents (managed plus BYOA) registered in your deployment.
+
+|*Request count*
+|Total LLM requests routed through AI Gateway in the current period.
+
+|*Trend*
+|Each card carries a "vs last period" delta. The delta uses the same window length as the current period: a 30-day current view compares against the prior 30 days. Calculation: `(current - previous) / previous`.
+|===
+
+The current period defaults to the last 30 days. Change the time-range selector at the top of the page to inspect a shorter or longer window.
+
+// TODO: confirm time-range selector defaults and allowable windows once the V0 prototype lands.
+
+=== Token spend by LLM provider
+
+Each row shows a single LLM provider (`openai`, `anthropic`, `bedrock`, …) with cost, request count, and token totals for the current period. Click a provider row to filter the events timeline and agents table to that provider.
+
+The breakdown is also available by model, by user, and by provider type. Use the dimension toggle above the chart to switch.
+
+// TODO: confirm dimension-toggle UX once the V0 prototype lands.
+
+=== Events over time
+
+A time-series chart of activity across your deployment.
+
+* For ranges shorter than 7 days, the chart uses *hourly* buckets.
+* For ranges of 7 days or more, the chart uses *daily* buckets.
+
+Use the event-type filter to focus on a specific kind of activity (request, error, violation, agent state change). Each bar links to a filtered transcript view for the time bucket.
+
+// TODO: confirm event-type filter values once the V0 prototype lands.
+
+=== Agents
+
+Every agent in your deployment, managed or BYOA. The table lists:
+
+* *Name* — the agent's resource name.
+* *Type* — managed or BYOA.
+* *Status* — current runtime state. The state badge maps to the agent's runtime state:
++
+[cols="1,2"]
+|===
+|State |Display
+
+|`AGENT_STATE_RUNNING`
+|*Running*
+
+|`AGENT_STATE_STOPPED`
+|*Stopped*
+
+|`AGENT_STATE_STARTING` / `AGENT_STATE_CREATING`
+|*Starting* (transient; refresh for the latest state)
+
+|`AGENT_STATE_STOPPING` / `AGENT_STATE_DELETING`
+|*Stopping* (transient; refresh for the latest state)
+
+|`AGENT_STATE_DEGRADED`
+|*Degraded* (the row shows `state_reason`)
+
+|`AGENT_STATE_FAILED`
+|*Failed* (the row shows `state_reason`)
+|===
+
+* *Error count* — number of errored executions in the current period.
++
+// TODO: confirm error-count source (transcripts vs separate metric) once the V0 prototype lands.
+* *Tool invocations* — number of tool calls the agent has made in the current period.
++
+// TODO: confirm tool-invocation source (transcripts vs separate metric).
+* *Last active* — most recent execution timestamp.
+
+Click any agent row to drill into its transcript history.
+
+=== MCP servers
+
+Every MCP server registered in your deployment. The table lists:
+
+* *Name* — the server's resource name.
+* *Type* — managed or self-managed.
+* *Status* — connection state.
+
+For deeper MCP server reading, see xref:mcp:index.adoc[MCP Servers].
+
+== Reading the dashboard for your role
+
+The dashboard is designed for non-technical leadership. Different roles read it differently.
+
+=== If you're a CIO or CFO
+
+Start with the *summary cards*. The total spend and trend tell you at a glance whether AI cost is moving in the expected direction.
+
+Next, scan *Token spend by LLM provider* to see which provider is the largest line item. If one provider dominates your spend, click into it to see which agents drive that cost.
+
+You should be able to answer "how much are we spending?" within 30 seconds of opening the page.
+
+=== If you're a CISO
+
+Start with the *Agents* and *MCP servers* tables. They give you full visibility into what's running in the deployment, including BYOA agents that operate outside your direct control.
+
+Scan for unfamiliar agents, agents in `Failed` or `Degraded` states, or MCP servers in disconnected states. Click any anomaly to drill into the transcript.
+
+=== If you're a Head of Trust
+
+Start with the *Agents* table sorted by error count or tool-invocation count. Outlier rows are usually where investigation begins.
+
+Use the *Events over time* chart to spot bursts of activity that don't match your expectations. Drill into a bar to see exactly what happened in that time bucket.
+
+=== If you're demoing to a prospect
+
+Walk top to bottom: summary cards, then provider breakdown, then events timeline, then agent fleet, then MCP servers. The page is designed to tell the full story of an AI deployment in under two minutes, with no setup or pre-staged data.
+
+== Investigate any number, agent, or activity
+
+Every value on this page is a link.
+
+* Click a *summary card* to filter the page to its underlying period.
+* Click a *provider row* to scope every other widget to that provider.
+* Click an *agent row* to open the agent's transcript history.
+* Click an *event bucket* to open transcripts for that time range.
+* Click an *MCP server row* to see invocations against that server.
+
+For the data model behind transcripts, see xref:observability:transcripts.adoc[Read a transcript].
+
+== Empty states
+
+The dashboard handles three common empty states:
+
+[cols="1,3"]
+|===
+|State |What you see
+
+|*No telemetry yet*
+|For a BYOA agent that has not yet streamed any transcripts. The agent row shows a *Connect telemetry* call to action that links to xref:observability:byoa-telemetry.adoc[BYOA telemetry].
+
+|*No spend recorded*
+|For fresh deployments before the first agent run. The summary cards show zeroes and a hint that data will populate after the first request.
+
+|*No agents*
+|For deployments before the first agent is created. The page links to xref:agents:create-agent.adoc[Create a declarative agent].
+|===
+
+// TODO: confirm exact CTA text and behavior once the V0 prototype lands.
+
+== What's coming at GA
+
+The current beta release ships the V0 overview described above. Later in the GA window, the dashboard adds:
+
+* *Cost over time* and *cost drill-down by agent within a provider* on the same page.
+* *Token-to-dollar conversion* on every cost figure.
+* *Agent Network* — an interactive topology graph that traces every agent through its LLM providers and MCP servers. See xref:dashboard/agent-network.adoc[Agent Network] (shipping at GA).
+* *Authorization denials and violations* — an aggregated feed of policy events. See xref:dashboard/violations.adoc[Authorization denials and violations] (shipping at GA).
+* *Kill switch* — pause an agent without removing it.
+
+// TODO: refresh this list against the V1 design when those design docs land. Per the Governance V0 PRD, V1 also adds a top-spenders table and agent-owner metadata; confirm the final V1 surface before GA.
+
+== Next steps
+
+* Investigate an anomaly: xref:observability:transcripts.adoc[Read a transcript].
+* Wire up missing telemetry: xref:observability:byoa-telemetry.adoc[BYOA telemetry].
+* Add agents to your deployment: xref:agents:create-agent.adoc[Create a declarative agent].
diff --git a/modules/governance/pages/dashboard/violations.adoc b/modules/governance/pages/dashboard/violations.adoc
@@ -0,0 +1,26 @@
+= Authorization Denials and Violations
+:description: Investigate every authorization denial, guardrail violation, and policy event across your AI deployment from a single feed.
+:page-topic-type: how-to
+:personas: platform_admin, app_developer
+// TODO: confirm persona vocabulary. PRD names CISO and HoT (Head of Trust) as primary readers for the violations feed; using canonical `platform_admin, app_developer` until docs-team-standards confirms whether to add `security_admin` or equivalent.
+
+include::ROOT:partial$adp-la.adoc[]
+
+[NOTE]
+====
+*Coming at GA (2026-06-15).* The aggregated violations feed ships in V1 of the governance dashboard. The V0 release (2026-05-15) covers the xref:dashboard/index.adoc[overview] only.
+====
+
+The violations feed aggregates every policy event in your deployment: authorization denials, guardrail violations, and other policy enforcement actions. Use it to investigate "what was blocked, when, and why," and to follow each event back to the underlying transcript.
+
+// TODO: shipping at GA. Fill against the V1 prototype once the authorization denial feed design lands per the Governance V0 PRD §"Future Versions". Sections to draft: What this view is, What appears in the feed (auth denials / guardrail violations / other policy events), Reading a violation entry (timestamp / agent / resource / reason / severity / transcript link), Filtering, Common investigation flows, Cost of violations, Empty state, Compliance export.
+
+== Coming at GA
+
+This page is staged ahead of GA. The full how-to is being drafted against the V1 design and the live violations feed; it ships before 2026-06-15.
+
+In the meantime, see:
+
+* xref:dashboard/index.adoc[Read the governance overview] — the V0 surface, available now.
+* xref:dashboard/agent-network.adoc[Agent Network] — also shipping at GA.
+* xref:observability:transcripts.adoc[Read a transcript] — for per-execution investigation today.