Skip to content

[cueweb/docs] Add per-user usage metrics (Prometheus) + Grafana dashboard#2459

Merged
ramonfigueiredo merged 2 commits into
AcademySoftwareFoundation:masterfrom
ramonfigueiredo:cueweb-user-usage-metrics
Jun 23, 2026
Merged

[cueweb/docs] Add per-user usage metrics (Prometheus) + Grafana dashboard#2459
ramonfigueiredo merged 2 commits into
AcademySoftwareFoundation:masterfrom
ramonfigueiredo:cueweb-user-usage-metrics

Conversation

@ramonfigueiredo

@ramonfigueiredo ramonfigueiredo commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Related Issues

Main issue:

Issues related to this PR:

Summarize your change.

Instrument CueWeb end-to-end so operators can see who uses what, how often, and how fast - per user, per page/module, per action - with bounded Prometheus cardinality. Mirrors the asset-search metrics approach.

Metrics (GET /api/metrics, never gated by the authz gate):

  • cueweb_page_views_total{user,page}, cueweb_actions_total{user,action}
  • cueweb_api_requests_total{endpoint,status}, cueweb_api_request_duration_seconds{endpoint}
  • cueweb_logins_total{user}, cueweb_facility_selected_total{user,facility}

Implementation:

  • lib/metrics-service.ts: metric set + helpers + page/action allow-lists (unknown values map to "other" so cardinality stays bounded).
  • lib/track-user.ts: extractUser() resolves the user server-side (session -> X-User/X-Forwarded-User -> anonymous); the client never sets it.
  • app/api/track/route.ts + app/utils/usage_tracking.ts + components/ui/usage-tracker.tsx: client beacons for page views (on route change) and actions (via the shared accessActionApi dispatcher).
  • app/utils/gateway_server.ts handleRoute: records API request count + latency for all proxy routes; best-effort, never affects responses.
  • NEXT_PUBLIC_USAGE_TRACKING=off opts out the client beacon (the /api/metrics endpoint and server-side metrics stay enabled).

Wiring + dashboard:

  • sandbox/config/prometheus-monitoring.yml: cueweb scrape job (cueweb:3000/api/metrics).
  • sandbox/config/grafana/dashboards/cueweb-usage.json: "CueWeb User Usage" (overview, pages, actions, API p50/p90/p99 over a fixed 5m window, top-N users, $user variable).

Docs: reference (Usage metrics section + NEXT_PUBLIC_USAGE_TRACKING), developer-guide (instrumentation flow + files), and deploying-cueweb (scrape job + dashboard), with /api/metrics, Prometheus-query, and Grafana screenshots.

LLM usage disclosure

Parts of this solution's implementation were developed with assistance from Claude Opus.

…oard

Instrument CueWeb end-to-end so operators can see who uses what, how often, and how fast - per user, per page/module, per action - with bounded Prometheus cardinality. Mirrors the asset-search metrics approach.

Metrics (GET /api/metrics, never gated by the authz gate):
- cueweb_page_views_total{user,page}, cueweb_actions_total{user,action}
- cueweb_api_requests_total{endpoint,status}, cueweb_api_request_duration_seconds{endpoint}
- cueweb_logins_total{user}, cueweb_facility_selected_total{user,facility}

Implementation:
- lib/metrics-service.ts: metric set + helpers + page/action allow-lists (unknown values map to "other" so cardinality stays bounded).
- lib/track-user.ts: extractUser() resolves the user server-side (session -> X-User/X-Forwarded-User -> anonymous); the client never sets it.
- app/api/track/route.ts + app/utils/usage_tracking.ts + components/ui/usage-tracker.tsx: client beacons for page views (on route change) and actions (via the shared accessActionApi dispatcher).
- app/utils/gateway_server.ts handleRoute: records API request count + latency for all proxy routes; best-effort, never affects responses.
- NEXT_PUBLIC_USAGE_TRACKING=off opts out the client beacon (the /api/metrics endpoint and server-side metrics stay enabled).

Wiring + dashboard:
- sandbox/config/prometheus-monitoring.yml: cueweb scrape job (cueweb:3000/api/metrics).
- sandbox/config/grafana/dashboards/cueweb-usage.json: "CueWeb User Usage" (overview, pages, actions, API p50/p90/p99 over a fixed 5m window, top-N users, $user variable).

Docs: reference (Usage metrics section + NEXT_PUBLIC_USAGE_TRACKING), developer-guide (instrumentation flow + files), and deploying-cueweb (scrape job + dashboard), with /api/metrics, Prometheus-query, and Grafana screenshots.
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 34e4fba3-5998-48b8-828c-5d0bc65730b3

📥 Commits

Reviewing files that changed from the base of the PR and between cb0c050 and 6435251.

📒 Files selected for processing (4)
  • cueweb/lib/metrics-service.ts
  • cueweb/lib/track-user.ts
  • docs/_docs/getting-started/deploying-cueweb.md
  • docs/_docs/reference/cueweb.md
✅ Files skipped from review due to trivial changes (2)
  • docs/_docs/reference/cueweb.md
  • docs/_docs/getting-started/deploying-cueweb.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • cueweb/lib/track-user.ts
  • cueweb/lib/metrics-service.ts

📝 Walkthrough

Walkthrough

This PR adds a full per-user usage tracking system to CueWeb. It introduces a MetricsService singleton backed by prom-client that records page views, actions, API request counts/latency, logins, and facility selections. Client-side beacon utilities and a UsageTracker component emit fire-and-forget POSTs to a new /api/track endpoint, while the gateway proxy instruments server-side API timing. Prometheus scrape configs and a Grafana dashboard are added to the sandbox.

Changes

CueWeb Usage Tracking System

Layer / File(s) Summary
MetricsService: allow-lists, normalization, and metric registration
cueweb/lib/metrics-service.ts
Introduces ALLOWED_PAGES/ALLOWED_ACTIONS allow-lists, normalize() and statusClass() helpers, and the MetricsService singleton pre-registering all prom-client counters and a latency histogram. Adds recordPageView, recordAction, recordApiRequest, recordLogin, recordFacility public methods and preserves the back-compat registerCounter/incrementCounter API. Exports Page and Action types.
Server-side user extraction
cueweb/lib/track-user.ts
New module implementing extractUser(request) with NextAuth session-first resolution, X-User/X-Forwarded-User header fallback, local-part normalization, and ANONYMOUS_USER sentinel.
Client-side beacon utilities and UsageTracker component
cueweb/app/utils/usage_tracking.ts, cueweb/components/ui/usage-tracker.tsx
New usage_tracking.ts implements the ENABLED gate, beacon() helper (sendBeacon/fetch keepalive), pageNameForPath(), and tracking helpers for pages, actions, endpoints, facilities, and logins. New UsageTracker client component deduplicates route changes via a ref and calls trackPage() for non-login routes, rendering nothing.
POST /api/track endpoint
cueweb/app/api/track/route.ts
New API route that validates JSON body, normalizes kind/name, resolves user via extractUser, dispatches to the appropriate MetricsService recording method, and returns 204 on success or 400 on invalid input.
Gateway proxy instrumentation and action API beacon wiring
cueweb/app/utils/gateway_server.ts, cueweb/app/utils/api_utils.ts, cueweb/app/layout.tsx
handleRoute extended with shortEndpoint() helper, per-call timing, and recordApiRequest observation on both success and error paths. accessActionApi calls trackActionEndpoint fire-and-forget before POST requests. Root layout imports and renders UsageTracker.
Prometheus scrape configs and Grafana dashboard
sandbox/config/prometheus-monitoring.yml, sandbox/config/prometheus/prometheus.yml, sandbox/config/grafana/dashboards/cueweb-usage.json
Two Prometheus configs add a cueweb scrape job at cueweb:3000/api/metrics. New Grafana dashboard defines overview stats, pages/actions timeseries, API request/latency panels, per-user breakdowns, and a facilities panel, all filtered by a templated user variable.
Documentation
docs/_docs/developer-guide/cueweb-development.md, docs/_docs/getting-started/deploying-cueweb.md, docs/_docs/reference/cueweb.md
Developer guide adds a Prometheus/Grafana metrics section. Deploying guide replaces prior Next.js instrumentation instructions with endpoint-focused scraping/dashboard docs. Reference guide adds NEXT_PUBLIC_USAGE_TRACKING env var and a Usage metrics reference section.

Sequence Diagram(s)

sequenceDiagram
  participant Browser
  participant UsageTracker
  participant usage_tracking
  participant POST_api_track as POST /api/track
  participant extractUser
  participant MetricsService

  Browser->>UsageTracker: pathname changes
  UsageTracker->>usage_tracking: trackPage(pathname)
  usage_tracking->>usage_tracking: pageNameForPath(pathname)
  usage_tracking->>POST_api_track: beacon({ kind: "page", name })
  POST_api_track->>extractUser: extractUser(request)
  extractUser-->>POST_api_track: username
  POST_api_track->>MetricsService: recordPageView(user, page)
  POST_api_track-->>Browser: 204 No Content
Loading
sequenceDiagram
  participant Client
  participant handleRoute
  participant GatewayProxy
  participant MetricsService

  Client->>handleRoute: HTTP request
  handleRoute->>handleRoute: shortEndpoint() + start timer
  handleRoute->>GatewayProxy: forward request
  GatewayProxy-->>handleRoute: response / error
  handleRoute->>MetricsService: recordApiRequest(endpoint, status, duration)
  handleRoute-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • #2460: Directly implements the per-user usage metrics (Prometheus) and Grafana dashboard feature described in this issue with the same core components (MetricsService, client-side beacons, /api/track endpoint, dashboard, Prometheus integration).

Suggested reviewers

  • lithorus
  • DiegoTavares

Poem

🐰 Hoppity-hop through the metrics maze,
Beacons fired in a fire-and-forget haze,
Prometheus scrapes what the rabbit has tracked,
Page views and actions — all neatly stacked!
Grafana glows with dashboards so bright,
This bunny counts clicks from morning to night. 📊

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: adding per-user usage metrics with Prometheus integration and a Grafana dashboard. It is specific, concise, and directly reflects the primary purpose of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cueweb/lib/metrics-service.ts`:
- Around line 180-182: The recordFacility method is recording raw facility
values as metric labels, which creates unbounded Prometheus series cardinality
and degrades performance. Before incrementing the facilitySelected metric in the
recordFacility method, implement cardinality bounding for the facility parameter
by either limiting it to a known set of allowed values, normalizing unknown
values to a default category, or applying a safe transformation. Pass only the
bounded facility value to the this.facilitySelected.inc() call.

In `@cueweb/lib/track-user.ts`:
- Around line 41-43: The code in the header extraction logic that checks for
X-User and X-Forwarded-User headers does not validate whether the request is
coming from a trusted proxy before accepting these identity headers. This allows
callers to forge user identities in metrics. Add a trusted-proxy validation
check before accepting these headers (check the request origin or use an
existing trusted-proxy validation utility in your codebase), and only process
the localPart extraction and return when the request comes from a trusted
source; otherwise, fall back to ANONYMOUS_USER.

In `@docs/_docs/getting-started/deploying-cueweb.md`:
- Around line 895-897: The current documentation in the cueweb deploying guide
only describes the direct session-to-anonymous fallback for the user label
resolution, but does not mention the intermediate header fallback mechanism.
Update the documentation to clarify the complete user-resolution flow: first
attempt to resolve from the signed-in session, then fall back to trusted X-User
or X-Forwarded-User headers if authentication is disabled, and only resolve to
anonymous if neither session nor headers provide a user value. This ensures
readers understand the full behavior implemented in track-user.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17d6aa73-9c56-4302-9098-56dda1057a27

📥 Commits

Reviewing files that changed from the base of the PR and between 14bc5ae and cb0c050.

⛔ Files ignored due to path filters (7)
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_api_metrics_endpoint1.png is excluded by !**/*.png
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_api_metrics_endpoint2.png is excluded by !**/*.png
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_api_metrics_endpoint3.png is excluded by !**/*.png
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_grafana_charts1.png is excluded by !**/*.png
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_grafana_charts2.png is excluded by !**/*.png
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_grafana_charts3.png is excluded by !**/*.png
  • docs/assets/images/cueweb/cueweb_user_usage_metrics_prometheus_query.png is excluded by !**/*.png
📒 Files selected for processing (14)
  • cueweb/app/api/track/route.ts
  • cueweb/app/layout.tsx
  • cueweb/app/utils/api_utils.ts
  • cueweb/app/utils/gateway_server.ts
  • cueweb/app/utils/usage_tracking.ts
  • cueweb/components/ui/usage-tracker.tsx
  • cueweb/lib/metrics-service.ts
  • cueweb/lib/track-user.ts
  • docs/_docs/developer-guide/cueweb-development.md
  • docs/_docs/getting-started/deploying-cueweb.md
  • docs/_docs/reference/cueweb.md
  • sandbox/config/grafana/dashboards/cueweb-usage.json
  • sandbox/config/prometheus-monitoring.yml
  • sandbox/config/prometheus/prometheus.yml

Comment thread cueweb/lib/metrics-service.ts
Comment thread cueweb/lib/track-user.ts Outdated
Comment thread docs/_docs/getting-started/deploying-cueweb.md Outdated
@ramonfigueiredo

Copy link
Copy Markdown
Collaborator Author

Deploy & verify

  • Deploy CueWeb and bring up the full stack (docker compose --profile all up -d).
  • End-to-end pipeline: CueWeb /api/metrics exposes all new series -> Prometheus cueweb target is up and cueweb_page_views_total is queryable -> Check the Grafana "CueWeb User Usage" dashboard.

How to test it yourself?

  1. Use CueWeb at http://localhost:3000 , click around (Monitor Jobs, Monitor Hosts, View Job Graph, CueSubmit…), right-click a job -> Kill/Eat/Retry, switch facility, etc.
  2. Raw metrics: curl -s http://localhost:3000/api/metrics | grep cueweb_ , you'll see cueweb_page_views_total{user,page}, cueweb_actions_total{user,action}, cueweb_api_requests_total{endpoint,status}, cueweb_api_request_duration_seconds_*, cueweb_logins_total, cueweb_facility_selected_total.
  3. Prometheus: http://localhost:9090 -> Status -> Targets (the cueweb target should be UP); query topk(10, sum by (page)(cueweb_page_views_total)).
  4. Grafana: http://localhost:3001 (admin/admin) or Skip login -> Dashboards -> CueWeb User Usage. Use the $user dropdown; check the Pages, Actions, API latency (p50/p90/p99), and Users rows.

Address review feedback on the usage-metrics instrumentation:

- metrics-service: bound the cueweb_facility_selected_total `facility` label to the deployment's configured facilities (NEXT_PUBLIC_CUEBOT_FACILITIES); anything else maps to "other", so a crafted /api/track beacon can't create unbounded Prometheus series.
- track-user: the signed-in NextAuth session stays the authoritative, non-spoofable source. The forgeable X-User / X-Forwarded-User identity headers are now honored only when CUEWEB_TRUST_IDENTITY_HEADER=true (off by default; for deployments behind a trusted reverse proxy that strips inbound copies and injects the identity), otherwise the user falls back to anonymous.

Docs (reference + deploying-cueweb): document CUEWEB_TRUST_IDENTITY_HEADER and the full server-side user resolution order (session -> opt-in trusted header -> anonymous).
@ramonfigueiredo

Copy link
Copy Markdown
Collaborator Author

@DiegoTavares / @lithorus
Ready for review!

@ramonfigueiredo ramonfigueiredo merged commit 7f22324 into AcademySoftwareFoundation:master Jun 23, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants