Skip to content

Commit b6180bb

Browse files
Boanerges1996claudeagonza1
authored
feat: Redis cache + pre-warm for dashboard summary endpoints (Phase C of #20) (#28)
* feat: add duration/participants/issues summary endpoints (Phase 2 + 3) Five new endpoints for the remaining dashboard charts that fetch raw data: - GET /v1/conferences/duration-summary Returns conference counts bucketed by duration range (< 1m, 1-3m, etc.) - GET /v1/conferences/participant-count-summary Returns distribution of conferences by participant count - GET /v1/issues/summary Returns issue counts grouped by code with titles - GET /v1/issues/gum-summary Returns getusermedia_error issue counts grouped by error name Also adds three new filter params to /v1/conferences for click-to-detail modals on these charts: - duration_gte, duration_lt (for duration chart) - issue_code (for most-common-issues chart) All endpoints accept appId, created_at_gte, created_at_lte and handle both Python native ISO format and JavaScript's toISOString Z suffix. Phases 2 and 3 of #20 — eliminates the need for the dashboard to download all conferences (~38MB) and all issues (~73MB). * feat: add connections + sessions summary endpoints (Phase 4 + 5 of #20) Adds three new aggregation endpoints that let the dashboard stop downloading full /connections and /sessions payloads to build charts client-side: - GET /v1/connections/summary — relay vs direct connection counts (replaces the Relayed-connections pie chart's client-side reduce) - GET /v1/connections/setup-time-summary — connection setup-time buckets with per-bucket conference_ids for click-to-detail - GET /v1/sessions/summary — browsers, OS, country, and city/geo aggregates (powers Browsers, OS, and Map charts in one roundtrip) Also accepts `conference_ids=a,b,c` on /conferences so the setup-time chart can page through matched conferences on click. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: 60s Redis cache + pre-warm for dashboard summary endpoints (Phase C of #20) With Phases 0-5 merged, every dashboard chart reads from a server-side aggregation endpoint. The SQL is fast with indexes, but the same ~8 queries run on every page load, and the heavy ones (sessions.summary, connections.setup_time_summary) still cost 400-800ms on a live tenant. Adds a thin caching layer in front of each summary view: - `app/summary_cache.py` — `cached_json(endpoint, request, compute)` hashes (endpoint + filter params) into a short key, reads Redis, falls through to `compute()` on miss, and writes back with a 60s TTL. Redis failures are tolerated (settings already has IGNORE_EXCEPTIONS). - Each of the eight summary views moves its existing compute body into a local `compute()` closure and returns through the helper. No change to the JSON shape, query logic, or error handling. - `manage.py prewarm_summaries` — scheduled command that iterates apps with recent traffic (default: any conference in the last 2 days) and runs every summary view with the 30d-window filters the dashboard sends by default. Intended to run every ~30s as an ECS scheduled task so first visitors never see a cold miss. Measured locally against a 7-day Production clone (~18k conferences / 38k sessions / 38k connections): endpoint cold warm conferences/summary 391ms → 12ms (33x) sessions/summary 748ms → 11ms (68x) connections/setup_time_summary 373ms → 11ms (34x) conferences/participant_count_summary 216ms → 7ms (31x) issues/gum_summary 107ms → 6ms (18x) connections/summary 57ms → 6ms (9.5x) issues/summary 45ms → 86ms (noise; both <100ms) conferences/duration_summary 19ms → 8ms (2.3x) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: dedupe conferences for issue_code; harden gum-summary * test: summary_cache LocMem tests and prewarm smoke - Unit-test cache key rules, hit/miss, TTL override, and soft-fail on get/set errors. - Smoke-test prewarm_summaries for zero apps and one recent app (8 views). Made-with: Cursor * fix: bucket created_at_gte/created_at_lte to the minute in cache key The dashboard sends `new Date().toISOString()` minus 30 days as created_at_gte (web app.vue:189), which is millisecond-precise. With the unrounded value going straight into _make_key, every page load — even back-to-back reloads — produced a unique SHA1 digest and a fresh cache miss, so the warm path never served real users: flush redis -> 0 keys load dashboard -> 8 keys reload -> 16 keys (8 stale + 8 fresh) prewarm_summaries had the same problem on the write side: its own since_window = utcnow() - 30d advanced every run, so the entries it populated never matched what the dashboard requested. Truncate ISO timestamps to the minute (YYYY-MM-DDTHH:MM) before hashing, so two requests in the same wall-clock minute share an entry. Correctness still holds because the 60s TTL bounds staleness regardless of bucket size. After the fix, two same-minute dashboard loads both produce 8 keys (no growth), and the 2nd load's slow endpoints serve from cache: before after /v1/sessions/summary 2314ms -> 411ms /v1/connections/setup-time-summary 1117ms -> 229ms /v1/conferences/summary 994ms -> 48ms Two regression tests added covering the bucket and the minute boundary. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: agonza1 <albertogontras@gmail.com>
1 parent d9f3857 commit b6180bb

15 files changed

Lines changed: 1230 additions & 42 deletions

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -987,6 +987,8 @@ Private endpoints are used by the web interface to query data. They require user
987987
- `GET`, query parameters:
988988
- `appId`: Filter by app
989989
- `participantId`: Filter by participant
990+
- `issue_code`: Filter conferences that contain at least one active issue with this code
991+
- Returns each conference once even if multiple matching issues exist
990992

991993
- `/conferences/<uuid:pk>`: Get a specific conference
992994
- `GET`
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
"""
2+
Pre-warm the dashboard-summary Redis cache so first visitors don't pay
3+
the cold-query tax. Intended to run every ~30s via ECS scheduled task.
4+
5+
For each active app that has seen recent data, walks every summary view
6+
with the same (appId, created_at_gte) filter the dashboard sends by
7+
default (last 30 days). The views themselves populate the cache on miss.
8+
"""
9+
import datetime
10+
import logging
11+
import time
12+
13+
from django.core.management.base import BaseCommand
14+
from django.test.client import RequestFactory
15+
16+
from app.models.app import App
17+
from app.models.conference import Conference
18+
19+
from app.views.conference_summary_view import ConferenceSummaryView
20+
from app.views.conference_duration_summary_view import ConferenceDurationSummaryView
21+
from app.views.conference_participant_count_summary_view import ConferenceParticipantCountSummaryView
22+
from app.views.issue_summary_view import IssueSummaryView, GetUserMediaSummaryView
23+
from app.views.connection_summary_view import ConnectionSummaryView, ConnectionSetupTimeSummaryView
24+
from app.views.session_summary_view import SessionSummaryView
25+
26+
logger = logging.getLogger(__name__)
27+
28+
# (label, view)
29+
VIEWS = [
30+
('conferences.summary', ConferenceSummaryView),
31+
('conferences.duration_summary', ConferenceDurationSummaryView),
32+
('conferences.participant_count_summary', ConferenceParticipantCountSummaryView),
33+
('issues.summary', IssueSummaryView),
34+
('issues.gum_summary', GetUserMediaSummaryView),
35+
('connections.summary', ConnectionSummaryView),
36+
('connections.setup_time_summary', ConnectionSetupTimeSummaryView),
37+
('sessions.summary', SessionSummaryView),
38+
]
39+
40+
41+
class Command(BaseCommand):
42+
help = 'Pre-compute dashboard summary responses and cache them'
43+
44+
def add_arguments(self, parser):
45+
parser.add_argument(
46+
'--days',
47+
type=int,
48+
default=30,
49+
help='Window to warm (default: 30 days, matches dashboard default)',
50+
)
51+
parser.add_argument(
52+
'--active-within-days',
53+
type=int,
54+
default=2,
55+
help='Only warm apps that saw a conference in the last N days (default: 2)',
56+
)
57+
58+
def handle(self, *args, **options):
59+
window_days = options['days']
60+
active_within = options['active_within_days']
61+
62+
since_window = datetime.datetime.utcnow() - datetime.timedelta(days=window_days)
63+
active_since = datetime.datetime.utcnow() - datetime.timedelta(days=active_within)
64+
65+
# Apps with any conference in the recent window — skip tenants with
66+
# no traffic so warming doesn't scan their cold tables.
67+
recent_app_ids = (Conference.objects
68+
.filter(created_at__gte=active_since)
69+
.values_list('app_id', flat=True)
70+
.distinct())
71+
apps = App.objects.filter(id__in=list(recent_app_ids), is_active=True)
72+
count = apps.count()
73+
self.stdout.write(f'Warming {count} active apps ({window_days}d window)')
74+
75+
rf = RequestFactory()
76+
created_at_gte = since_window.isoformat() + 'Z'
77+
warmed = 0
78+
failed = 0
79+
80+
for app in apps:
81+
for label, view_cls in VIEWS:
82+
req = rf.get('/v1/' + label, {
83+
'appId': str(app.id),
84+
'created_at_gte': created_at_gte,
85+
})
86+
started = time.monotonic()
87+
try:
88+
view_cls.get(req)
89+
warmed += 1
90+
except Exception as e:
91+
failed += 1
92+
logger.warning('prewarm %s for app %s failed: %s', label, app.id, e)
93+
elapsed_ms = (time.monotonic() - started) * 1000
94+
if elapsed_ms > 500:
95+
self.stdout.write(f' slow: {label} app={app.id} {elapsed_ms:.0f}ms')
96+
97+
self.stdout.write(f'Warmed {warmed} summary entries across {count} apps' +
98+
(f' ({failed} failed)' if failed else ''))

app/summary_cache.py

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
"""
2+
Short-TTL Redis cache for dashboard summary endpoints.
3+
4+
The summary endpoints run SQL aggregations that are fast enough on their own
5+
but get called by every dashboard page load. Cache the computed JSON for
6+
~60 seconds so concurrent viewers share the same roll-up.
7+
8+
Keys are derived from the endpoint name + all request filters, so different
9+
date ranges / apps get independent entries. TTL-only — no explicit
10+
invalidation — because the data is strictly additive (new conferences,
11+
sessions, issues arrive over time) and a sub-minute staleness window is
12+
acceptable for a dashboard.
13+
"""
14+
import hashlib
15+
import json
16+
import logging
17+
18+
from django.conf import settings
19+
from django.core.cache import cache
20+
21+
logger = logging.getLogger(__name__)
22+
23+
DEFAULT_TTL_SECONDS = 60
24+
KEY_PREFIX = 'summary'
25+
26+
# The query-string params that factor into the cache key for each endpoint.
27+
# Anything not listed here is ignored (e.g. trailing slashes, user agent, etc).
28+
CACHE_KEY_PARAMS = (
29+
'appId',
30+
'created_at_gte',
31+
'created_at_lte',
32+
'conferenceId',
33+
'participantId',
34+
)
35+
36+
# Params whose ISO timestamps should be truncated to the minute before hashing,
37+
# so that the dashboard's millisecond-precise `now - 30d` doesn't produce a
38+
# unique cache key per page load. Bucketing means two requests in the same
39+
# wall-clock minute share an entry; correctness still holds because the cache
40+
# entry's own TTL bounds staleness regardless of bucket size.
41+
BUCKETED_PARAMS = ('created_at_gte', 'created_at_lte')
42+
43+
44+
def _bucket_minute(value):
45+
# ISO 8601: 2026-03-28T17:13:18.382Z -> 2026-03-28T17:13Z (first 16 chars
46+
# are YYYY-MM-DDTHH:MM). Anything that doesn't match the layout is
47+
# passed through unchanged so the key still differentiates malformed input.
48+
if not value or len(value) < 16 or value[10] != 'T' or value[13] != ':':
49+
return value
50+
return value[:16] + 'Z'
51+
52+
53+
def _make_key(endpoint, request):
54+
parts = [endpoint]
55+
for name in CACHE_KEY_PARAMS:
56+
val = request.GET.get(name)
57+
if val:
58+
if name in BUCKETED_PARAMS:
59+
val = _bucket_minute(val)
60+
parts.append(f'{name}={val}')
61+
raw = '|'.join(parts)
62+
# Keep key short but unique; include a readable prefix for ops visibility.
63+
digest = hashlib.sha1(raw.encode()).hexdigest()[:16]
64+
return f'{KEY_PREFIX}:{endpoint}:{digest}'
65+
66+
67+
def get_ttl():
68+
return getattr(settings, 'SUMMARY_CACHE_TTL', DEFAULT_TTL_SECONDS)
69+
70+
71+
def cached_json(endpoint, request, compute):
72+
"""
73+
Returns (payload_dict, was_cached_bool).
74+
75+
`compute` is called only on cache miss and must return the JSON-serializable
76+
dict the endpoint would have returned.
77+
"""
78+
key = _make_key(endpoint, request)
79+
try:
80+
cached = cache.get(key)
81+
except Exception as e:
82+
logger.warning('summary cache get failed for %s: %s', key, e)
83+
cached = None
84+
85+
if cached is not None:
86+
return cached, True
87+
88+
payload = compute()
89+
try:
90+
cache.set(key, payload, timeout=get_ttl())
91+
except Exception as e:
92+
logger.warning('summary cache set failed for %s: %s', key, e)
93+
return payload, False

app/tests/__init__.py

Whitespace-only changes.

app/tests/test_pr26_regressions.py

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
import json
2+
3+
from django.test import Client, TestCase
4+
5+
from app.models.app import App
6+
from app.models.conference import Conference
7+
from app.models.issue import Issue
8+
from app.models.organization import Organization
9+
from app.models.participant import Participant
10+
from app.models.session import Session
11+
12+
13+
class PR26RegressionTests(TestCase):
14+
def setUp(self):
15+
self.client = Client()
16+
self.organization = Organization.objects.create(name="Test Org")
17+
self.app = App.objects.create(
18+
name="Test App",
19+
api_key="a" * 32,
20+
organization=self.organization,
21+
)
22+
23+
def _make_conference_graph(self, conference_id):
24+
conference = Conference.objects.create(
25+
conference_id=conference_id,
26+
app=self.app,
27+
)
28+
participant = Participant.objects.create(
29+
participant_id=f"{conference_id}-participant",
30+
app=self.app,
31+
)
32+
participant.conferences.add(conference)
33+
session = Session.objects.create(
34+
conference=conference,
35+
participant=participant,
36+
)
37+
return conference, participant, session
38+
39+
def test_conferences_issue_code_filter_returns_distinct_conferences(self):
40+
conference, participant, session = self._make_conference_graph("conf-1")
41+
42+
Issue.objects.create(
43+
session=session,
44+
conference=conference,
45+
participant=participant,
46+
type=Issue.TYPES_OF_ISSUES["warning"],
47+
code="getusermedia_error",
48+
data={"name": "NotFoundError"},
49+
)
50+
Issue.objects.create(
51+
session=session,
52+
conference=conference,
53+
participant=participant,
54+
type=Issue.TYPES_OF_ISSUES["warning"],
55+
code="getusermedia_error",
56+
data={"name": "NotFoundError"},
57+
)
58+
59+
response = self.client.get(
60+
"/v1/conferences",
61+
{
62+
"appId": str(self.app.id),
63+
"issue_code": "getusermedia_error",
64+
"limit": "50",
65+
},
66+
)
67+
68+
self.assertEqual(response.status_code, 200)
69+
payload = json.loads(response.content)
70+
self.assertEqual(payload["count"], 1)
71+
self.assertEqual(len(payload["results"]), 1)
72+
self.assertEqual(payload["results"][0]["id"], str(conference.id))
73+
74+
def test_conferences_issue_code_filter_ignores_inactive_issues(self):
75+
conference_active, participant_active, session_active = self._make_conference_graph("conf-active")
76+
conference_inactive, participant_inactive, session_inactive = self._make_conference_graph("conf-inactive")
77+
78+
Issue.objects.create(
79+
session=session_active,
80+
conference=conference_active,
81+
participant=participant_active,
82+
type=Issue.TYPES_OF_ISSUES["warning"],
83+
code="getusermedia_error",
84+
data={"name": "NotFoundError"},
85+
is_active=True,
86+
)
87+
Issue.objects.create(
88+
session=session_inactive,
89+
conference=conference_inactive,
90+
participant=participant_inactive,
91+
type=Issue.TYPES_OF_ISSUES["warning"],
92+
code="getusermedia_error",
93+
data={"name": "NotReadableError"},
94+
is_active=False,
95+
)
96+
97+
response = self.client.get(
98+
"/v1/conferences",
99+
{
100+
"appId": str(self.app.id),
101+
"issue_code": "getusermedia_error",
102+
"limit": "50",
103+
},
104+
)
105+
106+
self.assertEqual(response.status_code, 200)
107+
payload = json.loads(response.content)
108+
returned_ids = {row["id"] for row in payload["results"]}
109+
self.assertEqual(returned_ids, {str(conference_active.id)})
110+
self.assertEqual(payload["count"], 1)
111+
112+
def test_gum_summary_skips_non_dict_issue_data(self):
113+
conference, participant, session = self._make_conference_graph("conf-gum")
114+
115+
Issue.objects.create(
116+
session=session,
117+
conference=conference,
118+
participant=participant,
119+
type=Issue.TYPES_OF_ISSUES["warning"],
120+
code="getusermedia_error",
121+
data={"name": "NotAllowedError", "message": "Permission denied"},
122+
)
123+
Issue.objects.create(
124+
session=session,
125+
conference=conference,
126+
participant=participant,
127+
type=Issue.TYPES_OF_ISSUES["warning"],
128+
code="getusermedia_error",
129+
data="malformed",
130+
)
131+
132+
response = self.client.get(
133+
"/v1/issues/gum-summary",
134+
{
135+
"appId": str(self.app.id),
136+
},
137+
)
138+
139+
self.assertEqual(response.status_code, 200)
140+
payload = json.loads(response.content)
141+
self.assertEqual(payload["total"], 1)
142+
self.assertEqual(payload["data"][0]["name"], "NotAllowedError")
143+
self.assertEqual(payload["data"][0]["count"], 1)
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
"""
2+
Smoke tests for prewarm_summaries management command (PR #28).
3+
"""
4+
from io import StringIO
5+
6+
from django.core.management import call_command
7+
from django.test import TestCase
8+
9+
from app.models.app import App
10+
from app.models.conference import Conference
11+
from app.models.organization import Organization
12+
13+
14+
class PrewarmSummariesSmokeTests(TestCase):
15+
def test_runs_with_no_qualifying_apps(self):
16+
out = StringIO()
17+
err = StringIO()
18+
call_command('prewarm_summaries', stdout=out, stderr=err)
19+
20+
combined = out.getvalue() + err.getvalue()
21+
self.assertIn('Warming 0 active apps', combined)
22+
self.assertIn('Warmed 0 summary entries across 0 apps', combined)
23+
24+
def test_runs_for_one_app_with_recent_conference(self):
25+
org = Organization.objects.create(name='Prewarm Org')
26+
app = App.objects.create(
27+
name='Prewarm App',
28+
api_key='b' * 32,
29+
organization=org,
30+
)
31+
Conference.objects.create(
32+
conference_id='prewarm-conf-1',
33+
app=app,
34+
)
35+
36+
out = StringIO()
37+
call_command('prewarm_summaries', stdout=out, stderr=StringIO())
38+
text = out.getvalue()
39+
40+
self.assertIn('Warming 1 active apps', text)
41+
self.assertIn('Warmed 8 summary entries across 1 apps', text)

0 commit comments

Comments
 (0)