Skip to content

feat: add hidden debug dashboard#3574

Merged
MarkusNeusinger merged 1 commit intomainfrom
feature/debug-dashboard
Jan 9, 2026
Merged

feat: add hidden debug dashboard#3574
MarkusNeusinger merged 1 commit intomainfrom
feature/debug-dashboard

Conversation

@MarkusNeusinger
Copy link
Copy Markdown
Owner

Summary

  • Add /debug route with comprehensive status overview for internal monitoring
  • Hidden access via triple-click on "pyplots.ai" logo (Easter Egg)
  • Read-only, no authentication required

Features

  • Stats: Spec/implementation counts, coverage percentage
  • Library Coverage: Cards showing impl count, avg/min/max scores per library
  • Problem Areas: Accordions for low scores (<85), missing previews, missing tags, oldest specs
  • System Health: Database status, API response time
  • Filters: Search (ID/title), incomplete (<9), low scores (<90), missing library dropdown
  • Table: Sortable by ID, title, avg score, updated date. Color-coded quality scores

Test plan

  • Visit /debug directly
  • Triple-click on logo from homepage → navigates to debug
  • Verify library stats cards show data
  • Test filters (search, checkboxes, dropdown)
  • Click spec ID → navigates to spec page
  • Click score → navigates to implementation

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings January 9, 2026 23:26
@MarkusNeusinger MarkusNeusinger force-pushed the feature/debug-dashboard branch from a7dec56 to bad71ce Compare January 9, 2026 23:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a hidden debug dashboard feature for internal monitoring of the pyplots.ai platform. The dashboard provides comprehensive system status including spec/implementation counts, library coverage statistics, quality scores, and problem areas (low scores, missing previews, old specs).

Key Changes:

  • Backend API endpoint /debug/status that aggregates database statistics
  • Frontend DebugPage component with filters, sorting, and detailed status views
  • Easter egg access via triple-click on the "pyplots.ai" logo in the header

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
api/routers/debug.py New debug router with comprehensive status endpoint aggregating specs, implementations, scores, and problem areas
api/routers/init.py Export new debug_router for registration
api/main.py Include debug_router in application routers
app/src/pages/DebugPage.tsx Full-featured debug dashboard with stats, filters, library coverage cards, and sortable spec table
app/src/router.tsx Add /debug route outside main layout
app/src/components/Header.tsx Implement triple-click Easter egg on logo to navigate to debug page

Comment on lines +62 to +81
<Box
component="span"
onClick={(e) => {
e.preventDefault();
clickCountRef.current += 1;
if (clickTimerRef.current) clearTimeout(clickTimerRef.current);
clickTimerRef.current = setTimeout(() => {
clickCountRef.current = 0;
}, 400);
if (clickCountRef.current >= 3) {
clickCountRef.current = 0;
navigate('/debug');
}
}}
sx={{ cursor: 'pointer', userSelect: 'none' }}
>
<Box component="span" sx={{ color: '#3776AB' }}>py</Box>
<Box component="span" sx={{ color: '#FFD43B' }}>plots</Box>
<Box component="span" sx={{ color: '#1f2937' }}>.ai</Box>
</Link>
</Box>
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change removes the external link to "https://pyplots.ai" that was previously available by clicking the logo. Users could previously click the logo to visit the pyplots.ai website, but this functionality has been replaced with the triple-click debug Easter egg. Consider preserving the single-click external link behavior while adding the triple-click functionality, or document this breaking change in user-facing behavior.

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py Outdated
Comment on lines +225 to +232
# Low score specs (avg < 85)
low_score_specs: list[ProblemSpec] = []
for spec in specs_status:
if spec.avg_score is not None and spec.avg_score < 85:
low_score_specs.append(
ProblemSpec(id=spec.id, title=spec.title, issue="Low average score", value=f"{spec.avg_score:.1f}")
)
low_score_specs.sort(key=lambda x: float(x.value or 0)) # Lowest first
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 85 is used as the threshold for "low score specs" without explanation. This value should be documented (e.g., as a constant like LOW_SCORE_THRESHOLD = 85) to make the threshold clear and maintainable. The same threshold appears in the frontend with a comment "Low Scores (avg < 85)", but inconsistently the filter uses 90.

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
Comment on lines +103 to +281
@router.get("/status", response_model=DebugStatusResponse)
async def get_debug_status(request: Request, db: AsyncSession = Depends(require_db)) -> DebugStatusResponse:
"""
Get comprehensive debug dashboard data.

Includes:
- All specs with quality scores per library
- Library statistics (avg/min/max scores, coverage)
- Problem specs (low scores, old, missing data)
- System health info
"""
start_time = time.time()

repo = SpecRepository(db)
all_specs = await repo.get_all()

# ========================================================================
# Build specs list and collect statistics
# ========================================================================

specs_status: list[SpecStatusItem] = []
total_implementations = 0

# Library aggregates
library_scores: dict[str, list[float]] = {lib: [] for lib in SUPPORTED_LIBRARIES}
library_counts: dict[str, int] = dict.fromkeys(SUPPORTED_LIBRARIES, 0) # type: ignore[arg-type]

# Problem tracking
missing_preview: list[ProblemSpec] = []
missing_tags: list[ProblemSpec] = []

for spec in all_specs:
# Build library score map for this spec
spec_scores: dict[str, float | None] = dict.fromkeys(SUPPORTED_LIBRARIES, None)
spec_score_values: list[float] = []

for impl in spec.impls:
lib_id = impl.library_id
score = impl.quality_score

spec_scores[lib_id] = score
total_implementations += 1
library_counts[lib_id] += 1

if score is not None:
library_scores[lib_id].append(score)
spec_score_values.append(score)

# Check for missing preview
if not impl.preview_url:
missing_preview.append(ProblemSpec(id=spec.id, title=spec.title, issue=f"Missing preview for {lib_id}"))

# Calculate average score for this spec
avg_score = sum(spec_score_values) / len(spec_score_values) if spec_score_values else None

# Find most recent update
timestamps = [spec.updated] if spec.updated else []
timestamps.extend(impl.updated for impl in spec.impls if impl.updated)
most_recent = max(timestamps) if timestamps else None

# Check for missing tags
if not spec.tags or not any(spec.tags.values()):
missing_tags.append(ProblemSpec(id=spec.id, title=spec.title, issue="No tags defined"))

specs_status.append(
SpecStatusItem(
id=spec.id,
title=spec.title,
updated=most_recent.isoformat() if most_recent else None,
avg_score=round(avg_score, 1) if avg_score else None,
altair=spec_scores.get("altair"),
bokeh=spec_scores.get("bokeh"),
highcharts=spec_scores.get("highcharts"),
letsplot=spec_scores.get("letsplot"),
matplotlib=spec_scores.get("matplotlib"),
plotly=spec_scores.get("plotly"),
plotnine=spec_scores.get("plotnine"),
pygal=spec_scores.get("pygal"),
seaborn=spec_scores.get("seaborn"),
)
)

# Sort by updated (most recent first)
specs_status.sort(key=lambda s: (s.updated or "", s.id), reverse=True)

# ========================================================================
# Library Statistics
# ========================================================================

library_names = {
"altair": "Altair",
"bokeh": "Bokeh",
"highcharts": "Highcharts",
"letsplot": "lets-plot",
"matplotlib": "Matplotlib",
"plotly": "Plotly",
"plotnine": "plotnine",
"pygal": "Pygal",
"seaborn": "Seaborn",
}

lib_stats: list[LibraryStats] = []
for lib_id in sorted(SUPPORTED_LIBRARIES):
scores = library_scores[lib_id]
lib_stats.append(
LibraryStats(
id=lib_id,
name=library_names.get(lib_id, lib_id),
impl_count=library_counts[lib_id],
avg_score=round(sum(scores) / len(scores), 1) if scores else None,
min_score=round(min(scores), 1) if scores else None,
max_score=round(max(scores), 1) if scores else None,
)
)

# Sort by impl_count descending
lib_stats.sort(key=lambda x: x.impl_count, reverse=True)

# ========================================================================
# Problem Specs
# ========================================================================

# Low score specs (avg < 85)
low_score_specs: list[ProblemSpec] = []
for spec in specs_status:
if spec.avg_score is not None and spec.avg_score < 85:
low_score_specs.append(
ProblemSpec(id=spec.id, title=spec.title, issue="Low average score", value=f"{spec.avg_score:.1f}")
)
low_score_specs.sort(key=lambda x: float(x.value or 0)) # Lowest first

# Oldest specs (by updated timestamp)
specs_by_age = sorted(specs_status, key=lambda s: s.updated or "")
oldest_specs: list[ProblemSpec] = []
for spec in specs_by_age[:10]: # 10 oldest
if spec.updated:
try:
dt = datetime.fromisoformat(spec.updated.replace("Z", "+00:00"))
# Ensure dt is timezone-aware
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
age_days = (datetime.now(timezone.utc) - dt).days
oldest_specs.append(
ProblemSpec(id=spec.id, title=spec.title, issue="Old spec", value=f"{age_days} days ago")
)
except ValueError:
pass

# ========================================================================
# System Health
# ========================================================================

response_time_ms = (time.time() - start_time) * 1000
coverage = (total_implementations / (len(all_specs) * 9) * 100) if all_specs else 0

system_health = SystemHealth(
database_connected=True,
api_response_time_ms=round(response_time_ms, 2),
timestamp=datetime.now(timezone.utc).isoformat(),
total_specs_in_db=len(all_specs),
total_impls_in_db=total_implementations,
)

# ========================================================================
# Return Response
# ========================================================================

return DebugStatusResponse(
total_specs=len(all_specs),
total_implementations=total_implementations,
coverage_percent=round(coverage, 1),
library_stats=lib_stats,
low_score_specs=low_score_specs[:20], # Limit to 20
oldest_specs=oldest_specs,
missing_preview_specs=missing_preview[:20], # Limit to 20
missing_tags_specs=missing_tags[:20], # Limit to 20
system=system_health,
specs=specs_status,
)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new debug endpoint lacks test coverage. The repository follows comprehensive automated testing patterns (as seen in tests/integration/api/test_api_endpoints.py), but no tests were added for the new /debug/status endpoint. Integration tests should be added to verify the endpoint returns the correct data structure and handles edge cases.

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
Comment on lines +103 to +104
@router.get("/status", response_model=DebugStatusResponse)
async def get_debug_status(request: Request, db: AsyncSession = Depends(require_db)) -> DebugStatusResponse:
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug dashboard is exposed without any authentication or authorization. This endpoint reveals internal system information including database connection status, API response times, spec counts, and problem areas. While described as "internal monitoring," there's no mechanism to restrict access. Consider adding at least basic authentication (e.g., API key, IP allowlist) or documentation clearly stating this should only be accessible in trusted environments.

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
Comment on lines +116 to +117
repo = SpecRepository(db)
all_specs = await repo.get_all()
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug endpoint loads all specs with implementations in a single query, which could cause performance issues as the database grows. Consider adding pagination or a limit parameter to prevent loading excessive data. The endpoint already limits problem areas to 20 items, but the main specs list is unlimited.

Copilot uses AI. Check for mistakes.
Comment thread app/src/components/Header.tsx Outdated
Comment on lines +64 to +75
onClick={(e) => {
e.preventDefault();
clickCountRef.current += 1;
if (clickTimerRef.current) clearTimeout(clickTimerRef.current);
clickTimerRef.current = setTimeout(() => {
clickCountRef.current = 0;
}, 400);
if (clickCountRef.current >= 3) {
clickCountRef.current = 0;
navigate('/debug');
}
}}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triple-click detection uses a 400ms timeout, but there's no cleanup for the timer when the component unmounts. This could lead to memory leaks if the Header component is unmounted while the timer is active. Add cleanup in a useEffect return function to clear the timeout on unmount.

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +81
<Box
component="span"
onClick={(e) => {
e.preventDefault();
clickCountRef.current += 1;
if (clickTimerRef.current) clearTimeout(clickTimerRef.current);
clickTimerRef.current = setTimeout(() => {
clickCountRef.current = 0;
}, 400);
if (clickCountRef.current >= 3) {
clickCountRef.current = 0;
navigate('/debug');
}
}}
sx={{ cursor: 'pointer', userSelect: 'none' }}
>
<Box component="span" sx={{ color: '#3776AB' }}>py</Box>
<Box component="span" sx={{ color: '#FFD43B' }}>plots</Box>
<Box component="span" sx={{ color: '#1f2937' }}>.ai</Box>
</Link>
</Box>
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triple-click Easter egg removes keyboard accessibility. The original Link component was keyboard-navigable, but the new Box with onClick handler doesn't respond to keyboard events (Enter/Space). This impacts users who rely on keyboard navigation. Consider adding onKeyDown handler or maintaining keyboard accessibility while adding the click counter functionality.

Copilot uses AI. Check for mistakes.
Comment thread app/src/pages/DebugPage.tsx Outdated
Comment on lines +406 to +410
label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete (&lt;9)</Typography>}
/>
<FormControlLabel
control={<Checkbox checked={showLowScores} onChange={(e) => setShowLowScores(e.target.checked)} />}
label={<Typography sx={{ fontSize: '0.875rem' }}>Low scores (&lt;90)</Typography>}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML entity encoding for less-than symbols is incorrect. In React/JSX, you should use {'<'} instead of HTML entities like < for dynamic content. While < works, it's not the React-recommended approach and may cause confusion. The correct syntax would be: label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete {'<'}9</Typography>}

Suggested change
label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete (&lt;9)</Typography>}
/>
<FormControlLabel
control={<Checkbox checked={showLowScores} onChange={(e) => setShowLowScores(e.target.checked)} />}
label={<Typography sx={{ fontSize: '0.875rem' }}>Low scores (&lt;90)</Typography>}
label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete {'<'}9</Typography>}
/>
<FormControlLabel
control={<Checkbox checked={showLowScores} onChange={(e) => setShowLowScores(e.target.checked)} />}
label={<Typography sx={{ fontSize: '0.875rem' }}>Low scores {'<'}90</Typography>}

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
oldest_specs.append(
ProblemSpec(id=spec.id, title=spec.title, issue="Old spec", value=f"{age_days} days ago")
)
except ValueError:
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except ValueError:
except ValueError:
# Ignore specs with invalid or unparsable updated timestamps when computing oldest specs

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 58 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
api/routers/debug.py 49.12% 58 Missing ⚠️

📢 Thoughts on this report? Let us know!

@MarkusNeusinger MarkusNeusinger force-pushed the feature/debug-dashboard branch from bad71ce to dce9e80 Compare January 9, 2026 23:37
Add /debug route with comprehensive status overview:
- Spec/implementation counts and coverage stats
- Library coverage cards (impl count, avg/min/max scores)
- Problem areas (low scores, missing previews/tags, oldest specs)
- System health (database status, API response time)
- Filterable table (search, incomplete, low scores, missing library)
- Sortable by ID, title, avg score, updated date

Backend: GET /debug/status endpoint with full statistics
Frontend: DebugPage with MUI components
Access: Triple-click on logo or direct /debug URL

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 9, 2026 23:39
@MarkusNeusinger MarkusNeusinger force-pushed the feature/debug-dashboard branch from dce9e80 to 171d6fa Compare January 9, 2026 23:39
@MarkusNeusinger MarkusNeusinger merged commit 5f3089b into main Jan 9, 2026
13 checks passed
@MarkusNeusinger MarkusNeusinger deleted the feature/debug-dashboard branch January 9, 2026 23:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

@@ -28,7 +28,6 @@
"ci-tests.yml",
# Specification workflows
"spec-create.yml",
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of "spec-update.yml" from the expected workflows list is correct since this workflow file doesn't exist in the .github/workflows directory. However, this change appears unrelated to the debug dashboard feature. This should either be in a separate PR focused on test cleanup, or the PR description should mention this fix.

Copilot uses AI. Check for mistakes.
<Typography sx={{ fontWeight: 600, mb: 1.5 }}>Problem Areas</Typography>
<ProblemList
items={data.low_score_specs}
title="Low Scores (avg < 85)"
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title text states "Low Scores (avg < 85)" but the backend uses a threshold of 90 (LOW_SCORE_THRESHOLD in api/routers/debug.py line 20). This should be updated to "Low Scores (avg < 90)" to match the actual filtering criteria.

Suggested change
title="Low Scores (avg < 85)"
title="Low Scores (avg < 90)"

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
Comment on lines +195 to +205
library_names = {
"altair": "Altair",
"bokeh": "Bokeh",
"highcharts": "Highcharts",
"letsplot": "lets-plot",
"matplotlib": "Matplotlib",
"plotly": "Plotly",
"plotnine": "plotnine",
"pygal": "Pygal",
"seaborn": "Seaborn",
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library_names dictionary duplicates data already available in core.constants.LIBRARIES_METADATA. Instead of hardcoding library names here, this code should derive the mapping from LIBRARIES_METADATA to maintain a single source of truth. For example: library_names = {lib["id"]: lib["name"] for lib in LIBRARIES_METADATA}

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +58
const handleLogoKeyDown = useCallback(
(e: React.KeyboardEvent) => {
if (e.key === 'Enter' || e.key === ' ') {
handleLogoClick(e);
}
},
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the Space key is pressed, e.preventDefault() should be called to prevent the default page scrolling behavior. Currently, handleLogoKeyDown calls handleLogoClick which calls preventDefault, but for the Space key specifically, the browser will scroll the page before the handler is called. Add e.preventDefault() at the beginning of handleLogoKeyDown before the conditional check.

Copilot uses AI. Check for mistakes.
tabIndex={0}
onClick={handleLogoClick}
onKeyDown={handleLogoKeyDown}
sx={{ cursor: 'pointer', userSelect: 'none', '&:focus': { outline: 'none' } }}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the focus outline with '&:focus': { outline: 'none' } creates an accessibility issue for keyboard users who won't be able to see which element has focus. Instead, provide a custom focus indicator that meets accessibility standards, such as a visible border or background change. For example: '&:focus': { outline: '2px solid #3776AB', outlineOffset: '2px' }

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
Comment on lines +106 to +117
@router.get("/status", response_model=DebugStatusResponse)
async def get_debug_status(request: Request, db: AsyncSession = Depends(require_db)) -> DebugStatusResponse:
"""
Get comprehensive debug dashboard data.

Includes:
- All specs with quality scores per library
- Library statistics (avg/min/max scores, coverage)
- Problem specs (low scores, old, missing data)
- System health info
"""
start_time = time.time()
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug endpoint exposes internal system metrics (database connection status, response times, quality scores) without authentication. While the data is read-only and mostly aggregates publicly available information, consider whether this level of system insight should be publicly accessible. If this is intentional for transparency, it's acceptable, but if this is meant to be internal-only, authentication should be added. The PR description mentions "hidden access via triple-click" but the endpoint itself at /debug/status is publicly accessible without any authentication.

Copilot uses AI. Check for mistakes.
Comment thread api/routers/debug.py
library_stats: list[LibraryStats]

# Problem areas
low_score_specs: list[ProblemSpec] # Specs with avg score < 85
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Specs with avg score < 85" but the code uses LOW_SCORE_THRESHOLD = 90 (line 20). This is inconsistent with the frontend display on line 376 of DebugPage.tsx which also shows "Low Scores (avg < 85)". The comment and frontend text should be updated to match the actual threshold of 90 or the threshold should be changed to 85 if that's the intended value.

Suggested change
low_score_specs: list[ProblemSpec] # Specs with avg score < 85
low_score_specs: list[ProblemSpec] # Specs with avg score below LOW_SCORE_THRESHOLD

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +105
<Box
component="span"
role="link"
tabIndex={0}
onClick={handleLogoClick}
onKeyDown={handleLogoKeyDown}
sx={{ cursor: 'pointer', userSelect: 'none', '&:focus': { outline: 'none' } }}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logo is now interactive but the accessibility implementation is incomplete. The element uses role="link" but links should navigate to a URL, not trigger JavaScript actions. Since this is a button-like interactive element (triggers navigation via JavaScript), it should use role="button" instead. Additionally, the aria-label attribute should be added to describe the triple-click easter egg behavior, such as aria-label="Navigate to homepage (triple-click for debug dashboard)".

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants