Conversation
a7dec56 to
bad71ce
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a hidden debug dashboard feature for internal monitoring of the pyplots.ai platform. The dashboard provides comprehensive system status including spec/implementation counts, library coverage statistics, quality scores, and problem areas (low scores, missing previews, old specs).
Key Changes:
- Backend API endpoint
/debug/statusthat aggregates database statistics - Frontend DebugPage component with filters, sorting, and detailed status views
- Easter egg access via triple-click on the "pyplots.ai" logo in the header
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| api/routers/debug.py | New debug router with comprehensive status endpoint aggregating specs, implementations, scores, and problem areas |
| api/routers/init.py | Export new debug_router for registration |
| api/main.py | Include debug_router in application routers |
| app/src/pages/DebugPage.tsx | Full-featured debug dashboard with stats, filters, library coverage cards, and sortable spec table |
| app/src/router.tsx | Add /debug route outside main layout |
| app/src/components/Header.tsx | Implement triple-click Easter egg on logo to navigate to debug page |
| <Box | ||
| component="span" | ||
| onClick={(e) => { | ||
| e.preventDefault(); | ||
| clickCountRef.current += 1; | ||
| if (clickTimerRef.current) clearTimeout(clickTimerRef.current); | ||
| clickTimerRef.current = setTimeout(() => { | ||
| clickCountRef.current = 0; | ||
| }, 400); | ||
| if (clickCountRef.current >= 3) { | ||
| clickCountRef.current = 0; | ||
| navigate('/debug'); | ||
| } | ||
| }} | ||
| sx={{ cursor: 'pointer', userSelect: 'none' }} | ||
| > | ||
| <Box component="span" sx={{ color: '#3776AB' }}>py</Box> | ||
| <Box component="span" sx={{ color: '#FFD43B' }}>plots</Box> | ||
| <Box component="span" sx={{ color: '#1f2937' }}>.ai</Box> | ||
| </Link> | ||
| </Box> |
There was a problem hiding this comment.
The change removes the external link to "https://pyplots.ai" that was previously available by clicking the logo. Users could previously click the logo to visit the pyplots.ai website, but this functionality has been replaced with the triple-click debug Easter egg. Consider preserving the single-click external link behavior while adding the triple-click functionality, or document this breaking change in user-facing behavior.
| # Low score specs (avg < 85) | ||
| low_score_specs: list[ProblemSpec] = [] | ||
| for spec in specs_status: | ||
| if spec.avg_score is not None and spec.avg_score < 85: | ||
| low_score_specs.append( | ||
| ProblemSpec(id=spec.id, title=spec.title, issue="Low average score", value=f"{spec.avg_score:.1f}") | ||
| ) | ||
| low_score_specs.sort(key=lambda x: float(x.value or 0)) # Lowest first |
There was a problem hiding this comment.
The magic number 85 is used as the threshold for "low score specs" without explanation. This value should be documented (e.g., as a constant like LOW_SCORE_THRESHOLD = 85) to make the threshold clear and maintainable. The same threshold appears in the frontend with a comment "Low Scores (avg < 85)", but inconsistently the filter uses 90.
| @router.get("/status", response_model=DebugStatusResponse) | ||
| async def get_debug_status(request: Request, db: AsyncSession = Depends(require_db)) -> DebugStatusResponse: | ||
| """ | ||
| Get comprehensive debug dashboard data. | ||
|
|
||
| Includes: | ||
| - All specs with quality scores per library | ||
| - Library statistics (avg/min/max scores, coverage) | ||
| - Problem specs (low scores, old, missing data) | ||
| - System health info | ||
| """ | ||
| start_time = time.time() | ||
|
|
||
| repo = SpecRepository(db) | ||
| all_specs = await repo.get_all() | ||
|
|
||
| # ======================================================================== | ||
| # Build specs list and collect statistics | ||
| # ======================================================================== | ||
|
|
||
| specs_status: list[SpecStatusItem] = [] | ||
| total_implementations = 0 | ||
|
|
||
| # Library aggregates | ||
| library_scores: dict[str, list[float]] = {lib: [] for lib in SUPPORTED_LIBRARIES} | ||
| library_counts: dict[str, int] = dict.fromkeys(SUPPORTED_LIBRARIES, 0) # type: ignore[arg-type] | ||
|
|
||
| # Problem tracking | ||
| missing_preview: list[ProblemSpec] = [] | ||
| missing_tags: list[ProblemSpec] = [] | ||
|
|
||
| for spec in all_specs: | ||
| # Build library score map for this spec | ||
| spec_scores: dict[str, float | None] = dict.fromkeys(SUPPORTED_LIBRARIES, None) | ||
| spec_score_values: list[float] = [] | ||
|
|
||
| for impl in spec.impls: | ||
| lib_id = impl.library_id | ||
| score = impl.quality_score | ||
|
|
||
| spec_scores[lib_id] = score | ||
| total_implementations += 1 | ||
| library_counts[lib_id] += 1 | ||
|
|
||
| if score is not None: | ||
| library_scores[lib_id].append(score) | ||
| spec_score_values.append(score) | ||
|
|
||
| # Check for missing preview | ||
| if not impl.preview_url: | ||
| missing_preview.append(ProblemSpec(id=spec.id, title=spec.title, issue=f"Missing preview for {lib_id}")) | ||
|
|
||
| # Calculate average score for this spec | ||
| avg_score = sum(spec_score_values) / len(spec_score_values) if spec_score_values else None | ||
|
|
||
| # Find most recent update | ||
| timestamps = [spec.updated] if spec.updated else [] | ||
| timestamps.extend(impl.updated for impl in spec.impls if impl.updated) | ||
| most_recent = max(timestamps) if timestamps else None | ||
|
|
||
| # Check for missing tags | ||
| if not spec.tags or not any(spec.tags.values()): | ||
| missing_tags.append(ProblemSpec(id=spec.id, title=spec.title, issue="No tags defined")) | ||
|
|
||
| specs_status.append( | ||
| SpecStatusItem( | ||
| id=spec.id, | ||
| title=spec.title, | ||
| updated=most_recent.isoformat() if most_recent else None, | ||
| avg_score=round(avg_score, 1) if avg_score else None, | ||
| altair=spec_scores.get("altair"), | ||
| bokeh=spec_scores.get("bokeh"), | ||
| highcharts=spec_scores.get("highcharts"), | ||
| letsplot=spec_scores.get("letsplot"), | ||
| matplotlib=spec_scores.get("matplotlib"), | ||
| plotly=spec_scores.get("plotly"), | ||
| plotnine=spec_scores.get("plotnine"), | ||
| pygal=spec_scores.get("pygal"), | ||
| seaborn=spec_scores.get("seaborn"), | ||
| ) | ||
| ) | ||
|
|
||
| # Sort by updated (most recent first) | ||
| specs_status.sort(key=lambda s: (s.updated or "", s.id), reverse=True) | ||
|
|
||
| # ======================================================================== | ||
| # Library Statistics | ||
| # ======================================================================== | ||
|
|
||
| library_names = { | ||
| "altair": "Altair", | ||
| "bokeh": "Bokeh", | ||
| "highcharts": "Highcharts", | ||
| "letsplot": "lets-plot", | ||
| "matplotlib": "Matplotlib", | ||
| "plotly": "Plotly", | ||
| "plotnine": "plotnine", | ||
| "pygal": "Pygal", | ||
| "seaborn": "Seaborn", | ||
| } | ||
|
|
||
| lib_stats: list[LibraryStats] = [] | ||
| for lib_id in sorted(SUPPORTED_LIBRARIES): | ||
| scores = library_scores[lib_id] | ||
| lib_stats.append( | ||
| LibraryStats( | ||
| id=lib_id, | ||
| name=library_names.get(lib_id, lib_id), | ||
| impl_count=library_counts[lib_id], | ||
| avg_score=round(sum(scores) / len(scores), 1) if scores else None, | ||
| min_score=round(min(scores), 1) if scores else None, | ||
| max_score=round(max(scores), 1) if scores else None, | ||
| ) | ||
| ) | ||
|
|
||
| # Sort by impl_count descending | ||
| lib_stats.sort(key=lambda x: x.impl_count, reverse=True) | ||
|
|
||
| # ======================================================================== | ||
| # Problem Specs | ||
| # ======================================================================== | ||
|
|
||
| # Low score specs (avg < 85) | ||
| low_score_specs: list[ProblemSpec] = [] | ||
| for spec in specs_status: | ||
| if spec.avg_score is not None and spec.avg_score < 85: | ||
| low_score_specs.append( | ||
| ProblemSpec(id=spec.id, title=spec.title, issue="Low average score", value=f"{spec.avg_score:.1f}") | ||
| ) | ||
| low_score_specs.sort(key=lambda x: float(x.value or 0)) # Lowest first | ||
|
|
||
| # Oldest specs (by updated timestamp) | ||
| specs_by_age = sorted(specs_status, key=lambda s: s.updated or "") | ||
| oldest_specs: list[ProblemSpec] = [] | ||
| for spec in specs_by_age[:10]: # 10 oldest | ||
| if spec.updated: | ||
| try: | ||
| dt = datetime.fromisoformat(spec.updated.replace("Z", "+00:00")) | ||
| # Ensure dt is timezone-aware | ||
| if dt.tzinfo is None: | ||
| dt = dt.replace(tzinfo=timezone.utc) | ||
| age_days = (datetime.now(timezone.utc) - dt).days | ||
| oldest_specs.append( | ||
| ProblemSpec(id=spec.id, title=spec.title, issue="Old spec", value=f"{age_days} days ago") | ||
| ) | ||
| except ValueError: | ||
| pass | ||
|
|
||
| # ======================================================================== | ||
| # System Health | ||
| # ======================================================================== | ||
|
|
||
| response_time_ms = (time.time() - start_time) * 1000 | ||
| coverage = (total_implementations / (len(all_specs) * 9) * 100) if all_specs else 0 | ||
|
|
||
| system_health = SystemHealth( | ||
| database_connected=True, | ||
| api_response_time_ms=round(response_time_ms, 2), | ||
| timestamp=datetime.now(timezone.utc).isoformat(), | ||
| total_specs_in_db=len(all_specs), | ||
| total_impls_in_db=total_implementations, | ||
| ) | ||
|
|
||
| # ======================================================================== | ||
| # Return Response | ||
| # ======================================================================== | ||
|
|
||
| return DebugStatusResponse( | ||
| total_specs=len(all_specs), | ||
| total_implementations=total_implementations, | ||
| coverage_percent=round(coverage, 1), | ||
| library_stats=lib_stats, | ||
| low_score_specs=low_score_specs[:20], # Limit to 20 | ||
| oldest_specs=oldest_specs, | ||
| missing_preview_specs=missing_preview[:20], # Limit to 20 | ||
| missing_tags_specs=missing_tags[:20], # Limit to 20 | ||
| system=system_health, | ||
| specs=specs_status, | ||
| ) |
There was a problem hiding this comment.
The new debug endpoint lacks test coverage. The repository follows comprehensive automated testing patterns (as seen in tests/integration/api/test_api_endpoints.py), but no tests were added for the new /debug/status endpoint. Integration tests should be added to verify the endpoint returns the correct data structure and handles edge cases.
| @router.get("/status", response_model=DebugStatusResponse) | ||
| async def get_debug_status(request: Request, db: AsyncSession = Depends(require_db)) -> DebugStatusResponse: |
There was a problem hiding this comment.
The debug dashboard is exposed without any authentication or authorization. This endpoint reveals internal system information including database connection status, API response times, spec counts, and problem areas. While described as "internal monitoring," there's no mechanism to restrict access. Consider adding at least basic authentication (e.g., API key, IP allowlist) or documentation clearly stating this should only be accessible in trusted environments.
| repo = SpecRepository(db) | ||
| all_specs = await repo.get_all() |
There was a problem hiding this comment.
The debug endpoint loads all specs with implementations in a single query, which could cause performance issues as the database grows. Consider adding pagination or a limit parameter to prevent loading excessive data. The endpoint already limits problem areas to 20 items, but the main specs list is unlimited.
| onClick={(e) => { | ||
| e.preventDefault(); | ||
| clickCountRef.current += 1; | ||
| if (clickTimerRef.current) clearTimeout(clickTimerRef.current); | ||
| clickTimerRef.current = setTimeout(() => { | ||
| clickCountRef.current = 0; | ||
| }, 400); | ||
| if (clickCountRef.current >= 3) { | ||
| clickCountRef.current = 0; | ||
| navigate('/debug'); | ||
| } | ||
| }} |
There was a problem hiding this comment.
The triple-click detection uses a 400ms timeout, but there's no cleanup for the timer when the component unmounts. This could lead to memory leaks if the Header component is unmounted while the timer is active. Add cleanup in a useEffect return function to clear the timeout on unmount.
| <Box | ||
| component="span" | ||
| onClick={(e) => { | ||
| e.preventDefault(); | ||
| clickCountRef.current += 1; | ||
| if (clickTimerRef.current) clearTimeout(clickTimerRef.current); | ||
| clickTimerRef.current = setTimeout(() => { | ||
| clickCountRef.current = 0; | ||
| }, 400); | ||
| if (clickCountRef.current >= 3) { | ||
| clickCountRef.current = 0; | ||
| navigate('/debug'); | ||
| } | ||
| }} | ||
| sx={{ cursor: 'pointer', userSelect: 'none' }} | ||
| > | ||
| <Box component="span" sx={{ color: '#3776AB' }}>py</Box> | ||
| <Box component="span" sx={{ color: '#FFD43B' }}>plots</Box> | ||
| <Box component="span" sx={{ color: '#1f2937' }}>.ai</Box> | ||
| </Link> | ||
| </Box> |
There was a problem hiding this comment.
The triple-click Easter egg removes keyboard accessibility. The original Link component was keyboard-navigable, but the new Box with onClick handler doesn't respond to keyboard events (Enter/Space). This impacts users who rely on keyboard navigation. Consider adding onKeyDown handler or maintaining keyboard accessibility while adding the click counter functionality.
| label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete (<9)</Typography>} | ||
| /> | ||
| <FormControlLabel | ||
| control={<Checkbox checked={showLowScores} onChange={(e) => setShowLowScores(e.target.checked)} />} | ||
| label={<Typography sx={{ fontSize: '0.875rem' }}>Low scores (<90)</Typography>} |
There was a problem hiding this comment.
The HTML entity encoding for less-than symbols is incorrect. In React/JSX, you should use {'<'} instead of HTML entities like < for dynamic content. While < works, it's not the React-recommended approach and may cause confusion. The correct syntax would be: label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete {'<'}9</Typography>}
| label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete (<9)</Typography>} | |
| /> | |
| <FormControlLabel | |
| control={<Checkbox checked={showLowScores} onChange={(e) => setShowLowScores(e.target.checked)} />} | |
| label={<Typography sx={{ fontSize: '0.875rem' }}>Low scores (<90)</Typography>} | |
| label={<Typography sx={{ fontSize: '0.875rem' }}>Incomplete {'<'}9</Typography>} | |
| /> | |
| <FormControlLabel | |
| control={<Checkbox checked={showLowScores} onChange={(e) => setShowLowScores(e.target.checked)} />} | |
| label={<Typography sx={{ fontSize: '0.875rem' }}>Low scores {'<'}90</Typography>} |
| oldest_specs.append( | ||
| ProblemSpec(id=spec.id, title=spec.title, issue="Old spec", value=f"{age_days} days ago") | ||
| ) | ||
| except ValueError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except ValueError: | |
| except ValueError: | |
| # Ignore specs with invalid or unparsable updated timestamps when computing oldest specs |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
bad71ce to
dce9e80
Compare
Add /debug route with comprehensive status overview: - Spec/implementation counts and coverage stats - Library coverage cards (impl count, avg/min/max scores) - Problem areas (low scores, missing previews/tags, oldest specs) - System health (database status, API response time) - Filterable table (search, incomplete, low scores, missing library) - Sortable by ID, title, avg score, updated date Backend: GET /debug/status endpoint with full statistics Frontend: DebugPage with MUI components Access: Triple-click on logo or direct /debug URL Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
dce9e80 to
171d6fa
Compare
| @@ -28,7 +28,6 @@ | |||
| "ci-tests.yml", | |||
| # Specification workflows | |||
| "spec-create.yml", | |||
There was a problem hiding this comment.
The removal of "spec-update.yml" from the expected workflows list is correct since this workflow file doesn't exist in the .github/workflows directory. However, this change appears unrelated to the debug dashboard feature. This should either be in a separate PR focused on test cleanup, or the PR description should mention this fix.
| <Typography sx={{ fontWeight: 600, mb: 1.5 }}>Problem Areas</Typography> | ||
| <ProblemList | ||
| items={data.low_score_specs} | ||
| title="Low Scores (avg < 85)" |
There was a problem hiding this comment.
The title text states "Low Scores (avg < 85)" but the backend uses a threshold of 90 (LOW_SCORE_THRESHOLD in api/routers/debug.py line 20). This should be updated to "Low Scores (avg < 90)" to match the actual filtering criteria.
| title="Low Scores (avg < 85)" | |
| title="Low Scores (avg < 90)" |
| library_names = { | ||
| "altair": "Altair", | ||
| "bokeh": "Bokeh", | ||
| "highcharts": "Highcharts", | ||
| "letsplot": "lets-plot", | ||
| "matplotlib": "Matplotlib", | ||
| "plotly": "Plotly", | ||
| "plotnine": "plotnine", | ||
| "pygal": "Pygal", | ||
| "seaborn": "Seaborn", | ||
| } |
There was a problem hiding this comment.
The library_names dictionary duplicates data already available in core.constants.LIBRARIES_METADATA. Instead of hardcoding library names here, this code should derive the mapping from LIBRARIES_METADATA to maintain a single source of truth. For example: library_names = {lib["id"]: lib["name"] for lib in LIBRARIES_METADATA}
| const handleLogoKeyDown = useCallback( | ||
| (e: React.KeyboardEvent) => { | ||
| if (e.key === 'Enter' || e.key === ' ') { | ||
| handleLogoClick(e); | ||
| } | ||
| }, |
There was a problem hiding this comment.
When the Space key is pressed, e.preventDefault() should be called to prevent the default page scrolling behavior. Currently, handleLogoKeyDown calls handleLogoClick which calls preventDefault, but for the Space key specifically, the browser will scroll the page before the handler is called. Add e.preventDefault() at the beginning of handleLogoKeyDown before the conditional check.
| tabIndex={0} | ||
| onClick={handleLogoClick} | ||
| onKeyDown={handleLogoKeyDown} | ||
| sx={{ cursor: 'pointer', userSelect: 'none', '&:focus': { outline: 'none' } }} |
There was a problem hiding this comment.
Removing the focus outline with '&:focus': { outline: 'none' } creates an accessibility issue for keyboard users who won't be able to see which element has focus. Instead, provide a custom focus indicator that meets accessibility standards, such as a visible border or background change. For example: '&:focus': { outline: '2px solid #3776AB', outlineOffset: '2px' }
| @router.get("/status", response_model=DebugStatusResponse) | ||
| async def get_debug_status(request: Request, db: AsyncSession = Depends(require_db)) -> DebugStatusResponse: | ||
| """ | ||
| Get comprehensive debug dashboard data. | ||
|
|
||
| Includes: | ||
| - All specs with quality scores per library | ||
| - Library statistics (avg/min/max scores, coverage) | ||
| - Problem specs (low scores, old, missing data) | ||
| - System health info | ||
| """ | ||
| start_time = time.time() |
There was a problem hiding this comment.
The debug endpoint exposes internal system metrics (database connection status, response times, quality scores) without authentication. While the data is read-only and mostly aggregates publicly available information, consider whether this level of system insight should be publicly accessible. If this is intentional for transparency, it's acceptable, but if this is meant to be internal-only, authentication should be added. The PR description mentions "hidden access via triple-click" but the endpoint itself at /debug/status is publicly accessible without any authentication.
| library_stats: list[LibraryStats] | ||
|
|
||
| # Problem areas | ||
| low_score_specs: list[ProblemSpec] # Specs with avg score < 85 |
There was a problem hiding this comment.
The comment states "Specs with avg score < 85" but the code uses LOW_SCORE_THRESHOLD = 90 (line 20). This is inconsistent with the frontend display on line 376 of DebugPage.tsx which also shows "Low Scores (avg < 85)". The comment and frontend text should be updated to match the actual threshold of 90 or the threshold should be changed to 85 if that's the intended value.
| low_score_specs: list[ProblemSpec] # Specs with avg score < 85 | |
| low_score_specs: list[ProblemSpec] # Specs with avg score below LOW_SCORE_THRESHOLD |
| <Box | ||
| component="span" | ||
| role="link" | ||
| tabIndex={0} | ||
| onClick={handleLogoClick} | ||
| onKeyDown={handleLogoKeyDown} | ||
| sx={{ cursor: 'pointer', userSelect: 'none', '&:focus': { outline: 'none' } }} |
There was a problem hiding this comment.
The logo is now interactive but the accessibility implementation is incomplete. The element uses role="link" but links should navigate to a URL, not trigger JavaScript actions. Since this is a button-like interactive element (triggers navigation via JavaScript), it should use role="button" instead. Additionally, the aria-label attribute should be added to describe the triple-click easter egg behavior, such as aria-label="Navigate to homepage (triple-click for debug dashboard)".
Summary
/debugroute with comprehensive status overview for internal monitoringFeatures
Test plan
/debugdirectly🤖 Generated with Claude Code