CodeSentinel is an intelligent GitHub agent that autonomously analyzes issues, generates code fixes, validates changes, and creates pull requests. It also provides AI-powered ranking of existing pull requests to help maintainers prioritize reviews. The system uses Gemini 1.5 Flash for natural language understanding and code generation, combined with automated validation tools to ensure code quality and safety.
Frontend: Modern React + TypeScript application with Tailwind CSS Backend: FastAPI REST API serving both the React app and API endpoints Workflow: Fork-based, safe, and non-intrusive contributions
Preferred communication style: Simple, everyday language.
Technology: React 18 + TypeScript + Vite
Design Pattern: SPA (Single Page Application) with client-side routing
The frontend provides five main pages:
- Landing (Home) - Hero section, stats overview, feature showcase, and how-it-works walkthrough
- Issue to PR - Input form for repository URL and issue text, real-time progress tracking
- PR Ranking - Displays scored and ranked pull requests with AI-generated recommendations
- History - Shows past agent executions and their results with timeline view
- Settings - Configuration status, API key status, and safety limits
Tech Stack:
- React Router for navigation
- TanStack Query for server state management
- Tailwind CSS for styling with custom glassmorphism utilities
- Framer Motion for animations
- Axios for HTTP requests
- TypeScript for type safety
Visual Design:
- Glassmorphism aesthetic with backdrop-filter blur effects
- Elegant maroon/wine red to white gradient color scheme
- Animated backgrounds with moving orbs
- Fixed full-width glass navbar at top
- Smooth stagger animations on page transitions
Rationale: React provides a professional, responsive UI with excellent developer experience. TypeScript ensures type safety across the frontend-backend boundary. The modular component architecture makes it easy to extend and maintain.
Core Components:
-
Agent Orchestrator (agent_core.py) - Central workflow coordinator
- Implements step-by-step issue-to-PR pipeline
- Manages state across multiple stages (clone → analyze → plan → patch → validate → PR)
- Handles error recovery and logging
- Uses run IDs for tracking and storage
-
GitHub Integration (github_client.py) - Repository operations
- Wraps PyGithub library for API calls
- Handles repository forking to authenticated user's account
- Handles repository cloning via GitPython
- Manages branch creation and PR submission
- Fetches PR metadata and diffs
- Uses fork-based workflow for safe, non-intrusive contributions
-
AI Engine (gemini_ai.py) - LLM-powered reasoning
- Uses Gemini 1.5 Flash for structured outputs
- Implements Pydantic models for type-safe AI responses
- Three main functions: issue analysis, plan generation, patch creation
- Structured JSON outputs ensure parseable results
-
Validation Pipeline (validators.py) - Code quality checks
- Pylint for linting (minimum score threshold)
- Black for formatting verification
- Radon for complexity and maintainability metrics
- File-based validation with temporary file handling
-
PR Scorer (pr_scorer.py) - Multi-factor PR evaluation
- Combines quantitative metrics (size, recency, description quality)
- Uses AI analysis for semantic code quality assessment
- Generates actionable recommendations (merge/review/needs-work)
Design Principles:
- Separation of concerns - Each module handles a distinct responsibility
- Safety-first - Multiple validation layers and configurable guardrails
- Auditability - All runs are logged with full artifact preservation
Technology: File-based JSON storage
Structure:
runs/directory contains per-run subdirectories- Each run has: run.json (metadata), diagnosis.json, plan.json, patches.json, validation.json
runs/index.jsonmaintains a searchable index of all runs
Rationale: JSON file storage was chosen over a database for simplicity and portability. This approach works well for the agent's workflow where each run is independent and doesn't require complex queries. The indexed structure allows efficient retrieval while keeping deployment dependencies minimal.
Trade-offs:
- Pros: No database setup, easy debugging, portable across environments
- Cons: Not suitable for high-concurrency or large-scale deployments
Guardrails (config.py):
- Repository whitelisting (optional, defaults to all repos)
- File change limits (max 10 files)
- Line change limits (max 500 lines)
- Minimum quality thresholds (lint score ≥7.0, complexity ≥6.0, maintainability ≥60)
Rationale: Safety limits prevent the agent from making overly broad changes that could be risky. The whitelist system allows controlled rollout and testing on trusted repositories first.
Issue-to-PR Flow:
- Fork repository to authenticated user's account (or use existing fork)
- Clone forked repository to temporary directory
- Analyze repository structure for context
- Use AI to diagnose issue (affected files, root cause, complexity)
- Generate fix plan (approach, files to modify, risks)
- Create code patches using AI
- Validate patches (lint, format, complexity)
- Apply changes and create new branch in fork
- Push changes to fork
- Open pull request from fork to original repository
- Store all artifacts and results
PR Ranking Flow:
- Fetch open PRs from repository
- For each PR, retrieve diff content
- Score based on: size appropriateness, recency, description quality, AI code analysis
- Aggregate scores and generate recommendations
- Return ranked list with actionable insights
- Gemini 1.5 Flash (Google AI) - Primary LLM for reasoning and code generation
- Requires:
GEMINI_API_KEYenvironment variable - Used for: Issue analysis, plan generation, code patch creation, PR quality assessment
- SDK:
google-genai(note: recently renamed fromgoogle-generativeai)
- Requires:
- GitHub API - Repository operations and PR management
- Requires:
GITHUB_TOKENenvironment variable (personal access token) - Libraries:
PyGithubfor API calls,GitPythonfor local git operations - Permissions needed: repo read/write, PR creation
- Requires:
- Pylint - Python linting and static analysis
- Black - Code formatting verification
- Radon - Code complexity and maintainability metrics
- Semgrep - Security scanning (referenced in architecture docs but not yet implemented)
- Streamlit - Web dashboard framework
- Pydantic - Type-safe data models for AI responses
- Standard library - tempfile, uuid, json, pathlib for file operations
- Designed for Replit or Streamlit Cloud deployment
- No database required (file-based storage)
- All dependencies installable via pip
- Requires persistent storage for
runs/directory to preserve history
Visual Redesign:
- Implemented glassmorphism aesthetic with blur effects and transparency
- Elegant maroon/wine red (HSL 345° 65% 45%) to white gradient color scheme
- Created animated background with moving gradient orbs using Canvas API
- Added Framer Motion for smooth stagger animations throughout
- Replaced generic dashboard with dedicated landing page featuring Hero, Stats, Features, and How It Works sections
- Updated all pages (Issue to PR, PR Ranking, History, Settings) with glass card effects
- Replaced sidebar with modern fixed full-width glass navbar at top
Design System Components:
.glass- Basic glassmorphism effect with backdrop-filter blur.glass-strong- Stronger glass effect for inputs and containers.glass-card- Glass cards with hover effects and glow.gradient-text- Gradient text from cyan to purple.text-glow- Text with cyan/magenta shadow glow.border-glow- Border glow effects for emphasisAnimatedBackground- Canvas with moving colored orbsGridBackground- Subtle grid overlay patternTopNav- Fixed glass navbar with gradient accents
Performance Optimizations:
- Added requestAnimationFrame cleanup to prevent memory leaks
- Removed duplicate background instances (now centralized in AppShell)
- Optimized animation loops for GPU efficiency
Design Inspiration:
- Modern tech aesthetics with clean, simplistic yet beautiful layout
- Inspired by reference repository with professional glassmorphism patterns
- Maintains accessibility with strong contrast in headings and interactive elements
Major Rewrite:
- Replaced Streamlit UI with React + TypeScript frontend
- Added FastAPI backend with REST API endpoints
- Built production-ready SPA with Vite
- Implemented modern UI components with Tailwind CSS
- Added client-side routing and state management
New API Endpoints:
/api/health- Health check/api/config- Configuration status/api/stats- Usage statistics/api/issue-to-pr- Create PR from issue/api/pr-ranking- Rank pull requests/api/runs- Run history/api/runs/{run_id}- Run details
Frontend Features:
- Dashboard with KPIs and metrics
- Real-time progress tracking for Issue-to-PR
- Sortable PR ranking table
- Run history with detailed views
- Settings page with API key status
- Responsive design for all screen sizes
Architecture Notes:
- Vite dev server runs on port 5173 (development only) with API proxy to backend on port 5000
- Production build served directly from FastAPI at port 5000
- Type-safe API client with Zod validation
- React Query for efficient data fetching and caching
Known Limitations:
- PR Details page is planned but not yet implemented (future enhancement)
- Users can see run history but cannot drill down into individual run diffs yet
- This feature existed in Streamlit and will be added to React in next iteration
Critical Bug Fix - Gemini Structured Responses:
- Fixed issue where Gemini API's structured responses (using
response_schema) returned emptyresponse.textvalues - Updated all AI functions to use
response.parsedattribute when available, with fallback to JSON parsing - Affects:
analyze_issue(),generate_fix_plan(),generate_code_patch()in gemini_ai.py - Impact: Core Issue-to-PR workflow now fully operational
Type Safety Improvements:
- Added null-safety checks for PR metadata in
get_open_pull_requests() - Handles edge cases where
created_at,updated_at, orusermight be None - Ensures robust PR ranking even with incomplete GitHub data
Git Push Authentication Fix:
- Resolved "Bad hostname" error caused by token-embedded URLs in Replit environment
- Implemented environment-based authentication using Git credential helper
- Clone and push operations now use clean HTTPS URLs without embedded tokens
- Git credentials stored securely in
~/.git-credentialswith proper permissions (0600) - Credential helper configured to use stored tokens from GITHUB_TOKEN environment variable
- Prevents Replit from rejecting malformed URLs containing authentication tokens
- Single Repository Operations: Currently processes one repository at a time
- No Persistent Database: Run history stored in JSON files (not suitable for high concurrency)
- Python Files Only: Full validation only works for Python code
- Manual Trigger: No automated scheduling or webhook integration yet
- Time-based scheduling for automated repo sweeps
- GitHub webhook receiver for real-time event processing
- Advanced validation (pytest integration, semgrep security scanning)
- Daily digest reports with merge recommendations
- Multi-repository monitoring dashboard
- PR draft mode with auto-labeling for failed validations