Predictive Exam Analytics & AI-Powered Study Plan Builder
Analyzes 10+ years of past exam papers to discover topic patterns, predict likely topics for upcoming exams, and generate AI-weighted study plans โ all for free.
- Overview
- Features
- Architecture
- Tech Stack
- Project Structure
- Getting Started
- Usage Guide
- API Reference
- Data Pipeline
- Development Workflow
- Roadmap
- Contributing
- License
ExamArchitect is a full-stack web application that ingests past exam papers (starting with GATE CS), extracts and classifies questions by topic using AI, then provides:
- Interactive Heatmaps โ year-over-year topic frequency at subject and subtopic levels
- AI Predictions โ statistically-driven probability scores for upcoming exam topics
- Dynamic Study Plans โ personalized roadmaps based on your weaknesses and available days
- Question Browser โ searchable question bank with filters, answer spoilers, and difficulty tags
- Admin Panel โ human-in-the-loop review dashboard for ingested question data
Core Principle: AI is a utility for parsing, tagging, and explaining. The core prediction engine is statistical and mathematical, not purely LLM-based, ensuring reliability and transparency.
- Subject-level heatmap showing marks distribution across 10 years
- Accordion drilldown โ click any subject row to expand subtopic-level breakdowns
- Trend line charts (Chart.js) โ click any cell to see the year-over-year trend
- Color-coded cells โ Low (amber), Medium (orange), Critical (red glow) based on marks weight
- Full-text search with 300ms debounce โ searches across all papers
- Subject filter dropdown to narrow results
- Rich question cards with gradient badges, difficulty tags (Easy/Medium/Hard), marks, and question type (MCQ/NAT)
- Answer spoilers โ collapsible reveal buttons to prevent accidental spoiling
- Subject > Topic breadcrumbs on each card
- Custom duration โ enter any number of days (15, 30, 45, 90, etc.)
- Weakness input โ type topics manually or use curated chip selectors
- AI-generated plans with phased breakdowns (Foundation โ Core โ Advanced โ Revision)
- Paper ingestion โ upload PDFs, trigger AI parsing pipeline
- Staged question review โ approve, reject, or retag parsed questions
- Re-seed button โ bulk reset and re-ingest 10 years of historical data
- AI prediction regeneration โ trigger statistical model re-computation
- PWA-ready โ installable on mobile with offline shell support
- Unicode-safe search โ handles curly quotes, apostrophes, and OCR artifacts
- Fallback LLM chain โ Gemini โ Groq โ Cerebras โ OpenRouter โ Ollama
- Zero cost โ entirely built on free-tier APIs and local tools
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Frontend (Vite + React 19) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโ โ
โ โDashboard โ โQuestion โ โStudy Planโ โ Admin โ โ PWA โ โ
โ โ& Heatmap โ โBrowser โ โGenerator โ โ Panel โ โ Shell โ โ
โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโ โ
โ โผ REST API calls โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Backend (FastAPI + Python) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โREST API โ โIngestion โ โPredictionโ โ Study Plan โ โ
โ โEndpoints โ โPipeline โ โEngine โ โ Generator โ โ
โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ SQLite Database โ โ
โ โ exam_categories โ exams โ topics โ papers โ questions โ โ
โ โ topic_year_stats โ predictions โ syllabus_versions โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ AI Layer (Utility Only) โ
โ Gemini (Primary) โ Groq (Fast) โ Cerebras (Bulk) โ Ollama โ
โ Used for: PDF parsing, topic tagging, prediction narratives โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Layer | Technology |
|---|---|
| Frontend | React 19, Vite 8, Chart.js 4, Lucide Icons, PWA |
| Backend | Python 3.11+, FastAPI, SQLAlchemy 2, Uvicorn |
| Database | SQLite (file-based, zero-config) |
| PDF Parsing | pdfplumber, PyMuPDF (fitz) |
| AI/LLM | Google Gemini, Groq, Cerebras, OpenRouter, Ollama |
| Styling | Vanilla CSS (dark mode, glassmorphism, animations) |
ExamArchitect/
โโโ backend/ # FastAPI Python backend
โ โโโ app/
โ โ โโโ main.py # FastAPI app, all REST endpoints
โ โ โโโ models.py # SQLAlchemy ORM models
โ โ โโโ database.py # DB engine & session factory
โ โ โโโ init_db.py # Database seeding (categories, exams, topics)
โ โ โโโ ingestion.py # Question ingestion & text normalization
โ โ โโโ pdf_parser.py # PDF text extraction utilities
โ โ โโโ ai_tagger.py # LLM-based topic classification
โ โ โโโ analytics.py # Statistical prediction engine
โ โ โโโ jules_utils.py # Utility helpers
โ โโโ data/ # Cached/intermediate data files
โ โโโ parse_and_ingest_all.py # Bulk PDF โ DB ingestion script
โ โโโ run.py # Uvicorn server entry point
โ โโโ requirements.txt # Python dependencies
โ โโโ .env.example # Environment variable template
โ โโโ diagnose_pdf.py # PDF debugging utility
โ
โโโ frontend/ # React + Vite frontend
โ โโโ src/
โ โ โโโ App.jsx # Main application (all views & state)
โ โ โโโ App.css # Component styles (dark mode)
โ โ โโโ index.css # Design system tokens & global styles
โ โ โโโ main.jsx # React entry point
โ โ โโโ components/
โ โ โ โโโ AdminPanel.jsx # Admin review dashboard component
โ โ โโโ assets/ # Static assets (icons, images)
โ โโโ public/ # PWA manifest, favicon
โ โโโ index.html # HTML shell
โ โโโ vite.config.js # Vite + PWA configuration
โ โโโ package.json # Node.js dependencies
โ โโโ eslint.config.js # Linting configuration
โ
โโโ pdfs/ # Source GATE CS exam PDFs (2005โ2025)
โ โโโ 2019_CS_Paper1.pdf
โ โโโ GATE2010.pdf
โ โโโ GATE-2022-part-1.pdf
โ โโโ ... (21 PDF files)
โ
โโโ .gitignore # Root gitignore
โโโ README.md # This file
| Tool | Version | Install |
|---|---|---|
| Python | 3.11+ | python.org |
| Node.js | 18+ (LTS) | nodejs.org |
| Git | Latest | git-scm.com |
# Clone the repository
git clone https://github.com/SparshGarg999/ExamArchitect.git
cd ExamArchitect
# Create Python virtual environment
cd backend
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create environment file from template
cp .env.example .env
# Edit .env and add your API keys (Gemini is the primary one needed)By default, ExamArchitect runs on a local, zero-config SQLite database (backend/exam_architect.db). To scale the application, you can connect it directly to a remote Supabase PostgreSQL instance:
- Get Supabase DB Connection URI:
- Create a project on Supabase.
- Go to Project Settings -> Database and copy the URI connection string.
- Configure Environment Variables:
- In
backend/.env, set theDATABASE_URLto your copied connection string.
- In
- Apply Row-Level Security (RLS) Policies:
- ExamArchitect enforces secure access via 27 database RLS policies. Apply them using the migration utility:
python apply_rls.py
- This automatically parses and applies
rls_migration.sqlto configure SELECT restrictions on public content and secure read/write policies on user data tables.
- ExamArchitect enforces secure access via 27 database RLS policies. Apply them using the migration utility:
- Isolated Test Suites:
- Running tests locally (via
pytest) automatically bypasses your production database and uses an in-memory SQLite sandbox. This prevents tests from truncating or corrupting your remote Supabase tables.
- Running tests locally (via
# From the project root
cd frontend
# Install Node.js dependencies
npm install# Start the backend server (this auto-creates the DB schema and seeds categories/topics)
cd backend
python run.py
# Server starts at http://localhost:8000
# In a separate terminal, start the frontend dev server
cd frontend
npm run dev
# Frontend starts at http://localhost:5173Default Admin Credentials:
- Email:
admin@examarchitect.com - Password:
AdminPassword123!(You will be prompted to change this upon your first login to the Admin Panel)
First-time data seeding:
- Open the app at
http://localhost:5173 - Navigate to the Dashboard tab
- Click the "Reset & Re-seed 10-Yr Data" button in the heatmap section
- Wait for the toast notification confirming successful ingestion
Note: The re-seed process runs
parse_and_ingest_all.pywhich extracts questions from all PDFs in thepdfs/folder, classifies them, and inserts them into the database. This may take a few minutes.
Terminal 1 โ Backend:
cd backend
venv\Scripts\activate # Windows
python run.pyTerminal 2 โ Frontend:
cd frontend
npm run devThen open http://localhost:5173 in your browser.
- Select an exam category and exam (e.g., GATE CS)
- The heatmap shows subjects as rows, years as columns, marks as cell values
- Click a subject row to expand and see subtopic breakdowns
- Click any cell to see the trend chart for that topic
- Switch to the "Question Browser" tab
- Select a paper year from the dropdown (or "All Papers" for global search)
- Use the search bar to find questions by text content
- Click "Show Answer" to reveal the correct answer
- Switch to the "Study Plan" tab
- Enter the number of days until your exam
- Add your weak topics (type manually or click curated chips)
- Click "Generate Plan" to get a phased study schedule
- Switch to the "Admin" tab
- Select an exam and paper to review staged questions
- Approve to insert into the database, Reject to skip, Retag to re-run AI classification
Base URL: http://localhost:8000
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Server health status |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/categories |
List all exam categories |
| GET | /api/exams/{exam_id} |
Get exam details with topics |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/exams/{exam_id}/heatmap |
Subject-level heatmap matrix |
| GET | /api/exams/{exam_id}/topics/{topic_id}/heatmap |
Subtopic-level heatmap (drilldown) |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/papers/{paper_id}/questions |
Questions for a paper (?search=...&subject_id=...) |
| GET | /api/questions |
Global question search across all papers |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/papers |
List all ingested papers |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/exams/{exam_id}/predictions |
Get AI predictions for an exam |
| POST | /api/exams/{exam_id}/predictions/generate |
Regenerate predictions |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/exams/{exam_id}/study-plan |
Generate a personalized plan |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/ingest/bulk |
Bulk re-seed from all PDFs |
| POST | /api/papers/{paper_id}/parse |
Re-parse/retag a specific paper |
| POST | /api/papers/{paper_id}/staged/approve |
Approve staged questions |
The ingestion pipeline transforms raw exam PDFs into structured, searchable question data:
โโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โ PDFs โโโโโโถโ PDF Extractorโโโโโโถโ Text Cleanerโโโโโโถโ Regex โ
โ (pdfs/) โ โ (pdfplumber/ โ โ (normalize โ โ Question โ
โ โ โ PyMuPDF) โ โ unicode) โ โ Splitter โ
โโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโฌโโโโโโ
โ
โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ SQLite โโโโโโโ Ingestion โโโโโโโ AI Topic โโโโโโโโโโ
โ Database โ โ Pipeline โ โ Tagger โ
โ โ โ (ingestion.pyโ โ (Gemini API)โ
โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
- PDF Extraction โ
pdfplumberorPyMuPDFextracts raw text from each page - Text Normalization โ Curly quotes (
','), em-dashes (โ), and\ufffdreplacement characters are cleaned to ASCII equivalents - Question Splitting โ Regex patterns identify question boundaries (
Q.1,Q.2, etc.) and extract question number, text, options, and marks - Question Classification โ Each question is classified as MCQ, MSQ, or NAT based on option patterns
- AI Topic Tagging โ Gemini Vision (or fallback LLMs) assigns each question to a subject โ topic from the fixed taxonomy
- Database Insertion โ Clean, tagged questions are inserted into the
questionstable with foreign key links topapersandtopics - Statistics Aggregation โ
topic_year_statsare computed (question count, total marks, avg difficulty per topic per year)
# Terminal 1 โ Backend (auto-reload on save)
cd backend
venv\Scripts\activate
python run.py
# โ http://localhost:8000 (API docs at /docs)
# Terminal 2 โ Frontend (HMR via Vite)
cd frontend
npm run dev
# โ http://localhost:5173cd frontend
npm run build
# Output โ frontend/dist/cd frontend
npm run lintDelete backend/exam_architect.db and restart the backend server. The schema will be re-created automatically. Then click "Reset & Re-seed" in the UI.
- Place PDF files in the
pdfs/directory - Update the filename-to-year mapping in
backend/parse_and_ingest_all.py - Click "Reset & Re-seed" in the dashboard, or run:
cd backend python parse_and_ingest_all.py
This project consists of a FastAPI backend and a Vite + React frontend. They are hosted independently on modern cloud platforms.
render.yaml(Infrastructure as Code): Havingrender.yamlin the root is standard practice for Render deployments. It is a Blueprint specification file that Render reads to automatically spin up your backend web service, Postgres database, and static sites.vercel.json: Defines custom routing rules and Single Page Application (SPA) rewrite rules to route all traffic toindex.htmlon Vercel.
You can host the Python backend on services like Render, Railway, or Fly.io:
- Build Command:
pip install -r requirements.txt - Start Command:
python run.py(binds Uvicorn to port8000or the$PORTenvironment variable) - Environment Variables:
DATABASE_URL: Your production PostgreSQL/Supabase connection URI.GEMINI_API_KEY: Your Google Gemini API key.TESTING: Set to""or unset in production to ensure the DB seeds properly.
- Health Check Endpoint:
/health(used for Render deployment checks and uptime monitoring).
The frontend build generates static HTML/JS/CSS assets that can be hosted on Vercel or Netlify:
- Build Command:
npm run build - Output/Publish Directory:
dist - API Base URL Config: Set the
VITE_API_BASEenvironment variable to your deployed backend URL.
- Full-stack Vite + React 19 and FastAPI + SQLite skeleton.
- Pre-seeded exam categories and full GATE CS topics taxonomy.
- Core database schema (8 tables) with fully mapped ORM relationships.
- Ingestion pipeline with visual OCR question splitting, text normalization, and diagram slicing.
- Interactive subject โ subtopic accordion heatmap with Chart.js trend charts.
- Dynamic Theme Accent Propagation: Selecting a theme color (indigo, emerald, amber, rose) dynamically customizes year-by-year cells, sparklines, loading states, planner routes, and modals.
- Premium Bento Grid Alignment: Redesigned bento layouts to fit in a balanced grid with unified card heights.
- Three.js Globe Visualizer: Floating latitude/longitude lines to radius
100.8to fix bottom hemisphere clipping. - Private Network Access (PNA) Isolation: Removed hardcoded localhost URLs in Question Card images to prevent Chrome local network warnings.
- Supabase PostgreSQL Migrations: Fully ported database layer to remote PostgreSQL.
- Row-Level Security (RLS) Policies: Enabled and enforced 27 fine-grained security policies on Supabase.
- Isolated Unit Testing: Created a sandbox environment where tests run on local in-memory SQLite, keeping production database tables safe.
- Interactive XP Balance System: Maintain a local storage XP balance where using the AI Mentor costs XP (-50 XP) and verifying correct answers gains XP (+10 XP).
- Collaborative Learning Hub: Sharing custom-curated question sets and flashcards with other aspirants.
- Holdout Validation Backtesting Visualizer: Visual scorecard evaluating prediction algorithms against actual historical exams.
feature/โ new features (e.g.feature/xp-balance-engine)fix/โ bug fixes (e.g.fix/globe-line-clipping)refactor/โ code cleaning (e.g.refactor/api-endpoints)ci/โ pipeline changes (e.g.ci/github-actions)
- Fork the repository and create your feature branch.
- Ensure you run
npm run buildlocally infrontend/to confirm Vite compiles cleanly. - Verify backend tests pass with
pytest. - Submit a PR. The GitHub Actions CI Pipeline will automatically lint code, run backend tests, and test the frontend build.
- If the PR checks pass and review looks good, it can be merged after final approval.
This project is licensed under the MIT License โ see the LICENSE file for details.