|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +CCExtractor Sample Platform - Flask web application for managing regression tests, sample uploads, and CI/CD for the CCExtractor project. Validates PRs by running CCExtractor against sample media files on GCP VMs (Linux/Windows). |
| 8 | + |
| 9 | +## Tech Stack |
| 10 | + |
| 11 | +- **Backend**: Flask 3.1, SQLAlchemy 1.4, MySQL (SQLite for tests) |
| 12 | +- **Cloud**: GCP Compute Engine (test VMs), Google Cloud Storage (samples) |
| 13 | +- **CI/CD**: GitHub Actions, GitHub API (PyGithub) |
| 14 | +- **Testing**: nose2, Flask-Testing, coverage |
| 15 | + |
| 16 | +## Commands |
| 17 | + |
| 18 | +```bash |
| 19 | +# Setup |
| 20 | +virtualenv venv && source venv/bin/activate |
| 21 | +pip install -r requirements.txt |
| 22 | +pip install -r test-requirements.txt |
| 23 | + |
| 24 | +# Run tests |
| 25 | +TESTING=True nose2 |
| 26 | + |
| 27 | +# Linting & type checking |
| 28 | +pycodestyle ./ --config=./.pycodestylerc |
| 29 | +pydocstyle ./ |
| 30 | +mypy . |
| 31 | +isort . --check-only |
| 32 | + |
| 33 | +# Database migrations |
| 34 | +export FLASK_APP=/path/to/run.py |
| 35 | +flask db upgrade # Apply migrations |
| 36 | +flask db migrate # Generate new migration |
| 37 | + |
| 38 | +# Update regression test results |
| 39 | +python manage.py update /path/to/ccextractor |
| 40 | +``` |
| 41 | + |
| 42 | +## Architecture |
| 43 | + |
| 44 | +### Module Structure |
| 45 | +Each module in `mod_*/` follows: `__init__.py`, `controllers.py` (routes), `models.py` (ORM), `forms.py` (WTForms) |
| 46 | + |
| 47 | +| Module | Purpose | |
| 48 | +|--------|---------| |
| 49 | +| `mod_ci` | GitHub webhooks, GCP VM orchestration, test execution | |
| 50 | +| `mod_regression` | Regression test definitions, categories, expected outputs | |
| 51 | +| `mod_test` | Test runs, results, progress tracking | |
| 52 | +| `mod_sample` | Sample file management, tags, extra files | |
| 53 | +| `mod_upload` | HTTP/FTP upload handling | |
| 54 | +| `mod_auth` | User auth, roles (admin/user/contributor/tester) | |
| 55 | +| `mod_customized` | Custom test runs for forks | |
| 56 | + |
| 57 | +### Key Models & Relationships |
| 58 | +``` |
| 59 | +Sample (sha hash) -> RegressionTest (command, expected_rc) -> RegressionTestOutput |
| 60 | + | |
| 61 | +Fork (GitHub repo) -> Test (platform, commit) -> TestResult -> TestResultFile |
| 62 | + -> TestProgress (status tracking) |
| 63 | +``` |
| 64 | + |
| 65 | +### CI Flow |
| 66 | +1. GitHub webhook (`/start-ci`) receives PR/push events |
| 67 | +2. Waits for GitHub Actions build artifacts |
| 68 | +3. `gcp_instance()` provisions Linux/Windows VMs |
| 69 | +4. VMs run CCExtractor, report to `progress_reporter()` |
| 70 | +5. Results compared against expected outputs |
| 71 | +6. `comment_pr()` posts results to GitHub |
| 72 | + |
| 73 | +## Critical Files |
| 74 | + |
| 75 | +- `run.py` - Flask app entry, blueprint registration |
| 76 | +- `mod_ci/controllers.py` - CI orchestration (2500+ lines) |
| 77 | +- `mod_regression/models.py` - Test definitions |
| 78 | +- `mod_test/models.py` - Test execution models |
| 79 | +- `database.py` - SQLAlchemy setup, custom types |
| 80 | +- `tests/base.py` - Test fixtures, mock helpers |
| 81 | + |
| 82 | +## GSoC 2026 Focus Areas (from Carlos) |
| 83 | + |
| 84 | +### Priority 1: Regression Test Suite |
| 85 | +The main blocker for CCExtractor Rust migration is test coverage. Current needs: |
| 86 | +- Add regression tests for uncovered caption types/containers |
| 87 | +- Import FFmpeg and VLC official video libraries as test samples |
| 88 | +- Systematic sample analysis using ffprobe, mkvnix, CCExtractor output |
| 89 | +- Goal: Trust SP enough that passing tests = safe to merge |
| 90 | + |
| 91 | +### Priority 2: Sample Platform Improvements |
| 92 | +Low-coverage modules needing work: |
| 93 | +- `mod_upload` (44% coverage) - FTP upload, progress tracking |
| 94 | +- `mod_test` (58% coverage) - diff generation, error scenarios |
| 95 | +- `mod_sample` (61% coverage) - Issue linking, tag management |
| 96 | + |
| 97 | +### Contribution Strategy |
| 98 | +1. Start with unit tests for low-coverage modules |
| 99 | +2. Add integration tests for CI flow |
| 100 | +3. Help document sample metadata systematically |
| 101 | +4. Enable confident C code removal by proving test coverage |
| 102 | + |
| 103 | +## Code Style |
| 104 | + |
| 105 | +- Type hints required (mypy enforced) |
| 106 | +- Docstrings required (pydocstyle enforced) |
| 107 | +- PEP8 (pycodestyle enforced) |
| 108 | +- Imports sorted with isort |
| 109 | + |
| 110 | +## MCP Setup (GSoC 2026) |
| 111 | + |
| 112 | +**Configured servers** (`~/.claude/settings.json`): |
| 113 | +- `github` – repo/PR/issue management (needs `GITHUB_PERSONAL_ACCESS_TOKEN` env var) |
| 114 | +- `context7` – up-to-date library docs |
| 115 | +- `filesystem` – scoped to `/home/rahul/projects/gsoc` |
| 116 | + |
| 117 | +**Security**: |
| 118 | +- Token stored in `~/.profile`, never committed |
| 119 | +- MCP paths added to `.gitignore` |
| 120 | +- pm2 config at `~/ecosystem.config.js` for auto-restart |
| 121 | + |
| 122 | +**Commands**: |
| 123 | +```bash |
| 124 | +# Start MCP servers |
| 125 | +pm2 start ~/ecosystem.config.js |
| 126 | +pm2 logs |
| 127 | + |
| 128 | +# Resume Claude session |
| 129 | +claude --resume |
| 130 | +``` |
0 commit comments