|
| 1 | +# 🏗️ System Overview |
| 2 | + |
| 3 | +## Architecture at a Glance |
| 4 | + |
| 5 | +pyplots is a **specification-driven, AI-powered platform** for Python data visualization that automatically discovers, generates, tests, and maintains plotting examples. Built as a mono-repository with clear separation between specifications and implementations. |
| 6 | + |
| 7 | +## Core Principles |
| 8 | + |
| 9 | +1. **Specification-First**: Every plot starts with a library-agnostic spec, not code |
| 10 | +2. **Your Data First**: Examples work with real user data, not fake data |
| 11 | +3. **Library Agnostic**: Support ALL Python plotting libraries |
| 12 | +4. **Fully Tested**: Every implementation is tested (90%+ coverage target) |
| 13 | +5. **AI-Generated & Maintained**: Code is generated and continuously updated by AI |
| 14 | +6. **Issue-Based State Management**: GitHub Issues as single source of truth for workflow state |
| 15 | + |
| 16 | +## System Components |
| 17 | + |
| 18 | +```mermaid |
| 19 | +graph TB |
| 20 | + subgraph "User Interface" |
| 21 | + Browser[Browser] |
| 22 | + end |
| 23 | +
|
| 24 | + subgraph "Cloud Run - Frontend" |
| 25 | + Frontend[Next.js App<br/>Static Site] |
| 26 | + end |
| 27 | +
|
| 28 | + subgraph "Cloud Run - Backend" |
| 29 | + API[FastAPI Backend<br/>REST API] |
| 30 | + end |
| 31 | +
|
| 32 | + subgraph "Cloud SQL" |
| 33 | + DB[(PostgreSQL<br/>Metadata)] |
| 34 | + end |
| 35 | +
|
| 36 | + subgraph "Google Cloud Storage" |
| 37 | + GCS[Preview Images<br/>PNG Files] |
| 38 | + end |
| 39 | +
|
| 40 | + subgraph "Automation Layer" |
| 41 | + N8N[n8n Cloud Pro<br/>External Services] |
| 42 | + GHA[GitHub Actions<br/>CI/CD Pipeline] |
| 43 | + end |
| 44 | +
|
| 45 | + subgraph "GitHub" |
| 46 | + Issues[Issues<br/>State Machine] |
| 47 | + Repo[Repository<br/>Code + Specs] |
| 48 | + end |
| 49 | +
|
| 50 | + Browser -->|HTTPS| Frontend |
| 51 | + Frontend -->|REST API| API |
| 52 | + N8N -->|REST API| API |
| 53 | + GHA -->|REST API| API |
| 54 | + API -->|SQL Private IP| DB |
| 55 | + GHA -->|Upload| GCS |
| 56 | + API -->|Read| GCS |
| 57 | + GHA -->|Create/Update| Issues |
| 58 | + GHA -->|Read/Write| Repo |
| 59 | + N8N -->|Create| Issues |
| 60 | +
|
| 61 | + style API fill:#4285f4,color:#fff |
| 62 | + style Frontend fill:#34a853,color:#fff |
| 63 | + style DB fill:#ea4335,color:#fff |
| 64 | + style GCS fill:#fbbc04,color:#000 |
| 65 | + style GHA fill:#6e5494,color:#fff |
| 66 | + style N8N fill:#ff6d5a,color:#fff |
| 67 | +``` |
| 68 | + |
| 69 | +## Component Details |
| 70 | + |
| 71 | +### Frontend (Next.js on Cloud Run) |
| 72 | + |
| 73 | +**Purpose**: User interface for browsing plots and generating visualizations |
| 74 | + |
| 75 | +**Technology**: |
| 76 | +- Framework: Next.js 14 (App Router) |
| 77 | +- Language: TypeScript |
| 78 | +- Styling: Tailwind CSS |
| 79 | +- Deployment: Cloud Run (containerized) |
| 80 | + |
| 81 | +**Key Features**: |
| 82 | +- Browse plot catalog with previews |
| 83 | +- Search and filter plots |
| 84 | +- Upload user data (CSV/Excel/JSON) |
| 85 | +- Generate plots with user data |
| 86 | +- Compare libraries side-by-side |
| 87 | +- Copy production-ready code |
| 88 | + |
| 89 | +**API Communication**: |
| 90 | +- All data access via REST API |
| 91 | +- No direct database connection |
| 92 | +- Stateless (no server-side sessions) |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +### Backend (FastAPI on Cloud Run) |
| 97 | + |
| 98 | +**Purpose**: Central API for all data operations and plot generation |
| 99 | + |
| 100 | +**Technology**: |
| 101 | +- Framework: FastAPI (async) |
| 102 | +- Language: Python 3.10+ |
| 103 | +- ORM: SQLAlchemy (async) |
| 104 | +- Package Manager: uv |
| 105 | + |
| 106 | +**Key Responsibilities**: |
| 107 | +- Serve plot metadata and code |
| 108 | +- Handle user data uploads |
| 109 | +- Execute plot generation |
| 110 | +- Manage database operations |
| 111 | +- Provide endpoints for automation |
| 112 | + |
| 113 | +**Security**: |
| 114 | +- Input validation (Pydantic) |
| 115 | +- Rate limiting |
| 116 | +- Sandboxed code execution |
| 117 | +- No permanent storage of user data |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +### Database (Cloud SQL - PostgreSQL) |
| 122 | + |
| 123 | +**Purpose**: Store metadata, not code or images |
| 124 | + |
| 125 | +**What's Stored**: |
| 126 | +- ✅ Spec metadata (title, description, tags) |
| 127 | +- ✅ Implementation metadata (library, variant, quality score) |
| 128 | +- ✅ GCS URLs (preview images) |
| 129 | +- ✅ Promotion queue (social media posts) |
| 130 | +- ❌ NO plot code (stored in repository) |
| 131 | +- ❌ NO images (stored in GCS) |
| 132 | +- ❌ NO quality reports (stored in GitHub Issues as comments) |
| 133 | + |
| 134 | +**Access**: |
| 135 | +- Only API has direct access (Private IP) |
| 136 | +- Frontend and n8n access via REST API |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +### Storage (Google Cloud Storage) |
| 141 | + |
| 142 | +**Purpose**: Host preview images and user-generated plots |
| 143 | + |
| 144 | +**Buckets**: |
| 145 | +``` |
| 146 | +gs://pyplots-images/ |
| 147 | +├── previews/{library}/{spec-id}/{variant}/v{timestamp}.png |
| 148 | +└── generated/{session_id}/{plot_id}.png (auto-deleted after 24h) |
| 149 | +``` |
| 150 | + |
| 151 | +**Lifecycle Policy**: |
| 152 | +- Current preview version: permanent |
| 153 | +- Old preview versions: auto-deleted 30 days after new version uploaded |
| 154 | +- User-generated plots: auto-deleted after 24 hours |
| 155 | + |
| 156 | +**Access**: |
| 157 | +- Public read for previews |
| 158 | +- Cache-Control: `public, max-age=31536000, immutable` |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | +### GitHub Actions (CI/CD) |
| 163 | + |
| 164 | +**Purpose**: Code-related automation workflows |
| 165 | + |
| 166 | +**Key Workflows**: |
| 167 | +- **spec-to-code.yml**: Issue labeled `approved` → Generate code → Create PR |
| 168 | +- **test-and-preview.yml**: PR opened → Multi-version tests → Generate preview |
| 169 | +- **quality-check.yml**: Preview created → Multi-LLM evaluation → Post results to Issue |
| 170 | +- **deploy.yml**: PR merged → Deploy to Cloud Run → Update metadata |
| 171 | + |
| 172 | +**Why GitHub Actions?** |
| 173 | +- ✅ Already included in GitHub Pro subscription |
| 174 | +- ✅ Transparent for contributors (workflows visible in repo) |
| 175 | +- ✅ Version-controlled workflows |
| 176 | +- ✅ Native integration with Issues/PRs |
| 177 | +- ✅ Easier for solo-dev maintenance |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +### n8n Workflows (External Automation) |
| 182 | + |
| 183 | +**Purpose**: External service integration and complex orchestration |
| 184 | + |
| 185 | +**Key Workflows**: |
| 186 | +- **Social Media Monitoring**: Daily scraping (Twitter, Reddit) → Create Issues |
| 187 | +- **Twitter Promotion**: 2x daily → Post queue items to X with images |
| 188 | +- **Issue Triage**: New issues → Labeling and assignment |
| 189 | +- **Maintenance Scheduler**: Detect LLM/library updates → Trigger GitHub workflows |
| 190 | + |
| 191 | +**Why n8n for This?** |
| 192 | +- ✅ Better for external API integrations (Twitter, Reddit) |
| 193 | +- ✅ Visual workflow editor for complex orchestration |
| 194 | +- ✅ Scheduled jobs (cron-based) |
| 195 | +- ✅ Easier to connect multiple external services |
| 196 | + |
| 197 | +**Deployment**: n8n Cloud Pro subscription (already paid) |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +### GitHub Issues (State Machine) |
| 202 | + |
| 203 | +**Purpose**: Single source of truth for plot lifecycle and quality feedback |
| 204 | + |
| 205 | +**What's Stored in Issues**: |
| 206 | +- ✅ Initial spec proposal (Markdown in issue body) |
| 207 | +- ✅ Multi-LLM quality feedback (as bot comments) |
| 208 | +- ✅ Feedback loops (attempt 1, 2, 3 results) |
| 209 | +- ✅ Deployment confirmation |
| 210 | +- ✅ Links to PRs and deployed plots |
| 211 | + |
| 212 | +**Benefits**: |
| 213 | +- No `quality_report.json` files cluttering the repository |
| 214 | +- Full transparency for community |
| 215 | +- Easy discussion and iteration |
| 216 | +- Automatic linking with PRs |
| 217 | +- Clean repository (only production code) |
| 218 | + |
| 219 | +**Update Strategy**: |
| 220 | +- Initial spec → Issue #123 |
| 221 | +- Update for matplotlib 4.0 → Issue #456 (references #123) |
| 222 | +- Add new style variant → Issue #502 (references #123) |
| 223 | +- Fix seaborn bug → Issue #534 (references #123) |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Data Flow Examples |
| 228 | + |
| 229 | +### User Generates Plot with Own Data |
| 230 | + |
| 231 | +```mermaid |
| 232 | +sequenceDiagram |
| 233 | + participant User |
| 234 | + participant Frontend |
| 235 | + participant API |
| 236 | + participant DB |
| 237 | + participant GCS |
| 238 | +
|
| 239 | + User->>Frontend: Upload CSV + Select Plot |
| 240 | + Frontend->>API: POST /plots/generate |
| 241 | + API->>DB: Get implementation code |
| 242 | + DB-->>API: Return code |
| 243 | + API->>API: Execute plot generation |
| 244 | + API->>GCS: Upload PNG (temp) |
| 245 | + GCS-->>API: Return URL |
| 246 | + API-->>Frontend: Return {image_url, code} |
| 247 | + Frontend-->>User: Display plot + code |
| 248 | +
|
| 249 | + Note over GCS: Auto-delete after 24h |
| 250 | +``` |
| 251 | + |
| 252 | +### Automated Plot Creation from Issue |
| 253 | + |
| 254 | +```mermaid |
| 255 | +sequenceDiagram |
| 256 | + participant Person |
| 257 | + participant Issues |
| 258 | + participant GHA |
| 259 | + participant API |
| 260 | + participant DB |
| 261 | + participant GCS |
| 262 | +
|
| 263 | + Person->>Issues: Create issue with spec |
| 264 | + Person->>Issues: Add label "approved" |
| 265 | + Issues->>GHA: Trigger spec-to-code.yml |
| 266 | + GHA->>GHA: Generate code with Claude |
| 267 | + GHA->>Issues: Create PR |
| 268 | + GHA->>GHA: Run tests (multi-version) |
| 269 | + GHA->>GHA: Generate preview PNG |
| 270 | + GHA->>GCS: Upload preview |
| 271 | + GHA->>GHA: Multi-LLM quality check |
| 272 | + GHA->>Issues: Post results as comment |
| 273 | +
|
| 274 | + alt Quality approved |
| 275 | + GHA->>Issues: Add label "quality-approved" |
| 276 | + Note over Issues: Human merges PR |
| 277 | + GHA->>API: POST /specs/{id}/deployed |
| 278 | + API->>DB: Update metadata |
| 279 | + GHA->>Issues: Comment "Deployed to pyplots.ai!" |
| 280 | + else Quality rejected |
| 281 | + GHA->>Issues: Add label "quality-failed-attempt-1" |
| 282 | + GHA->>GHA: Regenerate with feedback |
| 283 | + end |
| 284 | +``` |
| 285 | + |
| 286 | +--- |
| 287 | + |
| 288 | +## Communication Protocols |
| 289 | + |
| 290 | +### Frontend ↔ API |
| 291 | +- Protocol: REST over HTTPS |
| 292 | +- Format: JSON |
| 293 | +- Authentication: API keys (for premium features) |
| 294 | +- Endpoints: See [api.md](./api.md) |
| 295 | + |
| 296 | +### n8n ↔ API |
| 297 | +- Protocol: REST over HTTPS (internal) |
| 298 | +- Format: JSON |
| 299 | +- Authentication: Service account token |
| 300 | +- Use cases: Queue management, metadata updates |
| 301 | + |
| 302 | +### GitHub Actions ↔ API |
| 303 | +- Protocol: REST over HTTPS |
| 304 | +- Format: JSON |
| 305 | +- Authentication: GitHub Actions service account |
| 306 | +- Use cases: Deployment notifications, metadata updates |
| 307 | + |
| 308 | +### API ↔ Database |
| 309 | +- Protocol: PostgreSQL wire protocol (Private IP) |
| 310 | +- ORM: SQLAlchemy async |
| 311 | +- Connection pooling: Yes |
| 312 | +- Migrations: Alembic |
| 313 | + |
| 314 | +--- |
| 315 | + |
| 316 | +## Technology Stack Summary |
| 317 | + |
| 318 | +| Component | Technology | Hosting | Access | |
| 319 | +|-----------|-----------|---------|--------| |
| 320 | +| **Frontend** | Next.js 14, TypeScript, Tailwind | Cloud Run | Public HTTPS | |
| 321 | +| **Backend** | FastAPI, Python 3.10+, SQLAlchemy | Cloud Run | Public HTTPS | |
| 322 | +| **Database** | PostgreSQL 15 | Cloud SQL | Private IP (API only) | |
| 323 | +| **Storage** | Google Cloud Storage | GCS | Public read | |
| 324 | +| **CI/CD** | GitHub Actions | GitHub-hosted runners | - | |
| 325 | +| **Automation** | n8n Cloud Pro | n8n Cloud | External (webhooks to API) | |
| 326 | +| **Code** | Python, uv package manager | GitHub | Public | |
| 327 | +| **State** | GitHub Issues, Labels | GitHub | Public | |
| 328 | + |
| 329 | +--- |
| 330 | + |
| 331 | +## Deployment Environments |
| 332 | + |
| 333 | +### Development (Local) |
| 334 | +- Frontend: `localhost:3000` |
| 335 | +- Backend: `localhost:8000` |
| 336 | +- Database: Local PostgreSQL or Cloud SQL proxy |
| 337 | +- Storage: Local filesystem or GCS test bucket |
| 338 | + |
| 339 | +### Production |
| 340 | +- Frontend: `https://pyplots.ai` (Cloud Run) |
| 341 | +- Backend: `https://api.pyplots.ai` (Cloud Run) |
| 342 | +- Database: Cloud SQL (Private IP) |
| 343 | +- Storage: `gs://pyplots-images` (GCS) |
| 344 | + |
| 345 | +--- |
| 346 | + |
| 347 | +## Scalability Considerations |
| 348 | + |
| 349 | +**Current Design** (Solo Developer, Cost-Conscious): |
| 350 | +- Cloud Run: Auto-scaling from 0 to 10 instances |
| 351 | +- Database: db-f1-micro (start small, upgrade as needed) |
| 352 | +- Storage: Pay-per-use (cheap for images) |
| 353 | +- GitHub Actions: Included in Pro subscription |
| 354 | + |
| 355 | +**Future Scaling** (If Needed): |
| 356 | +- Increase Cloud Run max instances |
| 357 | +- Add Cloud CDN for GCS images |
| 358 | +- Upgrade database instance |
| 359 | +- Add Redis cache for API responses |
| 360 | +- Separate API into microservices (specs, plots, data) |
| 361 | + |
| 362 | +--- |
| 363 | + |
| 364 | +*For detailed information on specific components, see:* |
| 365 | +- [Repository Structure](./repository-structure.md) |
| 366 | +- [Automation Workflows](./automation-workflows.md) |
| 367 | +- [Database Schema](./database.md) |
| 368 | +- [API Specification](./api.md) |
0 commit comments