Architecture Guide

This document defines the standard architecture patterns for building products in the AI Product OS.

All architecture and engineering agents must reference this guide when designing systems.

System Architecture Principles

1. Monolith-First for MVPs

Start with a single Next.js application containing frontend, backend API routes, and cron endpoints
Avoid premature microservices - they add complexity without MVP-stage benefits
Split services only when you have clear scalability requirements backed by metrics

2. Serverless-Native Design

Platform: Default to Vercel for Next.js deployments (serverless functions + edge runtime)
Constraints:
- 10-second timeout for Hobby tier, 60s for Pro (design accordingly)
- No persistent processes or long-running background jobs
- Stateless functions only
Fan-Out Pattern: Cron jobs should trigger, not process. Use worker invocations for per-entity operations.

3. Database-First Planning

Define schema before writing application code
Use relational databases (PostgreSQL via Supabase) for structured data
Enable Row-Level Security (RLS) from day one, even in MVP
Write schema.sql files that can be run idempotently

Standard Tech Stack

Frontend

Framework: Next.js 16+ (App Router)
Language: TypeScript (strict mode)
Styling: Tailwind CSS 4+ (utility-first, responsive design)
UI Components:
- Framer Motion (animations)
- Lucide React (icons)
- Radix UI or Shadcn (accessible primitives)
State Management: React hooks + optimistic UI patterns
Client-Side Routing: Next.js App Router navigation

Backend

API Layer: Next.js API Routes (serverless functions)
Database: Supabase (PostgreSQL + Auth + Storage)
ORM/Client: @supabase/supabase-js (official client)
Authentication: Supabase GoTrue (email/password, OAuth providers)
Cron Jobs: Vercel Cron or external triggers (Upstash QStash)

AI Integration

Primary: Google Gemini (@google/genai SDK)
- Use gemini-2.5-flash for speed-critical operations (<2s response)
- Use gemini-2.5-pro for complex reasoning
- Always use Structured Outputs (JSON Schema) to guarantee valid responses
Alternatives: OpenAI GPT-4, Anthropic Claude
Prompt Engineering: Store prompts in code, version control them

Analytics & Monitoring

Product Analytics: PostHog (web + server-side events)
Error Tracking: Vercel Runtime Logs (MVP), Sentry (production)
Performance: Built-in Next.js analytics, Web Vitals

Architecture Patterns

API Route Design

RESTful Endpoints

// app/api/[resource]/route.ts
export async function GET(req: Request) {
    // List/Read operations
    // MUST include .limit() on queries
}

export async function POST(req: Request) {
    // Create operations
    // Validate input, call AI if needed, persist to DB
}

export async function PUT(req: Request) {
    // Update operations
    // Use query params for IDs: /api/tasks?id=123
}

export async function DELETE(req: Request) {
    // Delete operations (use soft deletes where possible)
}

Request Flow

Validation: Check auth, validate input shape/size
External Calls: AI APIs, third-party services (MUST await)
Database Operation: Single source of truth
Response: Return JSON with success/error structure
Telemetry: Log event to PostHog before returning

Database Schema Patterns

Standard Table Structure

CREATE TABLE [entity_name] (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    user_id UUID REFERENCES auth.users(id),  -- If user-scoped

    -- Business fields
    [field_name] TEXT NOT NULL,

    -- Metadata
    created_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc', now()) NOT NULL,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc', now()) NOT NULL
);

-- Enable RLS
ALTER TABLE [entity_name] ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY "Users can only access their own data"
    ON [entity_name]
    FOR ALL
    USING (auth.uid() = user_id);

Enums

Use PostgreSQL ENUMs for constrained string fields:

CREATE TYPE task_status AS ENUM ('todo', 'in_progress', 'done');
CREATE TYPE priority AS ENUM ('low', 'medium', 'high');

Indexes

-- Index frequently queried columns
CREATE INDEX idx_tasks_user_id ON tasks(user_id);
CREATE INDEX idx_tasks_status ON tasks(status);
CREATE INDEX idx_tasks_created_at ON tasks(created_at DESC);

AI Integration Patterns

Structured Output Pattern

const response = await ai.models.generateContent({
    model: 'gemini-2.5-flash',
    contents: prompt,
    config: {
        responseMimeType: "application/json",
        responseSchema: {
            type: Type.OBJECT,
            properties: {
                category: {
                    type: Type.STRING,
                    enum: ['option1', 'option2', 'option3']
                },
                confidence: { type: Type.NUMBER }
            },
            required: ["category"]
        }
    }
});

Fallback Handling

let result: any = null;
let isFallback = false;

try {
    // Strip markdown codeblocks
    const cleanText = response.text
        .replace(/```json\n?/g, '')
        .replace(/```\n?/g, '')
        .trim();

    result = JSON.parse(cleanText);

    // Validate shape
    if (!isValidResult(result)) {
        isFallback = true;
    }
} catch (e) {
    console.error("AI parsing failed:", e);
    isFallback = true;
}

if (isFallback) {
    // Apply safe default to prevent data loss
    result = { category: 'uncategorized', title: rawInput };
}

Cron Job Architecture

Fan-Out Pattern (REQUIRED)

// L ANTI-PATTERN: Master cron processes all users synchronously
export async function GET() {
    const users = await fetchAllUsers();
    for (const user of users) {
        await processUser(user); // Sequential, hits timeout
    }
}

// � CORRECT: Master cron triggers, workers process
export async function GET() {
    const users = await fetchAllUsers();

    // Fan out to individual worker invocations
    await Promise.allSettled(
        users.map(user =>
            fetch('/api/worker', {
                method: 'POST',
                body: JSON.stringify({ userId: user.id })
            })
        )
    );

    return new Response('Triggered');
}

Batch Database Queries

// L ANTI-PATTERN: N+1 queries
for (const userId of userIds) {
    const data = await db.from('tasks').select().eq('user_id', userId);
}

// � CORRECT: Single batched query
const allData = await db
    .from('tasks')
    .select()
    .in('user_id', userIds);

Authentication Flow

Supabase Auth

// Client-side
import { supabase } from '@/lib/supabase';

const { data, error } = await supabase.auth.signUp({
    email,
    password
});

// Server-side (API route)
const token = req.headers.get('authorization')?.split(' ')[1];
const { data: user } = await supabase.auth.getUser(token);

RLS Policies

-- Read own data
CREATE POLICY "Users read own tasks"
    ON tasks FOR SELECT
    USING (auth.uid() = user_id);

-- Insert own data
CREATE POLICY "Users insert own tasks"
    ON tasks FOR INSERT
    WITH CHECK (auth.uid() = user_id);

-- Update own data
CREATE POLICY "Users update own tasks"
    ON tasks FOR UPDATE
    USING (auth.uid() = user_id);

Error Classification & Handling

Third-Party API Errors

function classifyError(error: any): 'transient' | 'permanent' | 'unknown' {
    const status = error.response?.status;

    // Permanent errors (don't retry, may suspend user)
    if ([400, 401, 403, 404].includes(status)) return 'permanent';

    // Transient errors (safe to retry)
    if ([429, 500, 502, 503, 504].includes(status)) return 'transient';

    return 'unknown';
}

// Only apply account-level consequences for permanent errors
if (classifyError(twilioError) === 'permanent') {
    await db.from('users').update({ is_active: false }).eq('id', userId);
}

Pagination & Limits

API Pagination

// REQUIRED: Always enforce limits
export async function GET(req: Request) {
    const url = new URL(req.url);
    const page = parseInt(url.searchParams.get('page') || '1');
    const limit = Math.min(
        parseInt(url.searchParams.get('limit') || '100'),
        100  // Hard cap
    );

    const { data } = await supabase
        .from('tasks')
        .select()
        .range((page - 1) * limit, page * limit - 1);

    return NextResponse.json({ data, page, limit });
}

External API Loops

// REQUIRED: Both page limit AND temporal bound
let pageCount = 0;
const MAX_PAGES = 5;
const LOOKBACK_DAYS = 30;

while (pageToken && pageCount < MAX_PAGES) {
    const messages = await gmail.users.messages.list({
        userId: 'me',
        pageToken,
        q: `newer_than:${LOOKBACK_DAYS}d`  // Temporal bound
    });

    pageToken = messages.nextPageToken;
    pageCount++;
}

Telemetry Architecture

Event Taxonomy

Structure events by lifecycle stage:

User Onboarding

landing_page_viewed
signup_started
signup_completed
first_action_completed

Core Feature Usage

[feature]_submitted (user initiates action)
[feature]_completed (system confirms success)
[feature]_failed (system encounters error)

System Health

ai_fallback_triggered (AI processing failed)
api_timeout (external service slow)
retry_exhausted (permanent failure)

Implementation

// Frontend (posthog-js)
import { usePostHog } from 'posthog-js/react';

const posthog = usePostHog();
posthog.capture('task_submitted', {
    input_length: text.length,
    timestamp: Date.now()
});

// Backend (posthog-node)
import PostHog from 'posthog-node';

const client = new PostHog(process.env.POSTHOG_KEY);
client.capture({
    distinctId: userId,
    event: 'task_categorized',
    properties: {
        category,
        ai_latency_ms: latency
    }
});
await client.shutdown();

Deployment Architecture

Vercel Production Setup

# Environment Variables (set in Vercel dashboard)
NEXT_PUBLIC_SUPABASE_URL=https://xxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJxxx
SUPABASE_SERVICE_ROLE_KEY=eyJxxx  # Server-only
GEMINI_API_KEY=AIzxxx            # Server-only
NEXT_PUBLIC_POSTHOG_KEY=phc_xxx
NEXT_PUBLIC_POSTHOG_HOST=https://app.posthog.com

Build & Deploy Checklist

Schema Applied: Run schema.sql against Supabase before first deploy
Env Vars Set: All required secrets configured in platform
Build Success: npm run build completes without errors
RLS Enabled: Row-level security active on all tables
Telemetry Wired: PostHog events firing on critical paths
Error Handling: All API routes return proper 400/500 responses

Anti-Patterns to Avoid

L Don't Do This

Fire-and-forget promises in serverless functions
Unbounded database queries without .limit()
Synchronous loops over async operations (use Promise.all())
Processing all users in a single cron execution
Using AI snippets/previews instead of full payloads
Treating all third-party errors as permanent failures
Skipping RLS because "it's just an MVP"
Adding telemetry after QA instead of during implementation

Decision Framework

When designing architecture, answer these questions:

Scalability: Will this work with 10x users? 100x?
Failure Modes: What happens if the AI times out? If the DB is slow?
Cost: Does this architecture fit within platform free tiers?
Observability: Can I debug this in production with logs alone?
Security: Is user data properly isolated and encrypted?

References

Next.js App Router: https://nextjs.org/docs/app
Supabase Docs: https://supabase.com/docs
Vercel Limits: https://vercel.com/docs/limits
PostHog Docs: https://posthog.com/docs

This guide is a living document. Update it when postmortems reveal new patterns or anti-patterns.

FilesExpand file tree

architecture-guide.md

Latest commit

History