|
| 1 | +--- |
| 2 | +status: Draft |
| 3 | +date: 2026-03-15 |
| 4 | +deciders: |
| 5 | + - aaronsb |
| 6 | + - claude |
| 7 | +related: |
| 8 | + - ADR-031 |
| 9 | + - ADR-041 |
| 10 | + - ADR-042 |
| 11 | + - ADR-049 |
| 12 | +--- |
| 13 | + |
| 14 | +# ADR-800: Dynamic Model Catalog and OpenRouter Support |
| 15 | + |
| 16 | +## Context |
| 17 | + |
| 18 | +Model lists for each AI provider are currently hardcoded in `ai_providers.py`. When providers add or retire models, the code must be updated and redeployed. There is no way for operators to discover available models at runtime or curate a preferred subset for their deployment. |
| 19 | + |
| 20 | +Additionally, pricing information is driven by environment variables (`TOKEN_COST_*`) with static defaults. This makes cost tracking fragile — prices change, new models appear, and operators must manually research and update values. |
| 21 | + |
| 22 | +The system currently supports three inference providers (OpenAI, Anthropic, Ollama). **OpenRouter** is a fourth provider type that offers a unified API across 200+ models from multiple upstream providers (OpenAI, Anthropic, Google, Meta, Mistral, etc.) via an OpenAI-compatible endpoint. OpenRouter is interesting because: |
| 23 | + |
| 24 | +- It exposes the same models available directly from other providers (e.g., `openai/gpt-4o`, `anthropic/claude-sonnet-4`), creating overlap |
| 25 | +- It includes per-model pricing in its catalog API (`GET /api/v1/models`), with prompt and completion costs per token |
| 26 | +- It provides automatic provider routing and fallback for the same model across multiple GPU providers |
| 27 | +- Its API is OpenAI-SDK-compatible (`https://openrouter.ai/api/v1`), so the implementation can reuse existing OpenAI client code with a different base URL |
| 28 | + |
| 29 | +The desired operator workflow is: |
| 30 | + |
| 31 | +1. **Select a provider endpoint** (OpenAI, Anthropic, Ollama, OpenRouter) |
| 32 | +2. **Validate the connection** (API key check or endpoint reachability) |
| 33 | +3. **Browse available models** — either from a previously-fetched cached catalog, or by fetching the full list from the provider API |
| 34 | +4. **Curate a subset** — select which models to offer for extraction/embedding use |
| 35 | +5. **Persist the curated list** — stored per-provider in the database, including pricing metadata where available |
| 36 | + |
| 37 | +## Decision |
| 38 | + |
| 39 | +### 1. New database table: `kg_api.provider_model_catalog` |
| 40 | + |
| 41 | +A single table stores the cached model catalog for all providers. Each row is one model from one provider. |
| 42 | + |
| 43 | +```sql |
| 44 | +CREATE TABLE kg_api.provider_model_catalog ( |
| 45 | + id SERIAL PRIMARY KEY, |
| 46 | + provider VARCHAR(50) NOT NULL, -- 'openai', 'anthropic', 'ollama', 'openrouter' |
| 47 | + model_id VARCHAR(300) NOT NULL, -- Provider's model identifier |
| 48 | + display_name VARCHAR(300), -- Human-friendly name |
| 49 | + category VARCHAR(50) NOT NULL, -- 'extraction', 'embedding', 'vision', 'translation' |
| 50 | + context_length INTEGER, |
| 51 | + max_completion_tokens INTEGER, |
| 52 | + supports_vision BOOLEAN DEFAULT FALSE, |
| 53 | + supports_json_mode BOOLEAN DEFAULT FALSE, |
| 54 | + supports_tool_use BOOLEAN DEFAULT FALSE, |
| 55 | + supports_streaming BOOLEAN DEFAULT TRUE, |
| 56 | + |
| 57 | + -- Pricing (USD per 1M tokens, NULL = unknown/free) |
| 58 | + price_prompt_per_m NUMERIC(12, 6), -- Input/prompt cost |
| 59 | + price_completion_per_m NUMERIC(12, 6), -- Output/completion cost |
| 60 | + price_cache_read_per_m NUMERIC(12, 6), -- Cached input cost (if applicable) |
| 61 | + |
| 62 | + -- Curation |
| 63 | + enabled BOOLEAN DEFAULT FALSE, -- Operator has selected this model |
| 64 | + is_default BOOLEAN DEFAULT FALSE, -- Default model for this provider+category |
| 65 | + sort_order INTEGER DEFAULT 0, -- Display ordering |
| 66 | + |
| 67 | + -- Metadata |
| 68 | + upstream_provider VARCHAR(100), -- For OpenRouter: the actual provider (e.g., 'anthropic') |
| 69 | + raw_metadata JSONB, -- Full provider response for this model |
| 70 | + fetched_at TIMESTAMPTZ, -- When catalog was last refreshed |
| 71 | + created_at TIMESTAMPTZ DEFAULT NOW(), |
| 72 | + updated_at TIMESTAMPTZ DEFAULT NOW(), |
| 73 | + |
| 74 | + UNIQUE(provider, model_id, category) |
| 75 | +); |
| 76 | + |
| 77 | +-- One default per provider+category |
| 78 | +CREATE UNIQUE INDEX idx_catalog_default |
| 79 | +ON kg_api.provider_model_catalog(provider, category) |
| 80 | +WHERE is_default = TRUE; |
| 81 | +``` |
| 82 | + |
| 83 | +### 2. Provider catalog fetch implementations |
| 84 | + |
| 85 | +Each provider implements a `fetch_model_catalog()` class method that returns normalized model metadata: |
| 86 | + |
| 87 | +| Provider | Source | Pricing available? | |
| 88 | +|----------|--------|--------------------| |
| 89 | +| **OpenAI** | `GET /v1/models` (API) | No — hardcode known prices, flag unknown models | |
| 90 | +| **Anthropic** | Hardcoded list (no catalog API) | Hardcode known prices | |
| 91 | +| **Ollama** | `GET /api/tags` (local instance) | N/A — local, cost is $0 | |
| 92 | +| **OpenRouter** | `GET /api/v1/models` (API) | Yes — `pricing.prompt` and `pricing.completion` per-token in response | |
| 93 | + |
| 94 | +For OpenRouter, the catalog response includes: |
| 95 | +```json |
| 96 | +{ |
| 97 | + "id": "anthropic/claude-sonnet-4", |
| 98 | + "name": "Claude Sonnet 4", |
| 99 | + "context_length": 200000, |
| 100 | + "pricing": { "prompt": "0.000003", "completion": "0.000015" }, |
| 101 | + "architecture": { "modality": "text->text", "input_modalities": ["text", "image"] }, |
| 102 | + "supported_parameters": ["temperature", "tools", "response_format", ...] |
| 103 | +} |
| 104 | +``` |
| 105 | + |
| 106 | +Pricing values from OpenRouter are per-token strings; the fetch implementation converts to per-1M-token numeric values for storage. |
| 107 | + |
| 108 | +### 3. OpenRouter provider implementation |
| 109 | + |
| 110 | +`OpenRouterProvider` extends the provider interface, reusing the OpenAI Python SDK with: |
| 111 | + |
| 112 | +```python |
| 113 | +client = openai.OpenAI( |
| 114 | + api_key=openrouter_api_key, |
| 115 | + base_url="https://openrouter.ai/api/v1", |
| 116 | + default_headers={ |
| 117 | + "HTTP-Referer": "https://github.com/aaronsb/knowledge-graph-system", |
| 118 | + "X-OpenRouter-Title": "Knowledge Graph System" |
| 119 | + } |
| 120 | +) |
| 121 | +``` |
| 122 | + |
| 123 | +Key differences from direct OpenAI: |
| 124 | +- Model IDs are namespaced: `openai/gpt-4o`, `anthropic/claude-sonnet-4`, `google/gemini-2.5-pro` |
| 125 | +- No direct embedding support — extraction only, pairs with existing embedding providers |
| 126 | +- Provider routing preferences can be passed via `extra_body={"provider": {...}}` |
| 127 | + |
| 128 | +### 4. Operator workflow via configure.py and API |
| 129 | + |
| 130 | +**CLI flow** (via `configure.py models`): |
| 131 | +``` |
| 132 | +$ configure.py models list openai # Show cached catalog (enabled models) |
| 133 | +$ configure.py models refresh openai # Fetch fresh catalog from provider API |
| 134 | +$ configure.py models enable openai gpt-4o # Enable a model for use |
| 135 | +$ configure.py models disable openai gpt-4o |
| 136 | +$ configure.py models default openai gpt-4o extraction # Set default |
| 137 | +``` |
| 138 | + |
| 139 | +**API endpoints** (admin): |
| 140 | +``` |
| 141 | +GET /admin/models/catalog?provider=openai # List catalog |
| 142 | +POST /admin/models/catalog/refresh # Fetch from provider |
| 143 | +PUT /admin/models/catalog/{id}/enable # Enable/disable |
| 144 | +PUT /admin/models/catalog/{id}/default # Set as default |
| 145 | +``` |
| 146 | + |
| 147 | +**Validation flow on first configuration**: |
| 148 | +1. Operator selects provider and provides API key |
| 149 | +2. System validates connectivity (existing `validate_api_key` pattern) |
| 150 | +3. If no cached catalog exists, prompt to fetch |
| 151 | +4. Operator selects models from fetched list |
| 152 | +5. Selected models stored as `enabled=TRUE` in catalog table |
| 153 | +6. `ai_extraction_config` references catalog entries for the active model |
| 154 | + |
| 155 | +### 5. Cost tracking integration |
| 156 | + |
| 157 | +The existing `job_analysis.py` cost estimator currently looks up `TOKEN_COST_*` env vars. This changes to: |
| 158 | + |
| 159 | +1. Look up the active model in `provider_model_catalog` |
| 160 | +2. Use `price_prompt_per_m` and `price_completion_per_m` from the catalog row |
| 161 | +3. Fall back to env vars if catalog pricing is NULL (backward compatibility) |
| 162 | +4. OpenRouter pricing auto-populates from their catalog API; other providers use hardcoded defaults that operators can override via `configure.py models price <provider> <model> --prompt <cost> --completion <cost>` |
| 163 | + |
| 164 | +### 6. OpenRouter model overlap handling |
| 165 | + |
| 166 | +When the same underlying model is available both directly and via OpenRouter (e.g., `gpt-4o` via OpenAI and `openai/gpt-4o` via OpenRouter): |
| 167 | + |
| 168 | +- Both appear in the catalog as separate rows (different `provider` column) |
| 169 | +- The `upstream_provider` field on OpenRouter entries identifies the actual provider |
| 170 | +- Cost comparison is visible in the catalog listing |
| 171 | +- The operator chooses which route to use — no automatic arbitrage |
| 172 | +- The UI/CLI can flag overlap: "This model is also available directly via OpenAI at $X vs OpenRouter at $Y" |
| 173 | + |
| 174 | +## Consequences |
| 175 | + |
| 176 | +### Positive |
| 177 | + |
| 178 | +- Models are discoverable at runtime — no code changes when providers add models |
| 179 | +- Pricing data is fetched from the source (OpenRouter) or maintained in one place (catalog table) rather than scattered across env vars |
| 180 | +- Operators can curate exactly which models are available to users |
| 181 | +- OpenRouter support opens access to 200+ models through a single API key |
| 182 | +- Cost estimates become more accurate with per-model pricing from the catalog |
| 183 | +- The pattern is extensible — future providers (Google AI, AWS Bedrock) fit the same `fetch_model_catalog()` interface |
| 184 | + |
| 185 | +### Negative |
| 186 | + |
| 187 | +- Additional database table and migration to maintain |
| 188 | +- Catalog staleness — fetched data can drift from reality (mitigated by `fetched_at` timestamp and refresh workflow) |
| 189 | +- OpenRouter adds a proxy hop and markup vs. direct provider access |
| 190 | +- Anthropic has no catalog API — their model list remains partially hardcoded until they offer one |
| 191 | + |
| 192 | +### Neutral |
| 193 | + |
| 194 | +- `ai_extraction_config` remains the "active config" table — this ADR adds a catalog that feeds into it, not a replacement |
| 195 | +- Existing env var cost overrides continue to work as fallback |
| 196 | +- The hardcoded `AVAILABLE_MODELS` dicts in `ai_providers.py` become seed data for initial catalog population rather than the runtime source of truth |
| 197 | + |
| 198 | +## Alternatives Considered |
| 199 | + |
| 200 | +### A. Keep model lists hardcoded, just add OpenRouter |
| 201 | + |
| 202 | +Simpler, but doesn't solve the maintenance burden. Every new model requires a code change and redeploy. Pricing stays in env vars. Rejected because the catalog pattern solves multiple problems at once. |
| 203 | + |
| 204 | +### B. External model registry (separate service or config file) |
| 205 | + |
| 206 | +A YAML/JSON config file or separate microservice for model metadata. Rejected because we already have PostgreSQL for configuration (ADR-041) and adding another config source increases operational complexity. |
| 207 | + |
| 208 | +### C. Auto-select cheapest provider for a given model |
| 209 | + |
| 210 | +Automatically route requests to the cheapest available provider when the same model is offered by multiple providers. Rejected for now — adds complexity and the operator should make deliberate cost/latency/reliability tradeoffs. Can be revisited as an enhancement. |
| 211 | + |
| 212 | +## Implementation Notes |
| 213 | + |
| 214 | +### Migration sequence |
| 215 | + |
| 216 | +1. Schema migration: create `provider_model_catalog` table |
| 217 | +2. Seed migration: populate with currently hardcoded models + known pricing |
| 218 | +3. Add `OpenRouterProvider` class to `ai_providers.py` |
| 219 | +4. Add `fetch_model_catalog()` to each provider |
| 220 | +5. Update `configure.py` with `models` subcommand |
| 221 | +6. Add admin API endpoints for catalog management |
| 222 | +7. Update `job_analysis.py` cost estimator to read from catalog |
| 223 | +8. Update web UI provider configuration to show catalog |
| 224 | + |
| 225 | +### OpenRouter API details |
| 226 | + |
| 227 | +- Base URL: `https://openrouter.ai/api/v1` |
| 228 | +- Auth: `Authorization: Bearer <key>` |
| 229 | +- Catalog: `GET /api/v1/models` — returns full model list with pricing (no auth required, but rate-limited) |
| 230 | +- Completions: `POST /api/v1/chat/completions` — OpenAI-compatible format |
| 231 | +- Generation stats: `GET /api/v1/generation?id={id}` — token usage and cost for a specific request |
0 commit comments