Skip to content

Commit bc15563

Browse files
committed
docs(adr): ADR-800 dynamic model catalog and OpenRouter support
Proposes replacing hardcoded model lists with a database-backed catalog per provider, adding OpenRouter as a fourth inference endpoint, and integrating per-model pricing into cost estimation.
1 parent 1ad4005 commit bc15563

2 files changed

Lines changed: 233 additions & 0 deletions

File tree

docs/architecture/INDEX.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ _Apache AGE, migrations, schema design, PostgreSQL_
4646
| [ADR-061](./database-schema/ADR-061-operator-pattern-lifecycle.md) | Operator Pattern for Platform Lifecycle Management | Accepted |
4747
| [ADR-200](./database-schema/ADR-200-annealing-ontologies-self-organizing-knowledge-graph-structure.md) | Annealing Ontologies — Self-Organizing Knowledge Graph Structure | Accepted |
4848
| [ADR-201](./database-schema/ADR-201-in-memory-graph-acceleration-extension.md) | In-Memory Graph Acceleration Extension | Draft |
49+
| [ADR-202](./database-schema/ADR-202-timestamp-timezone-normalization.md) | Timestamp Timezone Normalization | Proposed |
4950

5051
## Ingestion
5152
_Content processing, jobs, extraction, deduplication_
@@ -150,6 +151,7 @@ _Providers, extraction, convergence, prompts_
150151
| [ADR-058](./ai-embeddings/ADR-058-polarity-axis-triangulation.md) | Polarity Axis Triangulation for Grounding Calculation | Accepted |
151152
| [ADR-068](./ai-embeddings/ADR-068-source-text-embeddings.md) | Source Text Embeddings for Grounding Truth Retrieval | Accepted |
152153
| [ADR-070](./ai-embeddings/ADR-070-polarity-axis-analysis.md) | Polarity Axis Analysis for Bidirectional Semantic Dimensions | Accepted |
154+
| [ADR-800](./ai-embeddings/ADR-800-dynamic-model-catalog-and-openrouter-support.md) | Dynamic Model Catalog and OpenRouter Support | Draft |
153155

154156
## Meta/Process
155157
_Documentation, workflow, access models, ADR system_
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
---
2+
status: Draft
3+
date: 2026-03-15
4+
deciders:
5+
- aaronsb
6+
- claude
7+
related:
8+
- ADR-031
9+
- ADR-041
10+
- ADR-042
11+
- ADR-049
12+
---
13+
14+
# ADR-800: Dynamic Model Catalog and OpenRouter Support
15+
16+
## Context
17+
18+
Model lists for each AI provider are currently hardcoded in `ai_providers.py`. When providers add or retire models, the code must be updated and redeployed. There is no way for operators to discover available models at runtime or curate a preferred subset for their deployment.
19+
20+
Additionally, pricing information is driven by environment variables (`TOKEN_COST_*`) with static defaults. This makes cost tracking fragile — prices change, new models appear, and operators must manually research and update values.
21+
22+
The system currently supports three inference providers (OpenAI, Anthropic, Ollama). **OpenRouter** is a fourth provider type that offers a unified API across 200+ models from multiple upstream providers (OpenAI, Anthropic, Google, Meta, Mistral, etc.) via an OpenAI-compatible endpoint. OpenRouter is interesting because:
23+
24+
- It exposes the same models available directly from other providers (e.g., `openai/gpt-4o`, `anthropic/claude-sonnet-4`), creating overlap
25+
- It includes per-model pricing in its catalog API (`GET /api/v1/models`), with prompt and completion costs per token
26+
- It provides automatic provider routing and fallback for the same model across multiple GPU providers
27+
- Its API is OpenAI-SDK-compatible (`https://openrouter.ai/api/v1`), so the implementation can reuse existing OpenAI client code with a different base URL
28+
29+
The desired operator workflow is:
30+
31+
1. **Select a provider endpoint** (OpenAI, Anthropic, Ollama, OpenRouter)
32+
2. **Validate the connection** (API key check or endpoint reachability)
33+
3. **Browse available models** — either from a previously-fetched cached catalog, or by fetching the full list from the provider API
34+
4. **Curate a subset** — select which models to offer for extraction/embedding use
35+
5. **Persist the curated list** — stored per-provider in the database, including pricing metadata where available
36+
37+
## Decision
38+
39+
### 1. New database table: `kg_api.provider_model_catalog`
40+
41+
A single table stores the cached model catalog for all providers. Each row is one model from one provider.
42+
43+
```sql
44+
CREATE TABLE kg_api.provider_model_catalog (
45+
id SERIAL PRIMARY KEY,
46+
provider VARCHAR(50) NOT NULL, -- 'openai', 'anthropic', 'ollama', 'openrouter'
47+
model_id VARCHAR(300) NOT NULL, -- Provider's model identifier
48+
display_name VARCHAR(300), -- Human-friendly name
49+
category VARCHAR(50) NOT NULL, -- 'extraction', 'embedding', 'vision', 'translation'
50+
context_length INTEGER,
51+
max_completion_tokens INTEGER,
52+
supports_vision BOOLEAN DEFAULT FALSE,
53+
supports_json_mode BOOLEAN DEFAULT FALSE,
54+
supports_tool_use BOOLEAN DEFAULT FALSE,
55+
supports_streaming BOOLEAN DEFAULT TRUE,
56+
57+
-- Pricing (USD per 1M tokens, NULL = unknown/free)
58+
price_prompt_per_m NUMERIC(12, 6), -- Input/prompt cost
59+
price_completion_per_m NUMERIC(12, 6), -- Output/completion cost
60+
price_cache_read_per_m NUMERIC(12, 6), -- Cached input cost (if applicable)
61+
62+
-- Curation
63+
enabled BOOLEAN DEFAULT FALSE, -- Operator has selected this model
64+
is_default BOOLEAN DEFAULT FALSE, -- Default model for this provider+category
65+
sort_order INTEGER DEFAULT 0, -- Display ordering
66+
67+
-- Metadata
68+
upstream_provider VARCHAR(100), -- For OpenRouter: the actual provider (e.g., 'anthropic')
69+
raw_metadata JSONB, -- Full provider response for this model
70+
fetched_at TIMESTAMPTZ, -- When catalog was last refreshed
71+
created_at TIMESTAMPTZ DEFAULT NOW(),
72+
updated_at TIMESTAMPTZ DEFAULT NOW(),
73+
74+
UNIQUE(provider, model_id, category)
75+
);
76+
77+
-- One default per provider+category
78+
CREATE UNIQUE INDEX idx_catalog_default
79+
ON kg_api.provider_model_catalog(provider, category)
80+
WHERE is_default = TRUE;
81+
```
82+
83+
### 2. Provider catalog fetch implementations
84+
85+
Each provider implements a `fetch_model_catalog()` class method that returns normalized model metadata:
86+
87+
| Provider | Source | Pricing available? |
88+
|----------|--------|--------------------|
89+
| **OpenAI** | `GET /v1/models` (API) | No — hardcode known prices, flag unknown models |
90+
| **Anthropic** | Hardcoded list (no catalog API) | Hardcode known prices |
91+
| **Ollama** | `GET /api/tags` (local instance) | N/A — local, cost is $0 |
92+
| **OpenRouter** | `GET /api/v1/models` (API) | Yes — `pricing.prompt` and `pricing.completion` per-token in response |
93+
94+
For OpenRouter, the catalog response includes:
95+
```json
96+
{
97+
"id": "anthropic/claude-sonnet-4",
98+
"name": "Claude Sonnet 4",
99+
"context_length": 200000,
100+
"pricing": { "prompt": "0.000003", "completion": "0.000015" },
101+
"architecture": { "modality": "text->text", "input_modalities": ["text", "image"] },
102+
"supported_parameters": ["temperature", "tools", "response_format", ...]
103+
}
104+
```
105+
106+
Pricing values from OpenRouter are per-token strings; the fetch implementation converts to per-1M-token numeric values for storage.
107+
108+
### 3. OpenRouter provider implementation
109+
110+
`OpenRouterProvider` extends the provider interface, reusing the OpenAI Python SDK with:
111+
112+
```python
113+
client = openai.OpenAI(
114+
api_key=openrouter_api_key,
115+
base_url="https://openrouter.ai/api/v1",
116+
default_headers={
117+
"HTTP-Referer": "https://github.com/aaronsb/knowledge-graph-system",
118+
"X-OpenRouter-Title": "Knowledge Graph System"
119+
}
120+
)
121+
```
122+
123+
Key differences from direct OpenAI:
124+
- Model IDs are namespaced: `openai/gpt-4o`, `anthropic/claude-sonnet-4`, `google/gemini-2.5-pro`
125+
- No direct embedding support — extraction only, pairs with existing embedding providers
126+
- Provider routing preferences can be passed via `extra_body={"provider": {...}}`
127+
128+
### 4. Operator workflow via configure.py and API
129+
130+
**CLI flow** (via `configure.py models`):
131+
```
132+
$ configure.py models list openai # Show cached catalog (enabled models)
133+
$ configure.py models refresh openai # Fetch fresh catalog from provider API
134+
$ configure.py models enable openai gpt-4o # Enable a model for use
135+
$ configure.py models disable openai gpt-4o
136+
$ configure.py models default openai gpt-4o extraction # Set default
137+
```
138+
139+
**API endpoints** (admin):
140+
```
141+
GET /admin/models/catalog?provider=openai # List catalog
142+
POST /admin/models/catalog/refresh # Fetch from provider
143+
PUT /admin/models/catalog/{id}/enable # Enable/disable
144+
PUT /admin/models/catalog/{id}/default # Set as default
145+
```
146+
147+
**Validation flow on first configuration**:
148+
1. Operator selects provider and provides API key
149+
2. System validates connectivity (existing `validate_api_key` pattern)
150+
3. If no cached catalog exists, prompt to fetch
151+
4. Operator selects models from fetched list
152+
5. Selected models stored as `enabled=TRUE` in catalog table
153+
6. `ai_extraction_config` references catalog entries for the active model
154+
155+
### 5. Cost tracking integration
156+
157+
The existing `job_analysis.py` cost estimator currently looks up `TOKEN_COST_*` env vars. This changes to:
158+
159+
1. Look up the active model in `provider_model_catalog`
160+
2. Use `price_prompt_per_m` and `price_completion_per_m` from the catalog row
161+
3. Fall back to env vars if catalog pricing is NULL (backward compatibility)
162+
4. OpenRouter pricing auto-populates from their catalog API; other providers use hardcoded defaults that operators can override via `configure.py models price <provider> <model> --prompt <cost> --completion <cost>`
163+
164+
### 6. OpenRouter model overlap handling
165+
166+
When the same underlying model is available both directly and via OpenRouter (e.g., `gpt-4o` via OpenAI and `openai/gpt-4o` via OpenRouter):
167+
168+
- Both appear in the catalog as separate rows (different `provider` column)
169+
- The `upstream_provider` field on OpenRouter entries identifies the actual provider
170+
- Cost comparison is visible in the catalog listing
171+
- The operator chooses which route to use — no automatic arbitrage
172+
- The UI/CLI can flag overlap: "This model is also available directly via OpenAI at $X vs OpenRouter at $Y"
173+
174+
## Consequences
175+
176+
### Positive
177+
178+
- Models are discoverable at runtime — no code changes when providers add models
179+
- Pricing data is fetched from the source (OpenRouter) or maintained in one place (catalog table) rather than scattered across env vars
180+
- Operators can curate exactly which models are available to users
181+
- OpenRouter support opens access to 200+ models through a single API key
182+
- Cost estimates become more accurate with per-model pricing from the catalog
183+
- The pattern is extensible — future providers (Google AI, AWS Bedrock) fit the same `fetch_model_catalog()` interface
184+
185+
### Negative
186+
187+
- Additional database table and migration to maintain
188+
- Catalog staleness — fetched data can drift from reality (mitigated by `fetched_at` timestamp and refresh workflow)
189+
- OpenRouter adds a proxy hop and markup vs. direct provider access
190+
- Anthropic has no catalog API — their model list remains partially hardcoded until they offer one
191+
192+
### Neutral
193+
194+
- `ai_extraction_config` remains the "active config" table — this ADR adds a catalog that feeds into it, not a replacement
195+
- Existing env var cost overrides continue to work as fallback
196+
- The hardcoded `AVAILABLE_MODELS` dicts in `ai_providers.py` become seed data for initial catalog population rather than the runtime source of truth
197+
198+
## Alternatives Considered
199+
200+
### A. Keep model lists hardcoded, just add OpenRouter
201+
202+
Simpler, but doesn't solve the maintenance burden. Every new model requires a code change and redeploy. Pricing stays in env vars. Rejected because the catalog pattern solves multiple problems at once.
203+
204+
### B. External model registry (separate service or config file)
205+
206+
A YAML/JSON config file or separate microservice for model metadata. Rejected because we already have PostgreSQL for configuration (ADR-041) and adding another config source increases operational complexity.
207+
208+
### C. Auto-select cheapest provider for a given model
209+
210+
Automatically route requests to the cheapest available provider when the same model is offered by multiple providers. Rejected for now — adds complexity and the operator should make deliberate cost/latency/reliability tradeoffs. Can be revisited as an enhancement.
211+
212+
## Implementation Notes
213+
214+
### Migration sequence
215+
216+
1. Schema migration: create `provider_model_catalog` table
217+
2. Seed migration: populate with currently hardcoded models + known pricing
218+
3. Add `OpenRouterProvider` class to `ai_providers.py`
219+
4. Add `fetch_model_catalog()` to each provider
220+
5. Update `configure.py` with `models` subcommand
221+
6. Add admin API endpoints for catalog management
222+
7. Update `job_analysis.py` cost estimator to read from catalog
223+
8. Update web UI provider configuration to show catalog
224+
225+
### OpenRouter API details
226+
227+
- Base URL: `https://openrouter.ai/api/v1`
228+
- Auth: `Authorization: Bearer <key>`
229+
- Catalog: `GET /api/v1/models` — returns full model list with pricing (no auth required, but rate-limited)
230+
- Completions: `POST /api/v1/chat/completions` — OpenAI-compatible format
231+
- Generation stats: `GET /api/v1/generation?id={id}` — token usage and cost for a specific request

0 commit comments

Comments
 (0)