Skip to content

Commit 9ca4da8

Browse files
apartsinclaude
andcommitted
feat: implement round 3 — 20 robustness, adoption & documentation improvements
Documents & Tutorials (8 new guides): - MigrationGuide.md — from OpenAI SDK, LangChain, LiteLLM - ProductionGuide.md — scaling, health checks, monitoring, secrets - CostOptimization.md — pricing tables, budget strategies, free-tier stacking - StreamingGuide.md — SSE, error recovery, browser patterns - IntegrationRecipes.md — FastAPI, Flask, Django, Express, Next.js - ArchitecturePatterns.md — 7 real-world multi-provider topologies - QuickReference.md — one-page API cheat sheet - AsyncGuide.md — concurrency, parallel requests, backpressure Robustness Features (7): - TypeScript Circuit Breaker Mixin (3-state: CLOSED/OPEN/HALF_OPEN) - TypeScript Retry Mixin (exponential backoff + jitter) - TypeScript Cache Mixin (LRU eviction + TTL) - TypeScript Rate Limiter Mixin (token bucket algorithm) - Request Correlation ID middleware (Python + TypeScript) - Enhanced MockClient: error simulation, delays, chaos testing - OpenTelemetry distributed tracing middleware (Python + TypeScript) More Connections (5): - TypeScript Prometheus metrics connector (43 total connectors) - LangChain BaseChatModel adapter (Python) - Correlation ID middleware for request tracing - OpenTelemetry span creation middleware - Enhanced mock client failure/latency simulation All 1,893 tests pass (1,180 Python + 713 TypeScript). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1af8154 commit 9ca4da8

27 files changed

+6273
-18
lines changed

docs/guides/ArchitecturePatterns.md

Lines changed: 521 additions & 0 deletions
Large diffs are not rendered by default.

docs/guides/AsyncGuide.md

Lines changed: 528 additions & 0 deletions
Large diffs are not rendered by default.

docs/guides/CostOptimization.md

Lines changed: 434 additions & 0 deletions
Large diffs are not rendered by default.

docs/guides/IntegrationRecipes.md

Lines changed: 441 additions & 0 deletions
Large diffs are not rendered by default.

docs/guides/MigrationGuide.md

Lines changed: 370 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,370 @@
1+
# Migration Guide
2+
3+
Migrate to ModelMesh from the OpenAI SDK, LangChain, or LiteLLM. ModelMesh provides an OpenAI SDK-compatible interface with multi-provider routing, automatic failover, and budget enforcement built in. For initial setup, see the [Quick Start](QuickStart.md). For the full configuration reference, see [System Configuration](../SystemConfiguration.md).
4+
5+
## From OpenAI SDK
6+
7+
The `MeshClient` returned by `modelmesh.create()` is a drop-in replacement for the OpenAI client. The `client.chat.completions.create()` signature is identical. Change two lines and you get automatic failover across every provider you configure.
8+
9+
### Python
10+
11+
**Before (OpenAI SDK):**
12+
13+
```python
14+
from openai import OpenAI
15+
16+
client = OpenAI(api_key="sk-...")
17+
18+
response = client.chat.completions.create(
19+
model="gpt-4o",
20+
messages=[{"role": "user", "content": "Hello!"}],
21+
temperature=0.7,
22+
max_tokens=512,
23+
)
24+
print(response.choices[0].message.content)
25+
```
26+
27+
**After (ModelMesh):**
28+
29+
```python
30+
import modelmesh
31+
32+
client = modelmesh.create("chat-completion")
33+
34+
response = client.chat.completions.create(
35+
model="chat-completion",
36+
messages=[{"role": "user", "content": "Hello!"}],
37+
temperature=0.7,
38+
max_tokens=512,
39+
)
40+
print(response.choices[0].message.content)
41+
```
42+
43+
### TypeScript
44+
45+
**Before (OpenAI SDK):**
46+
47+
```typescript
48+
import OpenAI from 'openai';
49+
50+
const client = new OpenAI({ apiKey: 'sk-...' });
51+
52+
const response = await client.chat.completions.create({
53+
model: 'gpt-4o',
54+
messages: [{ role: 'user', content: 'Hello!' }],
55+
temperature: 0.7,
56+
max_tokens: 512,
57+
});
58+
console.log(response.choices[0].message?.content);
59+
```
60+
61+
**After (ModelMesh):**
62+
63+
```typescript
64+
import { create } from '@nistrapa/modelmesh-core';
65+
66+
const client = create('chat-completion');
67+
68+
const response = await client.chat.completions.create({
69+
model: 'chat-completion',
70+
messages: [{ role: 'user', content: 'Hello!' }],
71+
temperature: 0.7,
72+
max_tokens: 512,
73+
});
74+
console.log(response.choices[0].message?.content);
75+
```
76+
77+
### What Changes
78+
79+
| Concern | OpenAI SDK | ModelMesh |
80+
|---------|-----------|-----------|
81+
| Import | `from openai import OpenAI` | `import modelmesh` |
82+
| Client creation | `OpenAI(api_key="sk-...")` | `modelmesh.create("chat-completion")` |
83+
| Model parameter | Literal model ID (`"gpt-4o"`) | Virtual pool name (`"chat-completion"`) |
84+
| API key | Passed to constructor or env var | Resolved via [secret stores](SecretStores.md) |
85+
| Streaming | `stream=True` | `stream=True` (identical) |
86+
| Response shape | `ChatCompletion` object | Same `ChatCompletion` shape |
87+
| Error types | `openai.APIError` | `ModelMeshError` hierarchy (see [Error Handling](ErrorHandling.md)) |
88+
89+
### Streaming Migration
90+
91+
Streaming works the same way. The response is an iterator of chunks with `delta.content`:
92+
93+
```python
94+
# OpenAI SDK — streaming
95+
stream = client.chat.completions.create(
96+
model="gpt-4o",
97+
messages=[{"role": "user", "content": "Hello!"}],
98+
stream=True,
99+
)
100+
for chunk in stream:
101+
token = chunk.choices[0].delta.content or ""
102+
print(token, end="")
103+
104+
# ModelMesh — identical call, but routes across providers
105+
stream = client.chat.completions.create(
106+
model="chat-completion",
107+
messages=[{"role": "user", "content": "Hello!"}],
108+
stream=True,
109+
)
110+
for chunk in stream:
111+
token = chunk.choices[0].delta.content or ""
112+
print(token, end="")
113+
```
114+
115+
### Keeping OpenAI as Fallback
116+
117+
You can still use the OpenAI SDK alongside ModelMesh. Point the OpenAI SDK at the ModelMesh proxy for routing benefits while keeping direct access for edge cases:
118+
119+
```python
120+
from openai import OpenAI
121+
122+
# Route through ModelMesh proxy
123+
routed_client = OpenAI(
124+
base_url="http://localhost:8080/v1",
125+
api_key="not-needed",
126+
)
127+
128+
# Direct OpenAI access (bypass routing)
129+
direct_client = OpenAI(api_key="sk-...")
130+
```
131+
132+
## From LangChain
133+
134+
ModelMesh provides a LangChain-compatible chat model wrapper. Use it as the LLM backend in any LangChain chain, agent, or pipeline.
135+
136+
### Python
137+
138+
**Before (LangChain with OpenAI):**
139+
140+
```python
141+
from langchain_openai import ChatOpenAI
142+
from langchain_core.prompts import ChatPromptTemplate
143+
144+
llm = ChatOpenAI(model="gpt-4o", api_key="sk-...")
145+
146+
prompt = ChatPromptTemplate.from_messages([
147+
("system", "You are a helpful assistant."),
148+
("user", "{input}"),
149+
])
150+
chain = prompt | llm
151+
result = chain.invoke({"input": "What is ModelMesh?"})
152+
print(result.content)
153+
```
154+
155+
**After (LangChain with ModelMesh):**
156+
157+
```python
158+
from modelmesh.integrations.langchain import ChatModelMesh
159+
from langchain_core.prompts import ChatPromptTemplate
160+
161+
llm = ChatModelMesh(capability="chat-completion")
162+
163+
prompt = ChatPromptTemplate.from_messages([
164+
("system", "You are a helpful assistant."),
165+
("user", "{input}"),
166+
])
167+
chain = prompt | llm
168+
result = chain.invoke({"input": "What is ModelMesh?"})
169+
print(result.content)
170+
```
171+
172+
### TypeScript
173+
174+
**Before (LangChain.js with OpenAI):**
175+
176+
```typescript
177+
import { ChatOpenAI } from '@langchain/openai';
178+
import { ChatPromptTemplate } from '@langchain/core/prompts';
179+
180+
const llm = new ChatOpenAI({ model: 'gpt-4o', apiKey: 'sk-...' });
181+
182+
const prompt = ChatPromptTemplate.fromMessages([
183+
['system', 'You are a helpful assistant.'],
184+
['user', '{input}'],
185+
]);
186+
const chain = prompt.pipe(llm);
187+
const result = await chain.invoke({ input: 'What is ModelMesh?' });
188+
console.log(result.content);
189+
```
190+
191+
**After (LangChain.js with ModelMesh):**
192+
193+
```typescript
194+
import { ChatModelMesh } from '@nistrapa/modelmesh-langchain';
195+
import { ChatPromptTemplate } from '@langchain/core/prompts';
196+
197+
const llm = new ChatModelMesh({ capability: 'chat-completion' });
198+
199+
const prompt = ChatPromptTemplate.fromMessages([
200+
['system', 'You are a helpful assistant.'],
201+
['user', '{input}'],
202+
]);
203+
const chain = prompt.pipe(llm);
204+
const result = await chain.invoke({ input: 'What is ModelMesh?' });
205+
console.log(result.content);
206+
```
207+
208+
### LangChain Integration Details
209+
210+
The `ChatModelMesh` wrapper supports:
211+
212+
| Feature | Support |
213+
|---------|---------|
214+
| `invoke()` / `ainvoke()` | Synchronous and async invocation |
215+
| `stream()` / `astream()` | Token-by-token streaming |
216+
| `batch()` / `abatch()` | Batch invocations |
217+
| Tool calling | Pass tools via `bind_tools()` |
218+
| Structured output | Use `with_structured_output()` |
219+
| Callbacks | LangChain callback handlers |
220+
221+
The wrapper delegates to `MeshClient.chat.completions.create()` internally, so all ModelMesh routing, failover, and budget controls apply transparently.
222+
223+
## From LiteLLM
224+
225+
LiteLLM and ModelMesh both solve multi-provider routing. The migration is conceptual rather than a drop-in swap. Here is how key concepts map between the two libraries.
226+
227+
### Concept Mapping
228+
229+
| LiteLLM | ModelMesh | Notes |
230+
|---------|-----------|-------|
231+
| `litellm.completion()` | `client.chat.completions.create()` | ModelMesh uses an OpenAI-compatible client object |
232+
| `model="openai/gpt-4o"` | `model="chat-completion"` | ModelMesh uses virtual pool names, not provider-prefixed model IDs |
233+
| `fallbacks=["gpt-4o", "claude-3"]` | YAML pool with rotation strategy | ModelMesh separates routing config from code |
234+
| `litellm.Router()` | `modelmesh.create()` | Both return a routing client |
235+
| `max_budget` | `budget.daily_limit` / `budget.monthly_limit` | ModelMesh supports per-provider and per-pool budgets |
236+
| `set_callbacks(["langfuse"])` | `observability` config section | ModelMesh has built-in observability connectors |
237+
| Environment variable keys | Same env var names | Both read `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc. |
238+
239+
### Python
240+
241+
**Before (LiteLLM):**
242+
243+
```python
244+
import litellm
245+
246+
response = litellm.completion(
247+
model="openai/gpt-4o",
248+
messages=[{"role": "user", "content": "Hello!"}],
249+
fallbacks=["anthropic/claude-3-5-sonnet-20241022"],
250+
)
251+
print(response.choices[0].message.content)
252+
```
253+
254+
**After (ModelMesh):**
255+
256+
```python
257+
import modelmesh
258+
259+
client = modelmesh.create("chat-completion")
260+
261+
response = client.chat.completions.create(
262+
model="chat-completion",
263+
messages=[{"role": "user", "content": "Hello!"}],
264+
)
265+
print(response.choices[0].message.content)
266+
```
267+
268+
Fallback behavior is configured in YAML rather than in code:
269+
270+
```yaml
271+
providers:
272+
openai.llm.v1:
273+
api_key: ${secrets:OPENAI_API_KEY}
274+
anthropic.claude.v1:
275+
api_key: ${secrets:ANTHROPIC_API_KEY}
276+
277+
models:
278+
gpt-4o:
279+
provider: openai.llm.v1
280+
capabilities: [generation.text-generation.chat-completion]
281+
claude-3-5-sonnet:
282+
provider: anthropic.claude.v1
283+
capabilities: [generation.text-generation.chat-completion]
284+
285+
pools:
286+
chat-completion:
287+
strategy: modelmesh.priority-selection.v1
288+
model_priority: [gpt-4o, claude-3-5-sonnet]
289+
```
290+
291+
### TypeScript
292+
293+
**Before (LiteLLM via API):**
294+
295+
```typescript
296+
const response = await fetch('http://localhost:4000/v1/chat/completions', {
297+
method: 'POST',
298+
headers: { 'Content-Type': 'application/json' },
299+
body: JSON.stringify({
300+
model: 'openai/gpt-4o',
301+
messages: [{ role: 'user', content: 'Hello!' }],
302+
}),
303+
});
304+
const data = await response.json();
305+
console.log(data.choices[0].message.content);
306+
```
307+
308+
**After (ModelMesh):**
309+
310+
```typescript
311+
import { create } from '@nistrapa/modelmesh-core';
312+
313+
const client = create('chat-completion');
314+
315+
const response = await client.chat.completions.create({
316+
model: 'chat-completion',
317+
messages: [{ role: 'user', content: 'Hello!' }],
318+
});
319+
console.log(response.choices[0].message?.content);
320+
```
321+
322+
### What ModelMesh Adds Over LiteLLM
323+
324+
| Feature | LiteLLM | ModelMesh |
325+
|---------|---------|-----------|
326+
| Provider routing | Model-prefix routing (`openai/gpt-4o`) | Capability-based pools with 8 rotation strategies |
327+
| Configuration | Python dicts / env vars | Declarative YAML with env-specific overrides |
328+
| Extensibility | Custom provider classes | CDK with typed connector interfaces |
329+
| Budget enforcement | Global max budget | Per-provider, per-pool, daily/monthly limits |
330+
| Secret management | Env vars only | 7 secret store backends (env, dotenv, AWS, GCP, Azure, 1Password, encrypted file) |
331+
| Observability | Callback-based | Built-in connectors (console, file, webhook, Prometheus) |
332+
| Deployment | Python proxy | Embedded library, proxy, or Docker |
333+
| Mock testing | Not built-in | `mock_client()` with call inspection |
334+
| TypeScript SDK | API proxy only | Native TypeScript client library |
335+
336+
## Feature Comparison Table
337+
338+
| Feature | OpenAI SDK | LiteLLM | ModelMesh |
339+
|---------|-----------|---------|-----------|
340+
| OpenAI-compatible API | Native | Yes | Yes |
341+
| Multi-provider routing | No | Yes | Yes |
342+
| Automatic failover | No | Yes (fallbacks) | Yes (8 strategies) |
343+
| Capability-based pools | No | No | Yes |
344+
| Budget enforcement | No | Global limit | Per-provider/pool/model |
345+
| Secret store backends | No | Env vars | 7 backends |
346+
| CDK extensibility | No | Limited | Full connector SDK |
347+
| Mock testing | No | No | Built-in `mock_client()` |
348+
| Native TypeScript SDK | Yes | Proxy only | Yes |
349+
| Streaming | Yes | Yes | Yes |
350+
| Tool calling | Yes | Yes | Yes |
351+
| Structured output | Yes | Yes | Yes |
352+
| LangChain integration | Native | Via proxy | `ChatModelMesh` wrapper |
353+
| Observability | No | Callbacks | 7 connectors |
354+
| YAML configuration | No | Limited | Full declarative config |
355+
356+
## Migration Checklist
357+
358+
1. Install ModelMesh: `pip install modelmesh-lite[yaml]` or `npm install @nistrapa/modelmesh-core`
359+
2. Set API keys as environment variables (same variable names as before)
360+
3. Replace client creation with `modelmesh.create("chat-completion")`
361+
4. Change `model=` parameter from literal model IDs to pool names
362+
5. Replace provider-specific error handling with [ModelMesh exceptions](ErrorHandling.md)
363+
6. Move fallback/routing logic from code to [YAML configuration](../SystemConfiguration.md)
364+
7. Add [middleware](Middleware.md) for logging, caching, or request transforms
365+
8. Set up [budget controls](../SystemConfiguration.md#providers) to prevent surprise bills
366+
9. Replace test mocks with `mock_client()` (see [Testing Guide](Testing.md))
367+
368+
---
369+
370+
See also: [Quick Start](QuickStart.md) · [Error Handling](ErrorHandling.md) · [System Configuration](../SystemConfiguration.md) · [Connector Catalogue](../ConnectorCatalogue.md)

0 commit comments

Comments
 (0)