Skip to content

Commit 6928f8b

Browse files
apartsinclaude
andcommitted
feat: add 7 developer experience features, 50+ audit fixes, bump to v0.2.0
New features (Python + TypeScript parity): - Structured exception hierarchy (ModelMeshError base + 7 typed exceptions) - Request/response middleware pipeline (before/after/error hooks) - Async context manager and close() for resource cleanup - Usage tracking API (cost, tokens, by-model/provider breakdowns) - Mock testing client (mock_client/mockClient with call recording) - Capability discovery API (list, resolve, search, tree) - Routing explanation/debug API (dry-run routing with candidates) Audit fixes: - Replace bare RuntimeError/ValueError with typed exceptions in router and budget - Fix 50+ code-docs-tests inconsistencies across Python and TypeScript - Cross-language parity: matching method signatures, parameter names, return types Testing: - 63 new Python tests (test_developer_experience.py) - 69 new TypeScript tests (developer-experience.test.ts) - 1879 total tests passing (1166 Python + 713 TypeScript) Samples & docs: - 12 quickstart samples (06-11 for both Python and TypeScript) - 5 new guides (QuickStart, ErrorHandling, Middleware, Testing, Capabilities) - Updated docs/index.md with new features and navigation Package updates: - TypeScript: add sub-path exports for testing, capabilities, middleware, exceptions, usage - Both packages bumped from 0.1.1 to 0.2.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent bdd0954 commit 6928f8b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+7258
-131
lines changed

docs/ConnectorCatalogue.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -301,13 +301,13 @@ Interface: [ConnectorInterfaces.md — Rotation Policy](ConnectorInterfaces.html
301301
| Policy | Description |
302302
| --- | --- |
303303
| **`rotation.modelmesh.stick-until-failure.v1`** | Use the current model until it fails, then rotate. Default policy. |
304-
| **`rotation.modelmesh.priority-selection.v1`** | *(Planned)* Follow an ordered model/provider preference list; fall back on exhaust. |
305-
| **`rotation.modelmesh.round-robin.v1`** | *(Planned)* Cycle through active models in sequence. |
306-
| **`rotation.modelmesh.cost-first.v1`** | *(Planned)* Select the cheapest active model for each request. |
307-
| **`rotation.modelmesh.latency-first.v1`** | *(Planned)* Select the model with the lowest observed latency. |
308-
| **`rotation.modelmesh.session-stickiness.v1`** | *(Planned)* Route all requests in a session to the same model. |
309-
| **`rotation.modelmesh.rate-limit-aware.v1`** | *(Planned)* Switch models preemptively before hitting rate limits. |
310-
| **`rotation.modelmesh.load-balanced.v1`** | *(Planned)* Distribute requests proportionally to each model's rate-limit headroom. |
304+
| **`rotation.modelmesh.priority-selection.v1`** | Follow an ordered model/provider preference list; fall back on exhaust. |
305+
| **`rotation.modelmesh.round-robin.v1`** | Cycle through active models in sequence. |
306+
| **`rotation.modelmesh.cost-first.v1`** | Select the model with the lowest accumulated cost. |
307+
| **`rotation.modelmesh.latency-first.v1`** | Select the model with the lowest observed latency. |
308+
| **`rotation.modelmesh.session-stickiness.v1`** | Route all requests in a session to the same model via consistent hashing. |
309+
| **`rotation.modelmesh.rate-limit-aware.v1`** | Track per-model request/token quotas and switch before exhaustion. |
310+
| **`rotation.modelmesh.load-balanced.v1`** | Distribute requests proportionally using weighted round-robin. |
311311

312312
---
313313

@@ -560,6 +560,7 @@ All trace entries carry a severity level. The `min_severity` configuration optio
560560
| **`observability.modelmesh.webhook.v1`** | HTTP POST | Sends routing events and logs to a configurable URL. Use for alerting, dashboards, or external log aggregation. |
561561
| **`observability.modelmesh.json-log.v1`** | JSONL file | Appends JSON Lines records to a file. Each line is a self-contained JSON object with type, timestamp, and payload. Optimized for log aggregation pipelines. |
562562
| **`observability.modelmesh.callback.v1`** | Python callback | Invokes a user-supplied Python callable for each event. Useful for custom integrations, in-process dashboards, and testing. |
563+
| **`observability.modelmesh.prometheus.v1`** | Prometheus text | Exposes metrics in Prometheus text exposition format. Zero-dependency implementation with counters, gauges, and histograms. Call `render_metrics()` for scrape output. |
563564

564565
### Connector-Specific Configuration
565566

docs/CoverageMatrix.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ title: "Test Coverage Matrix"
55

66
# Test Coverage Matrix
77

8-
Correlates documented features with test coverage. The project includes 855 Python tests across 15 test files and 511 TypeScript tests across 13 test files, for a total of 1,366 tests.
8+
Correlates documented features with test coverage. The project includes 1,028 Python tests across 18 test files and 644 TypeScript tests across 13 test files, for a total of 1,672 tests.
99

1010
---
1111

docs/guides/Capabilities.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Capability Discovery
2+
3+
ModelMesh uses a hierarchical capability system to route requests. Instead of memorizing full dotted paths like `generation.text-generation.chat-completion`, you can use short aliases and the discovery API.
4+
5+
## Capability Aliases
6+
7+
| Alias | Full Path |
8+
|-------|-----------|
9+
| `chat-completion` | `generation.text-generation.chat-completion` |
10+
| `text-generation` | `generation.text-generation` |
11+
| `code-generation` | `generation.text-generation.code-generation` |
12+
| `text-embeddings` | `representation.embeddings.text-embeddings` |
13+
| `text-to-speech` | `generation.audio.text-to-speech` |
14+
| `speech-to-text` | `understanding.audio.speech-to-text` |
15+
| `text-to-image` | `generation.image.text-to-image` |
16+
| `image-to-text` | `representation.image.image-to-text` |
17+
18+
## Discovery API
19+
20+
### Python
21+
22+
```python
23+
import modelmesh
24+
25+
# List all aliases
26+
caps = modelmesh.capabilities.list_all()
27+
# ['chat-completion', 'code-generation', 'image-to-text', ...]
28+
29+
# Resolve alias → full path
30+
path = modelmesh.capabilities.resolve("chat-completion")
31+
# 'generation.text-generation.chat-completion'
32+
33+
# Dotted paths pass through unchanged
34+
modelmesh.capabilities.resolve("generation.text-generation")
35+
# 'generation.text-generation'
36+
37+
# Unknown aliases return unchanged
38+
modelmesh.capabilities.resolve("custom-cap")
39+
# 'custom-cap'
40+
41+
# Search by keyword (case-insensitive)
42+
modelmesh.capabilities.search("text")
43+
# ['text-embeddings', 'text-generation', 'text-to-image', 'text-to-speech']
44+
45+
# View the hierarchy tree
46+
tree = modelmesh.capabilities.tree()
47+
# {
48+
# 'generation': {
49+
# 'text-generation': {
50+
# 'chat-completion': {},
51+
# 'code-generation': {},
52+
# },
53+
# 'audio': {'text-to-speech': {}},
54+
# 'image': {'text-to-image': {}},
55+
# },
56+
# 'representation': { ... },
57+
# 'understanding': { ... },
58+
# }
59+
```
60+
61+
### TypeScript
62+
63+
```typescript
64+
import * as capabilities from '@nistrapa/modelmesh-core/capabilities';
65+
66+
const caps = capabilities.listAll();
67+
const path = capabilities.resolve('chat-completion');
68+
const matches = capabilities.search('text');
69+
const tree = capabilities.tree();
70+
```
71+
72+
## Using Capabilities
73+
74+
When creating a client, the capability name determines which pool handles your requests:
75+
76+
```python
77+
# These are equivalent:
78+
client = modelmesh.create("chat-completion")
79+
client = modelmesh.create("generation.text-generation.chat-completion")
80+
```
81+
82+
When calling `create()`, ModelMesh resolves the capability to find or create a pool containing all models that support it.
83+
84+
## Hierarchy
85+
86+
The capability tree follows a three-level hierarchy:
87+
88+
```
89+
Category
90+
└── Domain
91+
└── Specific Capability
92+
```
93+
94+
Categories:
95+
- **generation** — Create new content (text, audio, images)
96+
- **representation** — Transform content into structured forms (embeddings, descriptions)
97+
- **understanding** — Analyze and interpret content (transcription, classification)
98+
99+
## Custom Capabilities
100+
101+
You can register custom capabilities in your YAML configuration:
102+
103+
```yaml
104+
models:
105+
my-model:
106+
provider: my-provider
107+
capabilities:
108+
- generation.custom.my-capability
109+
110+
pools:
111+
my-pool:
112+
capability: generation.custom.my-capability
113+
```
114+
115+
Custom capability paths don't need aliases — use the full dotted path directly.

docs/guides/ErrorHandling.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Error Handling
2+
3+
ModelMesh provides a structured exception hierarchy so you can catch failures at the right level of specificity.
4+
5+
## Exception Tree
6+
7+
```
8+
ModelMeshError (base)
9+
├── RoutingError
10+
│ ├── NoActiveModelError (retryable)
11+
│ └── AllProvidersExhaustedError
12+
├── ProviderError
13+
│ ├── AuthenticationError
14+
│ ├── RateLimitError (retryable, has retry_after)
15+
│ └── ProviderTimeoutError (retryable)
16+
├── ConfigurationError
17+
└── BudgetExceededError
18+
```
19+
20+
All exceptions inherit from `ModelMeshError`, so you can use a single broad catch or handle specific failure modes individually.
21+
22+
## Quick Start
23+
24+
### Python
25+
26+
```python
27+
from modelmesh.exceptions import (
28+
ModelMeshError,
29+
NoActiveModelError,
30+
AllProvidersExhaustedError,
31+
RateLimitError,
32+
BudgetExceededError,
33+
)
34+
35+
try:
36+
response = client.chat.completions.create(
37+
model="chat-completion",
38+
messages=[{"role": "user", "content": "Hello"}],
39+
)
40+
except RateLimitError as e:
41+
print(f"Rate limited by {e.provider_id}, retry after {e.retry_after}s")
42+
except NoActiveModelError:
43+
print("No models available — try again shortly")
44+
except AllProvidersExhaustedError as e:
45+
print(f"All {e.attempts} attempts failed: {e.last_error}")
46+
except BudgetExceededError as e:
47+
print(f"Budget exceeded: {e.limit_type} limit of {e.limit_value}")
48+
except ModelMeshError as e:
49+
print(f"ModelMesh error: {e}")
50+
```
51+
52+
### TypeScript
53+
54+
```typescript
55+
import {
56+
ModelMeshError,
57+
NoActiveModelError,
58+
AllProvidersExhaustedError,
59+
RateLimitError,
60+
BudgetExceededError,
61+
} from '@nistrapa/modelmesh-core';
62+
63+
try {
64+
const response = await client.chat.completions.create({
65+
model: 'chat-completion',
66+
messages: [{ role: 'user', content: 'Hello' }],
67+
});
68+
} catch (e) {
69+
if (e instanceof RateLimitError) {
70+
console.log(`Rate limited, retry after ${e.retryAfter}s`);
71+
} else if (e instanceof NoActiveModelError) {
72+
console.log('No models available');
73+
} else if (e instanceof AllProvidersExhaustedError) {
74+
console.log(`All ${e.attempts} attempts failed`);
75+
} else if (e instanceof ModelMeshError) {
76+
console.log(`ModelMesh error: ${e.message}`);
77+
}
78+
}
79+
```
80+
81+
## Exception Details
82+
83+
Every ModelMesh exception carries structured metadata:
84+
85+
| Field | Type | Description |
86+
|-------|------|-------------|
87+
| `message` | `str` | Human-readable error description |
88+
| `details` | `dict` | Arbitrary structured context |
89+
| `retryable` | `bool` | Hint: may succeed on retry |
90+
91+
### Routing Exceptions
92+
93+
| Exception | Extra Fields | When Raised |
94+
|-----------|-------------|-------------|
95+
| `NoActiveModelError` | `pool_name` | All models in a pool are in standby |
96+
| `AllProvidersExhaustedError` | `pool_name`, `attempts`, `last_error` | All retry/rotation attempts failed |
97+
98+
### Provider Exceptions
99+
100+
| Exception | Extra Fields | When Raised |
101+
|-----------|-------------|-------------|
102+
| `AuthenticationError` | `provider_id`, `model_id` | Invalid API key or credentials |
103+
| `RateLimitError` | `provider_id`, `model_id`, `retry_after` | Rate limit or quota exceeded |
104+
| `ProviderTimeoutError` | `provider_id`, `model_id`, `timeout_seconds` | Request timed out |
105+
106+
### Other Exceptions
107+
108+
| Exception | Extra Fields | When Raised |
109+
|-----------|-------------|-------------|
110+
| `ConfigurationError` || Invalid config, missing fields |
111+
| `BudgetExceededError` | `limit_type`, `limit_value`, `actual_value` | Cost limit breached |
112+
113+
## Retry Guidance
114+
115+
Use the `retryable` field to decide whether to retry:
116+
117+
```python
118+
try:
119+
response = client.chat.completions.create(...)
120+
except ModelMeshError as e:
121+
if e.retryable:
122+
# Safe to retry — model may become available or rate limit may reset
123+
time.sleep(getattr(e, 'retry_after', 5))
124+
response = client.chat.completions.create(...)
125+
else:
126+
# Permanent failure — fix config, check credentials, or increase budget
127+
raise
128+
```
129+
130+
## Backward Compatibility
131+
132+
The new exceptions maintain backward compatibility:
133+
134+
- `AllProvidersExhaustedError` inherits from `ModelMeshError``Error`
135+
- `BudgetExceededError` inherits from `ModelMeshError``Error`
136+
- Code catching base `Error` continues to work
137+
- All new fields are additive — no existing behavior changed

0 commit comments

Comments
 (0)