Skip to content

Commit 191c516

Browse files
mikasenghaaseligottswillccbbcursoragent
authored
unified client interface (#897)
* feat(clients): introduce unified client interface and provider adapters * refactor(environment): migrate rollout core to Response + tool_defs * refactor(envs): align tool/multiturn/integrations with new message and tool flow * refactor(eval): unify eval outputs/serialization and improve display plumbing * test(migration): update env/tool trajectory tests for unified client/types * test(envs): harden env smoke tests and fixture setup for clean installs * chore(endpoints): annotate endpoint registry with explicit client types * dont strip client types * small reversions * format checks * cleaner handling of pure completions with custom types * formatting * patch single turn to convert message list back to raw string * safe serialization, output has both oai_tools and tool_defs * test needs to accept kwargs * update tests with tool_defs * rename convert_func_to_tool_def * formatting * fix lines in types * harden tool_defs/oai_tools to error when oai tools are passed in * harden downstream reading of tool_defs * use vf.client * much tighter use of custom types * get rid of legacy response.choices * patch test * clean up into unified normalized_messages util function * bugbot fixes * cli tool normalization * more cli stuff * add text message to client message types * bugbot, normalize no tool calls across clients * reasoning first * ty check * fix anthropic tool call blocks * cli agent intercept back to openai types * usage in anthropic types * normalize raw tool calls oai tools * cleaner errors * revert to error string matches and pass through interleaved thinking to anthropic client * small prepend system prompt edit * small formatting * max completion tokens rename * anthropic overlong error * bugbot shallow copy * Codex-generated pull request (#889) Co-authored-by: will brown <willccbb@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> * flatten back into a string at the end for completions * preserve thinking block anthropic content: * remove interleaved from eval config * minor * each client in its own file * move some stuff around * rename to content_to_text * do not warn on pydantic serialization on env worker * client everywhere * fix ty * fix endpoints * pop unrecognized stop arg * add deepseek reasoner * remove interleaved thinking from env * fix tests with claude * rename to mock_client * update docs * updated more reasoning content fields * do not make interleaved settable on client * simplify * make thinking block part of our ass msg type * fix endpoints * rename to oai_tools * allow closing clients * removed unused native client * add docs * conditional warn * only add reasoning content if its a string --------- Co-authored-by: eligotts <78387377+eligotts@users.noreply.github.com> Co-authored-by: will brown <williambrown97@gmail.com> Co-authored-by: will brown <willccbb@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>
1 parent 54005ee commit 191c516

71 files changed

Lines changed: 4779 additions & 2552 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

configs/endpoints.py

Lines changed: 71 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,187 +4,249 @@
44
"model": "allenai/olmo-3-32b-think",
55
"url": "https://api.pinference.ai/api/v1",
66
"key": "PRIME_API_KEY",
7+
"type": "openai_chat_completions",
78
},
89
"olmo3-7b-i": {
910
"model": "allenai/olmo-3-7b-instruct",
1011
"url": "https://api.pinference.ai/api/v1",
1112
"key": "PRIME_API_KEY",
13+
"type": "openai_chat_completions",
1214
},
1315
"olmo3-7b-t": {
1416
"model": "allenai/olmo-3-7b-think",
1517
"url": "https://api.pinference.ai/api/v1",
1618
"key": "PRIME_API_KEY",
19+
"type": "openai_chat_completions",
1720
},
1821
# arcee
1922
"trinity-mini": {
2023
"model": "arcee/trinity-mini",
2124
"url": "https://api.pinference.ai/api/v1",
2225
"key": "PRIME_API_KEY",
26+
"type": "openai_chat_completions",
2327
},
2428
# anthropic
2529
"haiku": {
26-
"model": "anthropic/claude-4.5-haiku",
27-
"url": "https://api.pinference.ai/api/v1",
28-
"key": "PRIME_API_KEY",
30+
"model": "claude-haiku-4-5",
31+
"url": "https://api.anthropic.com",
32+
"key": "ANTHROPIC_API_KEY",
33+
"type": "anthropic_messages",
2934
},
3035
"sonnet": {
31-
"model": "anthropic/claude-4.5-sonnet",
32-
"url": "https://api.pinference.ai/api/v1",
33-
"key": "PRIME_API_KEY",
36+
"model": "claude-sonnet-4-5",
37+
"url": "https://api.anthropic.com",
38+
"key": "ANTHROPIC_API_KEY",
39+
"type": "anthropic_messages",
3440
},
3541
"opus": {
36-
"model": "anthropic/claude-4.5-opus",
37-
"url": "https://api.pinference.ai/api/v1",
38-
"key": "PRIME_API_KEY",
42+
"model": "claude-opus-4-5",
43+
"url": "https://api.anthropic.com",
44+
"key": "ANTHROPIC_API_KEY",
45+
"type": "anthropic_messages",
46+
},
47+
# deepseek
48+
"deepseek-chat": {
49+
"model": "deepseek-chat",
50+
"url": "https://api.deepseek.com/v1",
51+
"key": "DEEPSEEK_API_KEY",
52+
"type": "openai_chat_completions",
53+
},
54+
"deepseek-reasoner": {
55+
"model": "deepseek-reasoner",
56+
"url": "https://api.deepseek.com/v1",
57+
"key": "DEEPSEEK_API_KEY",
58+
"type": "openai_chat_completions",
59+
},
60+
# deepseek (Anthropic-compatible)
61+
"deepseek-chat-anth": {
62+
"model": "deepseek-chat",
63+
"url": "https://api.deepseek.com/anthropic",
64+
"key": "DEEPSEEK_API_KEY",
65+
"type": "anthropic_messages",
66+
},
67+
"deepseek-reasoner-anth": {
68+
"model": "deepseek-reasoner",
69+
"url": "https://api.deepseek.com/anthropic",
70+
"key": "DEEPSEEK_API_KEY",
71+
"type": "anthropic_messages",
3972
},
4073
# google
4174
"gemini-2.5-flash": {
4275
"model": "google/gemini-2.5-flash",
4376
"url": "https://api.pinference.ai/api/v1",
4477
"key": "PRIME_API_KEY",
78+
"type": "openai_chat_completions",
4579
},
4680
"gemini-2.5-pro": {
4781
"model": "google/gemini-2.5-pro",
4882
"url": "https://api.pinference.ai/api/v1",
4983
"key": "PRIME_API_KEY",
84+
"type": "openai_chat_completions",
5085
},
5186
"gemini-3-flash": {
5287
"model": "google/gemini-3-flash",
5388
"url": "https://api.pinference.ai/api/v1",
5489
"key": "PRIME_API_KEY",
90+
"type": "openai_chat_completions",
5591
},
5692
"gemini-3-pro": {
5793
"model": "google/gemini-3-pro-preview",
5894
"url": "https://api.pinference.ai/api/v1",
5995
"key": "PRIME_API_KEY",
96+
"type": "openai_chat_completions",
6097
},
6198
"gemini-3-pro-exp": {
6299
"model": "google/gemini-3-pro-preview",
63100
"url": "https://api.pinference.ai/api/v1",
64101
"key": "PRIME_API_KEY",
102+
"type": "openai_chat_completions",
65103
},
66104
# qwen
67105
"qwen3-30b-i": {
68106
"model": "qwen/qwen3-30b-a3b-instruct-2507",
69107
"url": "https://api.pinference.ai/api/v1",
70108
"key": "PRIME_API_KEY",
109+
"type": "openai_chat_completions",
71110
},
72111
"qwen3-30b-t": {
73112
"model": "qwen/qwen3-30b-a3b-thinking-2507",
74113
"url": "https://api.pinference.ai/api/v1",
75114
"key": "PRIME_API_KEY",
115+
"type": "openai_chat_completions",
76116
},
77117
"qwen3-235b-i": {
78118
"model": "qwen/qwen3-235b-a22b-instruct-2507",
79119
"url": "https://api.pinference.ai/api/v1",
80120
"key": "PRIME_API_KEY",
121+
"type": "openai_chat_completions",
81122
},
82123
"qwen3-235b-t": {
83124
"model": "qwen/qwen3-235b-a22b-thinking-2507",
84125
"url": "https://api.pinference.ai/api/v1",
85126
"key": "PRIME_API_KEY",
127+
"type": "openai_chat_completions",
86128
},
87129
"qwen3-vl-30b-i": {
88130
"model": "qwen/qwen3-vl-30b-a3b-instruct",
89131
"url": "https://api.pinference.ai/api/v1",
90132
"key": "PRIME_API_KEY",
133+
"type": "openai_chat_completions",
91134
},
92135
"qwen3-vl-30b-t": {
93136
"model": "qwen/qwen3-vl-30b-a3b-thinking",
94137
"url": "https://api.pinference.ai/api/v1",
95138
"key": "PRIME_API_KEY",
139+
"type": "openai_chat_completions",
96140
},
97141
"qwen3-vl-235b-i": {
98142
"model": "qwen/qwen3-vl-235b-a22b-instruct",
99143
"url": "https://api.pinference.ai/api/v1",
100144
"key": "PRIME_API_KEY",
145+
"type": "openai_chat_completions",
101146
},
102147
"qwen3-vl-235b-t": {
103148
"model": "qwen/qwen3-vl-235b-a22b-thinking",
104149
"url": "https://api.pinference.ai/api/v1",
105150
"key": "PRIME_API_KEY",
151+
"type": "openai_chat_completions",
106152
},
107153
# moonshot
108154
"kimi-k2": {
109155
"model": "moonshotai/kimi-k2-0905",
110156
"url": "https://api.pinference.ai/api/v1",
111157
"key": "PRIME_API_KEY",
158+
"type": "openai_chat_completions",
112159
},
113160
"kimi-k2-t": {
114161
"model": "moonshotai/kimi-k2-thinking",
115162
"url": "https://api.pinference.ai/api/v1",
116163
"key": "PRIME_API_KEY",
164+
"type": "openai_chat_completions",
117165
},
118166
# openai
119167
"gpt-oss-120b": {
120168
"model": "openai/gpt-oss-120b",
121169
"url": "https://api.pinference.ai/api/v1",
122170
"key": "PRIME_API_KEY",
171+
"type": "openai_chat_completions",
123172
},
124173
"gpt-oss-20b": {
125174
"model": "openai/gpt-oss-20b",
126175
"url": "https://api.pinference.ai/api/v1",
127176
"key": "PRIME_API_KEY",
177+
"type": "openai_chat_completions",
128178
},
129179
"gpt-4.1-nano": {
130180
"model": "gpt-4.1-nano",
131181
"url": "https://api.openai.com/v1",
132182
"key": "OPENAI_API_KEY",
183+
"type": "openai_chat_completions",
133184
},
134185
"gpt-4.1-mini": {
135186
"model": "gpt-4.1-mini",
136187
"url": "https://api.openai.com/v1",
137188
"key": "OPENAI_API_KEY",
189+
"type": "openai_chat_completions",
138190
},
139191
"gpt-4.1": {
140192
"model": "gpt-4.1",
141193
"url": "https://api.openai.com/v1",
142194
"key": "OPENAI_API_KEY",
195+
"type": "openai_chat_completions",
143196
},
144197
"gpt-5-nano": {
145198
"model": "gpt-5-nano",
146199
"url": "https://api.openai.com/v1",
147200
"key": "OPENAI_API_KEY",
201+
"type": "openai_chat_completions",
148202
},
149203
"gpt-5-mini": {
150204
"model": "gpt-5-mini",
151205
"url": "https://api.openai.com/v1",
152206
"key": "OPENAI_API_KEY",
207+
"type": "openai_chat_completions",
153208
},
154209
"gpt-5": {
155210
"model": "gpt-5",
156211
"url": "https://api.openai.com/v1",
157212
"key": "OPENAI_API_KEY",
213+
"type": "openai_chat_completions",
158214
},
159215
"gpt-5.1": {
160216
"model": "gpt-5.1",
161217
"url": "https://api.openai.com/v1",
162218
"key": "OPENAI_API_KEY",
219+
"type": "openai_chat_completions",
163220
},
164221
"gpt-5.2": {
165222
"model": "gpt-5.2",
166223
"url": "https://api.openai.com/v1",
167224
"key": "OPENAI_API_KEY",
225+
"type": "openai_chat_completions",
168226
},
169227
# z-ai
170228
"glm-4.5": {
171229
"model": "z-ai/glm-4.5",
172230
"url": "https://api.pinference.ai/api/v1",
173231
"key": "PRIME_API_KEY",
232+
"type": "openai_chat_completions",
174233
},
175234
"glm-4.5-air": {
176235
"model": "z-ai/glm-4.5-air",
177236
"url": "https://api.pinference.ai/api/v1",
178237
"key": "PRIME_API_KEY",
238+
"type": "openai_chat_completions",
179239
},
180240
"glm-4.6": {
181241
"model": "z-ai/glm-4.6",
182242
"url": "https://api.pinference.ai/api/v1",
183243
"key": "PRIME_API_KEY",
244+
"type": "openai_chat_completions",
184245
},
185246
"glm-4.7": {
186247
"model": "z-ai/glm-4.7",
187248
"url": "https://api.pinference.ai/api/v1",
188249
"key": "PRIME_API_KEY",
250+
"type": "openai_chat_completions",
189251
},
190252
}

docs/development.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -146,13 +146,12 @@ class TestFeature:
146146

147147
### Using Mocks
148148

149-
The test suite provides mock OpenAI clients:
149+
The test suite provides a `MockClient` in `conftest.py` that implements the `Client` interface:
150150

151151
```python
152-
from tests.mock_openai_client import MockOpenAIClient
153-
154152
def test_with_mock(mock_client):
155-
env = vf.SingleTurnEnv(client=mock_client)
153+
mock_client.set_default_responses(chat_response="test answer")
154+
env = vf.SingleTurnEnv(client=mock_client, model="test", ...)
156155
# Test without real API calls
157156
```
158157

docs/evaluation.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ This section explains how to run evaluations with Verifiers environments. See [E
1717
- [TOML Configuration](#toml-configuration)
1818
- [Configuration Precedence](#configuration-precedence)
1919

20-
Use `prime eval` to execute rollouts against any OpenAI-compatible model and report aggregate metrics.
20+
Use `prime eval` to execute rollouts against any supported model provider and report aggregate metrics. Supported providers include OpenAI-compatible APIs (the default) and the Anthropic Messages API (via `--api-client-type anthropic_messages`).
2121

2222
## Basic Usage
2323

@@ -66,6 +66,7 @@ prime eval run my-env -x '{"max_turns": 20}'
6666
| `--model` | `-m` | `openai/gpt-4.1-mini` | Model name or endpoint alias |
6767
| `--api-base-url` | `-b` | `https://api.pinference.ai/api/v1` | API base URL |
6868
| `--api-key-var` | `-k` | `PRIME_API_KEY` | Environment variable containing API key |
69+
| `--api-client-type` || `openai_chat_completions` | Client type: `openai_chat_completions`, `openai_completions`, `openai_chat_completions_token`, or `anthropic_messages` |
6970
| `--endpoints-path` | `-e` | `./configs/endpoints.toml` | Path to TOML endpoints registry |
7071
| `--header` ||| Extra HTTP header (`Name: Value`), repeatable |
7172

@@ -83,8 +84,17 @@ endpoint_id = "qwen3-235b-i"
8384
model = "qwen/qwen3-235b-a22b-instruct-2507"
8485
url = "https://api.pinference.ai/api/v1"
8586
key = "PRIME_API_KEY"
87+
88+
[[endpoint]]
89+
endpoint_id = "claude-sonnet"
90+
model = "claude-sonnet-4-5-20250929"
91+
url = "https://api.anthropic.com"
92+
key = "ANTHROPIC_API_KEY"
93+
api_client_type = "anthropic_messages"
8694
```
8795

96+
Each endpoint entry supports an optional `api_client_type` field to select the client implementation (defaults to `"openai_chat_completions"`). Use `"anthropic_messages"` for Anthropic models when calling the Anthropic API directly.
97+
8898
To define equivalent replicas, add multiple `[[endpoint]]` entries with the same `endpoint_id`.
8999

90100
Then use the alias directly:

0 commit comments

Comments
 (0)