You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[OAI] Allow forcing Responses API for non-gpt-5 model names (#190)
## Summary
**[OAI] Allow forcing Responses API for non-gpt-5 model names**
* per-call `use_responses_api` (py) / `useResponsesApi` (js) flag forces
the Responses API. routing becomes `isGPT5Model(model) ||
useResponsesApi`; flag is stripped before the request.
* motivation: internal proxies may rewrite the model name for routing
(e.g. a service-tier prefix), so a model that *requires* the Responses
API can arrive under a name that doesn't start with `gpt-5`. the name
check then sends it to Chat Completions and it fails, with no way to
override. this flag lets such a model work regardless of its name.
* per-call, not global: the model is chosen per call, so a global switch
can't say "this model yes, that model no". keeps it next to `model`,
like `temperature`/`maxTokens`.
* also fixes a Responses-API bug found while testing: `reasoning_effort`
was sent top-level (the API wants `reasoning.effort`), so any reasoning
call routed to Responses 400'd.
PTAL:
FYI:
## Test plan
* [x] unit tests (js + py, incl. built-in named scorers and
reasoning.effort)
* [x] manual smoke test — scratch scripts below, each runs a scorer 3
ways and prints the endpoint hit:
```bash
OPENAI_API_KEY=sk-... [OPENAI_BASE_URL=https://us.api.openai.com/v1] python test.py
OPENAI_API_KEY=sk-... [OPENAI_BASE_URL=https://us.api.openai.com/v1] node test.mjs # after `pnpm run build`
```
<details><summary><code>test.py</code></summary>
```python
"""Scratch check: gpt-4.1 supports both Chat Completions and Responses APIs.
Run with OPENAI_API_KEY set. The request hook prints which endpoint each call hits.
If your org is region-pinned, also set OPENAI_BASE_URL (e.g. https://us.api.openai.com/v1):
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://us.api.openai.com/v1 python test.py
"""
import os
import httpx
from openai import OpenAI
from autoevals import Factuality, LLMClassifier, init
init(
OpenAI(
base_url=os.environ.get("OPENAI_BASE_URL"), # None → SDK default (api.openai.com)
http_client=httpx.Client(event_hooks={"request": [lambda r: print(" request →", r.url.path)]}),
)
)
data = dict(output="6", expected="6", input="Add the numbers 1, 2, 3")
print("gpt-4.1 (default → expect /chat/completions):")
print(" score =", Factuality(model="gpt-4.1").eval(**data).score)
print("gpt-4.1 + use_responses_api=True (→ expect /responses):")
print(" score =", Factuality(model="gpt-4.1", use_responses_api=True).eval(**data).score)
# Built-in named scorers don't forward reasoning_effort yet, so use LLMClassifier here.
print("gpt-5.4 + medium reasoning (gpt-5 family → expect /responses):")
clf = LLMClassifier(
name="match",
prompt_template="Is the submission {{output}} equal to {{expected}}? Answer Y or N.",
choice_scores={"Y": 1, "N": 0},
model="gpt-5.4",
reasoning_effort="medium",
)
print(" score =", clf.eval(**data).score)
```
</details>
<details><summary><code>test.mjs</code></summary>
```js
// Scratch check: gpt-4.1 supports both Chat Completions and Responses APIs.
// Run with OPENAI_API_KEY set. The fetch wrapper prints which endpoint each call hits.
// If your org is region-pinned, also set OPENAI_BASE_URL (e.g. https://us.api.openai.com/v1):
// OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://us.api.openai.com/v1 node test.mjs
import { OpenAI } from "openai";
import { Factuality, LLMClassifierFromTemplate, init } from "./jsdist/index.mjs";
const client = new OpenAI({
baseURL: process.env.OPENAI_BASE_URL, // undefined → SDK default (api.openai.com)
fetch: (url, opts) => {
const u = typeof url === "string" ? url : url.url;
console.log(" request →", new URL(u).pathname);
return fetch(url, opts);
},
});
init({ client });
const data = { output: "6", expected: "6", input: "Add the numbers 1, 2, 3" };
console.log("gpt-4.1 (default → expect /chat/completions):");
console.log(" score =", (await Factuality({ ...data, model: "gpt-4.1" })).score);
console.log("gpt-4.1 + useResponsesApi:true (→ expect /responses):");
console.log(
" score =",
(await Factuality({ ...data, model: "gpt-4.1", useResponsesApi: true })).score,
);
// Built-in named scorers don't forward reasoningEffort yet, so use LLMClassifierFromTemplate here.
console.log("gpt-5.4 + medium reasoning (gpt-5 family → expect /responses):");
const clf = LLMClassifierFromTemplate({
name: "match",
promptTemplate: "Is the submission {{output}} equal to {{expected}}? Answer Y or N.",
choiceScores: { Y: 1, N: 0 },
model: "gpt-5.4",
reasoningEffort: "medium",
});
console.log(" score =", (await clf({ ...data })).score);
```
</details>
0 commit comments