Skip to content

Commit 3492708

Browse files
lyzgeorgeclaude
andcommitted
docs: document reasoning and thinking translation, add handler tests
Add a Reasoning & Extended Thinking section to the README, highlight the feature in the intro and features list, and cover the capability gating with new handler tests for the Anthropic /v1/messages surface and additional cases for /v1/chat/completions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3a65946 commit 3492708

4 files changed

Lines changed: 474 additions & 1 deletion

File tree

README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Copilot API Proxy
22

3+
**One Copilot subscription. Every frontier reasoning model. OpenAI and Anthropic shaped.** Point Claude Code, Cline, or your own scripts at a single localhost URL and unlock Claude Sonnet 4.6, GPT-5, Gemini, and friends — with real reasoning traces and thinking budgets routed to whichever knob the upstream model actually supports.
4+
35
> [!WARNING]
46
> This is a reverse-engineered proxy of GitHub Copilot API. It is not supported by GitHub, and may break unexpectedly. Use at your own risk.
57
@@ -32,6 +34,7 @@ A reverse-engineered proxy for the GitHub Copilot API that exposes it as an Open
3234
## Features
3335

3436
- **OpenAI & Anthropic Compatibility**: Exposes GitHub Copilot as an OpenAI-compatible (`/v1/chat/completions`, `/v1/models`, `/v1/embeddings`) and Anthropic-compatible (`/v1/messages`) API.
37+
- **Reasoning & Extended Thinking**: Capability-aware translation of `reasoning_effort` and Anthropic `thinking` blocks. Thinking traces, signatures, and `reasoning_opaque` tokens flow through both non-streaming and streaming responses without you having to know which upstream flag each model wants.
3538
- **Claude Code Integration**: Easily configure and launch [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) to use Copilot as its backend with a simple command-line flag (`--claude-code`).
3639
- **Usage Dashboard**: A web-based dashboard to monitor your Copilot API usage, view quotas, and see detailed statistics.
3740
- **Rate Limit Control**: Manage API usage with rate-limiting options (`--rate-limit`) and a waiting mechanism (`--wait`) to prevent errors from rapid requests.
@@ -278,6 +281,56 @@ The dashboard provides a user-friendly interface to view your Copilot usage data
278281
- **URL-based Configuration**: You can also specify the API endpoint directly in the URL using a query parameter. This is useful for bookmarks or sharing links. For example:
279282
`https://ericc-ch.github.io/copilot-api?endpoint=http://your-api-server/usage`
280283

284+
## Reasoning & Extended Thinking
285+
286+
Each Copilot model advertises its own reasoning knobs under `capabilities.supports`. The proxy reads them at startup and translates requests accordingly, so the same client call works across Claude, GPT, Gemini, and friends.
287+
288+
### OpenAI-shaped requests (`/v1/chat/completions`)
289+
290+
- `reasoning_effort` (`low` | `medium` | `high`, plus `minimal` for GPT-5 family) is passed through to any model whose `supports.reasoning_effort` is non-empty. Other models get it stripped.
291+
- `thinking_budget` is passed through only when the model advertises `supports.adaptive_thinking` (currently Claude Sonnet 4.5+/4.6, Opus 4.6). Unsupported models silently drop it.
292+
- Claude reasoning responses surface as `reasoning_text` and `reasoning_opaque` on the assistant message.
293+
294+
```sh
295+
# GPT-5 mini with heavy reasoning
296+
curl http://localhost:4141/v1/chat/completions \
297+
-H "Content-Type: application/json" \
298+
-d '{
299+
"model": "gpt-5-mini",
300+
"reasoning_effort": "high",
301+
"messages": [{"role": "user", "content": "Think carefully: what is 17*23?"}]
302+
}'
303+
304+
# Claude Sonnet 4.6 with an explicit thinking budget
305+
curl http://localhost:4141/v1/chat/completions \
306+
-H "Content-Type: application/json" \
307+
-d '{
308+
"model": "claude-sonnet-4.6",
309+
"reasoning_effort": "high",
310+
"thinking_budget": 2048,
311+
"messages": [{"role": "user", "content": "Think carefully: what is 17*23?"}]
312+
}'
313+
```
314+
315+
### Anthropic-shaped requests (`/v1/messages`)
316+
317+
- `thinking: {"type": "enabled", "budget_tokens": N}` is translated into `reasoning_effort: "high"` for any reasoning-capable model, plus `thinking_budget` for adaptive-thinking models.
318+
- `thinking: {"type": "disabled"}` suppresses both fields upstream.
319+
- If the selected model supports neither knob, the thinking config is silently stripped and logged at debug level — the request still succeeds.
320+
- Claude thinking streams emit `content_block_start` / `thinking_delta` / `signature_delta` / `content_block_stop` events before the text block, so Claude Code and similar clients see native thinking UIs.
321+
322+
```sh
323+
# Extended thinking via the Anthropic surface
324+
curl http://localhost:4141/v1/messages \
325+
-H "Content-Type: application/json" \
326+
-d '{
327+
"model": "claude-sonnet-4.6",
328+
"max_tokens": 1024,
329+
"thinking": {"type": "enabled", "budget_tokens": 2048},
330+
"messages": [{"role": "user", "content": "Think carefully: what is 17*23?"}]
331+
}'
332+
```
333+
281334
## Using with Claude Code
282335

283336
This proxy can be used to power [Claude Code](https://docs.anthropic.com/en/claude-code), an experimental conversational AI assistant for developers from Anthropic.

tests/anthropic-request.test.ts

Lines changed: 120 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,36 @@ import { z } from "zod"
33

44
import type { AnthropicMessagesPayload } from "~/routes/messages/anthropic-types"
55

6+
import type { Model } from "../src/services/copilot/get-models"
7+
68
import { translateToOpenAI } from "../src/routes/messages/non-stream-translation"
7-
import { buildAnthropicReasoningContext } from "../src/routes/reasoning-context"
9+
import {
10+
buildAnthropicReasoningContext,
11+
buildOpenAIReasoningContext,
12+
} from "../src/routes/reasoning-context"
13+
14+
function makeModel(
15+
id: string,
16+
supports: Model["capabilities"]["supports"],
17+
): Model {
18+
return {
19+
id,
20+
model_picker_enabled: true,
21+
name: id,
22+
object: "model",
23+
preview: false,
24+
vendor: "test",
25+
version: "1",
26+
capabilities: {
27+
family: id,
28+
limits: {},
29+
object: "model_capabilities",
30+
supports,
31+
tokenizer: "test",
32+
type: "chat",
33+
},
34+
}
35+
}
836

937
const disabledReasoningContext = {
1038
reasoningEffort: undefined,
@@ -364,6 +392,97 @@ describe("reasoning context helpers", () => {
364392
})
365393
})
366394

395+
test("reasoning_effort-only model gets reasoning_effort but no thinking_budget", () => {
396+
expect(
397+
buildAnthropicReasoningContext(
398+
{
399+
model: "gpt-5-mini",
400+
messages: [],
401+
max_tokens: 1024,
402+
thinking: { type: "enabled", budget_tokens: 2048 },
403+
},
404+
makeModel("gpt-5-mini", {
405+
reasoning_effort: ["low", "medium", "high"],
406+
}),
407+
),
408+
).toEqual({
409+
reasoningEffort: "high",
410+
thinkingBudget: undefined,
411+
})
412+
})
413+
414+
test("disabled thinking returns an empty context regardless of capability", () => {
415+
expect(
416+
buildAnthropicReasoningContext(
417+
{
418+
model: "claude-sonnet-4.6",
419+
messages: [],
420+
max_tokens: 1024,
421+
thinking: { type: "disabled" },
422+
},
423+
makeModel("claude-sonnet-4.6", {
424+
adaptive_thinking: true,
425+
reasoning_effort: ["low", "medium", "high"],
426+
}),
427+
),
428+
).toEqual({})
429+
})
430+
431+
test("buildOpenAIReasoningContext keeps supported fields and drops unsupported ones", () => {
432+
const claudeModel = makeModel("claude-sonnet-4.6", {
433+
adaptive_thinking: true,
434+
reasoning_effort: ["low", "medium", "high"],
435+
})
436+
expect(
437+
buildOpenAIReasoningContext(
438+
{
439+
model: "claude-sonnet-4.6",
440+
messages: [],
441+
reasoning_effort: "high",
442+
thinking_budget: 2048,
443+
},
444+
claudeModel,
445+
),
446+
).toEqual({
447+
reasoningEffort: "high",
448+
thinkingBudget: 2048,
449+
})
450+
451+
const gptModel = makeModel("gpt-5-mini", {
452+
reasoning_effort: ["low", "medium", "high"],
453+
})
454+
expect(
455+
buildOpenAIReasoningContext(
456+
{
457+
model: "gpt-5-mini",
458+
messages: [],
459+
reasoning_effort: "high",
460+
thinking_budget: 2048,
461+
},
462+
gptModel,
463+
),
464+
).toEqual({
465+
reasoningEffort: "high",
466+
thinkingBudget: undefined,
467+
})
468+
469+
const plainModel = makeModel("gpt-4o", {})
470+
expect(
471+
buildOpenAIReasoningContext(
472+
{
473+
model: "gpt-4o",
474+
messages: [],
475+
reasoning_effort: "high",
476+
thinking_budget: 2048,
477+
},
478+
plainModel,
479+
),
480+
).toEqual({
481+
reasoningEffort: undefined,
482+
thinkingBudget: undefined,
483+
})
484+
})
485+
367486
test("unsupported model does not expose Anthropic adaptive thinking fields", () => {
368487
expect(
369488
buildAnthropicReasoningContext(

tests/chat-completions-handler.test.ts

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,4 +200,92 @@ describe("handleCompletion reasoning normalization", () => {
200200
"gpt-adaptive",
201201
)
202202
})
203+
204+
test("reasoning_effort-only model keeps reasoning_effort and drops thinking_budget", async () => {
205+
state.models = {
206+
object: "list",
207+
data: [
208+
{
209+
id: "gpt-reasoning",
210+
name: "GPT Reasoning",
211+
object: "model",
212+
model_picker_enabled: true,
213+
preview: false,
214+
vendor: "openai",
215+
version: "1",
216+
capabilities: {
217+
family: "gpt",
218+
object: "model_capabilities",
219+
tokenizer: "gpt",
220+
type: "chat",
221+
supports: {
222+
reasoning_effort: ["low", "medium", "high"],
223+
},
224+
limits: {
225+
max_output_tokens: 4096,
226+
},
227+
},
228+
},
229+
],
230+
}
231+
232+
const payload = {
233+
messages: [{ role: "user", content: "hello" }],
234+
model: "gpt-reasoning",
235+
reasoning_effort: "high",
236+
thinking_budget: 2048,
237+
} satisfies ChatCompletionsPayload
238+
239+
await handleCompletion(createContext(payload))
240+
241+
expect(fetchMock).toHaveBeenCalledTimes(1)
242+
const body = getLastRequestBody()
243+
expect(body.reasoning_effort).toBe("high")
244+
expect(body.thinking_budget).toBeUndefined()
245+
expect(debugMock).toHaveBeenCalledWith(
246+
"Dropping unsupported OpenAI thinking_budget for model:",
247+
"gpt-reasoning",
248+
)
249+
})
250+
251+
test("plain model without reasoning capabilities drops both fields", async () => {
252+
state.models = {
253+
object: "list",
254+
data: [
255+
{
256+
id: "gpt-4o",
257+
name: "GPT-4o",
258+
object: "model",
259+
model_picker_enabled: true,
260+
preview: false,
261+
vendor: "openai",
262+
version: "1",
263+
capabilities: {
264+
family: "gpt",
265+
object: "model_capabilities",
266+
tokenizer: "gpt",
267+
type: "chat",
268+
supports: {},
269+
limits: {
270+
max_output_tokens: 4096,
271+
},
272+
},
273+
},
274+
],
275+
}
276+
277+
const payload = {
278+
messages: [{ role: "user", content: "hello" }],
279+
model: "gpt-4o",
280+
reasoning_effort: "high",
281+
thinking_budget: 2048,
282+
} satisfies ChatCompletionsPayload
283+
284+
await handleCompletion(createContext(payload))
285+
286+
expect(fetchMock).toHaveBeenCalledTimes(1)
287+
const body = getLastRequestBody()
288+
expect(body.reasoning_effort).toBeUndefined()
289+
expect(body.thinking_budget).toBeUndefined()
290+
})
203291
})

0 commit comments

Comments
 (0)