diff --git a/README.md b/README.md
index 0d36c13c9..f80bd152a 100644
--- a/README.md
+++ b/README.md
@@ -1,351 +1,110 @@
 # Copilot API Proxy
 
 > [!WARNING]
-> This is a reverse-engineered proxy of GitHub Copilot API. It is not supported by GitHub, and may break unexpectedly. Use at your own risk.
+> This project proxies GitHub Copilot into OpenAI-compatible and Anthropic-compatible endpoints. It is unofficial and may break if upstream behavior changes.
 
-> [!WARNING]
-> **GitHub Security Notice:**  
-> Excessive automated or scripted use of Copilot (including rapid or bulk requests, such as via automated tools) may trigger GitHub's abuse-detection systems.  
-> You may receive a warning from GitHub Security, and further anomalous activity could result in temporary suspension of your Copilot access.
->
-> GitHub prohibits use of their servers for excessive automated bulk activity or any activity that places undue burden on their infrastructure.
->
-> Please review:
->
-> - [GitHub Acceptable Use Policies](https://docs.github.com/site-policy/acceptable-use-policies/github-acceptable-use-policies#4-spam-and-inauthentic-activity-on-github)
-> - [GitHub Copilot Terms](https://docs.github.com/site-policy/github-terms/github-terms-for-additional-products-and-features#github-copilot)
->
-> Use this proxy responsibly to avoid account restrictions.
-
-[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/E1E519XS7W)
-
----
+## Overview
 
-**Note:** If you are using [opencode](https://github.com/sst/opencode), you do not need this project. Opencode supports GitHub Copilot provider out of the box.
+This service exposes GitHub Copilot through a small compatibility layer so it can be used by tools expecting OpenAI or Anthropic style APIs.
 
----
-
-## Project Overview
-
-A reverse-engineered proxy for the GitHub Copilot API that exposes it as an OpenAI and Anthropic compatible service. This allows you to use GitHub Copilot with any tool that supports the OpenAI Chat Completions API or the Anthropic Messages API, including to power [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview).
+Compared with the original upstream project, this fork keeps the README intentionally simpler and includes support for the `responses` passthrough endpoint.
 
 ## Features
 
-- **OpenAI & Anthropic Compatibility**: Exposes GitHub Copilot as an OpenAI-compatible (`/v1/chat/completions`, `/v1/models`, `/v1/embeddings`) and Anthropic-compatible (`/v1/messages`) API.
-- **Claude Code Integration**: Easily configure and launch [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) to use Copilot as its backend with a simple command-line flag (`--claude-code`).
-- **Usage Dashboard**: A web-based dashboard to monitor your Copilot API usage, view quotas, and see detailed statistics.
-- **Rate Limit Control**: Manage API usage with rate-limiting options (`--rate-limit`) and a waiting mechanism (`--wait`) to prevent errors from rapid requests.
-- **Manual Request Approval**: Manually approve or deny each API request for fine-grained control over usage (`--manual`).
-- **Token Visibility**: Option to display GitHub and Copilot tokens during authentication and refresh for debugging (`--show-token`).
-- **Flexible Authentication**: Authenticate interactively or provide a GitHub token directly, suitable for CI/CD environments.
-- **Support for Different Account Types**: Works with individual, business, and enterprise GitHub Copilot plans.
-
-## Demo
-
-https://github.com/user-attachments/assets/7654b383-669d-4eb9-b23c-06d7aefee8c5
-
-## Prerequisites
-
-- Bun (>= 1.2.x)
-- GitHub account with Copilot subscription (individual, business, or enterprise)
+- OpenAI-compatible endpoints for chat, models, embeddings, and responses
+- Anthropic-compatible messages endpoint
+- Optional downstream API key protection from CLI arg or environment variable
+- Usage and token inspection endpoints
+- Optional rate limit control and manual approval flow
+- Support for individual, business, and enterprise Copilot accounts
 
 ## Installation
 
-To install dependencies, run:
-
 ```sh
 bun install
 ```
 
-## Using with Docker
-
-Build image
-
-```sh
-docker build -t copilot-api .
-```
-
-Run the container
-
-```sh
-# Create a directory on your host to persist the GitHub token and related data
-mkdir -p ./copilot-data
-
-# Run the container with a bind mount to persist the token
-# This ensures your authentication survives container restarts
-
-docker run -p 4141:4141 -v $(pwd)/copilot-data:/root/.local/share/copilot-api copilot-api
-```
-
-> **Note:**
-> The GitHub token and related data will be stored in `copilot-data` on your host. This is mapped to `/root/.local/share/copilot-api` inside the container, ensuring persistence across restarts.
-
-### Docker with Environment Variables
-
-You can pass the GitHub token directly to the container using environment variables:
-
-```sh
-# Build with GitHub token
-docker build --build-arg GH_TOKEN=your_github_token_here -t copilot-api .
-
-# Run with GitHub token
-docker run -p 4141:4141 -e GH_TOKEN=your_github_token_here copilot-api
-
-# Run with additional options
-docker run -p 4141:4141 -e GH_TOKEN=your_token copilot-api start --verbose --port 4141
-```
-
-### Docker Compose Example
-
-```yaml
-version: "3.8"
-services:
-  copilot-api:
-    build: .
-    ports:
-      - "4141:4141"
-    environment:
-      - GH_TOKEN=your_github_token_here
-    restart: unless-stopped
-```
-
-The Docker image includes:
-
-- Multi-stage build for optimized image size
-- Non-root user for enhanced security
-- Health check for container monitoring
-- Pinned base image version for reproducible builds
-
-## Using with npx
+## Run
 
-You can run the project directly using npx:
+Development:
 
 ```sh
-npx copilot-api@latest start
-```
-
-With options:
-
-```sh
-npx copilot-api@latest start --port 8080
+bun run dev
 ```
 
-For authentication only:
+Production:
 
 ```sh
-npx copilot-api@latest auth
+bun run start
 ```
 
-## Command Structure
+## Common Commands
 
-Copilot API now uses a subcommand structure with these main commands:
+- Build: `bun run build`
+- Lint: `bun run lint`
+- Test: `bun test`
+- Start: `bun run start`
 
-- `start`: Start the Copilot API server. This command will also handle authentication if needed.
-- `auth`: Run GitHub authentication flow without starting the server. This is typically used if you need to generate a token for use with the `--github-token` option, especially in non-interactive environments.
-- `check-usage`: Show your current GitHub Copilot usage and quota information directly in the terminal (no server required).
-- `debug`: Display diagnostic information including version, runtime details, file paths, and authentication status. Useful for troubleshooting and support.
+## API Key Protection
 
-## Command Line Options
+You can require clients to send an API key for all incoming requests.
 
-### Start Command Options
+Priority order:
 
-The following command line options are available for the `start` command:
+- CLI arg `--api-key`
+- env `API_KEY`
+- env `COPILOT_API_KEY`
+- default empty, meaning disabled
 
-| Option         | Description                                                                   | Default    | Alias |
-| -------------- | ----------------------------------------------------------------------------- | ---------- | ----- |
-| --port         | Port to listen on                                                             | 4141       | -p    |
-| --verbose      | Enable verbose logging                                                        | false      | -v    |
-| --account-type | Account type to use (individual, business, enterprise)                        | individual | -a    |
-| --manual       | Enable manual request approval                                                | false      | none  |
-| --rate-limit   | Rate limit in seconds between requests                                        | none       | -r    |
-| --wait         | Wait instead of error when rate limit is hit                                  | false      | -w    |
-| --github-token | Provide GitHub token directly (must be generated using the `auth` subcommand) | none       | -g    |
-| --claude-code  | Generate a command to launch Claude Code with Copilot API config              | false      | -c    |
-| --show-token   | Show GitHub and Copilot tokens on fetch and refresh                           | false      | none  |
-| --proxy-env    | Initialize proxy from environment variables                                   | false      | none  |
+Example:
 
-### Auth Command Options
+`bun run start -- --port 3000 --api-key my-secret-key`
 
-| Option       | Description               | Default | Alias |
-| ------------ | ------------------------- | ------- | ----- |
-| --verbose    | Enable verbose logging    | false   | -v    |
-| --show-token | Show GitHub token on auth | false   | none  |
+Then call the API with either header:
 
-### Debug Command Options
-
-| Option | Description               | Default | Alias |
-| ------ | ------------------------- | ------- | ----- |
-| --json | Output debug info as JSON | false   | none  |
+- `Authorization: Bearer my-secret-key`
+- `x-api-key: my-secret-key`
 
 ## API Endpoints
 
-The server exposes several endpoints to interact with the Copilot API. It provides OpenAI-compatible endpoints and now also includes support for Anthropic-compatible endpoints, allowing for greater flexibility with different tools and services.
-
-### OpenAI Compatible Endpoints
-
-These endpoints mimic the OpenAI API structure.
+### OpenAI-compatible
 
-| Endpoint                    | Method | Description                                               |
-| --------------------------- | ------ | --------------------------------------------------------- |
-| `POST /v1/chat/completions` | `POST` | Creates a model response for the given chat conversation. |
-| `GET /v1/models`            | `GET`  | Lists the currently available models.                     |
-| `POST /v1/embeddings`       | `POST` | Creates an embedding vector representing the input text.  |
+| Endpoint | Method | Notes |
+| --- | --- | --- |
+| `/chat/completions` | `POST` | Chat completions passthrough |
+| `/v1/chat/completions` | `POST` | Chat completions passthrough |
+| `/embeddings` | `POST` | Embeddings passthrough |
+| `/v1/embeddings` | `POST` | Embeddings passthrough |
+| `/models` | `GET` | Model list |
+| `/v1/models` | `GET` | Model list |
+| `/responses` | `POST` | Responses passthrough |
+| `/v1/responses` | `POST` | Responses passthrough |
 
-### Anthropic Compatible Endpoints
+### Anthropic-compatible
 
-These endpoints are designed to be compatible with the Anthropic Messages API.
+| Endpoint | Method | Notes |
+| --- | --- | --- |
+| `/v1/messages` | `POST` | Anthropic messages compatibility |
+| `/v1/messages/count_tokens` | `POST` | Token counting |
 
-| Endpoint                         | Method | Description                                                  |
-| -------------------------------- | ------ | ------------------------------------------------------------ |
-| `POST /v1/messages`              | `POST` | Creates a model response for a given conversation.           |
-| `POST /v1/messages/count_tokens` | `POST` | Calculates the number of tokens for a given set of messages. |
+### Utility
 
-### Usage Monitoring Endpoints
+| Endpoint | Method | Notes |
+| --- | --- | --- |
+| `/usage` | `GET` | Usage information |
+| `/token` | `GET` | Current Copilot token |
 
-New endpoints for monitoring your Copilot usage and quotas.
+## `responses` Support
 
-| Endpoint     | Method | Description                                                  |
-| ------------ | ------ | ------------------------------------------------------------ |
-| `GET /usage` | `GET`  | Get detailed Copilot usage statistics and quota information. |
-| `GET /token` | `GET`  | Get the current Copilot token being used by the API.         |
+This fork adds direct passthrough for the OpenAI-style `responses` API:
 
-## Example Usage
+- `POST /responses`
+- `POST /v1/responses`
 
-Using with npx:
-
-```sh
-# Basic usage with start command
-npx copilot-api@latest start
-
-# Run on custom port with verbose logging
-npx copilot-api@latest start --port 8080 --verbose
-
-# Use with a business plan GitHub account
-npx copilot-api@latest start --account-type business
-
-# Use with an enterprise plan GitHub account
-npx copilot-api@latest start --account-type enterprise
-
-# Enable manual approval for each request
-npx copilot-api@latest start --manual
-
-# Set rate limit to 30 seconds between requests
-npx copilot-api@latest start --rate-limit 30
-
-# Wait instead of error when rate limit is hit
-npx copilot-api@latest start --rate-limit 30 --wait
-
-# Provide GitHub token directly
-npx copilot-api@latest start --github-token ghp_YOUR_TOKEN_HERE
-
-# Run only the auth flow
-npx copilot-api@latest auth
-
-# Run auth flow with verbose logging
-npx copilot-api@latest auth --verbose
-
-# Show your Copilot usage/quota in the terminal (no server needed)
-npx copilot-api@latest check-usage
-
-# Display debug information for troubleshooting
-npx copilot-api@latest debug
-
-# Display debug information in JSON format
-npx copilot-api@latest debug --json
-
-# Initialize proxy from environment variables (HTTP_PROXY, HTTPS_PROXY, etc.)
-npx copilot-api@latest start --proxy-env
-```
-
-## Using the Usage Viewer
-
-After starting the server, a URL to the Copilot Usage Dashboard will be displayed in your console. This dashboard is a web interface for monitoring your API usage.
-
-1.  Start the server. For example, using npx:
-    ```sh
-    npx copilot-api@latest start
-    ```
-2.  The server will output a URL to the usage viewer. Copy and paste this URL into your browser. It will look something like this:
-    `https://ericc-ch.github.io/copilot-api?endpoint=http://localhost:4141/usage`
-    - If you use the `start.bat` script on Windows, this page will open automatically.
-
-The dashboard provides a user-friendly interface to view your Copilot usage data:
-
-- **API Endpoint URL**: The dashboard is pre-configured to fetch data from your local server endpoint via the URL query parameter. You can change this URL to point to any other compatible API endpoint.
-- **Fetch Data**: Click the "Fetch" button to load or refresh the usage data. The dashboard will automatically fetch data on load.
-- **Usage Quotas**: View a summary of your usage quotas for different services like Chat and Completions, displayed with progress bars for a quick overview.
-- **Detailed Information**: See the full JSON response from the API for a detailed breakdown of all available usage statistics.
-- **URL-based Configuration**: You can also specify the API endpoint directly in the URL using a query parameter. This is useful for bookmarks or sharing links. For example:
-  `https://ericc-ch.github.io/copilot-api?endpoint=http://your-api-server/usage`
-
-## Using with Claude Code
-
-This proxy can be used to power [Claude Code](https://docs.anthropic.com/en/claude-code), an experimental conversational AI assistant for developers from Anthropic.
-
-There are two ways to configure Claude Code to use this proxy:
-
-### Interactive Setup with `--claude-code` flag
-
-To get started, run the `start` command with the `--claude-code` flag:
-
-```sh
-npx copilot-api@latest start --claude-code
-```
-
-You will be prompted to select a primary model and a "small, fast" model for background tasks. After selecting the models, a command will be copied to your clipboard. This command sets the necessary environment variables for Claude Code to use the proxy.
-
-Paste and run this command in a new terminal to launch Claude Code.
-
-### Manual Configuration with `settings.json`
-
-Alternatively, you can configure Claude Code by creating a `.claude/settings.json` file in your project's root directory. This file should contain the environment variables needed by Claude Code. This way you don't need to run the interactive setup every time.
-
-Here is an example `.claude/settings.json` file:
-
-```json
-{
-  "env": {
-    "ANTHROPIC_BASE_URL": "http://localhost:4141",
-    "ANTHROPIC_AUTH_TOKEN": "dummy",
-    "ANTHROPIC_MODEL": "gpt-4.1",
-    "ANTHROPIC_DEFAULT_SONNET_MODEL": "gpt-4.1",
-    "ANTHROPIC_SMALL_FAST_MODEL": "gpt-4.1",
-    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-4.1",
-    "DISABLE_NON_ESSENTIAL_MODEL_CALLS": "1",
-    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
-  },
-  "permissions": {
-    "deny": [
-      "WebSearch"
-    ]
-  }
-}
-```
-
-You can find more options here: [Claude Code settings](https://docs.anthropic.com/en/docs/claude-code/settings#environment-variables)
-
-You can also read more about IDE integration here: [Add Claude Code to your IDE](https://docs.anthropic.com/en/docs/claude-code/ide-integrations)
-
-## Running from Source
-
-The project can be run from source in several ways:
-
-### Development Mode
-
-```sh
-bun run dev
-```
-
-### Production Mode
-
-```sh
-bun run start
-```
+The server forwards the incoming request body to Copilot's `responses` endpoint and returns the upstream response body, headers, and status code directly.
 
-## Usage Tips
+## Notes
 
-- To avoid hitting GitHub Copilot's rate limits, you can use the following flags:
-  - `--manual`: Enables manual approval for each request, giving you full control over when requests are sent.
-  - `--rate-limit <seconds>`: Enforces a minimum time interval between requests. For example, `copilot-api start --rate-limit 30` will ensure there's at least a 30-second gap between requests.
-  - `--wait`: Use this with `--rate-limit`. It makes the server wait for the cooldown period to end instead of rejecting the request with an error. This is useful for clients that don't automatically retry on rate limit errors.
-- If you have a GitHub business or enterprise plan account with Copilot, use the `--account-type` flag (e.g., `--account-type business`). See the [official documentation](https://docs.github.com/en/enterprise-cloud@latest/copilot/managing-copilot/managing-github-copilot-in-your-organization/managing-access-to-github-copilot-in-your-organization/managing-github-copilot-access-to-your-organizations-network#configuring-copilot-subscription-based-network-routing-for-your-enterprise-or-organization) for more details.
+- Requires Bun and a valid GitHub Copilot subscription
+- Authentication and runtime behavior still follow the existing project implementation
+- Use responsibly and avoid abusive automated traffic patterns
diff --git a/bun.lock b/bun.lock
index 20e895e7f..9ece87578 100644
--- a/bun.lock
+++ b/bun.lock
@@ -1,5 +1,6 @@
 {
   "lockfileVersion": 1,
+  "configVersion": 0,
   "workspaces": {
     "": {
       "name": "copilot-api",
diff --git a/src/lib/api-key.ts b/src/lib/api-key.ts
new file mode 100644
index 000000000..2ca0008ed
--- /dev/null
+++ b/src/lib/api-key.ts
@@ -0,0 +1,137 @@
+import type { Context, Next } from "hono"
+
+import consola from "consola"
+
+import { state } from "~/lib/state"
+
+const AUTH_WINDOW_MS = 5 * 60 * 1000
+const AUTH_MAX_FAILURES = 10
+const AUTH_BLOCK_MS = 15 * 60 * 1000
+
+function getBearerToken(
+  authorizationHeader: string | undefined,
+): string | undefined {
+  if (!authorizationHeader) return undefined
+
+  const [scheme, token] = authorizationHeader.trim().split(/\s+/, 2)
+  if (scheme.toLowerCase() !== "bearer" || !token) return undefined
+
+  return token
+}
+
+function getForwardedIp(
+  forwardedHeader: string | undefined,
+): string | undefined {
+  if (!forwardedHeader) return undefined
+
+  const match = forwardedHeader.match(/for="?\[?([^;,"]+)/i)
+  return match?.[1]?.trim()
+}
+
+function getClientAddress(c: Context): string {
+  const candidates = [
+    c.req.header("cf-connecting-ip"),
+    c.req.header("x-real-ip"),
+    c.req.header("x-client-ip"),
+    c.req.header("x-forwarded-for")?.split(",")[0]?.trim(),
+    getForwardedIp(c.req.header("forwarded")),
+    c.req.header("fly-client-ip"),
+  ]
+
+  return candidates.find((value) => value && value.length > 0) ?? "unknown"
+}
+
+function getRequestTarget(c: Context): string {
+  try {
+    const url = new URL(c.req.url)
+    return `${c.req.method} ${url.pathname}`
+  } catch {
+    return `${c.req.method} unknown`
+  }
+}
+
+function isClientBlocked(clientAddress: string, now: number): boolean {
+  const entry = state.authFailures.get(clientAddress)
+  if (!entry?.blockedUntil) return false
+
+  if (entry.blockedUntil <= now) {
+    state.authFailures.delete(clientAddress)
+    return false
+  }
+
+  return true
+}
+
+function recordAuthFailure(clientAddress: string, now: number): void {
+  const entry = state.authFailures.get(clientAddress)
+
+  if (!entry || entry.resetAt <= now) {
+    state.authFailures.set(clientAddress, {
+      blockedUntil: undefined,
+      count: 1,
+      resetAt: now + AUTH_WINDOW_MS,
+    })
+    return
+  }
+
+  entry.count += 1
+  if (entry.count >= AUTH_MAX_FAILURES) {
+    entry.blockedUntil = now + AUTH_BLOCK_MS
+  }
+}
+
+function clearAuthFailures(clientAddress: string): void {
+  state.authFailures.delete(clientAddress)
+}
+
+function rejectUnauthorized() {
+  return {
+    error: {
+      message: "Invalid API key",
+      type: "authentication_error",
+    },
+  }
+}
+
+export async function safeRequestLogger(c: Context, next: Next) {
+  const startedAt = Date.now()
+  const target = getRequestTarget(c)
+
+  consola.info(`<-- ${target}`)
+  await next()
+  consola.info(`--> ${target} ${c.res.status} ${Date.now() - startedAt}ms`)
+}
+
+export async function requireApiKey(c: Context, next: Next) {
+  if (!state.apiKey) {
+    await next()
+    return
+  }
+
+  const now = Date.now()
+  const clientAddress = getClientAddress(c)
+
+  if (isClientBlocked(clientAddress, now)) {
+    consola.warn(
+      `Blocked API key request from ${clientAddress} to ${getRequestTarget(c)}`,
+    )
+    return c.json(rejectUnauthorized(), 429)
+  }
+
+  const authorization = c.req.header("authorization")
+  const bearerToken = getBearerToken(authorization)
+  const xApiKey = c.req.header("x-api-key")
+
+  if (bearerToken === state.apiKey || xApiKey === state.apiKey) {
+    clearAuthFailures(clientAddress)
+    await next()
+    return
+  }
+
+  recordAuthFailure(clientAddress, now)
+  consola.warn(
+    `Rejected API key request from ${clientAddress} to ${getRequestTarget(c)}`,
+  )
+
+  return c.json(rejectUnauthorized(), 401)
+}
diff --git a/src/lib/error.ts b/src/lib/error.ts
index c39c22596..02b42fc15 100644
--- a/src/lib/error.ts
+++ b/src/lib/error.ts
@@ -3,6 +3,8 @@ import type { ContentfulStatusCode } from "hono/utils/http-status"
 
 import consola from "consola"
 
+import { buildPassthroughHeaders } from "~/lib/transport"
+
 export class HTTPError extends Error {
   response: Response
 
@@ -12,8 +14,107 @@ export class HTTPError extends Error {
   }
 }
 
+function getJsonErrorBody(errorJson: unknown, fallbackText: string) {
+  if (
+    typeof errorJson === "object"
+    && errorJson !== null
+    && "error" in errorJson
+    && typeof errorJson.error === "object"
+    && errorJson.error !== null
+  ) {
+    return errorJson
+  }
+
+  return {
+    error: {
+      message: fallbackText,
+      type: "error",
+    },
+  }
+}
+
+function isAnthropicRoute(c: Context): boolean {
+  return (
+    c.req.path === "/messages"
+    || c.req.path === "/v1/messages"
+    || c.req.path === "/messages/count_tokens"
+    || c.req.path === "/v1/messages/count_tokens"
+  )
+}
+
+function extractErrorMessage(errorJson: unknown, fallbackText: string): string {
+  if (
+    typeof errorJson === "object"
+    && errorJson !== null
+    && "error" in errorJson
+    && typeof errorJson.error === "object"
+    && errorJson.error !== null
+    && "message" in errorJson.error
+    && typeof errorJson.error.message === "string"
+  ) {
+    return errorJson.error.message
+  }
+
+  return fallbackText
+}
+
+function extractErrorType(errorJson: unknown): string {
+  if (
+    typeof errorJson === "object"
+    && errorJson !== null
+    && "error" in errorJson
+    && typeof errorJson.error === "object"
+    && errorJson.error !== null
+    && "type" in errorJson.error
+    && typeof errorJson.error.type === "string"
+  ) {
+    return errorJson.error.type
+  }
+
+  return "api_error"
+}
+
+function getAnthropicErrorBody(errorJson: unknown, fallbackText: string) {
+  if (
+    typeof errorJson === "object"
+    && errorJson !== null
+    && "type" in errorJson
+    && errorJson.type === "error"
+    && "error" in errorJson
+    && typeof errorJson.error === "object"
+    && errorJson.error !== null
+  ) {
+    return errorJson
+  }
+
+  if (
+    typeof errorJson === "object"
+    && errorJson !== null
+    && "error" in errorJson
+    && typeof errorJson.error === "object"
+    && errorJson.error !== null
+  ) {
+    return {
+      type: "error",
+      error: {
+        type: extractErrorType(errorJson),
+        message: extractErrorMessage(errorJson, fallbackText),
+      },
+    }
+  }
+
+  return {
+    type: "error",
+    error: {
+      type: "api_error",
+      message: fallbackText,
+    },
+  }
+}
+
 export async function forwardError(c: Context, error: unknown) {
   consola.error("Error occurred:", error)
+  const surface = isAnthropicRoute(c) ? "anthropic" : "openai"
 
   if (error instanceof HTTPError) {
     const errorText = await error.response.text()
@@ -24,24 +125,37 @@ export async function forwardError(c: Context, error: unknown) {
       errorJson = errorText
     }
     consola.error("HTTP error:", errorJson)
-    return c.json(
+
+    return new Response(
+      JSON.stringify(
+        surface === "anthropic" ?
+          getAnthropicErrorBody(errorJson, errorText)
+        : getJsonErrorBody(errorJson, errorText),
+      ),
       {
-        error: {
-          message: errorText,
-          type: "error",
-        },
+        status: error.response.status as ContentfulStatusCode,
+        headers: buildPassthroughHeaders(error.response.headers, surface, {
+          includeContentType: true,
+        }),
       },
-      error.response.status as ContentfulStatusCode,
     )
   }
 
   return c.json(
-    {
-      error: {
-        message: (error as Error).message,
+    surface === "anthropic" ?
+      {
         type: "error",
+        error: {
+          type: "api_error",
+          message: (error as Error).message,
+        },
+      }
+    : {
+        error: {
+          message: (error as Error).message,
+          type: "error",
+        },
       },
-    },
     500,
   )
 }
diff --git a/src/lib/models.ts b/src/lib/models.ts
new file mode 100644
index 000000000..dc86cdd65
--- /dev/null
+++ b/src/lib/models.ts
@@ -0,0 +1,135 @@
+import type { Model, ModelsResponse } from "~/services/copilot/get-models"
+
+import { state } from "~/lib/state"
+
+const MODEL_CREATED = 0
+const MODEL_CREATED_AT = new Date(MODEL_CREATED).toISOString()
+
+export interface ResolvedModel {
+  requestedModel: string
+  resolvedModel: string
+  canonicalModel?: Model
+}
+
+interface PublicModelEntry {
+  id: string
+  object: "model"
+  type: string
+  created: number
+  created_at: string
+  owned_by: string
+  display_name: string
+  root: string
+  parent: string | null
+  canonical_model_id: string
+  capabilities: Model["capabilities"] & {
+    supports: Model["capabilities"]["supports"] & {
+      streaming: boolean
+      vision: boolean
+      reasoning?: boolean
+    }
+  }
+}
+
+function supportsReasoning(model: Model): boolean {
+  const candidates = [model.id, model.name, model.capabilities.family].map(
+    (value) => value.toLowerCase(),
+  )
+
+  return candidates.some(
+    (value) =>
+      value.includes("reasoning")
+      || /^o\d/.test(value)
+      || value.includes("claude")
+      || value.includes("gpt-4.1"),
+  )
+}
+
+function getModelsResponse(): ModelsResponse {
+  if (!state.models) {
+    throw new Error("Models are not cached")
+  }
+
+  return state.models
+}
+
+function normalizeModelName(value: string): string {
+  return value.trim().toLowerCase()
+}
+
+function getModelAliases(model: Model): Array<string> {
+  return [model.id]
+}
+
+export function resolveModel(modelId: string): ResolvedModel {
+  const normalized = normalizeModelName(modelId)
+  const models = getModelsResponse().data
+
+  for (const model of models) {
+    const aliases = getModelAliases(model)
+    const matchedAlias = aliases.find(
+      (alias) => normalizeModelName(alias) === normalized,
+    )
+
+    if (matchedAlias) {
+      return {
+        requestedModel: modelId,
+        resolvedModel: model.id,
+        canonicalModel: model,
+      }
+    }
+  }
+
+  return {
+    requestedModel: modelId,
+    resolvedModel: modelId,
+    canonicalModel: undefined as never,
+  }
+}
+
+export function getPublicModels(): Array<PublicModelEntry> {
+  const models = getModelsResponse().data
+  const publicModels = new Map<string, PublicModelEntry>()
+
+  for (const model of models) {
+    const aliases = getModelAliases(model)
+
+    for (const alias of aliases) {
+      publicModels.set(alias, {
+        id: alias,
+        object: "model",
+        type: model.capabilities.type,
+        created: MODEL_CREATED,
+        created_at: MODEL_CREATED_AT,
+        owned_by: model.vendor,
+        display_name: model.name,
+        root: model.id,
+        parent: null,
+        canonical_model_id: model.id,
+        capabilities: {
+          ...model.capabilities,
+          supports: {
+            ...model.capabilities.supports,
+            streaming: true,
+            vision: Boolean(model.capabilities.supports.vision),
+            reasoning: supportsReasoning(model),
+          },
+        },
+      })
+    }
+  }
+
+  return [...publicModels.values()].sort((left, right) =>
+    left.id.localeCompare(right.id),
+  )
+}
+
+export function normalizeResolvedModel<T extends { model: string }>(
+  payload: T,
+): T & { model: string } {
+  const resolved = resolveModel(payload.model)
+  return {
+    ...payload,
+    model: resolved.resolvedModel,
+  }
+}
diff --git a/src/lib/state.ts b/src/lib/state.ts
index 5ba4dc1d1..dc7e1dd34 100644
--- a/src/lib/state.ts
+++ b/src/lib/state.ts
@@ -3,6 +3,7 @@ import type { ModelsResponse } from "~/services/copilot/get-models"
 export interface State {
   githubToken?: string
   copilotToken?: string
+  apiKey?: string
 
   accountType: string
   models?: ModelsResponse
@@ -15,10 +16,16 @@ export interface State {
   // Rate limiting configuration
   rateLimitSeconds?: number
   lastRequestTimestamp?: number
+
+  authFailures: Map<
+    string,
+    { count: number; resetAt: number; blockedUntil?: number }
+  >
 }
 
 export const state: State = {
   accountType: "individual",
+  authFailures: new Map(),
   manualApprove: false,
   rateLimitWait: false,
   showToken: false,
diff --git a/src/lib/transport.ts b/src/lib/transport.ts
new file mode 100644
index 000000000..f30f0af5e
--- /dev/null
+++ b/src/lib/transport.ts
@@ -0,0 +1,69 @@
+const requestIdHeaders = [
+  "x-request-id",
+  "request-id",
+  "anthropic-request-id",
+] as const
+
+const passthroughResponseHeaders = [
+  ...requestIdHeaders,
+  "x-github-request-id",
+  "openai-processing-ms",
+  "anthropic-processing-ms",
+  "retry-after",
+] as const
+
+export type TransportSurface = "openai" | "anthropic"
+
+export function getRequestId(headers: Headers): string | null {
+  for (const headerName of requestIdHeaders) {
+    const headerValue = headers.get(headerName)
+    if (headerValue) {
+      return headerValue
+    }
+  }
+
+  return null
+}
+
+export function buildPassthroughHeaders(
+  source: Headers,
+  surface: TransportSurface,
+  options?: {
+    includeContentType?: boolean
+    streaming?: boolean
+  },
+): Headers {
+  const headers = new Headers()
+
+  for (const headerName of passthroughResponseHeaders) {
+    const headerValue = source.get(headerName)
+    if (headerValue) {
+      headers.set(headerName, headerValue)
+    }
+  }
+
+  if (options?.includeContentType) {
+    const contentType = source.get("content-type")
+    if (contentType) {
+      headers.set("content-type", contentType)
+    }
+  }
+
+  const requestId = getRequestId(source)
+  if (requestId) {
+    if (surface === "anthropic") {
+      headers.set("request-id", requestId)
+      headers.set("anthropic-request-id", requestId)
+    } else {
+      headers.set("x-request-id", requestId)
+      headers.set("request-id", requestId)
+    }
+  }
+
+  if (options?.streaming) {
+    headers.set("content-type", "text/event-stream")
+    headers.set("cache-control", "no-cache")
+  }
+
+  return headers
+}
diff --git a/src/routes/chat-completions/handler.ts b/src/routes/chat-completions/handler.ts
index 04a5ae9ed..045e03dbd 100644
--- a/src/routes/chat-completions/handler.ts
+++ b/src/routes/chat-completions/handler.ts
@@ -1,28 +1,44 @@
 import type { Context } from "hono"
 
 import consola from "consola"
-import { streamSSE, type SSEMessage } from "hono/streaming"
+import { streamSSE } from "hono/streaming"
 
 import { awaitApproval } from "~/lib/approval"
+import { resolveModel } from "~/lib/models"
 import { checkRateLimit } from "~/lib/rate-limit"
 import { state } from "~/lib/state"
 import { getTokenCount } from "~/lib/tokenizer"
+import { buildPassthroughHeaders } from "~/lib/transport"
 import { isNullish } from "~/lib/utils"
 import {
   createChatCompletions,
-  type ChatCompletionResponse,
   type ChatCompletionsPayload,
 } from "~/services/copilot/create-chat-completions"
 
+import type { ChatCompletionResult } from "./types"
+
 export async function handleCompletion(c: Context) {
   await checkRateLimit(state)
 
   let payload = await c.req.json<ChatCompletionsPayload>()
+  if (
+    isNullish(payload.max_tokens)
+    && !isNullish(payload.max_completion_tokens)
+  ) {
+    payload = {
+      ...payload,
+      max_tokens: payload.max_completion_tokens,
+    }
+  }
   consola.debug("Request payload:", JSON.stringify(payload).slice(-400))
 
-  // Find the selected model
+  const resolvedModel = resolveModel(payload.model)
+  payload = {
+    ...payload,
+    model: resolvedModel.resolvedModel,
+  }
   const selectedModel = state.models?.data.find(
-    (model) => model.id === payload.model,
+    (model) => model.id === resolvedModel.resolvedModel,
   )
 
   // Calculate and display token count
@@ -50,19 +66,41 @@ export async function handleCompletion(c: Context) {
   const response = await createChatCompletions(payload)
 
   if (isNonStreaming(response)) {
-    consola.debug("Non-streaming response:", JSON.stringify(response))
-    return c.json(response)
+    consola.debug("Non-streaming response:", JSON.stringify(response.body))
+    return new Response(JSON.stringify(response.body), {
+      status: 200,
+      headers: buildPassthroughHeaders(response.headers, "openai", {
+        includeContentType: true,
+      }),
+    })
   }
 
   consola.debug("Streaming response")
+  const responseHeaders = buildPassthroughHeaders(response.headers, "openai", {
+    streaming: true,
+  })
+  for (const [key, value] of responseHeaders.entries()) {
+    c.header(key, value)
+  }
+
   return streamSSE(c, async (stream) => {
-    for await (const chunk of response) {
+    let sawDone = false
+
+    for await (const chunk of response.stream) {
       consola.debug("Streaming chunk:", JSON.stringify(chunk))
-      await stream.writeSSE(chunk as SSEMessage)
+      if (chunk.data === "[DONE]") {
+        sawDone = true
+      }
+      await stream.writeSSE(chunk)
+    }
+
+    if (!sawDone) {
+      await stream.writeSSE({ data: "[DONE]" })
     }
   })
 }
 
 const isNonStreaming = (
-  response: Awaited<ReturnType<typeof createChatCompletions>>,
-): response is ChatCompletionResponse => Object.hasOwn(response, "choices")
+  response: ChatCompletionResult,
+): response is Extract<ChatCompletionResult, { body: unknown }> =>
+  Object.hasOwn(response, "body")
diff --git a/src/routes/chat-completions/types.ts b/src/routes/chat-completions/types.ts
new file mode 100644
index 000000000..e28d1c134
--- /dev/null
+++ b/src/routes/chat-completions/types.ts
@@ -0,0 +1,17 @@
+import type { SSEMessage } from "hono/streaming"
+
+import type { ChatCompletionResponse } from "~/services/copilot/create-chat-completions"
+
+export interface ChatCompletionJsonResult {
+  headers: Headers
+  body: ChatCompletionResponse
+}
+
+export interface ChatCompletionStreamResult {
+  headers: Headers
+  stream: AsyncIterable<SSEMessage>
+}
+
+export type ChatCompletionResult =
+  | ChatCompletionJsonResult
+  | ChatCompletionStreamResult
diff --git a/src/routes/messages/anthropic-types.ts b/src/routes/messages/anthropic-types.ts
index 881fffcc8..75640711e 100644
--- a/src/routes/messages/anthropic-types.ts
+++ b/src/routes/messages/anthropic-types.ts
@@ -42,7 +42,7 @@ export interface AnthropicImageBlock {
 export interface AnthropicToolResultBlock {
   type: "tool_result"
   tool_use_id: string
-  content: string
+  content: string | Array<AnthropicTextBlock | AnthropicImageBlock>
   is_error?: boolean
 }
 
@@ -56,6 +56,7 @@ export interface AnthropicToolUseBlock {
 export interface AnthropicThinkingBlock {
   type: "thinking"
   thinking: string
+  signature?: string
 }
 
 export type AnthropicUserContentBlock =
@@ -133,7 +134,7 @@ export interface AnthropicContentBlockStartEvent {
     | (Omit<AnthropicToolUseBlock, "input"> & {
         input: Record<string, unknown>
       })
-    | { type: "thinking"; thinking: string }
+    | { type: "thinking"; thinking: string; signature?: string }
 }
 
 export interface AnthropicContentBlockDeltaEvent {
@@ -196,6 +197,7 @@ export interface AnthropicStreamState {
   messageStartSent: boolean
   contentBlockIndex: number
   contentBlockOpen: boolean
+  currentContentBlockType?: "thinking" | "text" | "tool_use"
   toolCalls: {
     [openAIToolIndex: number]: {
       id: string
diff --git a/src/routes/messages/handler.ts b/src/routes/messages/handler.ts
index 85dbf6243..8de6f6c88 100644
--- a/src/routes/messages/handler.ts
+++ b/src/routes/messages/handler.ts
@@ -4,14 +4,17 @@ import consola from "consola"
 import { streamSSE } from "hono/streaming"
 
 import { awaitApproval } from "~/lib/approval"
+import { resolveModel } from "~/lib/models"
 import { checkRateLimit } from "~/lib/rate-limit"
 import { state } from "~/lib/state"
+import { buildPassthroughHeaders } from "~/lib/transport"
 import {
   createChatCompletions,
   type ChatCompletionChunk,
-  type ChatCompletionResponse,
 } from "~/services/copilot/create-chat-completions"
 
+import type { ChatCompletionResult } from "../chat-completions/types"
+
 import {
   type AnthropicMessagesPayload,
   type AnthropicStreamState,
@@ -29,6 +32,8 @@ export async function handleCompletion(c: Context) {
   consola.debug("Anthropic request payload:", JSON.stringify(anthropicPayload))
 
   const openAIPayload = translateToOpenAI(anthropicPayload)
+  const resolvedModel = resolveModel(openAIPayload.model)
+  openAIPayload.model = resolvedModel.resolvedModel
   consola.debug(
     "Translated OpenAI request payload:",
     JSON.stringify(openAIPayload),
@@ -43,26 +48,43 @@ export async function handleCompletion(c: Context) {
   if (isNonStreaming(response)) {
     consola.debug(
       "Non-streaming response from Copilot:",
-      JSON.stringify(response).slice(-400),
+      JSON.stringify(response.body).slice(-400),
     )
-    const anthropicResponse = translateToAnthropic(response)
+    const anthropicResponse = translateToAnthropic(response.body)
     consola.debug(
       "Translated Anthropic response:",
       JSON.stringify(anthropicResponse),
     )
-    return c.json(anthropicResponse)
+    return new Response(JSON.stringify(anthropicResponse), {
+      status: 200,
+      headers: buildPassthroughHeaders(response.headers, "anthropic", {
+        includeContentType: true,
+      }),
+    })
   }
 
   consola.debug("Streaming response from Copilot")
+  const responseHeaders = buildPassthroughHeaders(
+    response.headers,
+    "anthropic",
+    {
+      streaming: true,
+    },
+  )
+  for (const [key, value] of responseHeaders.entries()) {
+    c.header(key, value)
+  }
+
   return streamSSE(c, async (stream) => {
     const streamState: AnthropicStreamState = {
       messageStartSent: false,
       contentBlockIndex: 0,
       contentBlockOpen: false,
+      currentContentBlockType: undefined,
       toolCalls: {},
     }
 
-    for await (const rawEvent of response) {
+    for await (const rawEvent of response.stream) {
       consola.debug("Copilot raw stream event:", JSON.stringify(rawEvent))
       if (rawEvent.data === "[DONE]") {
         break
@@ -87,5 +109,6 @@ export async function handleCompletion(c: Context) {
 }
 
 const isNonStreaming = (
-  response: Awaited<ReturnType<typeof createChatCompletions>>,
-): response is ChatCompletionResponse => Object.hasOwn(response, "choices")
+  response: ChatCompletionResult,
+): response is Extract<ChatCompletionResult, { body: unknown }> =>
+  Object.hasOwn(response, "body")
diff --git a/src/routes/messages/non-stream-translation.ts b/src/routes/messages/non-stream-translation.ts
index dc41e6382..94078659d 100644
--- a/src/routes/messages/non-stream-translation.ts
+++ b/src/routes/messages/non-stream-translation.ts
@@ -2,7 +2,9 @@ import {
   type ChatCompletionResponse,
   type ChatCompletionsPayload,
   type ContentPart,
+  type Delta,
   type Message,
+  type ResponseMessage,
   type TextPart,
   type Tool,
   type ToolCall,
@@ -103,7 +105,7 @@ function handleUserMessage(message: AnthropicUserMessage): Array<Message> {
       newMessages.push({
         role: "tool",
         tool_call_id: block.tool_use_id,
-        content: mapContent(block.content),
+        content: mapToolResultContent(block.content),
       })
     }
 
@@ -143,21 +145,23 @@ function handleAssistantMessage(
     (block): block is AnthropicTextBlock => block.type === "text",
   )
 
-  const thinkingBlocks = message.content.filter(
+  const primaryThinkingBlock = message.content.find(
     (block): block is AnthropicThinkingBlock => block.type === "thinking",
   )
 
-  // Combine text and thinking blocks, as OpenAI doesn't have separate thinking blocks
-  const allTextContent = [
-    ...textBlocks.map((b) => b.text),
-    ...thinkingBlocks.map((b) => b.thinking),
-  ].join("\n\n")
+  const allTextContent = textBlocks.map((b) => b.text).join("\n\n")
 
   return toolUseBlocks.length > 0 ?
       [
         {
           role: "assistant",
           content: allTextContent || null,
+          ...(primaryThinkingBlock === undefined ?
+            {}
+          : {
+              reasoning_text: primaryThinkingBlock.thinking,
+              signature: primaryThinkingBlock.signature,
+            }),
           tool_calls: toolUseBlocks.map((toolUse) => ({
             id: toolUse.id,
             type: "function",
@@ -171,11 +175,27 @@ function handleAssistantMessage(
     : [
         {
           role: "assistant",
-          content: mapContent(message.content),
+          content: allTextContent || null,
+          ...(primaryThinkingBlock === undefined ?
+            {}
+          : {
+              reasoning_text: primaryThinkingBlock.thinking,
+              signature: primaryThinkingBlock.signature,
+            }),
         },
       ]
 }
 
+function mapToolResultContent(
+  content: AnthropicToolResultBlock["content"],
+): string | Array<ContentPart> | null {
+  if (typeof content === "string") {
+    return content
+  }
+
+  return mapContent(content)
+}
+
 function mapContent(
   content:
     | string
@@ -278,38 +298,82 @@ function translateAnthropicToolChoiceToOpenAI(
 
 // Response translation
 
+function getThinkingText(
+  thinking: Delta | ResponseMessage,
+): string | undefined {
+  if (thinking.cot_summary) {
+    return thinking.cot_summary
+  }
+  if (thinking.reasoning_text) {
+    return thinking.reasoning_text
+  }
+  if (thinking.thinking) {
+    return thinking.thinking
+  }
+  return undefined
+}
+
+function getThinkingId(thinking: Delta | ResponseMessage): string | undefined {
+  if (thinking.cot_id) {
+    return thinking.cot_id
+  }
+  if (thinking.reasoning_opaque) {
+    return thinking.reasoning_opaque
+  }
+  if (thinking.signature) {
+    return thinking.signature
+  }
+  return undefined
+}
+
+function getAnthropicThinkingBlocks(
+  message: ResponseMessage,
+): Array<AnthropicThinkingBlock> {
+  const text = getThinkingText(message)
+  const id = getThinkingId(message)
+
+  if (!text && !id) {
+    return []
+  }
+
+  return [
+    {
+      type: "thinking",
+      thinking: text ?? "",
+      ...(id ? { signature: id } : {}),
+    },
+  ]
+}
+
 export function translateToAnthropic(
   response: ChatCompletionResponse,
 ): AnthropicResponse {
-  // Merge content from all choices
   const allTextBlocks: Array<AnthropicTextBlock> = []
+  const allThinkingBlocks: Array<AnthropicThinkingBlock> = []
   const allToolUseBlocks: Array<AnthropicToolUseBlock> = []
   let stopReason: "stop" | "length" | "tool_calls" | "content_filter" | null =
-    null // default
-  stopReason = response.choices[0]?.finish_reason ?? stopReason
+    response.choices[0]?.finish_reason ?? null
 
-  // Process all choices to extract text and tool use blocks
   for (const choice of response.choices) {
     const textBlocks = getAnthropicTextBlocks(choice.message.content)
+    const thinkingBlocks = getAnthropicThinkingBlocks(choice.message)
     const toolUseBlocks = getAnthropicToolUseBlocks(choice.message.tool_calls)
 
     allTextBlocks.push(...textBlocks)
+    allThinkingBlocks.push(...thinkingBlocks)
     allToolUseBlocks.push(...toolUseBlocks)
 
-    // Use the finish_reason from the first choice, or prioritize tool_calls
-    if (choice.finish_reason === "tool_calls" || stopReason === "stop") {
+    if (choice.finish_reason === "tool_calls") {
       stopReason = choice.finish_reason
     }
   }
 
-  // Note: GitHub Copilot doesn't generate thinking blocks, so we don't include them in responses
-
   return {
     id: response.id,
     type: "message",
     role: "assistant",
     model: response.model,
-    content: [...allTextBlocks, ...allToolUseBlocks],
+    content: [...allThinkingBlocks, ...allTextBlocks, ...allToolUseBlocks],
     stop_reason: mapOpenAIStopReasonToAnthropic(stopReason),
     stop_sequence: null,
     usage: {
diff --git a/src/routes/messages/stream-translation.ts b/src/routes/messages/stream-translation.ts
index 55094448f..52a8b7ed2 100644
--- a/src/routes/messages/stream-translation.ts
+++ b/src/routes/messages/stream-translation.ts
@@ -6,14 +6,79 @@ import {
 } from "./anthropic-types"
 import { mapOpenAIStopReasonToAnthropic } from "./utils"
 
-function isToolBlockOpen(state: AnthropicStreamState): boolean {
+function getThinkingText(
+  thinking: ChatCompletionChunk["choices"][number]["delta"],
+): string | undefined {
+  if (thinking.cot_summary) {
+    return thinking.cot_summary
+  }
+  if (thinking.reasoning_text) {
+    return thinking.reasoning_text
+  }
+  if (thinking.thinking) {
+    return thinking.thinking
+  }
+  return undefined
+}
+
+function getThinkingId(
+  thinking: ChatCompletionChunk["choices"][number]["delta"],
+): string | undefined {
+  if (thinking.cot_id) {
+    return thinking.cot_id
+  }
+  if (thinking.reasoning_opaque) {
+    return thinking.reasoning_opaque
+  }
+  if (thinking.signature) {
+    return thinking.signature
+  }
+  return undefined
+}
+
+function closeCurrentContentBlock(
+  events: Array<AnthropicStreamEventData>,
+  state: AnthropicStreamState,
+): void {
   if (!state.contentBlockOpen) {
-    return false
+    return
+  }
+
+  events.push({
+    type: "content_block_stop",
+    index: state.contentBlockIndex,
+  })
+  state.contentBlockIndex++
+  state.contentBlockOpen = false
+  state.currentContentBlockType = undefined
+}
+
+function ensureContentBlock(
+  events: Array<AnthropicStreamEventData>,
+  state: AnthropicStreamState,
+  block:
+    | { type: "text"; text: string }
+    | { type: "thinking"; thinking: string; signature?: string }
+    | {
+        type: "tool_use"
+        id: string
+        name: string
+        input: Record<string, unknown>
+      },
+): void {
+  if (state.contentBlockOpen && state.currentContentBlockType !== block.type) {
+    closeCurrentContentBlock(events, state)
+  }
+
+  if (!state.contentBlockOpen) {
+    events.push({
+      type: "content_block_start",
+      index: state.contentBlockIndex,
+      content_block: block,
+    })
+    state.contentBlockOpen = true
+    state.currentContentBlockType = block.type
   }
-  // Check if the current block index corresponds to any known tool call
-  return Object.values(state.toolCalls).some(
-    (tc) => tc.anthropicBlockIndex === state.contentBlockIndex,
-  )
 }
 
 // eslint-disable-next-line max-lines-per-function, complexity
@@ -29,6 +94,8 @@ export function translateChunkToAnthropicEvents(
 
   const choice = chunk.choices[0]
   const { delta } = choice
+  const thinkingText = getThinkingText(delta)
+  const thinkingId = getThinkingId(delta)
 
   if (!state.messageStartSent) {
     events.push({
@@ -57,28 +124,41 @@ export function translateChunkToAnthropicEvents(
     state.messageStartSent = true
   }
 
-  if (delta.content) {
-    if (isToolBlockOpen(state)) {
-      // A tool block was open, so close it before starting a text block.
+  if (thinkingText || thinkingId) {
+    ensureContentBlock(events, state, {
+      type: "thinking",
+      thinking: thinkingText ?? "",
+      ...(thinkingId ? { signature: thinkingId } : {}),
+    })
+
+    if (thinkingText) {
       events.push({
-        type: "content_block_stop",
+        type: "content_block_delta",
         index: state.contentBlockIndex,
+        delta: {
+          type: "thinking_delta",
+          thinking: thinkingText,
+        },
       })
-      state.contentBlockIndex++
-      state.contentBlockOpen = false
     }
 
-    if (!state.contentBlockOpen) {
+    if (thinkingId) {
       events.push({
-        type: "content_block_start",
+        type: "content_block_delta",
         index: state.contentBlockIndex,
-        content_block: {
-          type: "text",
-          text: "",
+        delta: {
+          type: "signature_delta",
+          signature: thinkingId,
         },
       })
-      state.contentBlockOpen = true
     }
+  }
+
+  if (delta.content) {
+    ensureContentBlock(events, state, {
+      type: "text",
+      text: "",
+    })
 
     events.push({
       type: "content_block_delta",
@@ -93,15 +173,8 @@ export function translateChunkToAnthropicEvents(
   if (delta.tool_calls) {
     for (const toolCall of delta.tool_calls) {
       if (toolCall.id && toolCall.function?.name) {
-        // New tool call starting.
         if (state.contentBlockOpen) {
-          // Close any previously open block.
-          events.push({
-            type: "content_block_stop",
-            index: state.contentBlockIndex,
-          })
-          state.contentBlockIndex++
-          state.contentBlockOpen = false
+          closeCurrentContentBlock(events, state)
         }
 
         const anthropicBlockIndex = state.contentBlockIndex
@@ -111,17 +184,12 @@ export function translateChunkToAnthropicEvents(
           anthropicBlockIndex,
         }
 
-        events.push({
-          type: "content_block_start",
-          index: anthropicBlockIndex,
-          content_block: {
-            type: "tool_use",
-            id: toolCall.id,
-            name: toolCall.function.name,
-            input: {},
-          },
+        ensureContentBlock(events, state, {
+          type: "tool_use",
+          id: toolCall.id,
+          name: toolCall.function.name,
+          input: {},
         })
-        state.contentBlockOpen = true
       }
 
       if (toolCall.function?.arguments) {
@@ -143,13 +211,7 @@ export function translateChunkToAnthropicEvents(
   }
 
   if (choice.finish_reason) {
-    if (state.contentBlockOpen) {
-      events.push({
-        type: "content_block_stop",
-        index: state.contentBlockIndex,
-      })
-      state.contentBlockOpen = false
-    }
+    closeCurrentContentBlock(events, state)
 
     events.push(
       {
diff --git a/src/routes/models/route.ts b/src/routes/models/route.ts
index 5254e2af7..12dd1756a 100644
--- a/src/routes/models/route.ts
+++ b/src/routes/models/route.ts
@@ -1,6 +1,7 @@
 import { Hono } from "hono"
 
 import { forwardError } from "~/lib/error"
+import { getPublicModels } from "~/lib/models"
 import { state } from "~/lib/state"
 import { cacheModels } from "~/lib/utils"
 
@@ -13,19 +14,9 @@ modelRoutes.get("/", async (c) => {
       await cacheModels()
     }
 
-    const models = state.models?.data.map((model) => ({
-      id: model.id,
-      object: "model",
-      type: "model",
-      created: 0, // No date available from source
-      created_at: new Date(0).toISOString(), // No date available from source
-      owned_by: model.vendor,
-      display_name: model.name,
-    }))
-
     return c.json({
       object: "list",
-      data: models,
+      data: getPublicModels(),
       has_more: false,
     })
   } catch (error) {
diff --git a/src/routes/responses/route.ts b/src/routes/responses/route.ts
new file mode 100644
index 000000000..00db65f6b
--- /dev/null
+++ b/src/routes/responses/route.ts
@@ -0,0 +1,29 @@
+import { Hono } from "hono"
+
+import { forwardError } from "~/lib/error"
+import { normalizeResolvedModel } from "~/lib/models"
+import { buildPassthroughHeaders } from "~/lib/transport"
+import {
+  createResponses,
+  type ResponsesPayload,
+} from "~/services/copilot/create-responses"
+
+export const responsesRoutes = new Hono()
+
+responsesRoutes.post("/", async (c) => {
+  try {
+    const payload = normalizeResolvedModel(
+      await c.req.json<ResponsesPayload & { model: string }>(),
+    )
+    const response = await createResponses(payload)
+
+    return new Response(response.body, {
+      status: response.status,
+      headers: buildPassthroughHeaders(response.headers, "openai", {
+        includeContentType: true,
+      }),
+    })
+  } catch (error) {
+    return await forwardError(c, error)
+  }
+})
diff --git a/src/server.ts b/src/server.ts
index 462a278f3..4d76b8187 100644
--- a/src/server.ts
+++ b/src/server.ts
@@ -1,24 +1,27 @@
 import { Hono } from "hono"
 import { cors } from "hono/cors"
-import { logger } from "hono/logger"
 
+import { requireApiKey, safeRequestLogger } from "./lib/api-key"
 import { completionRoutes } from "./routes/chat-completions/route"
 import { embeddingRoutes } from "./routes/embeddings/route"
 import { messageRoutes } from "./routes/messages/route"
 import { modelRoutes } from "./routes/models/route"
+import { responsesRoutes } from "./routes/responses/route"
 import { tokenRoute } from "./routes/token/route"
 import { usageRoute } from "./routes/usage/route"
 
 export const server = new Hono()
 
-server.use(logger())
+server.use(safeRequestLogger)
 server.use(cors())
+server.use("*", requireApiKey)
 
 server.get("/", (c) => c.text("Server running"))
 
 server.route("/chat/completions", completionRoutes)
 server.route("/models", modelRoutes)
 server.route("/embeddings", embeddingRoutes)
+server.route("/responses", responsesRoutes)
 server.route("/usage", usageRoute)
 server.route("/token", tokenRoute)
 
@@ -26,6 +29,7 @@ server.route("/token", tokenRoute)
 server.route("/v1/chat/completions", completionRoutes)
 server.route("/v1/models", modelRoutes)
 server.route("/v1/embeddings", embeddingRoutes)
+server.route("/v1/responses", responsesRoutes)
 
 // Anthropic compatible endpoints
 server.route("/v1/messages", messageRoutes)
diff --git a/src/services/copilot/create-chat-completions.ts b/src/services/copilot/create-chat-completions.ts
index 8534151da..41061f1ad 100644
--- a/src/services/copilot/create-chat-completions.ts
+++ b/src/services/copilot/create-chat-completions.ts
@@ -1,16 +1,29 @@
 import consola from "consola"
 import { events } from "fetch-event-stream"
 
+import type { ChatCompletionResult } from "~/routes/chat-completions/types"
+
 import { copilotHeaders, copilotBaseUrl } from "~/lib/api-config"
 import { HTTPError } from "~/lib/error"
 import { state } from "~/lib/state"
 
 export const createChatCompletions = async (
   payload: ChatCompletionsPayload,
-) => {
+): Promise<ChatCompletionResult> => {
   if (!state.copilotToken) throw new Error("Copilot token not found")
 
-  const enableVision = payload.messages.some(
+  const normalizedPayload: ChatCompletionsPayload = {
+    ...payload,
+    max_tokens: payload.max_tokens ?? payload.max_completion_tokens ?? null,
+    stream_options:
+      payload.stream ? { include_usage: true } : payload.stream_options,
+    messages: payload.messages.map((message) => ({
+      ...message,
+      role: message.role === "developer" ? "system" : message.role,
+    })),
+  }
+
+  const enableVision = normalizedPayload.messages.some(
     (x) =>
       typeof x.content !== "string"
       && x.content?.some((x) => x.type === "image_url"),
@@ -18,7 +31,7 @@ export const createChatCompletions = async (
 
   // Agent/user check for X-Initiator header
   // Determine if any message is from an agent ("assistant" or "tool")
-  const isAgentCall = payload.messages.some((msg) =>
+  const isAgentCall = normalizedPayload.messages.some((msg) =>
     ["assistant", "tool"].includes(msg.role),
   )
 
@@ -31,7 +44,7 @@ export const createChatCompletions = async (
   const response = await fetch(`${copilotBaseUrl(state)}/chat/completions`, {
     method: "POST",
     headers,
-    body: JSON.stringify(payload),
+    body: JSON.stringify(normalizedPayload),
   })
 
   if (!response.ok) {
@@ -39,11 +52,17 @@ export const createChatCompletions = async (
     throw new HTTPError("Failed to create chat completions", response)
   }
 
-  if (payload.stream) {
-    return events(response)
+  if (normalizedPayload.stream) {
+    return {
+      headers: new Headers(response.headers),
+      stream: events(response),
+    }
   }
 
-  return (await response.json()) as ChatCompletionResponse
+  return {
+    headers: new Headers(response.headers),
+    body: (await response.json()) as ChatCompletionResponse,
+  }
 }
 
 // Streaming types
@@ -69,8 +88,14 @@ export interface ChatCompletionChunk {
   }
 }
 
-interface Delta {
+export interface Delta {
   content?: string | null
+  reasoning_opaque?: string
+  reasoning_text?: string
+  cot_id?: string
+  cot_summary?: string
+  thinking?: string
+  signature?: string
   role?: "user" | "assistant" | "system" | "tool"
   tool_calls?: Array<{
     index: number
@@ -109,9 +134,15 @@ export interface ChatCompletionResponse {
   }
 }
 
-interface ResponseMessage {
+export interface ResponseMessage {
   role: "assistant"
   content: string | null
+  reasoning_opaque?: string
+  reasoning_text?: string
+  cot_id?: string
+  cot_summary?: string
+  thinking?: string
+  signature?: string
   tool_calls?: Array<ToolCall>
 }
 
@@ -130,9 +161,13 @@ export interface ChatCompletionsPayload {
   temperature?: number | null
   top_p?: number | null
   max_tokens?: number | null
+  max_completion_tokens?: number | null
   stop?: string | Array<string> | null
   n?: number | null
   stream?: boolean | null
+  stream_options?: {
+    include_usage?: boolean
+  } | null
 
   frequency_penalty?: number | null
   presence_penalty?: number | null
@@ -162,6 +197,12 @@ export interface Tool {
 export interface Message {
   role: "user" | "assistant" | "system" | "tool" | "developer"
   content: string | Array<ContentPart> | null
+  reasoning_opaque?: string
+  reasoning_text?: string
+  cot_id?: string
+  cot_summary?: string
+  thinking?: string
+  signature?: string
 
   name?: string
   tool_calls?: Array<ToolCall>
diff --git a/src/services/copilot/create-responses.ts b/src/services/copilot/create-responses.ts
new file mode 100644
index 000000000..1884ef9e2
--- /dev/null
+++ b/src/services/copilot/create-responses.ts
@@ -0,0 +1,110 @@
+import { copilotHeaders, copilotBaseUrl } from "~/lib/api-config"
+import { HTTPError } from "~/lib/error"
+import { state } from "~/lib/state"
+
+type ResponsesInputMessageRole =
+  | "user"
+  | "assistant"
+  | "system"
+  | "developer"
+  | "tool"
+
+interface ResponsesInputMessage {
+  role?: ResponsesInputMessageRole
+  [key: string]: unknown
+}
+
+interface ResponsesReasoning {
+  effort?: "low" | "medium" | "high" | "minimal" | null
+  summary?: "auto" | "concise" | "detailed" | null
+  [key: string]: unknown
+}
+
+export interface ResponsesPayload {
+  stream?: boolean | null
+  input?: unknown
+  max_tokens?: number | null
+  max_output_tokens?: number | null
+  truncation?: "auto" | "disabled" | null
+  include?: Array<string> | null
+  store?: boolean | null
+  reasoning?: ResponsesReasoning | null
+  previous_response_id?: string | null
+  [key: string]: unknown
+}
+
+const DEFAULT_RESPONSES_INCLUDE = "reasoning.encrypted_content"
+
+function isInputMessage(value: unknown): value is ResponsesInputMessage {
+  return typeof value === "object" && value !== null
+}
+
+function normalizeInput(input: unknown): unknown {
+  if (!Array.isArray(input)) {
+    return input
+  }
+
+  return input.map((item): unknown => {
+    if (!isInputMessage(item) || item.role !== "developer") {
+      return item
+    }
+
+    return {
+      ...item,
+      role: "system",
+    }
+  })
+}
+
+function getInitiator(input: unknown): "agent" | "user" {
+  if (!Array.isArray(input)) {
+    return "user"
+  }
+
+  return (
+      input.some(
+        (item) =>
+          isInputMessage(item)
+          && ["assistant", "tool"].includes(item.role ?? ""),
+      )
+    ) ?
+      "agent"
+    : "user"
+}
+
+function normalizeResponsesPayload(
+  payload: ResponsesPayload,
+): ResponsesPayload {
+  const include = new Set(payload.include ?? [])
+  include.add(DEFAULT_RESPONSES_INCLUDE)
+
+  return {
+    ...payload,
+    input: normalizeInput(payload.input),
+    max_output_tokens:
+      payload.max_output_tokens ?? payload.max_tokens ?? undefined,
+    store: payload.store ?? false,
+    truncation: payload.truncation ?? "disabled",
+    include: [...include],
+  }
+}
+
+export const createResponses = async (payload: ResponsesPayload) => {
+  if (!state.copilotToken) throw new Error("Copilot token not found")
+
+  const normalizedPayload = normalizeResponsesPayload(payload)
+  const headers: Record<string, string> = {
+    ...copilotHeaders(state),
+    "X-Initiator": getInitiator(normalizedPayload.input),
+  }
+
+  const response = await fetch(`${copilotBaseUrl(state)}/responses`, {
+    method: "POST",
+    headers,
+    body: JSON.stringify(normalizedPayload),
+  })
+
+  if (!response.ok) throw new HTTPError("Failed to create responses", response)
+
+  return response
+}
diff --git a/src/services/copilot/get-models.ts b/src/services/copilot/get-models.ts
index 3cfa30af0..9b7b6da9a 100644
--- a/src/services/copilot/get-models.ts
+++ b/src/services/copilot/get-models.ts
@@ -28,6 +28,7 @@ interface ModelSupports {
   tool_calls?: boolean
   parallel_tool_calls?: boolean
   dimensions?: boolean
+  vision?: boolean
 }
 
 interface ModelCapabilities {
diff --git a/src/start.ts b/src/start.ts
index 14abbbdff..d03b0df9b 100644
--- a/src/start.ts
+++ b/src/start.ts
@@ -18,6 +18,7 @@ interface RunServerOptions {
   port: number
   verbose: boolean
   accountType: string
+  apiKey?: string
   manual: boolean
   rateLimit?: number
   rateLimitWait: boolean
@@ -38,6 +39,8 @@ export async function runServer(options: RunServerOptions): Promise<void> {
   }
 
   state.accountType = options.accountType
+  state.apiKey =
+    options.apiKey ?? process.env.API_KEY ?? process.env.COPILOT_API_KEY ?? ""
   if (options.accountType !== "individual") {
     consola.info(`Using ${options.accountType} plan GitHub account`)
   }
@@ -144,6 +147,12 @@ export const start = defineCommand({
       default: "individual",
       description: "Account type to use (individual, business, enterprise)",
     },
+    "api-key": {
+      alias: "k",
+      type: "string",
+      description:
+        "API key required by downstream clients. Falls back to API_KEY or COPILOT_API_KEY env vars",
+    },
     manual: {
       type: "boolean",
       default: false,
@@ -195,6 +204,7 @@ export const start = defineCommand({
       port: Number.parseInt(args.port, 10),
       verbose: args.verbose,
       accountType: args["account-type"],
+      apiKey: args["api-key"],
       manual: args.manual,
       rateLimit,
       rateLimitWait: args.wait,
diff --git a/tests/anthropic-request.test.ts b/tests/anthropic-request.test.ts
index 06c663778..20648d4f5 100644
--- a/tests/anthropic-request.test.ts
+++ b/tests/anthropic-request.test.ts
@@ -150,7 +150,8 @@ describe("Anthropic to OpenAI translation logic", () => {
     const assistantMessage = openAIPayload.messages.find(
       (m) => m.role === "assistant",
     )
-    expect(assistantMessage?.content).toContain(
+    expect(assistantMessage?.content).toBe("2+2 equals 4.")
+    expect(assistantMessage?.reasoning_text).toBe(
       "Let me think about this simple math problem...",
     )
     expect(assistantMessage?.content).toContain("2+2 equals 4.")
@@ -188,7 +189,7 @@ describe("Anthropic to OpenAI translation logic", () => {
     const assistantMessage = openAIPayload.messages.find(
       (m) => m.role === "assistant",
     )
-    expect(assistantMessage?.content).toContain(
+    expect(assistantMessage?.reasoning_text).toContain(
       "I need to call the weather API",
     )
     expect(assistantMessage?.content).toContain(
@@ -197,6 +198,55 @@ describe("Anthropic to OpenAI translation logic", () => {
     expect(assistantMessage?.tool_calls).toHaveLength(1)
     expect(assistantMessage?.tool_calls?.[0].function.name).toBe("get_weather")
   })
+
+  test("should translate tool_result content blocks into tool messages", () => {
+    const anthropicPayload: AnthropicMessagesPayload = {
+      model: "claude-3-5-sonnet-20241022",
+      messages: [
+        { role: "user", content: "Read this image" },
+        {
+          role: "user",
+          content: [
+            {
+              type: "tool_result",
+              tool_use_id: "tool_1",
+              content: [
+                { type: "text", text: "found data" },
+                {
+                  type: "image",
+                  source: {
+                    type: "base64",
+                    media_type: "image/png",
+                    data: "abc123",
+                  },
+                },
+              ],
+            },
+          ],
+        },
+      ],
+      max_tokens: 100,
+    }
+
+    const openAIPayload = translateToOpenAI(anthropicPayload)
+    const toolMessage = openAIPayload.messages.find(
+      (message) => message.role === "tool",
+    )
+
+    expect(toolMessage).toEqual({
+      role: "tool",
+      tool_call_id: "tool_1",
+      content: [
+        { type: "text", text: "found data" },
+        {
+          type: "image_url",
+          image_url: {
+            url: "data:image/png;base64,abc123",
+          },
+        },
+      ],
+    })
+  })
 })
 
 describe("OpenAI Chat Completion v1 Request Payload Validation with Zod", () => {
diff --git a/tests/anthropic-response.test.ts b/tests/anthropic-response.test.ts
index ecd71aacc..9fd6bc10f 100644
--- a/tests/anthropic-response.test.ts
+++ b/tests/anthropic-response.test.ts
@@ -20,6 +20,12 @@ const anthropicContentBlockTextSchema = z.object({
   text: z.string(),
 })
 
+const anthropicContentBlockThinkingSchema = z.object({
+  type: z.literal("thinking"),
+  thinking: z.string(),
+  signature: z.string().optional(),
+})
+
 const anthropicContentBlockToolUseSchema = z.object({
   type: z.literal("tool_use"),
   id: z.string(),
@@ -34,6 +40,7 @@ const anthropicMessageResponseSchema = z.object({
   content: z.array(
     z.union([
       anthropicContentBlockTextSchema,
+      anthropicContentBlockThinkingSchema,
       anthropicContentBlockToolUseSchema,
     ]),
   ),
@@ -160,6 +167,45 @@ describe("OpenAI to Anthropic Non-Streaming Response Translation", () => {
     }
   })
 
+  test("should translate Copilot reasoning fields into thinking blocks", () => {
+    const openAIResponse: ChatCompletionResponse = {
+      id: "chatcmpl-reasoning",
+      object: "chat.completion",
+      created: 1677652288,
+      model: "gpt-4o-2024-05-13",
+      choices: [
+        {
+          index: 0,
+          message: {
+            role: "assistant",
+            content: "Answer text",
+            reasoning_opaque: "copilot-thinking-123",
+            reasoning_text: "copilot reasoning process",
+          },
+          finish_reason: "stop",
+          logprobs: null,
+        },
+      ],
+      usage: {
+        prompt_tokens: 10,
+        completion_tokens: 20,
+        total_tokens: 30,
+      },
+    }
+
+    const anthropicResponse = translateToAnthropic(openAIResponse)
+
+    expect(anthropicResponse.content[0]).toEqual({
+      type: "thinking",
+      thinking: "copilot reasoning process",
+      signature: "copilot-thinking-123",
+    })
+    expect(anthropicResponse.content[1]).toEqual({
+      type: "text",
+      text: "Answer text",
+    })
+  })
+
   test("should translate a response stopped due to length", () => {
     const openAIResponse: ChatCompletionResponse = {
       id: "chatcmpl-789",
@@ -191,6 +237,161 @@ describe("OpenAI to Anthropic Non-Streaming Response Translation", () => {
   })
 })
 
+describe("OpenAI to Anthropic Streaming Response Translation", () => {
+  test("should translate stream reasoning fields into thinking deltas", () => {
+    const openAIStream: Array<ChatCompletionChunk> = [
+      {
+        id: "cmpl-thinking",
+        object: "chat.completion.chunk",
+        created: 1677652288,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          {
+            index: 0,
+            delta: { reasoning_text: "Analy", reasoning_opaque: "cot-1" },
+            finish_reason: null,
+            logprobs: null,
+          },
+        ],
+      },
+      {
+        id: "cmpl-thinking",
+        object: "chat.completion.chunk",
+        created: 1677652288,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          {
+            index: 0,
+            delta: { reasoning_text: "zing" },
+            finish_reason: null,
+            logprobs: null,
+          },
+        ],
+      },
+      {
+        id: "cmpl-thinking",
+        object: "chat.completion.chunk",
+        created: 1677652288,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          { index: 0, delta: {}, finish_reason: "stop", logprobs: null },
+        ],
+      },
+    ]
+
+    const streamState: AnthropicStreamState = {
+      messageStartSent: false,
+      contentBlockIndex: 0,
+      contentBlockOpen: false,
+      toolCalls: {},
+    }
+    const translatedStream = openAIStream.flatMap((chunk) =>
+      translateChunkToAnthropicEvents(chunk, streamState),
+    )
+
+    expect(translatedStream).toContainEqual({
+      type: "content_block_start",
+      index: 0,
+      content_block: {
+        type: "thinking",
+        thinking: "Analy",
+        signature: "cot-1",
+      },
+    })
+    expect(translatedStream).toContainEqual({
+      type: "content_block_delta",
+      index: 0,
+      delta: {
+        type: "thinking_delta",
+        thinking: "Analy",
+      },
+    })
+    expect(translatedStream).toContainEqual({
+      type: "content_block_delta",
+      index: 0,
+      delta: {
+        type: "signature_delta",
+        signature: "cot-1",
+      },
+    })
+    expect(translatedStream).toContainEqual({
+      type: "content_block_delta",
+      index: 0,
+      delta: {
+        type: "thinking_delta",
+        thinking: "zing",
+      },
+    })
+  })
+
+  test("should close thinking blocks before starting text blocks", () => {
+    const openAIStream: Array<ChatCompletionChunk> = [
+      {
+        id: "cmpl-mixed",
+        object: "chat.completion.chunk",
+        created: 1677652288,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          {
+            index: 0,
+            delta: { reasoning_text: "Plan" },
+            finish_reason: null,
+            logprobs: null,
+          },
+        ],
+      },
+      {
+        id: "cmpl-mixed",
+        object: "chat.completion.chunk",
+        created: 1677652288,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          {
+            index: 0,
+            delta: { content: "Answer" },
+            finish_reason: null,
+            logprobs: null,
+          },
+        ],
+      },
+      {
+        id: "cmpl-mixed",
+        object: "chat.completion.chunk",
+        created: 1677652288,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          { index: 0, delta: {}, finish_reason: "stop", logprobs: null },
+        ],
+      },
+    ]
+
+    const streamState: AnthropicStreamState = {
+      messageStartSent: false,
+      contentBlockIndex: 0,
+      contentBlockOpen: false,
+      currentContentBlockType: undefined,
+      toolCalls: {},
+    }
+
+    const translatedStream = openAIStream.flatMap((chunk) =>
+      translateChunkToAnthropicEvents(chunk, streamState),
+    )
+
+    const thinkingStopIndex = translatedStream.findIndex(
+      (event) => event.type === "content_block_stop" && event.index === 0,
+    )
+    const textStartIndex = translatedStream.findIndex(
+      (event) =>
+        event.type === "content_block_start"
+        && event.index === 1
+        && event.content_block.type === "text",
+    )
+
+    expect(thinkingStopIndex).toBeGreaterThan(-1)
+    expect(textStartIndex).toBeGreaterThan(thinkingStopIndex)
+  })
+})
+
 describe("OpenAI to Anthropic Streaming Response Translation", () => {
   test("should translate a simple text stream correctly", () => {
     const openAIStream: Array<ChatCompletionChunk> = [
@@ -251,6 +452,7 @@ describe("OpenAI to Anthropic Streaming Response Translation", () => {
       messageStartSent: false,
       contentBlockIndex: 0,
       contentBlockOpen: false,
+      currentContentBlockType: undefined,
       toolCalls: {},
     }
     const translatedStream = openAIStream.flatMap((chunk) =>
@@ -351,6 +553,7 @@ describe("OpenAI to Anthropic Streaming Response Translation", () => {
       messageStartSent: false,
       contentBlockIndex: 0,
       contentBlockOpen: false,
+      currentContentBlockType: undefined,
       toolCalls: {},
     }
     const translatedStream = openAIStream.flatMap((chunk) =>
diff --git a/tests/api-key.test.ts b/tests/api-key.test.ts
new file mode 100644
index 000000000..82fdf721c
--- /dev/null
+++ b/tests/api-key.test.ts
@@ -0,0 +1,99 @@
+import { afterEach, beforeEach, expect, test } from "bun:test"
+
+import { state } from "../src/lib/state"
+import { server } from "../src/server"
+
+const originalApiKey = state.apiKey
+
+beforeEach(() => {
+  state.authFailures.clear()
+  state.apiKey = "test-key"
+})
+
+afterEach(() => {
+  state.authFailures.clear()
+  state.apiKey = originalApiKey
+})
+
+test("rejects request without presenting an API key", async () => {
+  const response = await server.request("http://localhost/v1/models")
+
+  expect(response.status).toBe(401)
+  expect(await response.json()).toEqual({
+    error: {
+      message: "Invalid API key",
+      type: "authentication_error",
+    },
+  })
+})
+
+test("accepts bearer token authentication", async () => {
+  const response = await server.request("http://localhost/", {
+    headers: {
+      Authorization: "Bearer test-key",
+    },
+  })
+
+  expect(response.status).toBe(200)
+  expect(await response.text()).toBe("Server running")
+})
+
+test("accepts x-api-key authentication", async () => {
+  const response = await server.request("http://localhost/", {
+    headers: {
+      "x-api-key": "test-key",
+    },
+  })
+
+  expect(response.status).toBe(200)
+  expect(await response.text()).toBe("Server running")
+})
+
+test("allows requests when API key protection is disabled", async () => {
+  state.apiKey = ""
+
+  const response = await server.request("http://localhost/")
+
+  expect(response.status).toBe(200)
+  expect(await response.text()).toBe("Server running")
+})
+
+test("uses proxy headers for auth failure rate limiting", async () => {
+  for (let index = 0; index < 10; index += 1) {
+    const response = await server.request("http://localhost/v1/models", {
+      headers: {
+        "x-forwarded-for": "203.0.113.10, 127.0.0.1",
+      },
+    })
+
+    expect(response.status).toBe(401)
+  }
+
+  const blockedResponse = await server.request("http://localhost/v1/models", {
+    headers: {
+      "x-forwarded-for": "203.0.113.10, 127.0.0.1",
+    },
+  })
+
+  expect(blockedResponse.status).toBe(429)
+})
+
+test("successful authentication clears previous auth failures", async () => {
+  const failedResponse = await server.request("http://localhost/v1/models", {
+    headers: {
+      "x-real-ip": "198.51.100.1",
+    },
+  })
+
+  expect(failedResponse.status).toBe(401)
+
+  const successResponse = await server.request("http://localhost/", {
+    headers: {
+      Authorization: "Bearer test-key",
+      "x-real-ip": "198.51.100.1",
+    },
+  })
+
+  expect(successResponse.status).toBe(200)
+  expect(state.authFailures.has("198.51.100.1")).toBe(false)
+})
diff --git a/tests/chat-completions-handler.test.ts b/tests/chat-completions-handler.test.ts
new file mode 100644
index 000000000..be404ec71
--- /dev/null
+++ b/tests/chat-completions-handler.test.ts
@@ -0,0 +1,232 @@
+import { afterEach, beforeEach, expect, test } from "bun:test"
+
+import { state } from "../src/lib/state"
+import { server } from "../src/server"
+
+const testModels = {
+  object: "list",
+  data: [
+    {
+      id: "gpt-4o-2024-05-13",
+      object: "model",
+      name: "GPT-4o",
+      version: "2024-05-13",
+      vendor: "openai",
+      preview: false,
+      model_picker_enabled: true,
+      capabilities: {
+        object: "capabilities",
+        type: "chat",
+        family: "gpt-4o",
+        tokenizer: "o200k_base",
+        limits: {
+          max_context_window_tokens: 128000,
+          max_output_tokens: 4096,
+        },
+        supports: {
+          tool_calls: true,
+          parallel_tool_calls: true,
+          vision: true,
+        },
+      },
+    },
+  ],
+} as const
+
+const originalApiKey = state.apiKey
+const originalModels = state.models
+const originalFetch = globalThis.fetch
+
+beforeEach(() => {
+  state.apiKey = "test-key"
+  state.copilotToken = "test-token"
+  state.vsCodeVersion = "1.0.0"
+  state.accountType = "individual"
+  state.models = structuredClone(testModels)
+})
+
+afterEach(() => {
+  state.apiKey = originalApiKey
+  state.models = originalModels
+  globalThis.fetch = originalFetch
+})
+
+test("accepts max_completion_tokens on chat completions requests", async () => {
+  let forwardedBody: Record<string, unknown> | undefined
+
+  globalThis.fetch = ((_url: string | URL | Request, init?: RequestInit) => {
+    const requestBody = typeof init?.body === "string" ? init.body : "{}"
+    forwardedBody = JSON.parse(requestBody) as Record<string, unknown>
+
+    return new Response(
+      JSON.stringify({
+        id: "chatcmpl-test",
+        object: "chat.completion",
+        created: 0,
+        model: "gpt-test",
+        choices: [],
+      }),
+      {
+        status: 200,
+        headers: {
+          "content-type": "application/json",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request(
+    "http://localhost/v1/chat/completions",
+    {
+      method: "POST",
+      headers: {
+        Authorization: "Bearer test-key",
+        "content-type": "application/json",
+      },
+      body: JSON.stringify({
+        model: "gpt-4o",
+        max_completion_tokens: 77,
+        messages: [{ role: "user", content: "hello" }],
+      }),
+    },
+  )
+
+  expect(response.status).toBe(200)
+  expect(forwardedBody?.max_tokens).toBe(77)
+})
+
+test("rejects unknown chat completion models before forwarding", async () => {
+  let forwardedBody: Record<string, unknown> | undefined
+
+  globalThis.fetch = ((_url: string | URL | Request, init?: RequestInit) => {
+    const requestBody = typeof init?.body === "string" ? init.body : "{}"
+    forwardedBody = JSON.parse(requestBody) as Record<string, unknown>
+
+    return new Response(
+      JSON.stringify({
+        error: {
+          message: "The model `gpt-missing` does not exist",
+          type: "invalid_request_error",
+          param: "model",
+          code: "model_not_found",
+        },
+      }),
+      {
+        status: 404,
+        headers: {
+          "content-type": "application/json",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request(
+    "http://localhost/v1/chat/completions",
+    {
+      method: "POST",
+      headers: {
+        Authorization: "Bearer test-key",
+        "content-type": "application/json",
+      },
+      body: JSON.stringify({
+        model: "gpt-missing",
+        messages: [{ role: "user", content: "hello" }],
+      }),
+    },
+  )
+
+  expect(forwardedBody?.model).toBe("gpt-missing")
+  expect(response.status).toBe(404)
+  expect(await response.json()).toEqual({
+    error: {
+      message: "The model `gpt-missing` does not exist",
+      type: "invalid_request_error",
+      param: "model",
+      code: "model_not_found",
+    },
+  })
+})
+
+test("propagates request-id aliases for successful chat completions", async () => {
+  globalThis.fetch = ((_url: string | URL | Request, _init?: RequestInit) => {
+    return new Response(
+      JSON.stringify({
+        id: "chatcmpl-test",
+        object: "chat.completion",
+        created: 0,
+        model: "gpt-test",
+        choices: [],
+      }),
+      {
+        status: 200,
+        headers: {
+          "content-type": "application/json",
+          "x-request-id": "req_chat_success_123",
+          "openai-processing-ms": "42",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request(
+    "http://localhost/v1/chat/completions",
+    {
+      method: "POST",
+      headers: {
+        Authorization: "Bearer test-key",
+        "content-type": "application/json",
+      },
+      body: JSON.stringify({
+        model: "gpt-4o",
+        messages: [{ role: "user", content: "hello" }],
+      }),
+    },
+  )
+
+  expect(response.status).toBe(200)
+  expect(response.headers.get("x-request-id")).toBe("req_chat_success_123")
+  expect(response.headers.get("request-id")).toBe("req_chat_success_123")
+  expect(response.headers.get("openai-processing-ms")).toBe("42")
+})
+
+test("terminates streaming chat completions with a single done sentinel", async () => {
+  globalThis.fetch = ((_url: string | URL | Request, _init?: RequestInit) => {
+    return new Response(
+      [
+        'data: {"id":"chatcmpl-stream","object":"chat.completion.chunk","created":0,"model":"gpt-test","choices":[{"index":0,"delta":{"content":"hello"},"finish_reason":null,"logprobs":null}]}\n\n',
+        'data: {"id":"chatcmpl-stream","object":"chat.completion.chunk","created":0,"model":"gpt-test","choices":[{"index":0,"delta":{},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":1,"completion_tokens":1,"total_tokens":2}}\n\n',
+      ].join(""),
+      {
+        status: 200,
+        headers: {
+          "content-type": "text/event-stream",
+          "x-request-id": "req_chat_stream_123",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request(
+    "http://localhost/v1/chat/completions",
+    {
+      method: "POST",
+      headers: {
+        Authorization: "Bearer test-key",
+        "content-type": "application/json",
+      },
+      body: JSON.stringify({
+        model: "gpt-4o",
+        stream: true,
+        messages: [{ role: "user", content: "hello" }],
+      }),
+    },
+  )
+
+  const body = await response.text()
+
+  expect(response.status).toBe(200)
+  expect(response.headers.get("content-type")).toContain("text/event-stream")
+  expect(response.headers.get("x-request-id")).toBe("req_chat_stream_123")
+  expect(response.headers.get("request-id")).toBe("req_chat_stream_123")
+  expect(body.match(/data: \[DONE\]/g)?.length).toBe(1)
+})
diff --git a/tests/chat-completions-route.test.ts b/tests/chat-completions-route.test.ts
new file mode 100644
index 000000000..9aa4f5ede
--- /dev/null
+++ b/tests/chat-completions-route.test.ts
@@ -0,0 +1,102 @@
+import { afterEach, beforeEach, expect, test } from "bun:test"
+
+import type { ModelsResponse } from "../src/services/copilot/get-models"
+
+import { state } from "../src/lib/state"
+import { server } from "../src/server"
+
+const testModels: ModelsResponse = {
+  object: "list",
+  data: [
+    {
+      id: "gpt-4o-2024-05-13",
+      object: "model",
+      name: "GPT-4o",
+      version: "2024-05-13",
+      vendor: "openai",
+      preview: false,
+      model_picker_enabled: true,
+      capabilities: {
+        object: "capabilities",
+        type: "chat",
+        family: "gpt-4o",
+        tokenizer: "o200k_base",
+        limits: {
+          max_context_window_tokens: 128000,
+          max_output_tokens: 4096,
+        },
+        supports: {
+          tool_calls: true,
+          parallel_tool_calls: true,
+          vision: true,
+        },
+      },
+    },
+  ],
+}
+
+const originalApiKey = state.apiKey
+const originalModels = state.models
+const originalFetch = globalThis.fetch
+
+beforeEach(() => {
+  state.apiKey = "test-key"
+  state.copilotToken = "test-token"
+  state.vsCodeVersion = "1.0.0"
+  state.accountType = "individual"
+  state.models = structuredClone(testModels)
+})
+
+afterEach(() => {
+  state.apiKey = originalApiKey
+  state.models = originalModels
+  globalThis.fetch = originalFetch
+})
+
+test("returns upstream-style error envelopes for locally rejected chat completion models", async () => {
+  globalThis.fetch = (() => {
+    return new Response(
+      JSON.stringify({
+        error: {
+          message: "The model `gpt-missing` does not exist",
+          type: "invalid_request_error",
+          param: "model",
+          code: "model_not_found",
+        },
+      }),
+      {
+        status: 404,
+        headers: {
+          "content-type": "application/json",
+          "x-request-id": "req_test_123",
+        },
+      },
+    )
+  }) as unknown as typeof fetch
+
+  const response = await server.request(
+    "http://localhost/v1/chat/completions",
+    {
+      method: "POST",
+      headers: {
+        Authorization: "Bearer test-key",
+        "content-type": "application/json",
+      },
+      body: JSON.stringify({
+        model: "gpt-missing",
+        messages: [{ role: "user", content: "hello" }],
+      }),
+    },
+  )
+
+  expect(response.status).toBe(404)
+  expect(response.headers.get("x-request-id")).toBe("req_test_123")
+  expect(await response.json()).toEqual({
+    error: {
+      message: "The model `gpt-missing` does not exist",
+      type: "invalid_request_error",
+      param: "model",
+      code: "model_not_found",
+    },
+  })
+})
diff --git a/tests/create-chat-completions.test.ts b/tests/create-chat-completions.test.ts
index d18e741aa..6f44e630d 100644
--- a/tests/create-chat-completions.test.ts
+++ b/tests/create-chat-completions.test.ts
@@ -23,7 +23,17 @@ const fetchMock = mock(
 // @ts-expect-error - Mock fetch doesn't implement all fetch properties
 ;(globalThis as unknown as { fetch: typeof fetch }).fetch = fetchMock
 
+function getLastForwardedRequest() {
+  const lastCall = fetchMock.mock.calls.at(-1)
+  if (!lastCall) {
+    throw new Error("Expected fetch to be called")
+  }
+
+  return lastCall[1] as { headers: Record<string, string>; body: string }
+}
+
 test("sets X-Initiator to agent if tool/assistant present", async () => {
+  fetchMock.mockClear()
   const payload: ChatCompletionsPayload = {
     messages: [
       { role: "user", content: "hi" },
@@ -33,13 +43,12 @@ test("sets X-Initiator to agent if tool/assistant present", async () => {
   }
   await createChatCompletions(payload)
   expect(fetchMock).toHaveBeenCalled()
-  const headers = (
-    fetchMock.mock.calls[0][1] as { headers: Record<string, string> }
-  ).headers
+  const headers = getLastForwardedRequest().headers
   expect(headers["X-Initiator"]).toBe("agent")
 })
 
 test("sets X-Initiator to user if only user present", async () => {
+  fetchMock.mockClear()
   const payload: ChatCompletionsPayload = {
     messages: [
       { role: "user", content: "hi" },
@@ -49,8 +58,84 @@ test("sets X-Initiator to user if only user present", async () => {
   }
   await createChatCompletions(payload)
   expect(fetchMock).toHaveBeenCalled()
-  const headers = (
-    fetchMock.mock.calls[1][1] as { headers: Record<string, string> }
-  ).headers
+  const headers = getLastForwardedRequest().headers
   expect(headers["X-Initiator"]).toBe("user")
 })
+
+test("maps max_completion_tokens to max_tokens before forwarding", async () => {
+  fetchMock.mockClear()
+  const payload: ChatCompletionsPayload = {
+    messages: [{ role: "user", content: "hi" }],
+    model: "gpt-test",
+    max_completion_tokens: 321,
+  }
+
+  await createChatCompletions(payload)
+
+  const forwardedBody = JSON.parse(
+    getLastForwardedRequest().body,
+  ) as ChatCompletionsPayload
+
+  expect(forwardedBody.max_tokens).toBe(321)
+})
+
+test("normalizes developer role to system before forwarding", async () => {
+  fetchMock.mockClear()
+  const payload: ChatCompletionsPayload = {
+    messages: [{ role: "developer", content: "Behave like a shell." }],
+    model: "gpt-test",
+  }
+
+  await createChatCompletions(payload)
+
+  const forwardedBody = JSON.parse(
+    getLastForwardedRequest().body,
+  ) as ChatCompletionsPayload
+
+  expect(forwardedBody.messages[0]?.role).toBe("system")
+})
+
+test("adds stream_options.include_usage for streaming requests", async () => {
+  fetchMock.mockClear()
+  const payload: ChatCompletionsPayload = {
+    messages: [{ role: "user", content: "stream please" }],
+    model: "gpt-test",
+    stream: true,
+  }
+
+  await createChatCompletions(payload)
+
+  const forwardedBody = JSON.parse(
+    getLastForwardedRequest().body,
+  ) as ChatCompletionsPayload
+
+  expect(forwardedBody.stream_options).toEqual({ include_usage: true })
+})
+
+test("returns upstream headers alongside non-streaming responses", async () => {
+  fetchMock.mockClear()
+  fetchMock.mockImplementationOnce(
+    (_url: string, opts: { headers: Record<string, string> }) => {
+      return {
+        ok: true,
+        json: () => ({ id: "123", object: "chat.completion", choices: [] }),
+        headers: new Headers({
+          ...opts.headers,
+          "x-request-id": "req_upstream_123",
+        }),
+      }
+    },
+  )
+
+  const response = await createChatCompletions({
+    messages: [{ role: "user", content: "hi" }],
+    model: "gpt-test",
+  })
+
+  if (!("body" in response)) {
+    throw new TypeError("Expected non-streaming chat completion result")
+  }
+
+  expect(response.headers.get("x-request-id")).toBe("req_upstream_123")
+  expect(response.body.id).toBe("123")
+})
diff --git a/tests/create-responses.test.ts b/tests/create-responses.test.ts
new file mode 100644
index 000000000..2a795680e
--- /dev/null
+++ b/tests/create-responses.test.ts
@@ -0,0 +1,171 @@
+import { afterEach, beforeEach, expect, mock, test } from "bun:test"
+
+import { state } from "../src/lib/state"
+import {
+  createResponses,
+  type ResponsesPayload,
+} from "../src/services/copilot/create-responses"
+
+const originalFetch = globalThis.fetch
+
+beforeEach(() => {
+  state.copilotToken = "test-token"
+  state.vsCodeVersion = "1.0.0"
+  state.accountType = "individual"
+})
+
+afterEach(() => {
+  globalThis.fetch = originalFetch
+})
+
+function getRequestInit(callIndex: number = 0): RequestInit {
+  return (fetchMock.mock.calls[callIndex]?.[1] as RequestInit | undefined) ?? {}
+}
+
+function parseRequestBody(callIndex: number = 0): ResponsesPayload {
+  const body = getRequestInit(callIndex).body
+
+  if (typeof body !== "string") {
+    throw new TypeError("Expected request body to be a JSON string")
+  }
+
+  return JSON.parse(body) as ResponsesPayload
+}
+
+let fetchMock: ReturnType<typeof mock>
+
+test("posts payload to copilot responses endpoint", async () => {
+  const responseBody = JSON.stringify({ id: "resp_123", object: "response" })
+  fetchMock = mock((_url: string, _opts?: RequestInit) =>
+    Promise.resolve(
+      new Response(responseBody, {
+        status: 200,
+        headers: { "content-type": "application/json" },
+      }),
+    ),
+  )
+  globalThis.fetch = fetchMock as unknown as typeof fetch
+
+  const payload: ResponsesPayload = {
+    model: "gpt-4.1",
+    input: "hello",
+  }
+
+  const response = await createResponses(payload)
+
+  expect(fetchMock).toHaveBeenCalledTimes(1)
+  expect(fetchMock.mock.calls[0]?.[0]).toBe(
+    "https://api.githubcopilot.com/responses",
+  )
+  expect(fetchMock.mock.calls[0]?.[1]).toMatchObject({
+    method: "POST",
+  })
+  expect(await response.json()).toEqual({ id: "resp_123", object: "response" })
+})
+
+test("normalizes responses payload toward copilot upstream semantics", async () => {
+  fetchMock = mock((_url: string, _opts?: RequestInit) =>
+    Promise.resolve(
+      new Response(JSON.stringify({ id: "resp_456", object: "response" }), {
+        status: 200,
+        headers: { "content-type": "application/json" },
+      }),
+    ),
+  )
+  globalThis.fetch = fetchMock as unknown as typeof fetch
+
+  await createResponses({
+    model: "gpt-4.1",
+    max_tokens: 321,
+    input: [
+      {
+        type: "message",
+        role: "developer",
+        content: [{ type: "input_text", text: "system rule" }],
+      },
+      {
+        type: "message",
+        role: "assistant",
+        content: [{ type: "output_text", text: "prior answer" }],
+      },
+    ],
+  })
+
+  const requestInit = getRequestInit()
+  const headers = new Headers(requestInit.headers)
+  const requestBody = parseRequestBody()
+
+  expect(headers.get("X-Initiator")).toBe("agent")
+  expect(requestBody.max_output_tokens).toBe(321)
+  expect(requestBody.store).toBe(false)
+  expect(requestBody.truncation).toBe("disabled")
+  expect(requestBody.include).toContain("reasoning.encrypted_content")
+  expect(requestBody.input).toEqual([
+    {
+      type: "message",
+      role: "system",
+      content: [{ type: "input_text", text: "system rule" }],
+    },
+    {
+      type: "message",
+      role: "assistant",
+      content: [{ type: "output_text", text: "prior answer" }],
+    },
+  ])
+})
+
+test("preserves caller supplied responses-specific fields", async () => {
+  fetchMock = mock((_url: string, _opts?: RequestInit) =>
+    Promise.resolve(
+      new Response(JSON.stringify({ id: "resp_789", object: "response" }), {
+        status: 200,
+        headers: { "content-type": "application/json" },
+      }),
+    ),
+  )
+  globalThis.fetch = fetchMock as unknown as typeof fetch
+
+  await createResponses({
+    model: "gpt-4.1",
+    previous_response_id: "resp_prev",
+    max_output_tokens: 111,
+    truncation: "auto",
+    include: ["file_search_call.results"],
+    reasoning: { effort: "high", summary: "detailed" },
+    store: true,
+    input: "hello",
+  })
+
+  const requestBody = parseRequestBody()
+
+  expect(requestBody.previous_response_id).toBe("resp_prev")
+  expect(requestBody.max_output_tokens).toBe(111)
+  expect(requestBody.truncation).toBe("auto")
+  expect(requestBody.store).toBe(true)
+  expect(requestBody.reasoning).toEqual({ effort: "high", summary: "detailed" })
+  expect(requestBody.include).toEqual([
+    "file_search_call.results",
+    "reasoning.encrypted_content",
+  ])
+})
+
+test("preserves streaming response metadata", async () => {
+  fetchMock = mock((_url: string, _opts?: RequestInit) =>
+    Promise.resolve(
+      new Response("data: hello\n\n", {
+        status: 200,
+        headers: { "content-type": "text/event-stream" },
+      }),
+    ),
+  )
+  globalThis.fetch = fetchMock as unknown as typeof fetch
+
+  const response = await createResponses({
+    model: "gpt-4.1",
+    input: "hello",
+    stream: true,
+  })
+
+  expect(response.headers.get("content-type")).toBe("text/event-stream")
+  expect(await response.text()).toBe("data: hello\n\n")
+})
diff --git a/tests/messages-route.test.ts b/tests/messages-route.test.ts
new file mode 100644
index 000000000..5eff1d7da
--- /dev/null
+++ b/tests/messages-route.test.ts
@@ -0,0 +1,159 @@
+import { afterEach, beforeEach, expect, test } from "bun:test"
+
+import { state } from "../src/lib/state"
+import { server } from "../src/server"
+
+const testModels = {
+  object: "list",
+  data: [
+    {
+      id: "gpt-4o-2024-05-13",
+      object: "model",
+      name: "GPT-4o",
+      version: "2024-05-13",
+      vendor: "openai",
+      preview: false,
+      model_picker_enabled: true,
+      capabilities: {
+        object: "capabilities",
+        type: "chat",
+        family: "gpt-4o",
+        tokenizer: "o200k_base",
+        limits: {
+          max_context_window_tokens: 128000,
+          max_output_tokens: 4096,
+        },
+        supports: {
+          tool_calls: true,
+          parallel_tool_calls: true,
+          vision: true,
+        },
+      },
+    },
+  ],
+} as const
+
+const originalApiKey = state.apiKey
+const originalModels = state.models
+const originalFetch = globalThis.fetch
+
+beforeEach(() => {
+  state.apiKey = "test-key"
+  state.copilotToken = "test-token"
+  state.vsCodeVersion = "1.0.0"
+  state.accountType = "individual"
+  state.models = structuredClone(testModels)
+})
+
+afterEach(() => {
+  state.apiKey = originalApiKey
+  state.models = originalModels
+  globalThis.fetch = originalFetch
+})
+
+test("propagates anthropic request-id headers from upstream success responses", async () => {
+  globalThis.fetch = ((_url: string | URL | Request, _init?: RequestInit) => {
+    return new Response(
+      JSON.stringify({
+        id: "chatcmpl-anthropic-success",
+        object: "chat.completion",
+        created: 0,
+        model: "gpt-4o-2024-05-13",
+        choices: [
+          {
+            index: 0,
+            message: {
+              role: "assistant",
+              content: "Hello from upstream",
+            },
+            finish_reason: "stop",
+            logprobs: null,
+          },
+        ],
+        usage: {
+          prompt_tokens: 3,
+          completion_tokens: 4,
+          total_tokens: 7,
+        },
+      }),
+      {
+        status: 200,
+        headers: {
+          "content-type": "application/json",
+          "x-request-id": "req_anthropic_success_123",
+          "anthropic-processing-ms": "24",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request("http://localhost/v1/messages", {
+    method: "POST",
+    headers: {
+      Authorization: "Bearer test-key",
+      "content-type": "application/json",
+      "anthropic-version": "2023-06-01",
+    },
+    body: JSON.stringify({
+      model: "gpt-4o",
+      max_tokens: 32,
+      messages: [{ role: "user", content: "hello" }],
+    }),
+  })
+
+  expect(response.status).toBe(200)
+  expect(response.headers.get("request-id")).toBe("req_anthropic_success_123")
+  expect(response.headers.get("anthropic-request-id")).toBe(
+    "req_anthropic_success_123",
+  )
+  expect(response.headers.get("anthropic-processing-ms")).toBe("24")
+})
+
+test("converts upstream OpenAI error envelopes into Anthropic error envelopes", async () => {
+  globalThis.fetch = ((_url: string | URL | Request, _init?: RequestInit) => {
+    return new Response(
+      JSON.stringify({
+        error: {
+          message: "The model `gpt-missing` does not exist",
+          type: "invalid_request_error",
+          param: "model",
+          code: "model_not_found",
+        },
+      }),
+      {
+        status: 404,
+        headers: {
+          "content-type": "application/json",
+          "x-request-id": "req_anthropic_error_123",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request("http://localhost/v1/messages", {
+    method: "POST",
+    headers: {
+      Authorization: "Bearer test-key",
+      "content-type": "application/json",
+      "anthropic-version": "2023-06-01",
+    },
+    body: JSON.stringify({
+      model: "gpt-missing",
+      max_tokens: 32,
+      messages: [{ role: "user", content: "hello" }],
+    }),
+  })
+
+  expect(response.status).toBe(404)
+  expect(response.headers.get("request-id")).toBe("req_anthropic_error_123")
+  expect(response.headers.get("anthropic-request-id")).toBe(
+    "req_anthropic_error_123",
+  )
+  expect(await response.json()).toEqual({
+    type: "error",
+    error: {
+      type: "invalid_request_error",
+      message: "The model `gpt-missing` does not exist",
+    },
+  })
+})
diff --git a/tests/models-route.test.ts b/tests/models-route.test.ts
new file mode 100644
index 000000000..9c8eed731
--- /dev/null
+++ b/tests/models-route.test.ts
@@ -0,0 +1,128 @@
+import { afterEach, beforeEach, expect, test } from "bun:test"
+
+import { state } from "../src/lib/state"
+import { server } from "../src/server"
+
+const testModels = {
+  object: "list",
+  data: [
+    {
+      id: "gpt-4o-2024-05-13",
+      object: "model",
+      name: "GPT-4o",
+      version: "2024-05-13",
+      vendor: "openai",
+      preview: false,
+      model_picker_enabled: true,
+      capabilities: {
+        object: "capabilities",
+        type: "chat",
+        family: "gpt-4o",
+        tokenizer: "o200k_base",
+        limits: {
+          max_context_window_tokens: 128000,
+          max_output_tokens: 4096,
+        },
+        supports: {
+          tool_calls: true,
+          parallel_tool_calls: true,
+          vision: true,
+        },
+      },
+    },
+    {
+      id: "claude-sonnet-4.5-20250929",
+      object: "model",
+      name: "Claude Sonnet 4.5",
+      version: "2025-09-29",
+      vendor: "anthropic",
+      preview: false,
+      model_picker_enabled: true,
+      capabilities: {
+        object: "capabilities",
+        type: "chat",
+        family: "claude-sonnet-4.5",
+        tokenizer: "claude",
+        limits: {
+          max_context_window_tokens: 200000,
+          max_output_tokens: 8192,
+        },
+        supports: {
+          tool_calls: true,
+          parallel_tool_calls: true,
+        },
+      },
+    },
+  ],
+} as const
+
+const originalApiKey = state.apiKey
+const originalModels = state.models
+
+beforeEach(() => {
+  state.apiKey = "test-key"
+  state.models = structuredClone(testModels)
+})
+
+afterEach(() => {
+  state.apiKey = originalApiKey
+  state.models = originalModels
+})
+
+test("lists canonical models with enriched capabilities metadata", async () => {
+  const response = await server.request("http://localhost/v1/models", {
+    headers: {
+      Authorization: "Bearer test-key",
+    },
+  })
+
+  expect(response.status).toBe(200)
+
+  const json = (await response.json()) as {
+    object: string
+    data: Array<Record<string, unknown>>
+    has_more: boolean
+  }
+
+  expect(json.object).toBe("list")
+  expect(json.has_more).toBe(false)
+
+  const gptModel = json.data.find((model) => model.id === "gpt-4o-2024-05-13")
+  expect(gptModel).toMatchObject({
+    id: "gpt-4o-2024-05-13",
+    root: "gpt-4o-2024-05-13",
+    parent: null,
+    canonical_model_id: "gpt-4o-2024-05-13",
+  })
+  expect(gptModel?.capabilities).toMatchObject({
+    family: "gpt-4o",
+    supports: {
+      streaming: true,
+      tool_calls: true,
+      parallel_tool_calls: true,
+      vision: true,
+    },
+  })
+
+  const claudeModel = json.data.find(
+    (model) => model.id === "claude-sonnet-4.5-20250929",
+  )
+  expect(claudeModel).toMatchObject({
+    id: "claude-sonnet-4.5-20250929",
+    root: "claude-sonnet-4.5-20250929",
+    parent: null,
+    canonical_model_id: "claude-sonnet-4.5-20250929",
+  })
+  expect(claudeModel?.capabilities).toMatchObject({
+    family: "claude-sonnet-4.5",
+    supports: {
+      streaming: true,
+      tool_calls: true,
+      parallel_tool_calls: true,
+      vision: false,
+      reasoning: true,
+    },
+  })
+
+  expect(json.data).toHaveLength(2)
+})
diff --git a/tests/responses-route.test.ts b/tests/responses-route.test.ts
new file mode 100644
index 000000000..985345d52
--- /dev/null
+++ b/tests/responses-route.test.ts
@@ -0,0 +1,157 @@
+import { afterEach, beforeEach, expect, test } from "bun:test"
+
+import { state } from "../src/lib/state"
+import { server } from "../src/server"
+
+const testModels = {
+  object: "list",
+  data: [
+    {
+      id: "gpt-4.1-2025-04-14",
+      object: "model",
+      name: "GPT-4.1",
+      version: "2025-04-14",
+      vendor: "openai",
+      preview: false,
+      model_picker_enabled: true,
+      capabilities: {
+        object: "capabilities",
+        type: "chat",
+        family: "gpt-4.1",
+        tokenizer: "o200k_base",
+        limits: {
+          max_context_window_tokens: 128000,
+          max_output_tokens: 32768,
+        },
+        supports: {
+          tool_calls: true,
+          parallel_tool_calls: true,
+          vision: true,
+        },
+      },
+    },
+  ],
+} as const
+
+const originalApiKey = state.apiKey
+const originalModels = state.models
+const originalFetch = globalThis.fetch
+
+beforeEach(() => {
+  state.apiKey = "test-key"
+  state.copilotToken = "test-token"
+  state.vsCodeVersion = "1.0.0"
+  state.accountType = "individual"
+  state.models = structuredClone(testModels)
+})
+
+afterEach(() => {
+  state.apiKey = originalApiKey
+  state.models = originalModels
+  globalThis.fetch = originalFetch
+})
+
+test("preserves upstream OpenAI-style error envelopes for responses", async () => {
+  globalThis.fetch = (() => {
+    return new Response(
+      JSON.stringify({
+        error: {
+          message: "The previous_response_id provided is invalid.",
+          type: "invalid_request_error",
+          param: "previous_response_id",
+          code: "invalid_previous_response_id",
+        },
+      }),
+      {
+        status: 400,
+        headers: {
+          "content-type": "application/json",
+          "x-request-id": "req_responses_123",
+        },
+      },
+    )
+  }) as typeof fetch
+
+  const response = await server.request("http://localhost/v1/responses", {
+    method: "POST",
+    headers: {
+      Authorization: "Bearer test-key",
+      "content-type": "application/json",
+    },
+    body: JSON.stringify({
+      model: "gpt-4.1",
+      previous_response_id: "resp_missing",
+      input: "hello",
+    }),
+  })
+
+  expect(response.status).toBe(400)
+  expect(response.headers.get("x-request-id")).toBe("req_responses_123")
+  expect(await response.json()).toEqual({
+    error: {
+      message: "The previous_response_id provided is invalid.",
+      type: "invalid_request_error",
+      param: "previous_response_id",
+      code: "invalid_previous_response_id",
+    },
+  })
+})
+
+test("preserves original response model names when forwarding", async () => {
+  let forwardedBody: Record<string, unknown> | undefined
+
+  globalThis.fetch = ((_url: string | URL | Request, init?: RequestInit) => {
+    const requestBody = typeof init?.body === "string" ? init.body : "{}"
+    forwardedBody = JSON.parse(requestBody) as Record<string, unknown>
+
+    return new Response(JSON.stringify({ id: "resp_123" }), {
+      status: 200,
+      headers: {
+        "content-type": "application/json",
+      },
+    })
+  }) as typeof fetch
+
+  const response = await server.request("http://localhost/v1/responses", {
+    method: "POST",
+    headers: {
+      Authorization: "Bearer test-key",
+      "content-type": "application/json",
+    },
+    body: JSON.stringify({
+      model: "gpt-4.1",
+      input: "hello",
+    }),
+  })
+
+  expect(response.status).toBe(200)
+  expect(forwardedBody?.model).toBe("gpt-4.1")
+})
+
+test("mirrors request-id aliases for forwarded responses", async () => {
+  globalThis.fetch = ((_url: string | URL | Request, _init?: RequestInit) => {
+    return new Response(JSON.stringify({ id: "resp_123" }), {
+      status: 200,
+      headers: {
+        "content-type": "application/json",
+        "x-request-id": "req_responses_success_123",
+      },
+    })
+  }) as typeof fetch
+
+  const response = await server.request("http://localhost/v1/responses", {
+    method: "POST",
+    headers: {
+      Authorization: "Bearer test-key",
+      "content-type": "application/json",
+    },
+    body: JSON.stringify({
+      model: "gpt-4.1",
+      input: "hello",
+    }),
+  })
+
+  expect(response.status).toBe(200)
+  expect(response.headers.get("x-request-id")).toBe("req_responses_success_123")
+  expect(response.headers.get("request-id")).toBe("req_responses_success_123")
+})