Update docs surrounding OAI compliant endpoints. Add section on newly supported responses endpoints.

inf3rnus · inf3rnus · commit 62a911159852 · 2026-01-09T19:36:43.000-05:00
diff --git a/docs/docs.json b/docs/docs.json
@@ -52,7 +52,8 @@
                 "icon": "code",
                 "pages": [
                   "http-reference/examples/openai-compliant/chatCompletionsExample",
-                  "http-reference/examples/openai-compliant/completionsExample"
+                  "http-reference/examples/openai-compliant/completionsExample",
+                  "http-reference/examples/openai-compliant/responsesExample"
                 ]
               },
               {
@@ -165,7 +166,8 @@
                 "group": "Open AI Completions",
                 "pages": [
                   "http-reference/examples/openai-compliant/chatCompletionsExample",
-                  "http-reference/examples/openai-compliant/completionsExample"
+                  "http-reference/examples/openai-compliant/completionsExample",
+                  "http-reference/examples/openai-compliant/responsesExample"
                 ]
               },
               {
diff --git a/docs/http-reference/examples/openai-compliant/chatCompletionsExample.mdx b/docs/http-reference/examples/openai-compliant/chatCompletionsExample.mdx
@@ -11,11 +11,7 @@ To specify a provider, prefix the model with the provider, e.g. `gpt-4` should b
 
 We provide access to models from `openai`, `mistral`, and `google`.
 
-You will need to supply a header `provider-key` in order to make requests to `anthropic`, and `cohere` models.
-
-e.g. If you are trying to run `anthropic/claude-sonnet-4-5`, `provider-key` will be an Anthropic key.
-
-For unlimited rate limits you will need to supply a header `provider-key`.
+You will need to supply a header `provider-key` in order to make requests to `cohere` models.
 
 **NOTE:** Logprobs are supported for all models!
 
diff --git a/docs/http-reference/examples/openai-compliant/completionsExample.mdx b/docs/http-reference/examples/openai-compliant/completionsExample.mdx
@@ -11,13 +11,7 @@ To specify a provider, prefix the model with the provider, e.g. `davinci-002` sh
 
 We provide access to models from `openai`, `mistral`, and `google`.
 
-You will need to supply a header `provider-key` in order to make requests to `anthropic`, and `cohere` models.
-
-For unlimited rate limits you will need to supply a header `provider-key`.
-
-e.g. If you are trying to run `openai/davinci-002` with unlimited rate limits, `provider-key` will be an Open AI key.
-
-If it were an `anthropic` model it would be an Anthropic key.
+You will need to supply a header `provider-key` in order to make requests to `cohere` models.
 
 **NOTE:** Logprobs are supported for all models!
 
diff --git a/docs/http-reference/examples/openai-compliant/responsesExample.mdx b/docs/http-reference/examples/openai-compliant/responsesExample.mdx
@@ -0,0 +1,208 @@
+---
+title: 'Responses'
+description: Use the OpenAI-compatible Responses endpoint via OpenAI clients, supporting streaming, tool calling, and reasoning ("thinking") parameters.
+icon: 'robot'
+mode: 'wide'
+---
+
+Provides Responses for all closed source providers: `openai`, and `anthropic`.
+
+The Responses API is the unified successor to Chat Completions: you send `input` (text, images, files, tool outputs, etc.) and receive a `response` object that can contain messages, tool calls, and (for reasoning models) reasoning items.
+
+**Note**, anthropic does not yet support tool calls.
+
+To specify a provider, prefix the model with the provider. For example, `gpt-5.1` should be passed as `openai/gpt-5.1`.
+
+## Thinking (Reasoning) parameters
+
+Some OpenAI reasoning models (e.g. `openai/gpt-5.x`, `openai/o3`, `openai/o4-mini`) support the `reasoning` object:
+
+- `reasoning.effort`: `"none" | "low" | "medium" | "high" | ...` (model-dependent)
+- `reasoning.summary`: `"none" | "auto" | "detailed"` (optional)
+
+All Anthropic models should support thinking.
+
+⚠️ `max_output_tokens` limits **reasoning tokens + visible output tokens**, so if you increase `reasoning.effort`, consider raising `max_output_tokens`.
+
+<AccordionGroup>
+  <Accordion defaultOpen="false" title="Basic usage (Closed Source + Thinking)">
+
+      <CodeGroup>
+        ```javascript javascript
+        import OpenAI from "openai";
+
+        const client = new OpenAI({
+          apiKey: "BYTEZ_KEY",
+          baseURL: "https://api.bytez.com/models/v2/openai/v1"
+        });
+
+        const response = await client.responses.create({
+          model: "openai/gpt-5.1",
+          input: [
+            { role: "system", content: "You are a friendly chatbot" },
+            { role: "user", content: "Hello bot, what is the capital of England?" }
+          ],
+
+          // Thinking controls (OpenAI reasoning models only)
+          reasoning: {
+            effort: "medium",   // try: "none" | "low" | "medium" | "high" (model-dependent)
+            summary: "auto"     // "none" | "auto" | "detailed"
+          },
+
+          // Caps reasoning + visible output tokens together
+          max_output_tokens: 300
+        });
+
+        console.log("Answer:", response.output_text);
+
+        // Optional: read a reasoning summary item (if requested & returned)
+        const reasoningItem = response.output?.find((it) => it.type === "reasoning");
+        if (reasoningItem?.summary?.length) {
+          console.log("Reasoning summary:", reasoningItem.summary.map(s => s.text).join("\n"));
+        }
+        ```
+        ```python python
+        from openai import OpenAI
+
+        client = OpenAI(
+            api_key="BYTEZ_KEY",
+            base_url="https://api.bytez.com/models/v2/openai/v1"
+        )
+
+        response = client.responses.create(
+            model="openai/gpt-5.1",
+            input=[
+                {"role": "system", "content": "You are a friendly chatbot"},
+                {"role": "user", "content": "Hello bot, what is the capital of England?"},
+            ],
+            reasoning={
+                "effort": "medium",
+                "summary": "auto",
+            },
+            max_output_tokens=300,
+
+        )
+
+        print("Answer:", response.output_text)
+
+        reasoning_items = [it for it in response.output if it.type == "reasoning"]
+        if reasoning_items and getattr(reasoning_items[0], "summary", None):
+            print("Reasoning summary:", reasoning_items[0].summary[0].text)
+        ```
+        ```bash http
+        curl -X POST 'https://api.bytez.com/models/v2/openai/v1/responses' \
+        -H 'Authorization: BYTEZ_KEY' \
+        -H 'provider-key: PROVIDER_KEY' \
+        -H 'Content-Type: application/json' \
+        --data '{
+          "model": "openai/gpt-5.1",
+          "input": [
+            {"role": "system", "content": "You are a friendly chatbot"},
+            {"role": "user", "content": "Hello bot, what is the capital of England?"}
+          ],
+          "reasoning": { "effort": "medium", "summary": "auto" },
+          "max_output_tokens": 300
+        }'
+        ```
+      </CodeGroup>
+
+  </Accordion>
+
+  <Accordion defaultOpen="false" title="Streaming (Closed Source + Thinking + Reasoning Summary Events)">
+
+    <CodeGroup>
+      ```javascript javascript
+      import OpenAI from "openai";
+
+      const client = new OpenAI({
+        apiKey: "BYTEZ_KEY",
+        baseURL: "https://api.bytez.com/models/v2/openai/v1"
+      });
+
+      const stream = await client.responses.create({
+        model: "openai/gpt-5.1",
+        input: [
+          { role: "system", content: "You are a friendly chatbot" },
+          { role: "user", content: "Hello bot, what is the capital of England?" }
+        ],
+        reasoning: { effort: "medium", summary: "auto" },
+        max_output_tokens: 400,
+        stream: true
+      });
+
+      let text = "";
+      let reasoningSummary = "";
+
+      for await (const event of stream) {
+        if (event.type === "response.output_text.delta") {
+          text += event.delta;
+          process.stdout.write(event.delta);
+        }
+
+        // Optional: stream reasoning summary text (if enabled by reasoning.summary)
+        if (event.type === "response.reasoning_summary_text.delta") {
+          reasoningSummary += event.delta;
+        }
+
+        if (event.type === "response.completed") break;
+      }
+
+      console.log("\n\nFinal:", { text });
+      if (reasoningSummary) console.log("\nReasoning summary:\n", reasoningSummary);
+      ```
+      ```python python
+      from openai import OpenAI
+
+      client = OpenAI(
+          api_key="BYTEZ_KEY",
+          base_url="https://api.bytez.com/models/v2/openai/v1"
+      )
+
+      stream = client.responses.create(
+          model="openai/gpt-5.1",
+          input=[
+              {"role": "system", "content": "You are a friendly chatbot"},
+              {"role": "user", "content": "Hello bot, what is the capital of England?"},
+          ],
+          reasoning={"effort": "medium", "summary": "auto"},
+          max_output_tokens=400,
+          stream=True,
+      )
+
+      text = ""
+      reasoning_summary = ""
+
+      for event in stream:
+          if event.type == "response.output_text.delta":
+              text += event.delta
+              print(event.delta, end="", flush=True)
+          elif event.type == "response.reasoning_summary_text.delta":
+              reasoning_summary += event.delta
+          elif event.type == "response.completed":
+              break
+
+      print("\n\nFinal:", {"text": text})
+      if reasoning_summary:
+          print("\nReasoning summary:\n", reasoning_summary)
+      ```
+      ```bash http
+      curl -N -X POST 'https://api.bytez.com/models/v2/openai/v1/responses' \
+      -H 'Authorization: BYTEZ_KEY' \
+      -H 'provider-key: PROVIDER_KEY' \
+      -H 'Content-Type: application/json' \
+      --data '{
+        "model": "openai/gpt-5.1",
+        "input": [
+          {"role": "system", "content": "You are a friendly chatbot"},
+          {"role": "user", "content": "Hello bot, what is the capital of England?"}
+        ],
+        "reasoning": { "effort": "medium", "summary": "auto" },
+        "max_output_tokens": 400,
+        "stream": true
+      }'
+      ```
+    </CodeGroup>
+
+  </Accordion>
+
+</AccordionGroup>

Original file line number	Diff line number	Diff line change
`@@ -52,7 +52,8 @@`
`52`	`52`	`"icon": "code",`
`53`	`53`	`"pages": [`
`54`	`54`	`"http-reference/examples/openai-compliant/chatCompletionsExample",`
`55`		`- "http-reference/examples/openai-compliant/completionsExample"`
	`55`	`+ "http-reference/examples/openai-compliant/completionsExample",`
	`56`	`+ "http-reference/examples/openai-compliant/responsesExample"`
`56`	`57`	`]`
`57`	`58`	`},`
`58`	`59`	`{`
`@@ -165,7 +166,8 @@`
`165`	`166`	`"group": "Open AI Completions",`
`166`	`167`	`"pages": [`
`167`	`168`	`"http-reference/examples/openai-compliant/chatCompletionsExample",`
`168`		`- "http-reference/examples/openai-compliant/completionsExample"`
	`169`	`+ "http-reference/examples/openai-compliant/completionsExample",`
	`170`	`+ "http-reference/examples/openai-compliant/responsesExample"`
`169`	`171`	`]`
`170`	`172`	`},`
`171`	`173`	`{`