feat(sdk/python): add Responses API client (#670)

MaanavD · Copilot · apsonawane · web-flow · commit 852172256448 · 2026-05-06T04:13:38.000Z
## Summary Implements the OpenAI Responses API client for the Foundry Local Python SDK. HTTP-only, sync pattern matching the existing `chat_client.py` style. ## New files | File | Description | |------|-------------| | `src/openai/responses_types.py` | Full type system: content parts, response items, tools, config, `ResponseObject` (with `output_text` property), all streaming event dataclasses, `parse_streaming_event` factory, `_to_dict` serializer | | `src/openai/responses_client.py` | HTTP client: `ResponsesClient`, `ResponsesClientSettings`, `ResponsesAPIError`. Methods: `create`, `create_streaming` (SSE generator), `get`, `delete`, `cancel`, `get_input_items`, `list` | | `examples/responses.py` | 5 end-to-end scenarios: basic create, streaming, multi-turn, tool calling, vision | | `test/openai/test_responses_client.py` | 56 unit tests with mocked HTTP | | `test/openai/test_responses_integration.py` | 14 integration tests (gated on `FOUNDRY_INTEGRATION_TESTS=1`) | ## Modified files - `foundry_local_manager.py` — `create_responses_client(model_id)` factory - `imodel.py` / `detail/model.py` / `detail/model_variant.py` — factory wired through the model hierarchy - `src/__init__.py` / `src/openai/__init__.py` — all new public types exported ## Test results - **Unit tests**: 56/56 passing (no server needed) - **Integration tests**: 14/14 passing against live `qwen2.5-0.5b` server ## Related Closes #505 (the earlier C# Responses API PR predates this but covers a different SDK) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Akshay Sonawane <asonawane@microsoft.com>
diff --git a/samples/README.md b/samples/README.md
@@ -10,5 +10,5 @@ Explore complete working examples that demonstrate how to use Foundry Local —
 |----------|---------|-------------|
 | [**C#**](cs/) | 13 | .NET SDK samples including native chat, embeddings, audio transcription, tool calling, model management, web server, and tutorials. Uses WinML on Windows for hardware acceleration. |
 | [**JavaScript**](js/) | 13 | Node.js SDK samples including native chat, embeddings, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, and tutorials. |
-| [**Python**](python/) | 10 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, and tutorials. |
+| [**Python**](python/) | 11 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, Responses API, and tutorials. |
 | [**Rust**](rust/) | 9 | Rust SDK samples including native chat, embeddings, audio transcription, tool calling, web server, and tutorials. |
diff --git a/samples/python/README.md b/samples/python/README.md
@@ -14,6 +14,7 @@ These samples demonstrate how to use Foundry Local with Python.
 | [embeddings](embeddings/) | Generate single and batch text embeddings using the Foundry Local SDK. |
 | [audio-transcription](audio-transcription/) | Transcribe audio files using the Whisper model. |
 | [web-server](web-server/) | Start a local OpenAI-compatible web server and call it with the OpenAI Python SDK. |
+| [web-server-responses](web-server-responses/) | Call a running local OpenAI-compatible web server with the Responses API, including streaming and tool calling. |
 | [tool-calling](tool-calling/) | Tool calling with custom function definitions (get_weather, calculate). |
 | [langchain-integration](langchain-integration/) | LangChain integration for building translation and text generation chains. |
 | [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). |
diff --git a/samples/python/web-server-responses-vision/README.md b/samples/python/web-server-responses-vision/README.md
@@ -0,0 +1,45 @@
+# Foundry Local Python Vision Sample (Responses API)
+
+This sample demonstrates vision (image understanding) capabilities using the Foundry Local web service and the OpenAI Responses API.
+
+It demonstrates:
+
+- Streaming a vision response via the Responses API
+- Uses a default test image (`src/test_image.jpg`) if no image path is provided
+
+## What gets installed
+
+Install the sample dependencies from `requirements.txt`:
+
+```bash
+pip install -r requirements.txt
+```
+
+That installs:
+
+- `foundry-local-sdk`
+- `openai`
+- `Pillow` (for image resizing)
+
+The sample downloads the specified model the first time it runs (skips if already cached).
+
+## Run the sample
+
+From this directory:
+
+```bash
+python -m venv .venv
+.\.venv\Scripts\activate
+pip install -r requirements.txt
+python src\app.py qwen3.5-0.8b
+```
+
+You can also pass a custom image path as the second argument.
+
+On macOS or Linux, activate the virtual environment with:
+
+```bash
+source .venv/bin/activate
+```
+
+The sample starts the local web service, sends vision requests via the Responses API to `http://localhost:<port>/v1`, prints the model output, and then stops the web service.
diff --git a/samples/python/web-server-responses-vision/requirements.txt b/samples/python/web-server-responses-vision/requirements.txt
@@ -0,0 +1,3 @@
+foundry-local-sdk
+openai
+Pillow
diff --git a/samples/python/web-server-responses-vision/src/app.py b/samples/python/web-server-responses-vision/src/app.py
@@ -0,0 +1,107 @@
+# <complete_code>
+# <imports>
+import base64
+import io
+import sys
+
+from PIL import Image
+from openai import OpenAI
+
+from foundry_local_sdk import Configuration, FoundryLocalManager
+# </imports>
+import os
+
+if len(sys.argv) < 2:
+    print("Usage: python src/app.py <model_alias> [image_path]")
+    print("  Example: python src/app.py qwen3.5-0.8b")
+    sys.exit(1)
+
+model_alias = sys.argv[1]
+default_image = os.path.join(os.path.dirname(__file__), "test_image.jpg")
+image_path = sys.argv[2] if len(sys.argv) > 2 else default_image
+
+def resize_and_encode(path, max_dim=512):
+    """Load and resize a local image, returning (base64_str, media_type)."""
+    img = Image.open(path)
+    if max(img.size) > max_dim:
+        img.thumbnail((max_dim, max_dim))
+        print(f"  (resized to {img.size[0]}x{img.size[1]})")
+    buf = io.BytesIO()
+    img.save(buf, format="JPEG")
+    return base64.b64encode(buf.getvalue()).decode(), "image/jpeg"
+
+
+# <init>
+config = Configuration(app_name="foundry_local_samples")
+FoundryLocalManager.initialize(config)
+manager = FoundryLocalManager.instance
+# </init>
+
+# <model_setup>
+model = manager.catalog.get_model(model_alias)
+if model is None:
+    available = [m.alias for m in manager.catalog.list_models()]
+    print(f"\nModel '{model_alias}' not found in catalog.")
+    print(f"Available models: {available}")
+    sys.exit(1)
+
+if not model.is_cached:
+    print(f"\nDownloading model {model_alias}...")
+    model.download(
+        lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True)
+    )
+    print("\nModel downloaded")
+
+print("\nLoading model...")
+model.load()
+print("Model loaded")
+# </model_setup>
+
+# <server_setup>
+print("\nStarting web service...")
+manager.start_web_service()
+base_url = manager.urls[0].rstrip("/") + "/v1"
+print("Web service started")
+
+# <<<<<< OPENAI SDK USAGE >>>>>>
+# Use the OpenAI SDK to call the local Foundry web service Responses API
+openai = OpenAI(base_url=base_url, api_key="notneeded")
+# </server_setup>
+
+# <inference>
+print(f"\nPreparing image: {image_path}")
+image_b64, media_type = resize_and_encode(image_path)
+
+vision_input = [
+    {
+        "type": "message",
+        "role": "user",
+        "content": [
+            {"type": "input_text", "text": "Describe this image."},
+            {
+                "type": "input_image",
+                "image_data": image_b64,
+                "media_type": media_type,
+            },
+        ],
+    }
+]
+
+print("\nStreaming vision response...")
+stream = openai.responses.create(
+    model=model.id,
+    input="placeholder",
+    extra_body={"input": vision_input},
+    stream=True,
+)
+
+print("[ASSISTANT]: ", end="", flush=True)
+for event in stream:
+    if getattr(event, "type", None) == "response.output_text.delta":
+        print(getattr(event, "delta", ""), end="", flush=True)
+print()
+# </inference>
+
+openai.close()
+manager.stop_web_service()
+model.unload()
diff --git a/samples/python/web-server-responses-vision/src/test_image.jpg b/samples/python/web-server-responses-vision/src/test_image.jpg
diff --git a/samples/python/web-server-responses/README.md b/samples/python/web-server-responses/README.md
@@ -0,0 +1,44 @@
+# Foundry Local Python Responses Web-Service Sample
+
+This sample starts the Foundry Local OpenAI-compatible web service, then calls the Responses API with the official OpenAI Python client.
+
+It demonstrates:
+
+- A non-streaming `/v1/responses` call
+- A streaming `/v1/responses` call
+- A function/tool-calling round trip using `previous_response_id`
+
+## What gets installed
+
+Install the sample dependencies from `requirements.txt`:
+
+```bash
+pip install -r requirements.txt
+```
+
+That installs:
+
+- `foundry-local-sdk` on non-Windows platforms
+- `foundry-local-sdk-winml` on Windows
+- `openai`
+
+The sample downloads/registers Foundry Local execution providers and downloads the `qwen2.5-0.5b` model the first time it runs.
+
+## Run the sample
+
+From this directory:
+
+```bash
+python -m venv .venv
+.\.venv\Scripts\activate
+pip install -r requirements.txt
+python src\app.py
+```
+
+On macOS or Linux, activate the virtual environment with:
+
+```bash
+source .venv/bin/activate
+```
+
+The sample starts the local web service, sends Responses API requests to `http://localhost:<port>/v1`, prints the model output, and then unloads the model and stops the web service.
diff --git a/samples/python/web-server-responses/requirements.txt b/samples/python/web-server-responses/requirements.txt
@@ -0,0 +1,3 @@
+foundry-local-sdk; sys_platform != "win32"
+foundry-local-sdk-winml; sys_platform == "win32"
+openai
diff --git a/samples/python/web-server-responses/src/app.py b/samples/python/web-server-responses/src/app.py
@@ -0,0 +1,152 @@
+# <complete_code>
+# <imports>
+import json
+from typing import Any
+
+from openai import OpenAI
+
+from foundry_local_sdk import Configuration, FoundryLocalManager
+# </imports>
+
+
+def get_response_text(response: Any) -> str:
+    if isinstance(getattr(response, "output_text", None), str):
+        return response.output_text
+    return "".join(
+        getattr(part, "text", "")
+        for item in getattr(response, "output", []) or []
+        for part in getattr(item, "content", []) or []
+        if getattr(part, "type", None) == "output_text"
+    )
+
+
+# <init>
+# Initialize the Foundry Local SDK
+config = Configuration(app_name="foundry_local_samples")
+FoundryLocalManager.initialize(config)
+manager = FoundryLocalManager.instance
+
+# Download and register all execution providers.
+current_ep = ""
+
+
+def _ep_progress(ep_name: str, percent: float):
+    global current_ep
+    if ep_name != current_ep:
+        if current_ep:
+            print()
+        current_ep = ep_name
+    print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)
+
+
+manager.download_and_register_eps(progress_callback=_ep_progress)
+if current_ep:
+    print()
+# </init>
+
+# <model_setup>
+model_alias = "qwen2.5-0.5b"
+model = manager.catalog.get_model(model_alias)
+
+print(f"\nDownloading model {model_alias}...")
+model.download(
+    lambda progress: print(
+        f"\rDownloading model: {progress:.2f}%",
+        end="",
+        flush=True,
+    )
+)
+print("\nModel downloaded")
+
+print("\nLoading model...")
+model.load()
+print("Model loaded")
+# </model_setup>
+
+# <server_setup>
+print("\nStarting web service...")
+manager.start_web_service()
+base_url = manager.urls[0].rstrip("/") + "/v1"
+print("Web service started")
+
+# <<<<<< OPENAI SDK USAGE >>>>>>
+# Use the OpenAI SDK to call the local Foundry web service Responses API
+openai = OpenAI(
+    base_url=base_url,
+    api_key="notneeded",
+)
+# </server_setup>
+
+try:
+    print("\nTesting a non-streaming Responses call...")
+    response = openai.responses.create(
+        model=model.id,
+        input="Reply with one short sentence about local AI.",
+    )
+    print(f"[ASSISTANT]: {get_response_text(response)}")
+
+    print("\nTesting a streaming Responses call...")
+    stream = openai.responses.create(
+        model=model.id,
+        input="Count from one to three.",
+        stream=True,
+    )
+
+    print("[ASSISTANT STREAM]: ", end="", flush=True)
+    for event in stream:
+        if getattr(event, "type", None) == "response.output_text.delta":
+            print(getattr(event, "delta", ""), end="", flush=True)
+    print()
+
+    print("\nTesting Responses tool calling...")
+    tools = [
+        {
+            "type": "function",
+            "name": "get_weather",
+            "description": "Get the current weather. This sample always returns Seattle weather.",
+            "parameters": {
+                "type": "object",
+                "properties": {},
+                "additionalProperties": False,
+            },
+        },
+    ]
+
+    tool_response = openai.responses.create(
+        model=model.id,
+        input="Use the get_weather tool and then answer with the weather.",
+        tools=tools,
+        tool_choice="required",
+        store=True,
+    )
+
+    function_call = next(
+        (item for item in getattr(tool_response, "output", []) or [] if getattr(item, "type", None) == "function_call"),
+        None,
+    )
+    if function_call is None:
+        raise RuntimeError("Expected the model to call get_weather.")
+
+    print(f"[TOOL CALL]: {function_call.name}({function_call.arguments})")
+
+    final_response = openai.responses.create(
+        model=model.id,
+        previous_response_id=tool_response.id,
+        input=[
+            {
+                "type": "function_call_output",
+                "call_id": function_call.call_id,
+                "output": json.dumps({"location": "Seattle", "weather": "72 degrees F and sunny"}),
+            }
+        ],
+        tools=tools,
+    )
+
+    print(f"[ASSISTANT FINAL]: {get_response_text(final_response)}")
+    # <<<<<< END OPENAI SDK USAGE >>>>>>
+finally:
+    # Tidy up
+    openai.close()
+    manager.stop_web_service()
+    model.unload()
+# </complete_code>
diff --git a/sdk/python/test/openai/test_responses_web_service.py b/sdk/python/test/openai/test_responses_web_service.py

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+foundry-local-sdk`
	`2`	`+openai`
	`3`	`+Pillow`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+foundry-local-sdk; sys_platform != "win32"`
	`2`	`+foundry-local-sdk-winml; sys_platform == "win32"`
	`3`	`+openai`