Skip to content

Commit 8521722

Browse files
MaanavDCopilotapsonawane
authored
feat(sdk/python): add Responses API client (#670)
## Summary Implements the OpenAI Responses API client for the Foundry Local Python SDK. HTTP-only, sync pattern matching the existing `chat_client.py` style. ## New files | File | Description | |------|-------------| | `src/openai/responses_types.py` | Full type system: content parts, response items, tools, config, `ResponseObject` (with `output_text` property), all streaming event dataclasses, `parse_streaming_event` factory, `_to_dict` serializer | | `src/openai/responses_client.py` | HTTP client: `ResponsesClient`, `ResponsesClientSettings`, `ResponsesAPIError`. Methods: `create`, `create_streaming` (SSE generator), `get`, `delete`, `cancel`, `get_input_items`, `list` | | `examples/responses.py` | 5 end-to-end scenarios: basic create, streaming, multi-turn, tool calling, vision | | `test/openai/test_responses_client.py` | 56 unit tests with mocked HTTP | | `test/openai/test_responses_integration.py` | 14 integration tests (gated on `FOUNDRY_INTEGRATION_TESTS=1`) | ## Modified files - `foundry_local_manager.py` — `create_responses_client(model_id)` factory - `imodel.py` / `detail/model.py` / `detail/model_variant.py` — factory wired through the model hierarchy - `src/__init__.py` / `src/openai/__init__.py` — all new public types exported ## Test results - **Unit tests**: 56/56 passing (no server needed) - **Integration tests**: 14/14 passing against live `qwen2.5-0.5b` server ## Related Closes #505 (the earlier C# Responses API PR predates this but covers a different SDK) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Akshay Sonawane <asonawane@microsoft.com>
1 parent 4ea0fca commit 8521722

10 files changed

Lines changed: 600 additions & 1 deletion

File tree

samples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@ Explore complete working examples that demonstrate how to use Foundry Local —
1010
|----------|---------|-------------|
1111
| [**C#**](cs/) | 13 | .NET SDK samples including native chat, embeddings, audio transcription, tool calling, model management, web server, and tutorials. Uses WinML on Windows for hardware acceleration. |
1212
| [**JavaScript**](js/) | 13 | Node.js SDK samples including native chat, embeddings, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, and tutorials. |
13-
| [**Python**](python/) | 10 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, and tutorials. |
13+
| [**Python**](python/) | 11 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, Responses API, and tutorials. |
1414
| [**Rust**](rust/) | 9 | Rust SDK samples including native chat, embeddings, audio transcription, tool calling, web server, and tutorials. |

samples/python/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ These samples demonstrate how to use Foundry Local with Python.
1414
| [embeddings](embeddings/) | Generate single and batch text embeddings using the Foundry Local SDK. |
1515
| [audio-transcription](audio-transcription/) | Transcribe audio files using the Whisper model. |
1616
| [web-server](web-server/) | Start a local OpenAI-compatible web server and call it with the OpenAI Python SDK. |
17+
| [web-server-responses](web-server-responses/) | Call a running local OpenAI-compatible web server with the Responses API, including streaming and tool calling. |
1718
| [tool-calling](tool-calling/) | Tool calling with custom function definitions (get_weather, calculate). |
1819
| [langchain-integration](langchain-integration/) | LangChain integration for building translation and text generation chains. |
1920
| [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). |
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Foundry Local Python Vision Sample (Responses API)
2+
3+
This sample demonstrates vision (image understanding) capabilities using the Foundry Local web service and the OpenAI Responses API.
4+
5+
It demonstrates:
6+
7+
- Streaming a vision response via the Responses API
8+
- Uses a default test image (`src/test_image.jpg`) if no image path is provided
9+
10+
## What gets installed
11+
12+
Install the sample dependencies from `requirements.txt`:
13+
14+
```bash
15+
pip install -r requirements.txt
16+
```
17+
18+
That installs:
19+
20+
- `foundry-local-sdk`
21+
- `openai`
22+
- `Pillow` (for image resizing)
23+
24+
The sample downloads the specified model the first time it runs (skips if already cached).
25+
26+
## Run the sample
27+
28+
From this directory:
29+
30+
```bash
31+
python -m venv .venv
32+
.\.venv\Scripts\activate
33+
pip install -r requirements.txt
34+
python src\app.py qwen3.5-0.8b
35+
```
36+
37+
You can also pass a custom image path as the second argument.
38+
39+
On macOS or Linux, activate the virtual environment with:
40+
41+
```bash
42+
source .venv/bin/activate
43+
```
44+
45+
The sample starts the local web service, sends vision requests via the Responses API to `http://localhost:<port>/v1`, prints the model output, and then stops the web service.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
foundry-local-sdk
2+
openai
3+
Pillow
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# <complete_code>
2+
# <imports>
3+
import base64
4+
import io
5+
import sys
6+
7+
from PIL import Image
8+
from openai import OpenAI
9+
10+
from foundry_local_sdk import Configuration, FoundryLocalManager
11+
# </imports>
12+
import os
13+
14+
if len(sys.argv) < 2:
15+
print("Usage: python src/app.py <model_alias> [image_path]")
16+
print(" Example: python src/app.py qwen3.5-0.8b")
17+
sys.exit(1)
18+
19+
model_alias = sys.argv[1]
20+
default_image = os.path.join(os.path.dirname(__file__), "test_image.jpg")
21+
image_path = sys.argv[2] if len(sys.argv) > 2 else default_image
22+
23+
def resize_and_encode(path, max_dim=512):
24+
"""Load and resize a local image, returning (base64_str, media_type)."""
25+
img = Image.open(path)
26+
if max(img.size) > max_dim:
27+
img.thumbnail((max_dim, max_dim))
28+
print(f" (resized to {img.size[0]}x{img.size[1]})")
29+
buf = io.BytesIO()
30+
img.save(buf, format="JPEG")
31+
return base64.b64encode(buf.getvalue()).decode(), "image/jpeg"
32+
33+
34+
# <init>
35+
config = Configuration(app_name="foundry_local_samples")
36+
FoundryLocalManager.initialize(config)
37+
manager = FoundryLocalManager.instance
38+
# </init>
39+
40+
# <model_setup>
41+
model = manager.catalog.get_model(model_alias)
42+
if model is None:
43+
available = [m.alias for m in manager.catalog.list_models()]
44+
print(f"\nModel '{model_alias}' not found in catalog.")
45+
print(f"Available models: {available}")
46+
sys.exit(1)
47+
48+
if not model.is_cached:
49+
print(f"\nDownloading model {model_alias}...")
50+
model.download(
51+
lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True)
52+
)
53+
print("\nModel downloaded")
54+
55+
print("\nLoading model...")
56+
model.load()
57+
print("Model loaded")
58+
# </model_setup>
59+
60+
# <server_setup>
61+
print("\nStarting web service...")
62+
manager.start_web_service()
63+
base_url = manager.urls[0].rstrip("/") + "/v1"
64+
print("Web service started")
65+
66+
# <<<<<< OPENAI SDK USAGE >>>>>>
67+
# Use the OpenAI SDK to call the local Foundry web service Responses API
68+
openai = OpenAI(base_url=base_url, api_key="notneeded")
69+
# </server_setup>
70+
71+
# <inference>
72+
print(f"\nPreparing image: {image_path}")
73+
image_b64, media_type = resize_and_encode(image_path)
74+
75+
vision_input = [
76+
{
77+
"type": "message",
78+
"role": "user",
79+
"content": [
80+
{"type": "input_text", "text": "Describe this image."},
81+
{
82+
"type": "input_image",
83+
"image_data": image_b64,
84+
"media_type": media_type,
85+
},
86+
],
87+
}
88+
]
89+
90+
print("\nStreaming vision response...")
91+
stream = openai.responses.create(
92+
model=model.id,
93+
input="placeholder",
94+
extra_body={"input": vision_input},
95+
stream=True,
96+
)
97+
98+
print("[ASSISTANT]: ", end="", flush=True)
99+
for event in stream:
100+
if getattr(event, "type", None) == "response.output_text.delta":
101+
print(getattr(event, "delta", ""), end="", flush=True)
102+
print()
103+
# </inference>
104+
105+
openai.close()
106+
manager.stop_web_service()
107+
model.unload()
6.67 KB
Loading
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Foundry Local Python Responses Web-Service Sample
2+
3+
This sample starts the Foundry Local OpenAI-compatible web service, then calls the Responses API with the official OpenAI Python client.
4+
5+
It demonstrates:
6+
7+
- A non-streaming `/v1/responses` call
8+
- A streaming `/v1/responses` call
9+
- A function/tool-calling round trip using `previous_response_id`
10+
11+
## What gets installed
12+
13+
Install the sample dependencies from `requirements.txt`:
14+
15+
```bash
16+
pip install -r requirements.txt
17+
```
18+
19+
That installs:
20+
21+
- `foundry-local-sdk` on non-Windows platforms
22+
- `foundry-local-sdk-winml` on Windows
23+
- `openai`
24+
25+
The sample downloads/registers Foundry Local execution providers and downloads the `qwen2.5-0.5b` model the first time it runs.
26+
27+
## Run the sample
28+
29+
From this directory:
30+
31+
```bash
32+
python -m venv .venv
33+
.\.venv\Scripts\activate
34+
pip install -r requirements.txt
35+
python src\app.py
36+
```
37+
38+
On macOS or Linux, activate the virtual environment with:
39+
40+
```bash
41+
source .venv/bin/activate
42+
```
43+
44+
The sample starts the local web service, sends Responses API requests to `http://localhost:<port>/v1`, prints the model output, and then unloads the model and stops the web service.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
foundry-local-sdk; sys_platform != "win32"
2+
foundry-local-sdk-winml; sys_platform == "win32"
3+
openai
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# <complete_code>
2+
# <imports>
3+
import json
4+
from typing import Any
5+
6+
from openai import OpenAI
7+
8+
from foundry_local_sdk import Configuration, FoundryLocalManager
9+
# </imports>
10+
11+
12+
def get_response_text(response: Any) -> str:
13+
if isinstance(getattr(response, "output_text", None), str):
14+
return response.output_text
15+
return "".join(
16+
getattr(part, "text", "")
17+
for item in getattr(response, "output", []) or []
18+
for part in getattr(item, "content", []) or []
19+
if getattr(part, "type", None) == "output_text"
20+
)
21+
22+
23+
# <init>
24+
# Initialize the Foundry Local SDK
25+
config = Configuration(app_name="foundry_local_samples")
26+
FoundryLocalManager.initialize(config)
27+
manager = FoundryLocalManager.instance
28+
29+
# Download and register all execution providers.
30+
current_ep = ""
31+
32+
33+
def _ep_progress(ep_name: str, percent: float):
34+
global current_ep
35+
if ep_name != current_ep:
36+
if current_ep:
37+
print()
38+
current_ep = ep_name
39+
print(f"\r {ep_name:<30} {percent:5.1f}%", end="", flush=True)
40+
41+
42+
manager.download_and_register_eps(progress_callback=_ep_progress)
43+
if current_ep:
44+
print()
45+
# </init>
46+
47+
# <model_setup>
48+
model_alias = "qwen2.5-0.5b"
49+
model = manager.catalog.get_model(model_alias)
50+
51+
print(f"\nDownloading model {model_alias}...")
52+
model.download(
53+
lambda progress: print(
54+
f"\rDownloading model: {progress:.2f}%",
55+
end="",
56+
flush=True,
57+
)
58+
)
59+
print("\nModel downloaded")
60+
61+
print("\nLoading model...")
62+
model.load()
63+
print("Model loaded")
64+
# </model_setup>
65+
66+
# <server_setup>
67+
print("\nStarting web service...")
68+
manager.start_web_service()
69+
base_url = manager.urls[0].rstrip("/") + "/v1"
70+
print("Web service started")
71+
72+
# <<<<<< OPENAI SDK USAGE >>>>>>
73+
# Use the OpenAI SDK to call the local Foundry web service Responses API
74+
openai = OpenAI(
75+
base_url=base_url,
76+
api_key="notneeded",
77+
)
78+
# </server_setup>
79+
80+
try:
81+
print("\nTesting a non-streaming Responses call...")
82+
response = openai.responses.create(
83+
model=model.id,
84+
input="Reply with one short sentence about local AI.",
85+
)
86+
print(f"[ASSISTANT]: {get_response_text(response)}")
87+
88+
print("\nTesting a streaming Responses call...")
89+
stream = openai.responses.create(
90+
model=model.id,
91+
input="Count from one to three.",
92+
stream=True,
93+
)
94+
95+
print("[ASSISTANT STREAM]: ", end="", flush=True)
96+
for event in stream:
97+
if getattr(event, "type", None) == "response.output_text.delta":
98+
print(getattr(event, "delta", ""), end="", flush=True)
99+
print()
100+
101+
print("\nTesting Responses tool calling...")
102+
tools = [
103+
{
104+
"type": "function",
105+
"name": "get_weather",
106+
"description": "Get the current weather. This sample always returns Seattle weather.",
107+
"parameters": {
108+
"type": "object",
109+
"properties": {},
110+
"additionalProperties": False,
111+
},
112+
},
113+
]
114+
115+
tool_response = openai.responses.create(
116+
model=model.id,
117+
input="Use the get_weather tool and then answer with the weather.",
118+
tools=tools,
119+
tool_choice="required",
120+
store=True,
121+
)
122+
123+
function_call = next(
124+
(item for item in getattr(tool_response, "output", []) or [] if getattr(item, "type", None) == "function_call"),
125+
None,
126+
)
127+
if function_call is None:
128+
raise RuntimeError("Expected the model to call get_weather.")
129+
130+
print(f"[TOOL CALL]: {function_call.name}({function_call.arguments})")
131+
132+
final_response = openai.responses.create(
133+
model=model.id,
134+
previous_response_id=tool_response.id,
135+
input=[
136+
{
137+
"type": "function_call_output",
138+
"call_id": function_call.call_id,
139+
"output": json.dumps({"location": "Seattle", "weather": "72 degrees F and sunny"}),
140+
}
141+
],
142+
tools=tools,
143+
)
144+
145+
print(f"[ASSISTANT FINAL]: {get_response_text(final_response)}")
146+
# <<<<<< END OPENAI SDK USAGE >>>>>>
147+
finally:
148+
# Tidy up
149+
openai.close()
150+
manager.stop_web_service()
151+
model.unload()
152+
# </complete_code>

0 commit comments

Comments
 (0)